Glen
Glen is an IT Person who also plays games and occasionally writes things. He has a website at glenscott.net and tweets @memoryresident.
Some time ago I wrote a small python script to search and parse out results from an IP: search on Bing. You can get it from the githib repo here. I had to work around some odd/broken behaviour from Bing along the way.
The Bing IP: operator allows you to search for an IP address and return results from any sites sharing that IP. Can be useful, but it was clunky to do manually so I wrote a script to do it all in the console.
I haven’t used it in years but recently thought to dust it off to publish on gitHub, and after searching through some old folders I dug out the most recent version I could find. It needed updating to Python 3 which was a simple matter of updating a few lines to print(), but I noticed some other problems which took longer to debug.
Firstly, the script worked but the result set was really short. After some manual testing I discovered this was due to some apparent bug/faults with Bing itself which have developed since I first wrote the script.
In general, the Bing IP search seems to be quite neglected: if you visit the default search form at bing.com and enter ip:204.79.197.200 – (bing.com) – for example, it returns an empty page (really empty – a blank, 0kb http response body). Removing all but the actual query parameter from the url string causes the page to render, so you end up with a url like this which works:
https://www.bing.com/search?q=ip%3A204.79.197.200
Which is what the script uses.
The main problem however was the truncated results set. The script tries to load more pages but beyond the first page they are empty – even with the parameter stripping hack which works on the first page, the rest are back to an empty response. It turns out that to load additional results pages, the additional URL parameter ‘first’ is required (eg first=11 – start from result 11) and it appears that more than one parameter used with ‘ip:’ alone in the query string breaks the site.
I confirmed this behaviour with a simple test which resulted in an empty response:
https://www.bing.com/search?q=ip%3A204.79.197.200&foo=bar
So it seems something is definitely breaking on the bing backend, and it happens when IP: is used as the query plus additional GET params; however I found that everything worked as expected as long as IP: was not the only search term in the query parameter itself. I still wanted just the results from the ip search without any modifiers/filters, so I tested out some Bing search operators and came up with these workarounds which resulted in all pages loading as advertised:
The () operator means (include these search terms). I left it blank to see what the behaviour was: it seemed to partially work so far as working around the page loading bug. Same with the OR operator, although both of these returned less results than the last one I tried (+) so I stuck with that.
The (+) operator simply means ‘this term must be included’. Seems redundant with only one search term, and I wasn’t sure how it would behave when applied to another search operator, but it worked and returned more results than the previous attempts so I settled on that.
You can check it out on github.
This post started as an email aimed at untangling some concepts around modern web development for a friend. Like myself, he had spent quite some time in infrastructure roles away from the web development space and found understanding the new scene to be a bit of a hurdle.
Understanding how non-traditional web apps worked became necessary for me as a pentester and I had been on the topic for a while at this point, so I wrote this primer mostly as a brain-dump and a way to organise some of the concepts I knew about.
Caveats: Opinionated, I am not a developer, I very likely get things wrong. May be vaguely insulting to front end devs but we are all points on the same curve and I really, honestly bear you no ills. I think modern web apps are wonderous.
You just stepped out of the time machine after arriving from the late 90’s, or early 2000’s. You have, or had, some coding skills, maybe ASP or probably PHP. You even did some development, maybe hacking a few wordpress themes and plugins, or creating some custom websites (this is before they were called ‘web apps’). You setup Windows servers with IIS, or Linux with Apache. The LAMP stack was the main scene. Javascript and CSS were just starting to be more of a thing. You mostly understood how it all worked.
Then you spent the next 15 years happily doing infrastructure or some other non web-dev role. This was your time machine, and now you’ve emerged, you’re dipping your foot back in the web development scene, and things are… different.
If you’re just getting (re)started, there’s a big gap between how things were and the current state of the art and it never stops moving, so there is no wonder it can be disorienting getting back onboard. The new scene can seem very different. There is help and friendly faces everywhere, and quite a few old school people here too, but the current generation has brought a whole new language and culture to navigate.
The good news is, after you’ve learned to interpret the new colloquialisms and peeled back the labels on what’s trendy to see what’s underneath, what you’ll find is still good old fashioned programming (and server) paradigms. The stuff you learned back in class doing C, PHP, even Java are relevant, if only translating the concepts to the current languages.
After being used to coding my own stuff with the blinkers on for a long time, I started out being fairly old-man suspicious of frameworks. ‘I don’t want no layer of hippie abstraction getting in between me and the real code, get off my lawn’ and so on. This was based in ignorance, and once I realised what frameworks are and how they actually work, I got over that fast.
Simply, a framework is a set of libraries – pre-written chunks of code – to do common and useful jobs in your app. A framework includes a whole set of functions, classes and so on done in the language of choice, so when you start making a (web) app, you don’t have to reinvent really common wheels, taking your base language and writing something to do the same job, except badly. Also if it’s an open source framework, it benefits from all the community eyes on the code, especially in areas like efficiency and security.
Effectively, if you make a web app *without* using an existing framework, you’re going to be writing parts of your own framework. Except it probably won’t be as good as what’s out there already.
This quote springs to mind and I think is a good analogy when comparing any particular language to a decent framework written in it:
“Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.”
https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule
Some examples of common jobs a framework might take care of for you:
So frameworks are good.
Last time you heard the term “API” was back in your Computer Science class, right? But you’re safe, because it has to do with systems programming or something you’re unlikely to be doing. Actually, no. In modern web apps, APIs are a big thing.
Quick refresher: an API (Application Programming Interface) is a high level term for a relatively friendly layer between your code and the actual work being done. A programming language or framework could have an API in the form of function calls within the code. So instead of, for example, having to work out how to convince the runtime environment (eg operating system, web browser, etc) to draw a circle on the screen via a highly granular, low level method doing something horrific and error prone like directly manipulating video memory, there would be a higher level API exposed which you could call instead by something like: draw_shape(‘circle’, ‘large’, ‘green’, ‘middle’), which would draw a large green circle in the middle of the screen.
In the same way, a modern web application uses Web APIs in the form of dedicated URLs to get things done in the app. One such URL might look like: https://my.web.site/api/search?query=test which is an interface to a search function on the site, which might be expected to return some useful data. Modern web apps make heavy use of Web APIs, which we will talk about a bit more soon.
Next, let’s talk about the term ‘MVC’ (Model-View-Controller). MVC is a design pattern used to separate logical functionality in the code. It’s a useful idea and lots of frameworks are ‘MVC’ these days (or MVP, or MVVM – but let’s just stick with getting the idea of MVC for now). This is just a way of saying the framework conforms to the defacto standard of providing roughly three main chunks of functionality :
‘Model’ deals with functions to access the database. ‘View’ handles input from the user and creates interfaces. The third element ‘Controller’ is code you write to actually do things in your app, especially stuff which might not be directly related to the framework.
For example, I might make a simple app to check remote RSS feeds for me and render them to a HTML table. I write some controller code which does some web requests on demand to download the feeds from a remote site and pass the data to an API on the model interface to shove it in the database. Then when a user comes along to use the app, some view code generates a welcome screen, followed by an ‘update remote feeds’ button the user can push which calls the controller code, followed by a nice HTML presentation of the data (which it pulled back out of the database via the model interface).
Clear. As. Mud. So that’s MVC.
Next, nuts and bolts: languages, web servers, and so on. There are a lot of labels on everything, so brace yourself. React! Angular! Rails! NGINX! Python! DotNet! Node! Java! Full Stack! Webpack! Ruby! Javascript! CoffeeScript! JSX! JSON! And that’s just the start. It’s difficult to cleanly categorise; it’s more of a venn diagram, but here goes:
Frameworks and webservers – suprisingly, the lines blur. A framework can be ‘full stack’ and implement its own webserver – after all it’s just code, it can do whatever it wants, including serving HTTP requests. Usually there is dedicated code which is much better at being a webserver than the framework itself though, often a server like Apache, NGINX, IIS, so most often the framework just hangs off a ‘real’ webserver, which has functionality (eg WSGI) to interface nicely with your app/framework so they can live together and each part can get on with doing what it does best.
But, it’s useful to keep in mind the framework can be the webserver as well. Django has its own development server built in, for example (with a dedicated webserver recommended for production). When you run this you’re basically running a python script which fires up some (python) webserver code which relays web requests back and forth to the django code. It’s functionally the same as you’d experience in production but a lot easier to run and debug, so that’s where you do your development.
(I’m going to talk about JavaScript frameworks in a minute as a few rules are broken, but we’re getting there).
Languages. You have a bunch, I guess you can make a framework in anything you like, but here’s some of the key players:
Ok. JavaScript gets its very own section.
JavaScript is obviously the language traditionally associated with running cutesy bits of user interface stuff inside a browser, like making a button pop when clicked, or some cool looking tooltip. JavaScript is an OO language which looks kind of like C and mostly nothing like Java. It’s colloquially considered to be a bad language, but as long as it was confined to being used to create UI widgets by harmless ‘web designers‘, limited damage could be done. And it was probably going to stay that way, and who could blame it, it was a defacto standard written in ten days by a guy at Netscape in the 90s and it ran in BROWSERS, nobody expected anyone was in danger of creating skynet with this thing anytime soon.
Then Google came along and basically strapped a rocket engine onto JavaScript by creating a hugely powerful and optimised engine called V8, mainly to make webpages faster in Chrome. V8 is skynet. It compiles JavaScript to native machine code at runtime. And reoptimises it dynamically. This is how you can get entire virtual machines implemented in JavaScript in the freaking browser. Google, what darned heck hast thou wrought.
So, it became a feasible thing to run big chunks of JavaScript in the browser nice and fast. Front end coders rejoiced.
Entire MVC frameworks have thus been implemented in JS. Functionally, when the whole MVC framework is in the browser, when you hit a site, effectively it dumps all or part of the app down the tube to your browser in a bunch of JavaScript files. After that, the webserver falls back to being a HTTP talking database API endpoint, and the JavaScript app running in the browser talks to it via JSON or XML, via HTTP requests. You don’t necessarily even have to be online for the web app to keep working. Just like when you go offline and your Gmail session is still kind of functional – it’s still running everything locally in the browser.
Gmail, by the way, uses the Google sponsored open source JavaScript MVC framework ‘AngularJS’. Doing your web app this way makes heaps of sense if you have a huge user base as it effectively pushes all the interface load off to the client.
Now JavaScript runs fast, there is a lot more of it in use. The web interface designers have rebranded into ‘frontend developers’, and why not, since the language they code in can sure enough be used to make full apps. Even ‘native’ desktop and mobile apps. There are a lot more open source JS projects available, running the gamut from simple widget libraries like jQuery, to user interface libraries like React (“The ‘V’ in MVC”), to full MVC frameworks like Angular.
On a side note, the promotion of JavaScript to major league status has corresponded with an explosion of the number of JavaScript based frameworks and tools, of reportedly wildly varying stability, development and support wise. There are a handful of major ones which can be relied on – jQuery, Angular (Google), and React (Facebook) are a couple which spring to mind, backed by the big players, and sound advice seems to be; use those, at least when starting out, because they’re more likely to stick around.
After all this though, you’re still running JavaScript in the browser which is served by a ‘real’ backend webserver and database.
With JavaScript on the V8 engine running like a ferret on meth, the real pandoras box was opened when a developer decided to implement an entire webserver and framework in one running on top of JS, called NodeJS.
And depending which development circles you run in, opinion on Node can be polarised. Node is a webserver with big ambitions, using a crap language made powerful by Googles engine, populated by a new generation and community of developers, creating applications used by millions of people daily.
The arrival of Node made possible the incursion of interface designers into server side development, an event which may have ruffled some feathers among the old guard, to whom JavaScript was best constrained at the browser, but then when did technological progress every obey the rules? It’s called ‘disruptive’ because it disrupts.
My understanding of Node is limited but I’ll sketch it out.
Node runs as the web server, and is the framework as well (like Django in developer mode). The Node engine itself is actually mostly implemented in C – and it consumes JavaScript as its web app language. It is designed to talk to ‘NoSQL’ databases, which is another topic but is effectively a non-relational database type which runs quite fast with the tradeoff of it being, well, non-relational. (I hear that NoSQL may also seen as suspiciously hippie by some traditional DBAs).
Node has a major advantage with certain types of applications when compared to apps written in more traditional languages; it is implemented to run code asynchronously by exploiting a feature of JavaScript known as the ‘non blocking event loop’. This effectively means it has a single thread which keeps spinning, processing whatever it can, not being blocked while it waits for some job to complete like a network connection or a database call. So it can be very fast/efficient when dealing with apps which require a lot of parallel processing. A bunch of big companies have jumped on the wagon, Node become flavor of the month, and a huge market sprung up for Node devs.
As a true badge of membership to the big leagues, Node has ‘NPM’ – node package manager (like apt or rpm, but for JavaScript modules). This can be managed on the command line of a node server (just like apt-get), making it easy to use and popular with devs. Lots of code is shared via NPM, and interesting and critical chains of dependencies can eventuate, with bad consequences. Some of the Node packages in common use have been considered to reflect poorly on the developers publishing and using them, as these discrete libraries do very simple jobs which are generally thought to be stuff a good coder should be able to implement themselves with low effort. Padding strings, for example. (See also: ‘npmgate‘).
Despite (or because of) its differences to traditional frameworks / engines, Node is very popular.
So there’s a brief overview of a few of the more popular moving parts in contemporary web application development. There is a lot more to dive into and the landscape evolves constantly (and I realise I haven’t even mentioned stuff like websockets, webGL, asm.js, and so on). So there’s a lot to digest if you’re new, but hopefully this gives you the beginning of the lay of the land and went some way to untangling things.
On a personal front, I’ve been having fun tinkering with Python and JavaScript, with a general aim of implementing my backends in Django and Flask, with ReactJS on the frontend. And with game dev always an enticing side track, and advancements like V8, HTML5 and WebGL, JavaScript is a real contender now to make games. A bunch of commercial game development tools (combo IDEs + frameworks) now offer the ability to compile directly to HTML5/javascript/webgl for browser publishing.
A good example is the JS game framework called Phaser (http://phaser.io/) as it looks like a great vector for gamedev and aligns well if anyone wants to do some light game dev and learn more about JS at the same time.
Thanks for reading!
I was using OpenVAS to do some network auditing and accessing report results via the (Greenbone Security assistant) web interface quite often seemed somewhat slow and clunky. The report is downloadable as an XML file though, and I’ve recently been getting familiar with parsing nmap XML files in python, so a bit of scripting later and voila! GOXParse (Glens OpenVAS XML Parser) – a command line tool to quickly search / filter through the openvas scan results.
As an added bonus, you can output a .csv file from an nmap scan using gnxparse.py and feed it to goxparse.py to provide an inline comparison of open ports.
$ ./goxparse.py --help usage: goxparse.py filename.xml [OPTIONS] Glens OpenVas XML Parser (goxparse) positional arguments: file File containing OpenVAS XML report optional arguments: -h, --help show this help message and exit -i, -ips Output unfiltered list of scanned ipv4 addresses -host [HOSTIP] Host to generate a report for -cvssmin [CVSSMIN] Minimum CVSS level to report -cvssmax [CVSSMAX] Maximum CVSS level to report -threatlevel [THREAT] Threat Level to match, LOG/LOW/MEDIUM/HIGH/CRITICAL -matchfile [MATCHFILE] .csv file from which to match open ports, in format HOSTIP,port1,port2,port3 -v, --version show program's version number and exit usage examples: goxparse.py ./scan.xml -ips goxparse.py ./scan.xml -host <HOSTIP> goxparse.py ./scan.xml -cvssmin 5 -cvssmax 8 goxparse.py ./scan.xml -threatlevel HIGH