Just a quick post to let you know that it’s now possible to scrape javascript heavy sites easily using our scraping platform morph.io.
This is really useful with Microsoft .NET web sites that often use complicated states stored in javascript with links simulated via javascript posts.
Also, we recently discovered another more worrying example. The main website of the NSW Electoral commission, who oversee state elections in NSW, is “protected” by some anti-scraping technology that stops you from being able to download the contents of a web page without javascript. This is clearly terrible for accessibility and in our case for getting access to basic electoral information which is not available by any other means than scraping.
Thankfully…
PhantomJS is now installed for everyone using the experimental buildpack support.
PhantomJS is essentially a headless browser that you can control from your scraper using javascript or alternatively via wrapper libraries available for most major languages.
If you want to use PhantomJS but are not yet using the buildpack support, use this as a little bit of extra incentive to move over to it. All you need to do is ask us to enable it for you (letting us know which user or organisation you would like it for)