Ruby web scraping tutorial on morph.io – Part 3, continue writing your scraper

This post is part of a series of posts that provide step-by-step instructions on how to write a simple web scraper using Ruby on morph.io. If you find any problems, let us know in the comments so we can improve these tutorials.


In the last post we started writing our scraper and gathering some data. In this post we’ll expand our scraper to get more of the data we’re after.

So now that you’ve got the title for the first member get the electorate (the place the member is ‘member for’) and party.

Looking at the page source again, you can see this information is in the first and second <dd> elements in the member’s <li>.

<li>
  <p class='title'>
    <a href="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">
      The Hon Ian Macfarlane MP
    </a>
  </p>
  <p class='thumbnail'>
    <a href="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">
      <img alt="Photo of The Hon Ian Macfarlane MP" src="http://parlinfo.aph.gov.au/parlInfo/download/handbook/allmps/WN6/upload_ref_binary/WN6.JPG" width="80" />
    </a>
  </p>
  <dl>
    <dt>Member for</dt>
    <dd>Groom, Queensland</dd>
    <dt>Party</dt>
    <dd>Liberal Party of Australia</dd>
    <dt>Connect</dt>
    <dd>
      <a class="social mail" href="mailto:Ian.Macfarlane.MP@aph.gov.au"
      target="_blank">Email</a>
    </dd>
  </dl>
</li>

Get the electorate and party by first getting an array of the <dd> elements and then selecting the one you want by its index in the array. Remember that [0] is the first item in an Array.

Try getting the data in your irb session:

>> page.at('.search-filter-results').at('li').search('dd')[0].inner_text
=> "Groom, Queensland"
>> page.at('.search-filter-results').at('li').search('dd')[1].inner_text
=> "Liberal Party of Australia"

Then add the code to expand your member object in your scraper.rb:

member = {
  title: page.at('.search-filter-results').at('li').at('.title').inner_text.strip,
  electorate: page.at('.search-filter-results').at('li').search('dd')[0].inner_text,
  party: page.at('.search-filter-results').at('li').search('dd')[1].inner_text
}

Save and run your scraper using bundle exec ruby scraper.rb and check that your object includes the attributes with values you expect.

OK, now you just need the url for the member’s individual page. Look at that source code again and you’ll find it in the href of the <a> inside the <p> with the class title.

In your irb session, first get the <a> element:

>> page.at('.search-filter-results').at('li').at('.title a')
=> #<Nokogiri::XML::Element:0x3fca485cfba0 name="a" attributes=[#<Nokogiri::XML::Attr:0x3fca48432a18 name="href" value="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">] children=[#<Nokogiri::XML::Text:0x3fca4843b5c8 "The Hon Ian Macfarlane MP">]>

You get a Nokogiri XML Element with one attribute. The attribute has the name “href” and the value is the url you want. You can use the attr() method here to return this value:

>> page.at('.search-filter-results').at('li').at('.title a').attr('href')
=> "http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6"

You can now add this final attribute to your member object in scraper.rb:

member = {
  title: page.at('.search-filter-results').at('li').at('.title').inner_text.strip,
  electorate: page.at('.search-filter-results').at('li').search('dd')[0].inner_text,
  party: page.at('.search-filter-results').at('li').search('dd')[1].inner_text,
  url: page.at('.search-filter-results').at('li').at('.title a').attr('href')
}

Save and run your scraper file to make sure all is well. This is a good time to do another git commit to save your progress.

Now you’ve written a scraper to get information about one member of Australian Parliament. It’s time to get information about all the members on the first page.

Currently you’re using page.at('.search-filter-results').at('li') to target the first list item in the members list. You can adapt this to get every list item using the search() method:

page.at('.search-filter-results').search('li')

Use a ruby each loop to run your code to collect and print your member object once for each list item.

page.at('.search-filter-results').search('li').each do |li|
  member = {
    title: li.at('.title').inner_text.strip,
    electorate: li.search('dd')[0].inner_text,
    party: li.search('dd')[1].inner_text,
    url: li.at('.title a').attr('href')
  }

  p member
end

Save and run the file and see if it collects all the members on the page as expected. Now you’re really scraping!

You still don’t have all the members though, they are split over 3 pages and you only have the first. In our next post we’ll work out how to deal with this pagination.

Posted in Morph | Tagged , , | Leave a comment

Who comments in PlanningAlerts and how could it work better?

In our last two quarterly planning posts (see Q3 2015 and Q4 2015), we’ve talked about helping people write to their elected local councillors about planning applications through PlanningAlerts. As Matthew wrote in June, “The aim is to strengthen the connection between citizens and local councillors around one of the most important things that local government does which is planning”. We’re also trying to improve the whole commenting flow in PlanningAlerts.

I’ve been working on this new system for a while now, prototyping and iterating on the new comment options and folding improvements back into the general comment form so everybody benefits.

About a month ago I ran a survey with people who had made a comment on PlanningAlerts in the last few months. The survey went out to just over 500 people and we had 36 responders–about the same percentage turn-out as our PlanningAlerts survey at the beginning of the year (6% from 20,000). As you can see, the vast majority of PlanningAlerts users don’t currently comment.

We’ve never asked users about the commenting process before, so I was initially trying to find out some quite general things:

  • What kind of people are commenting currently?
  • How do they feel about the experience of commenting?
  • How easily do they get through the process of commenting?
  • Do people see the comments as a discussion between neighbours or just a message to council? or both?
  • Who do they think these comments go to? Do they understand the difference between the council organisation and the councillors?

The responses include some clear patterns and have raised a bunch of questions to follow up with short structured interviews. I’m also going to have these people use the new form prototype. This is to weed out usability problems before we launch this new feature to some areas of PlanningAlerts.

Here are some of the observations from the survey responses:

Older people are more likely to comment in PlanningAlerts

We’re now run two surveys of PlanningAlerts users asking them roughly how old they are. The first survey was sent to all users, this recent one was just to people who had recently commented on a planning application through the site.

Compared to the first survey to all users, responders to the recent commenters survey were relatively older. There were less people in their 30s and 40s and more in their 60s and 70s. Older people may be more likely to respond to these surveys generally, but we can still see from the different results that commenters are relatively older.

Knowing this can help us better empathise with the people using PlanningAlerts and make it more usable. For example, there is currently a lot of very small, grey text on the site that is likely not noticeable or comfortable to read for people with diminished eye sight—almost everybody’s eye sight gets at least a little worse with age. Knowing that this could be an issue for lots of PlanningAlerts users makes improving the readability of text a higher priority.

Comparing recent commenters to all PlanningAlerts users
Age group All users Recent commenters
30s 20% 11%
40s 26% 14%
50s 26% 28%
60s 18% 33%
70s 5% 8%

There’s a good understanding that comments go to planning authorities, but not that they go to neighbours signed up to PlanningAlerts

To “Who do you think receives your comments made on PlanningAlerts?” 86% (32) of responders checked “Local council staff”. Only 35% (13) checked “Neighbours who are signed up to PlanningAlerts”. Only one person thought their comments also went to elected councillors.

There seems to be a good understanding amongst these commenters that their comments are sent to the planning authority for the application. But not that they go to other people in the area signed up to PlanningAlerts. They were also very clear that their comments did not go to elected councillors.

In the interviews I want to follow up on this are find out if people are positive or negative about their comments going to other locals. I personally think it’s an important part of PlanningAlerts that people in an area can learn about local development, local history and how to impact the planning process from their neighbours. It seems like an efficient way to share knowledge, a way to strengthen connections between people and to demonstrate how easy it is to comment. If people are negative about this then what are their concerns?

I have no idea if the comments will be listened to or what impact they will have if any

There’s a clear pattern in the responses that people don’t think their comments are being listened to by planning authorities. They also don’t know how they could find out if they are. One person noted this as a reason to why they don’t make more comments.

  • I have no real way of knowing whether my concerns are given any attention by local council.
  • I have no idea if the comments will be listened to or what impact they will have if any
  • I believe that the [council] are going to go ahead and develop, come what may. However, if I and others don’t comment/object we will be seen as providing tacit approval to Council’s actions
  • Insufficient tools and transparency of processes from Planning Panel.
  • I don’t feel I have any influence. I was just sharing my observations, or thoughts with like minded people who may. (have influence)
  • I do get the ‘Form Letter’ from Council but I’m not in any way convinced they listen.
  • The process of being alerted and expressing an opinion works well but whether it has any effect is doubtful.
  • Although councils do respond to my comments, it is just an automated reply. The replies from City of Sydney are quite informative but the ones from Marrickville pretty meaningless.
  • I am not in any way convinced anyone listens. A previous mayor stated he ONLY listens to people whose property directly adjoins the building site.
  • –I know it’s money that matters, not people

Giving people simple access to their elected local representatives, and a way to have a public exchange with them, will hopefully provide a lever to increase their impact.

I would only comment on applications that really affect me

There was a strong pattern of people saying they only comment on applications that will effect them or that are interesting to them:

  • I would only comment on applications that really affect me, don’t want to just restrict any application.
  • Not many are that relevant / interest me.
  • Sometimes it doesn’t feel like it is right making comments that don’t directly impact
  • I target the ones that are most important
  • Only interested in applications which either reflect major planning and development issues for the district as a whole (eg approval for demolition of old houses or repurposing of industrial structures) or which affect the immediate location around where I live.
  • I comment on those that affect my area
  • I only comment on applications that may effect my immediate area.
  • Comment on those that I get that are significant,ie: not on normal sheds,pools,dwellings etc.
  • only comment on ones that I feel directly impact myself or my suburb
  • I would only comment on an application, that adversely affected me or my community.
  • Not all relevant to me. Also don’t want to be seen as simply negative about a lot of the development
  • A lot are irrelevant to my interest.

How do people decide if an application is relevant to them? Is there a common criteria?

Why don’t you comment on more applications? “It takes too much time

A number of people mentioned that commenting was a time consuming process, and that this prevented them from commenting on more applications:

  • Time – not so much in writing the response but in being across the particulars of DAs and being able to write an informed response.
  • Not enough time in my life – I restrict myself to those most relevant to me
  • Time poor
  • It takes too much time, but one concern is that it generates too much paper and mail from the council.

What are people’s basic processes for commenting in PlanningAlerts? What are the most time consuming components of this? Can we save people time?

I have only commented on applications where I have a knowledge of the property or street amenities.

A few people mentioned that they feel you should have a certain amount of knowledge of an application or area to comment on it, and that they only comment on applications they are knowledgeable about.

How does someone become knowledgeable about application? What is the most important and useful information about applications?

Comment in private

A small number of people mentioned that they would like to be able to comment without it being made public.

  • Would like an option to remain private on the internet – eg a “name withheld” type system.
  • Should be able to make comments in confidence ie only seen by council, not other residents
  • I prefer not to have my name published on the web. The first time I commented it wasn’t clear that the name was published.

Suggestions & improvements

There were a few suggestions for changes to PlanningAlerts:

  • Should be able to cut and paste photos diagrams, sketches etc.
  • I was pleased that the local council accepted the comments as an Objection. But it was not clear in making the comment that it would be going to the council.
  • There could be a button to share the objection via other social media or a process to enforce the council to contact us.
  • Some times it is hard to find a document to comment on if I don’t know the exact details, The search function is complex.

Summing up PlanningAlerts

We also had a few comments that are just nice summaries of what is good about PlanningAlerts. It’s great to see that there are people who understand and can articulate what PlanningAlerts does well:

  • PlanningAlerts removes the hurdles. I hear about developments I would not have otherwise known about, and I can quickly provide input without having to know any particular council processes.
  • Its an efficient system. I’m alerted to the various viewpoints of others.
  • Because it shares my opinion with other concerned people as well as council. Going directly to council wouldn’t share it with others concerned.

Next steps

If we want to make using PlanningAlerts a intuitive and enjoyable experience we need to understand the humans at the centre of it’s design. This is a small step to improve our understanding of the type of people who comment in PlanningAlerts, some of their concerns, and some of the barriers to commenting.

We’ve already drawn on the responses to this survey in updating wording and information surrounding the commenting process to make it better fit people’s mental model and address their concerns.

I’m now lining up interviews with a handful of the people who responded to try and answer some of the questions raised above and get to know them more. They’ll also show us how they use PlanningAlerts and test out the new comment form. This will highlight current usability problems and hopefully suggest ways to make commenting easier for everyone.

Design research is still very new to the OpenAustralia Foundation. Like all our work, we’re always open to advice and contributions to help us improve our projects. If you’re experienced in user research and want to make a contribution to our open source projects to transform democracy, please drop us a line or come down to our monthly pub meet. We’d love to hear your ideas.

Posted in PlanningAlerts.org.au | Tagged , , , , , , , , , | Leave a comment

Ruby web scraping tutorial on morph.io – Part 2, start writing your scraper

This post is part of a series of posts that provide step-by-step instructions on how to write a simple web scraper using Ruby on morph.io. If you find any problems, let us know in the comments so we can improve these tutorials.


In the past post we set up our scraper. Now we’re going to start out writing our scraper.

It can be really helpful to start out writing your scraper in an interactive shell. In the shell you’ll get quick feedback as you explore the page you’re trying to scrape, instead of having to run your scraper file to see what your code does.

The interactive shell for ruby is called irb. Start an irb session on the command line with:

> bundle exec irb

The bundle exec command executes your irb command in the context of your project’s Gemfile. This means that your specified gems will be available.

The first command you need to run in irb is:

>> require 'mechanize'

This loads in the Mechanize library. Mechanize is a helpful library for making requesting and interacting with webpages.

Now you can create an instance of Mechanize that will be your agent to do things like ‘get’ pages and ‘click’ on links:

>> agent = Mechanize.new

You want to get information for all the members you can. Looking at your target page it turns out the members are spread across several pages. You’ll have to scrape all 3 pages to get all the members. Rather than worry about this now, lets start small. Start by just collecting the information you want for the first member on the first page. Reducing the complexity as you start to write your code will make it easier to debug as you go along.

In your irb session, use the Mechanize get method to get the first page with members listed on it.

>> url = "https://morph.io/documentation/examples/australian_members_of_parliament"
>> page = agent.get(url)

This returns the source of your page as a Mechanize Page object. You’ll be pulling the information you want out of this object using the handy Nokogiri XML searching methods that Mechanize loads in for you.

Let’s review some of these methods.

at()

The at() method returns the first element that matches the selectors provided. For example, page.at(‘ul’) returns the first <ul> element in the page as a Nokogiri XML Element that you can parse. There are a number of ways to target elements using the at() method. We’re using a css style selector in this example because many people are familiar with this style from writing CSS or jQuery. You can also target elements by class, e.g. page.at('.search-filter-results'); or id, e.g. page.at('#content').

search()

The search() method works like the at() method, but returns an Array of every element that matches the target instead of just the first. Running page.search('li') returns an Array of every <li> element in page.

You can chain these methods together to find specific elements. page.at('.search-filter-results').at('li').search('p') will return an Array of all <p> elements found within the first <li> element found within the first element with the class .search-filter-results on the page.

You can use the at() and search() methods to get the first member list item on the page:

>> page.at('.search-filter-results').at('li')

This returns a big blob of code that can be hard to read. You can use the inner_text() method to help work out if got the element you’re looking for: the first member in the list.

>> page.at('.search-filter-results').at('li').inner_text
=> "\n\nThe Hon Ian Macfarlane MP\n\n\n\n\n\nMember for\nGroom,Queensland\nParty\nLiberal Party of Australia\nConnect\n\nEmail\n\n\n"

Victory!

Now that you’ve found your first member, you want to collect their title, electorate, party, and the url for their individual page. Let’s start with the title.

If you view the page source in your browser and look at the first member list item, you can see that the title of the member, “The Hon Ian Macfarlane MP”, is the text inside the link in the <p> with the class ‘title’.

<li>
  <p class='title'>
    <a href="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">
      The Hon Ian Macfarlane MP
    </a>
  </p>
  <p class='thumbnail'>
    <a href="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">
      <img alt="Photo of The Hon Ian Macfarlane MP" src="http://parlinfo.aph.gov.au/parlInfo/download/handbook/allmps/WN6/upload_ref_binary/WN6.JPG" width="80" />
    </a>
  </p>
  <dl>
    <dt>Member for</dt>
    <dd>Groom, Queensland</dd>
    <dt>Party</dt>
    <dd>Liberal Party of Australia</dd>
    <dt>Connect</dt>
    <dd>
      <a class="social mail" href="mailto:Ian.Macfarlane.MP@aph.gov.au"
      target="_blank">Email</a>
    </dd>
  </dl>
</li>

You can use the .inner_text method here.

>> page.at('.search-filter-results').at('li').at('.title').inner_text
=> "\nThe Hon Ian Macfarlane MP\n"

There it is: the title of the first member. It’s got messy \n whitespace characters around it though. Never fear, you can clean it up with the Ruby method strip.

>> page.at('.search-filter-results').at('li').at('.title').inner_text.strip
=> "The Hon Ian Macfarlane MP"

You’ve successfully scraped the first bit of information you want.

Now that you’ve got some code for your scraper, let’s add it to your scraper.rb file and make your first commit.

You’ll want to come back to your irb session, so leave it running and open your scraper.rb file in your code editor. Replace the commented out template code with the working code from your irb session.

Your scraper.rb should look like this:

require 'mechanize'

agent = Mechanize.new
url = 'https://morph.io/documentation/examples/australian_members_of_parliament'
page = agent.get(url)

page.at('.search-filter-results').at('li').at('.title').inner_text.strip

You actually want to collect members with this scraper, so create a member object and assign the text you’ve collected as it’s title:

require 'mechanize'

agent = Mechanize.new
url = 'https://morph.io/documentation/examples/australian_members_of_parliament'
page = agent.get(url)

member = {
  title: page.at('.search-filter-results').at('li').at('.title').inner_text.strip
}

Add a final line to the file to help confirm that everything is working as expected.

p member

You can now, back in on the command line in the folder for your project, run this file in Ruby:

> bundle exec ruby scraper.rb

The scraper runs and the p command returns your member:

> bundle exec ruby scraper.rb
{:title=>"The Hon Ian Macfarlane MP"}

This is a good time to make your first git commit for this project. Yay!

In our next post we’ll work out how to scrape more bits of information from the page.

Posted in Morph | Tagged , , | Leave a comment

We’ve got a lot to finish in 2015 – here is our plan

Catch a ferry to Cockatoo Island for a planning session? Why not.

Catch a ferry to Cockatoo Island for a planning session? Why not.

Every three months the core team at the OpenAustralia Foundation gets together for a day to plan our next quarter ahead. It’s a good time to review what we got done, how we’re feeling about our work, what we want to get done next and to make any course corrections. It’s all in the context of a broader plan we mapped out at the start of the year.

Last Tuesday, the 6th of October, was our last planning session for 2015. We like to escape our normal working environments for these sessions to make sure our thinking is a bit fresher than usual. Thanks to the beautiful weather in Sydney lately we decided to take the ferry to Cockatoo Island.

After our last planning session in June we said it was going to be a really busy 3 months. It sure was.

Because of some much needed holidays, for most of the time we were back to just 2 full time people. This turned out to be good preparation for Matthew joining the Digital Transformation Office.

By and large we did what we set out to do last quarter but with some interesting additions, including an exciting international collaboration. A few months ago we started discussing a project to bring They Vote For You to the Ukraine with some partners, Open North and OPORA. That project was confirmed last quarter and we’ve already started work on it and we’re making great progress.

To make way for this Henare didn’t start work on PlanningAlerts – Write To Your Councillor as we’d planned and instead Luke plunged into the most vital bit of the project – making it work beautifully and simply within the existing PlanningAlerts application. We’re continuing work on this major project over the next quarter.

A problem that both Luke and Henare encountered over the last couple of months was the big overhead of context switching between so many little projects and jobs. We were really keen to avoid this in our next plan however it’s clear that we have a lot of small things to get done before the end of the year. A positive part of the plan ahead is that we’ve managed to allow Luke to mainly focus on the important work of our next major project.

Here’s what’s on the agenda

The brain dump of what we’ve been up to and, roughly, what the next 3 months will be like

October

Another great scraping workshop

In August we ran a scraping workshop. It was lots of fun and we had excellent feedback so we’re doing it all again on October the 25th. At the time of writing there are still places available if you want to register to join us.

After the last workshop Luke developed a detailed scraping tutorial. We’re in the process of publishing a series of blog posts of this tutorial in time for the next scraping course. You can already take a look at the first post of the series.

More work on They Vote For You for Ukraine. And a bit of a break

We’re planning to keep progressing TVFY for Ukraine but we’ll also take a short break to do some other work while the OPORA team start work on developing the translations. This will allow us to pick the work back up in November and make a lot more progress quickly when we do.

PlanningAlerts – Write To Your Councillor

Throughout October Luke will almost exclusively work on this project with the ambitious goal of launching a focussed minimum viable project early in November. This will likely involve rolling this feature out to a small number of councils to see what works and what needs improvement.

Charging commercial users of PlanningAlerts

Luke and Henare worked on this feature over August but so far we haven’t had a lot of take up. We still have a number of important questions unanswered and Henare is hoping to spend some time here and there trying to answer those questions and make this work.

It’s still one of the most promising opportunities we have to financially support the work the Foundation does but we’re not going to spend forever chasing something that doesn’t work. This quarter will be the decider.

Server upgrade

Last quarter we spent the time to verify a viable strategy for a vital server upgrade. We’ll start planning this in October and schedule it in for a weekend in the near future.

November

Write To Your Councillor & Ukraine

At the start of November Luke and Henare will work together on hopefully rolling out the first stages of the PlanningAlerts feature to write to your councillor. They’ll then switch to working together on TVFY for Ukraine, which Henare will do for the rest of the month.

Saying Hi to Melbourne

Since the start of this year we’ve been wanting to connect with the Melbourne civic tech community. It’s just been a matter of finding the right time.

Recently the good people at the Melbourne chapter of Open Knowledge Foundation Australia contacted us to see if we’d like to come and chat with them about what we do – we said of course, of course! Henare will head down around the 25th of November and is also hoping to say hi to some of the other tech and transparency communities in Melbourne.

December

Party Time

To celebrate all our achievements in 2015, and all you great people who’ve helped us on the way, we’re going to have a little party on the 6th of December. It’ll be pretty low key and casual (what else?) and will probably involve us heading to a park somewhere and enjoying some sunshine, a drink, and something to eat. We’d love for you to join us.

Write To Your Councillor

We’ll begin to look at the infrastructure of this project in December so we can start rolling it out more widely. Hopefully we’ll also have learnt a little about how people are using it so that we can also improve the design before a more general rollout in the new year.

Full steam ahead!

Posted in OpenAustralia Foundation, Planning | Leave a comment

Ruby web scraping tutorial on morph.io – Part 1, setting up your scraper

This post is part of a series of posts that provide step-by-step instructions on how to write a simple web scraper using Ruby on morph.io. If you find any problems, let us know in the comments so we can improve these tutorials.


 

With just a few lines of code, you can write a scraper to collect data from messy web pages and save it in a structured format you can work with.

This tutorial will take you through the process of writing a simple scraper. This tutorial uses the Ruby programming language, but you can apply the steps and techniques to any language available on morph.io.

Over this tutorial you will:

  • create a scraper on morph.io
  • clone it using git to work with on your local machine
  • make sure you have the necessary dependencies installed
  • write scraping code to collect information from a website
  • publish and run your scraper on morph.io

In this first instalment you’ll create a scraper, clone it to your machine, and install

You’ll use morph.io, the command line and a code editor on your local machine.

Let’s get started.

Find the data you want to scrape

In this tutorial you’re going to write a simple scraper to collect information about the elected members of Australia’s Federal Parliament. For each member let’s capture their title, electorate, party, and the url for their individual page on the Parliament’s website.

The data you want to scrape needs to be available on the web. We’ve copied a basic list of members from the Parliament’s website to https://morph.io/documentation/examples/australian_members_of_parliament for practice scraping. You will target this page to get the member information with your scraper.

senators_and_members

The simplified list of Australian MPs for you to scrape on morph.io

Some web pages are much harder to scrape than others. The member information you’re trying to collect is published in a simple HTML list, which means you should be able to target and collect the information you want quite easily. If the information was in an image or PDF then it would be much harder to access programmatically and therefore much harder to write a scraper for.

Now that you’ve found the data you want to scrape and you know you can scrape it, the next step is to set up your scraper.

Create your scraper on morph.io and clone it to your machine

The easiest way to get started is to create a new scraper on morph.io.

Select the language you want to write your scraper in. This tutorial uses Ruby, so let’s go with that.

new_scraper

Fill out the new scraper form

If you are a member of organisations on GitHub, you can set the owner of your scraper to be either your own account or one of your organisations.

Choose a name and description for your scraper. Use keywords that will help you and others find this scraper on morph.io in the future. Let’s call this scraper “tutorial_members_of_australian_parliament” and describe it as “Collects members of Australia’s Federal Parliament (tutorial)”.

Click “Create Scraper”!

After morph.io has finished creating the new scraper you are taken to your fresh scraper page. You want to clone all the template scraper code morph.io provides to your local machine so you can work with it there.

On the scraper page there is a heading “Scraper code”, with a button to copy the “git clone URL”. This is the link to the GitHub repository of your scraper’s code. Click the button to copy the link to your clipboard.

clone_repository

Commands you’ll need to enter to clone your repository

Open your computer’s command line and cd to the directory you want to work in. Type git clone then paste in the url you copied to get something like:

git clone https://github.com/username/tutorial_members_of_australian_parliament.git

This command pulls down the code from GitHub and adds it to a new directory called nsw_parliament_current_session_bills. Change to that directory with cd tutorial_members_of_australian_parliament and then list the files with ls -al. You should see a bunch of files including:

  • scraper.rb, the file that morph.io runs and that you’ll write your scraping code in
  • Gemfile, which defines the dependencies you’ll need to run your scraper.

Now that you have the template scraper on your local machine, you need to make sure you have the necessary software installed to run it.

Installing Ruby

Installing Ruby is out of the scope of this tutorial but there are lots of good guides on the web. You might like to use something like RailsInstaller that takes care of this for you. Tools like rbenv or rvm can also be helpful for installing and switching Ruby versions on your computer.

Install the required libraries

In the Gemfile, you’ll see a Ruby version and two libraries specified:

ruby "2.0.0"

gem "scraperwiki", git: "https://github.com/openaustralia/scraperwiki-ruby.git", branch: "morph_defaults"
gem "mechanize"

This is template code that helps you get started by defining some basic dependencies for your scraper. You can read more about language versions and libraries in the morph.io documentation.

You can use Bundler to manage a Ruby project’s dependencies. Run, bundle install on the command line to check the Gemfile and install any libraries (called gems in Ruby) that are required.

So far you’ve set up all your files, cloned them to your machine, and installed the necessary dependencies. In our next post it’ll be time to write your scraper!

Posted in Morph | Tagged , , | Leave a comment

Matthew and the Digital Transformation Office

The Digital Transformation Office was established in July of this year, by the then Minister, now Prime Minister Malcolm Turnbull, to transform government and make government services simpler, clearer, faster and more humane.

The cornerstone of this huge endeavour is putting users first. Design and build government services for people, not for government and all kinds of amazing things will happen.

This is a mission that is close to my heart. After all Kat Szuminska and I started the OpenAustralia Foundation with the goal of getting people more actively involved in the political process. This led us to create services with people at the centre.

The creation of the Digital Transformation Office brings a once in a generation opportunity to make things better and I want to help realise this amazing opportunity.

That is why I’m joining the Digital Transformation Office.

In order for the OpenAustralia Foundation to clearly and definitively maintain its independence I will step down as a director of the foundation.

This is certainly not a decision I take lightly. On a personal level I would have loved to stay on as a member of the board but with a much less active day-to-day involvement with the running of the foundation and the development of its projects. However, that would not have been the best thing for the foundation. In very practical terms the foundation needs to be able to praise government when it is doing the right thing and criticise government when it is doing the wrong thing. This is only possible with true independence.

I won’t be disappearing. In fact I still plan to contribute as a volunteer.

The foundation is People.

I want to thank Henare Degan for his unflailing “getting shit done” attitude and approach – nothing is too hard and little is too serious to not laugh about.

I want to thank Kat Szuminska for inciting me at just the right time with some carefully chosen words. She’s the true radical, yet ever-patient, with a keen bullshit detector and always the one to ask the questions that get to the core of a problem.

And to all the volunteers and donors over the years, there are far too many of you to mention here. Thank you for everything. Thank you for your support. Thank you for everything you’ve done.

For me everything started with a talk in 2004 given by Tom Steinberg, Tom Loosemore and Stefan Magdalinski. It was the launch of a new website TheyWorkForYou.com. It was really an accident that I was there. Little did I know where it would lead. Thank you to Tom, Tom and Stef for that.

For inventing what we now know as civic tech I want to thank Julian Todd, Francis Irving, Chris Lightfoot, Matthew Somerville and Tom Steinberg. You have all been a continuing source of inspiration to me. Thank you.

I’m very proud of what we’ve achieved at the OpenAustralia Foundation. We’ve helped millions of Australians connect with their communities, governments and politicians. We’ve made tools to help people create the change they want to see.

With Henare, Kat and Luke, the OpenAustralia Foundation is in excellent hands and I very much look forward to seeing it develop and grow into the future. I can’t wait to see what they will make next!

The foundation, not being part of government, and with full independence is still in a unique position to do things that no one else can do. What we need for Australia is a rich and diverse ecosystem of governmental organisations and non-governmental organisations all working together for the best possible outcome for citizens.

This is one of many reasons why what the OpenAustralia Foundation does is now more important than ever.

Posted in Announcement, OpenAustralia Foundation | Tagged , , | 4 Responses

Civic Tech Monthly, September 2015

Welcome to the eighth edition of Civic Tech Monthly. Below you’ll find news and notes about civic tech from Australia and around the world.

As always we’d love to see you at the OpenAustralia Foundation Sydney Pub Meet
next Tuesday if you’re in town. Come along and give a lightning talk about something interesting in civic tech you’ve seen or done.

If you know someone who’d like this newsletter, pass it on: http://eepurl.com/bcE0DX.

News and Notes

A massive week for They Vote For You

Australia got a new Prime Minister last week. As the media detailed the step by step drama, tens of thousands of people visited They Vote For You to see how the players had voted in Parliament.

Over 30,000 people have visited They Vote For You since the leadership challenge was announced. This is over three times as many people as on the project’s launch day when it featured prominently in the Guardian Australia. In a week of so many words, it was fantastic to see people looking for some real information about what MPs do in Parliament.

Web scraping workshop success—we’re doing it again in October

We ran our first Introduction to Web Scraping Workshop a few weeks ago and it was a big success.

You can read more about how it went on information graphics designer Kelly Tall’s blog. We also blogged about the things we learned in this first experiment. We’re looking forward to seeing what everyone does with their new scraping skills!

We were really pleased and impressed with how much everyone learned—so we’re doing it all again. We’re still locking down our date and venue, but it will almost certainly be on Sunday, 25th October, near Central in Sydney. There will be 10 places available.

If you want to learn how to scrape structured data for your projects then let us know you’re interested via email or twitter. Then we can contact you when registrations open shortly.

EveryPolitician – 200 countries and counting

For Global Legislative Openness Week 2015 EveryPolitician set the ambitious target to add 66 more nations to their project and pass the 200 mark—and they did it! People from all around the world contributed research and scrapers to hit the goal.

EveryPolitician is an open, free repository of information on politicians from (now) over 200 national parliaments that you can use in your projects.

FOI and eyes wide shut: even public servants want to know

Suelette Dreyfus made a clear case for a more open, reliable Freedom of Information system in Australia to a conference hall full of public servants in August. Read a transcript of the address in The Mandarin.

Dreyfus argues that a more open government isn’t only in the public interest, but can also protect the independence of public servants and foster public trust in government.

Freedominfo.org is creating a deliberative process exemption library

In Freedom of Information law deliberative process exemptions are rules designed to protect the open and frank flow of advice and discussion inside government by restricting it from public access.

Freedominfo.org is creating a library of model examples of these rules and summaries of national laws that you can compare. How do your government’s rules compare to others? What would you change?

If you’re government isn’t listed yet you can contribute it to the resource.

Economist Joseph Stiglitz discusses the justification for these kinds of public access exemptions and their impact in his 1999 lecture on the importance of transparency to good governance (PDF).

Leadership transition at Open North

Canadian civic tech organisation Open North have a new executive director Jean-Noé Landry as founder James McKinney moves on to new things.

We’re looking forward to seeing how Open North continue to grow with a new director and can’t wait to see what James does next.

In this post James shares his current thoughts about where he fit as an individual inside the organisation he founded and looks back on his approach as executive director.

FoxScan

Since foxes were introduced into Australia in 1871 they have caused huge damage to native species. Foxes are a designated pest throughout the country.

FoxScan gives information to people trying to fix this problem. You report sightings of foxes and the damage they cause and Foxscan—rather than just sucking all this data into a black hole—tries to make it available to everyone. It shares some similarities with GrowStuff in that it encourages people to share disparate information and helps practitioners working independently to contribute to a social good.

morph.io got a big new server and continues to grow

A couple of months ago we celebrated the 3000th scraper running on morph.io. In the last week we passed 3600 scrapers and there’s now 3000 people using the platform—yikes!

With all this new use morph.io’s server was beginning to struggle. Some scrapers were also requiring more memory than we could allocate them.

After a helpful discussion with some of the most active morph.io users we upgraded to a bigger, more powerful server. You can now scrape bigger documents and run more memory intensive processes, as we increased the memory allowance for your scraper runs from 100 MB to 512 MB.

If morph.io is useful to you, please become a supporter to keep it running and open to all.

Posted in Civic Tech Monthly | Tagged , , , , , , , , , , , , , , | Leave a comment

A massive week for They Vote For You

Forget what politicians say. What truly matters is what they do. And what they do is vote, to write our laws which affect us all.
They Vote For You

Australia got a new Prime Minister last week. As the media detailed the step by step drama, tens of thousands of people visited and shared They Vote For You to see how the players had actually voted in Parliament.

Well over 30,000 people have visited They Vote For You since the leadership challenge was announced. This is over three times as many people as on the project’s launch day when it featured in the Guardian Australia’s comment section and Datablog.

Most people looked at the new Prime Minister Malcolm Turnbull’s voting record, while others looked up their own representative or the newly promoted ministers when they were announced. Dozens more people have subscribed to be notified when policies that interest them are updated and we even saw one wonderful person start contributing summaries of divisions. The vast majority of people came from Facebook (70%!), and over 90% came from social media more generally.

In a week of so many words, it was fantastic to see people looking for and sharing real information about what MPs do in Parliament. We hope to see more people holding their representatives accountable for their votes on the laws that effect our society.

Henare went on FBi Radio’s Backchat on Saturday morning to talk about the project and why people are looking for the kind of information it provides.

Posted in Media, They Vote For You | Tagged , , , , , , | Leave a comment

Web scraping workshop success—we’re doing it again in October!

Photo of Henare from OpenAustralia Foundation with the wonderful attendee of the first Introduction to Web Scraping Workshop, September 2015

Henare from OpenAustralia Foundation with the wonderful attendees of the first Introduction to Web Scraping Workshop, September 2015

Earlier this month we ran our first Introduction to Web Scraping Workshop and it was a big success! We had 6 wonderful attendees. By the afternoon everyone had written a scraper to collect data from the web.

You can read more about how it went on information graphics designer Kelly Tall’s blog. We’re looking forward to seeing what everyone does with their new scraping skills!

We owe a huge thanks to the Media, Entertainment, & Arts Alliance for providing a great venue and to Fleetwood Macciatto for a delicious lunch.

We were really pleased and impressed with how much everyone learned—so we’re doing it all again.

If you want to learn how to scrape structured data for your projects then…

The next Introduction to Web Scraping Workshop is Sunday, 25th October, near Central in Sydney.

Register here

There are 10 places available.

If have any questions please email (contact@oaf.org.au) or contact us on twitter (@OpenAustralia).

Things we learned from our first workshop

Pairing works well

At the OpenAustralia Foundation we’ve found Pair programming to be one of the best ways to learn together and transfer knowledge between people.

In planning the workshop we thought pairing would be a good way for people to learn and problem solve, and also that it would be useful to get some experience working in this style.

We paired attendees up to write their scrapers and it was a big success. The pairs had fun working through problems together and were able to remind each other of the techniques they’d been shown when necessary.

We got very positive feedback on this format, so we’ll definitely do it again.

A reference guide would have been useful

We didn’t provide hand-outs with a step by step guide, but rather focused on a detailed, live walk-through of writing a real scraper.

While the live programming was an engaging and realistic intro to scraping, a simple list of methods and techniques could have been a useful aid during and after the workshop. We’ve since had feedback that a resource like this would help avoid easy mistakes after the workshop, such as missing the bundle exec command before running irb or ruby scraper.rb. We’ll prepare something like this for the next event.

Luke drafted a detailed guide to writing a first scraper based on Henare’s demonstration in the workshop—hopefully we can draw on this to create a simpler list of steps and techniques for attendees.

Setup is tricky

Getting your laptop set up for scraping is not straightforward. Only 3 of the 6 attendees managed to get Ruby and the necessary scraping libraries installed for the workshop. This wasn’t a problem on the day because we had already planned to be working in pairs, but it would be better if everyone went home with their laptop ready to run the scraper they had written.

We’re not sure of the best way to handle this problem. We want to keep the workshop focused on scraping rather than get bogged down installing Ruby.

We’re very open to ideas on this. We’ll start by recommending that attendees try a free Ruby on Rails Installfest event that aims to help people get past this often confusing step.

These workshop events are still an experiment for the OpenAustralia Foundation so we’re very open to ideas and keen for feedback.

Remember to get in contact if you want to attend the next workshop in late October as space is limited. You can email us at contact@oaf.org.au.

Posted in Event | Tagged , , , | 3 Responses

Civic Tech Monthly, August 2015

Welcome to the seventh edition of Civic Tech Monthly. Below you’ll find news and notes about civic tech from Australia and around the world.

It seems like August has been a busy month for civic hackers everywhere because this month we’ve got a bunch of new projects from around the world for you.

As always we’d love to see you at the OpenAustralia Foundation Sydney Pub Meet next Tuesday if you’re in town. Last month’s lightning talks and new venue in Surry Hills were a big success. We’re doing it all again this month so come along and tell us something interesting in civic tech you’ve seen or done.

If you know someone who’d like this newsletter, pass it on: http://eepurl.com/bcE0DX.

News and Notes

Introduction to Web Scraping Workshop in Sydney

Join us in Sydney in two weeks time to learn how to write a web scraper. You can use a scraper to quickly grab all kinds of information for analysis and processing. Scrapers are the backbone of OpenAustralia Foundation projects such as They Vote For You and PlanningAlerts—we’re always finding new ways that scrapers can help, and we’re keen to share this skill. You can find out more and register via our blog post.

This is a hands-on, half-day workshop and by the afternoon you will have written a web scraper. Tickets are $295 and the workshop is in Redfern, Sydney on Friday the 4th of September. You will need a laptop and some programming experience to attend but you don’t have to be an expert. If you know what a variable, loop and array are, then this is for you.

This is a new experiment for the OpenAustralia Foundation. We’re trying out a new approach to help people who want to make their own projects. If you can’t make the workshop, but know someone who might be interested, we’d really appreciate if you could pass this along.

Thank you Tom

It was Tom Steinberg’s last day at mySociety earlier this month. He’s made a huge contribution to civic tech everywhere. It’s been our absolute pleasure to work with him over the years and we’d like to especially thank him for being generous with his advice whenever we asked. We can’t wait to see what he gets up to after his chillax and we wish him very well.

We also love this delightfully mad post from the excellent Francis Irving celebrating Tom and all the other amazing people that have contributed to mySociety over the years – “Those brief moments when winning seems possible”.

Soft launch of Freedom of Information Portal for Malaysia

You can now use Malaysia’s first Freedom of Information portal to ask their governments for information. Of course it’s built on Alaveteli!

By law, we have Freedom of Information Enactments (FOIE) in the following two states: Selangor & Penang. However, we have also added the avenue to request for information from Public Authorities not covered by FOI in order to gauge what citizens want to know at the Federal level.

This is a nice example of one of the Civic Patterns we like to follow at OAF: “When designing a service, make your process reflect the legal rules that you wish existed, instead of those that do. Reality will catch up.”

Yo Quiero Saber 2015

This month Argentines got to compare the basic positions of candidates using a simple game. First you have to state your position on an issue, then the you see the positions of the candidates. This dynamic leads you through to see where you stand on a range of issues compared to to all the candidates.

The feedback so far is that people felt informed and used it to pick candidates that better represented their views. Martín Szyszlican, one of the creators, says they’ll keep running and developing the project, and that the big challenge is to reach more people. This time they had over 1% of voters, doubling their audience in 2013, but not enough to impact election results.

Help work out the gender balance of 100 parliaments

Which country has the highest proportion of women in parliament? Do women vote differently on issues like defence, the environment, or maternity benefits? Exactly when did women come into power in different countries, and did their presence change the way the country was run? Frustratingly, these are questions for which it’s difficult to provide an answer, because the objective data just isn’t there … So we created Gender Balance, an easy game that crowd-sources gender data across every parliament in the world.

Believe it or not, but you can have fun, learn about your parliament, and generate useful open data all at the same time. Get to it!

OpenPlanning Launches in Hampshire, UK

Hampshire Hub have teamed up with mySociety to prototype a tool to demystify the local planning process. This is a new take on making local planning records more useful. You can read more in Ben Nickolls blog post on the project.

In excellent open source fashion, this prototype is forked from the OpenAustralia Foundation’s PlanningAlerts, which in turn is based the UK’s original planningalerts.com. As civicpatterns.org says “Don’t Reinvent The Wheel”

It looks like the team is considering how to merge some of their upgrades back into PlanningAlerts.org.au, which would be truly fantastic.

Dare to talk about your civic tech mistakes — submit your failure story

In a small room full of friends, talking about failure should be easy. But for some reason — maybe because of the relative novelty of using more expensive technologies for social innovation — people working around civic technology are not used to admitting when their projects don’t work. And while this sort of dishonesty might help with short-term opportunities (especially when it comes to funding), it has a serious effect on long-term sustainability: We cannot learn from our mistakes.

Tell the Sunlight Foundation about your unsuccessful projects so they can work out the patterns and help us all improve our work.

CKAN meetup and Hacks/Hackers in Sydney

In more Sydney event news, after the scraping workshop we can walk down to the CKAN meetup. There’ll be lots of people with experience working with and publishing open data.

Hacks/Hackers is also back on in Sydney. The next meetup, Data Journalism and Investigations in Political Reporting, is on Wednesday evening, September 16. You should give a lightning talk if you’re in town.

How far does your MP tread the party line?

You can greatly improve or reduce the usability of a resource just by changing its text. This is an interesting post about the impact of wording in a civic tech site and how the TheyWorkForYou team approached improving it.

101 web scraping and research tasks for the data journo (or civic hacker)

If you want to learn scraping or polish your skills (and can’t make it to our workshop ;-) ) then here’s 101 tasks to keep you busy.

Help develop the Influence Mapping Toolbox

You can help the development of new tools to map the role of personal ties and economic interests in politics. If you have an influence mapping project, or are interested in getting one started, you can help the team with their initial research and shape development of their new tools. You’ll also find lots of interesting discussion on the topic of influence mapping at the Influence Mapping Google Group.

Posted in Civic Tech Monthly | Tagged , , , , , , , , , , , , , , , , , , , | Leave a comment