This post is part of a series of posts that provide step-by-step instructions on how to write a simple web scraper using Ruby on morph.io. If you find any problems, let us know in the comments so we can improve these tutorials.
In the last post we started writing our scraper and gathering some data. In this post we’ll expand our scraper to get more of the data we’re after.
So now that you’ve got the title for the first member get the electorate (the place the member is ‘member for’) and party.
Looking at the page source again, you can see this information is in the first and second <dd>
elements in the member’s <li>
.
<li>
<p class='title'>
<a href="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">
The Hon Ian Macfarlane MP
</a>
</p>
<p class='thumbnail'>
<a href="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">
<img alt="Photo of The Hon Ian Macfarlane MP" src="http://parlinfo.aph.gov.au/parlInfo/download/handbook/allmps/WN6/upload_ref_binary/WN6.JPG" width="80" />
</a>
</p>
<dl>
<dt>Member for</dt>
<dd>Groom, Queensland</dd>
<dt>Party</dt>
<dd>Liberal Party of Australia</dd>
<dt>Connect</dt>
<dd>
<a class="social mail" href="mailto:Ian.Macfarlane.MP@aph.gov.au"
target="_blank">Email</a>
</dd>
</dl>
</li>
Get the electorate and party by first getting an array of the <dd>
elements and then selecting the one you want by its index in the array. Remember that [0]
is the first item in an Array.
Try getting the data in your irb
session:
>> page.at('.search-filter-results').at('li').search('dd')[0].inner_text
=> "Groom, Queensland"
>> page.at('.search-filter-results').at('li').search('dd')[1].inner_text
=> "Liberal Party of Australia"
Then add the code to expand your member
object in your scraper.rb
:
member = {
title: page.at('.search-filter-results').at('li').at('.title').inner_text.strip,
electorate: page.at('.search-filter-results').at('li').search('dd')[0].inner_text,
party: page.at('.search-filter-results').at('li').search('dd')[1].inner_text
}
Save and run your scraper using bundle exec ruby scraper.rb
and check that your object includes the attributes with values you expect.
OK, now you just need the url for the member’s individual page. Look at that source code again and you’ll find it in the href
of the <a>
inside the <p>
with the class title
.
In your irb
session, first get the <a>
element:
>> page.at('.search-filter-results').at('li').at('.title a')
=> #<Nokogiri::XML::Element:0x3fca485cfba0 name="a" attributes=[#<Nokogiri::XML::Attr:0x3fca48432a18 name="href" value="http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6">] children=[#<Nokogiri::XML::Text:0x3fca4843b5c8 "The Hon Ian Macfarlane MP">]>
You get a Nokogiri XML Element with one attribute. The attribute has the name “href” and the value is the url you want. You can use the attr() method
here to return this value:
>> page.at('.search-filter-results').at('li').at('.title a').attr('href')
=> "http://www.aph.gov.au/Senators_and_Members/Parliamentarian?MPID=WN6"
You can now add this final attribute to your member object in scraper.rb
:
member = {
title: page.at('.search-filter-results').at('li').at('.title').inner_text.strip,
electorate: page.at('.search-filter-results').at('li').search('dd')[0].inner_text,
party: page.at('.search-filter-results').at('li').search('dd')[1].inner_text,
url: page.at('.search-filter-results').at('li').at('.title a').attr('href')
}
Save and run your scraper file to make sure all is well. This is a good time to do another git commit
to save your progress.
Now you’ve written a scraper to get information about one member of Australian Parliament. It’s time to get information about all the members on the first page.
Currently you’re using page.at('.search-filter-results').at('li')
to target the first list item in the members list. You can adapt this to get every list item using the search()
method:
page.at('.search-filter-results').search('li')
Use a ruby each
loop to run your code to collect and print your member object once for each list item.
page.at('.search-filter-results').search('li').each do |li|
member = {
title: li.at('.title').inner_text.strip,
electorate: li.search('dd')[0].inner_text,
party: li.search('dd')[1].inner_text,
url: li.at('.title a').attr('href')
}
p member
end
Save and run the file and see if it collects all the members on the page as expected. Now you’re really scraping!
You still don’t have all the members though, they are split over 3 pages and you only have the first. In our next post we’ll work out how to deal with this pagination.
Who comments in PlanningAlerts and how could it work better?
In our last two quarterly planning posts (see Q3 2015 and Q4 2015), we’ve talked about helping people write to their elected local councillors about planning applications through PlanningAlerts. As Matthew wrote in June, “The aim is to strengthen the connection between citizens and local councillors around one of the most important things that local government does which is planning”. We’re also trying to improve the whole commenting flow in PlanningAlerts.
I’ve been working on this new system for a while now, prototyping and iterating on the new comment options and folding improvements back into the general comment form so everybody benefits.
About a month ago I ran a survey with people who had made a comment on PlanningAlerts in the last few months. The survey went out to just over 500 people and we had 36 responders–about the same percentage turn-out as our PlanningAlerts survey at the beginning of the year (6% from 20,000). As you can see, the vast majority of PlanningAlerts users don’t currently comment.
We’ve never asked users about the commenting process before, so I was initially trying to find out some quite general things:
The responses include some clear patterns and have raised a bunch of questions to follow up with short structured interviews. I’m also going to have these people use the new form prototype. This is to weed out usability problems before we launch this new feature to some areas of PlanningAlerts.
Here are some of the observations from the survey responses:
Older people are more likely to comment in PlanningAlerts
We’re now run two surveys of PlanningAlerts users asking them roughly how old they are. The first survey was sent to all users, this recent one was just to people who had recently commented on a planning application through the site.
Compared to the first survey to all users, responders to the recent commenters survey were relatively older. There were less people in their 30s and 40s and more in their 60s and 70s. Older people may be more likely to respond to these surveys generally, but we can still see from the different results that commenters are relatively older.
Knowing this can help us better empathise with the people using PlanningAlerts and make it more usable. For example, there is currently a lot of very small, grey text on the site that is likely not noticeable or comfortable to read for people with diminished eye sight—almost everybody’s eye sight gets at least a little worse with age. Knowing that this could be an issue for lots of PlanningAlerts users makes improving the readability of text a higher priority.
There’s a good understanding that comments go to planning authorities, but not that they go to neighbours signed up to PlanningAlerts
To “Who do you think receives your comments made on PlanningAlerts?” 86% (32) of responders checked “Local council staff”. Only 35% (13) checked “Neighbours who are signed up to PlanningAlerts”. Only one person thought their comments also went to elected councillors.
There seems to be a good understanding amongst these commenters that their comments are sent to the planning authority for the application. But not that they go to other people in the area signed up to PlanningAlerts. They were also very clear that their comments did not go to elected councillors.
In the interviews I want to follow up on this are find out if people are positive or negative about their comments going to other locals. I personally think it’s an important part of PlanningAlerts that people in an area can learn about local development, local history and how to impact the planning process from their neighbours. It seems like an efficient way to share knowledge, a way to strengthen connections between people and to demonstrate how easy it is to comment. If people are negative about this then what are their concerns?
“I have no idea if the comments will be listened to or what impact they will have if any”
There’s a clear pattern in the responses that people don’t think their comments are being listened to by planning authorities. They also don’t know how they could find out if they are. One person noted this as a reason to why they don’t make more comments.
Giving people simple access to their elected local representatives, and a way to have a public exchange with them, will hopefully provide a lever to increase their impact.
“I would only comment on applications that really affect me”
There was a strong pattern of people saying they only comment on applications that will effect them or that are interesting to them:
How do people decide if an application is relevant to them? Is there a common criteria?
Why don’t you comment on more applications? “It takes too much time”
A number of people mentioned that commenting was a time consuming process, and that this prevented them from commenting on more applications:
What are people’s basic processes for commenting in PlanningAlerts? What are the most time consuming components of this? Can we save people time?
“I have only commented on applications where I have a knowledge of the property or street amenities.”
A few people mentioned that they feel you should have a certain amount of knowledge of an application or area to comment on it, and that they only comment on applications they are knowledgeable about.
How does someone become knowledgeable about application? What is the most important and useful information about applications?
Comment in private
A small number of people mentioned that they would like to be able to comment without it being made public.
Suggestions & improvements
There were a few suggestions for changes to PlanningAlerts:
Summing up PlanningAlerts
We also had a few comments that are just nice summaries of what is good about PlanningAlerts. It’s great to see that there are people who understand and can articulate what PlanningAlerts does well:
Next steps
If we want to make using PlanningAlerts a intuitive and enjoyable experience we need to understand the humans at the centre of it’s design. This is a small step to improve our understanding of the type of people who comment in PlanningAlerts, some of their concerns, and some of the barriers to commenting.
We’ve already drawn on the responses to this survey in updating wording and information surrounding the commenting process to make it better fit people’s mental model and address their concerns.
I’m now lining up interviews with a handful of the people who responded to try and answer some of the questions raised above and get to know them more. They’ll also show us how they use PlanningAlerts and test out the new comment form. This will highlight current usability problems and hopefully suggest ways to make commenting easier for everyone.
Design research is still very new to the OpenAustralia Foundation. Like all our work, we’re always open to advice and contributions to help us improve our projects. If you’re experienced in user research and want to make a contribution to our open source projects to transform democracy, please drop us a line or come down to our monthly pub meet. We’d love to hear your ideas.