Tuesday, 29 July 2014

GovHack 2014 post-mortem

Earlier this month, I attended the GovHack 2014 hackathon, along with thousands of other fellow hackers all across the country. This was my first GovHack, but not my first hackathon. My previous hackathon was RHoK and having no idea how GovHack would turn out, I entered the GovHack event with a RHoK-based mindset of how I would expect this hackathon to turn out.

Bad idea.

I learned very quickly there was a major difference between RHoK and GovHack. Going into RHoK, you have an idea about what solutions you will get to hack on over the weekend as problem owners are present to pitch their ideas to the audience of prospective hackers. With GovHack, you need an idea about what solution you want to hack on over the weekend, all they were going to provide was the various open data and APIs. What on earth are we going to build?



So after losing nearly half the weekend to analysis paralysis, our team (named CreativeDrought, wonder why?) agreed with my suggestion of just building a MapGuide-based mashup of various open datasets, most notably, the VicRoads Crash Stats dataset and related transportation data. I obviously knew MapGuide inside-and-out and its capabilities to have a level of confidence that with the remaining weekend we should still be able to crank out some sort of workable solution. At the very least, we'd have a functional interactive map with some open data on it.

And that's the story of our CrashTest solution in a nutshell. It's a Fusion application, packed to the gills with out-of-the-box functionality from its rich array of widgets (including Google StreetView integration). The main objective of this solution was to allow users to view and analyse crash data, sliced and diced along various age, gender, vehicle type and various socio-economic parameters.



MapGuide's rich out-of-the-box capabilities, Maestro's rapid authoring functionality and GDAL/OGR's ubiquitous data support greatly helped us. I knew with this trio of tools, that we could assemble an application together in the remaining day and a bit left that we had to actually "hack" on something.

Sadly, we only got as far as putting the data on the map for the most part. Our team spent more time frantically trying to massage various datasets via ogr2ogr/Excel/GoogleDocs into something more usable than actually writing lines of code! Seriously VicRoads? Pseudo-AMG? Thank goodness I found the necessary proj4 string for this cryptic coordinate system so that we could re-project a fair chunk of the VicRoads spatial data into a coordinate system that better reflects the world we want to mash this data up with!

Still, our "solution" should hopefully still open up a lot of "what if" scenarios. Imagine looking at a cluster of accident events, not being able to ascertain any real patterns or correlation and so you then fire up the StreetView widget and lo-and-behold, Google StreetView providing additional insights that a birds-eye view could not. Also imagine the various reporting and number crunching possibilities that are available by tapping into the MapGuide API. Imagine what other useful information you could derive if we had more time to put up additional useful datasets. We didn't get very far on any of the above ideas, so just imagine such possibilities if you will :)

So here's our entry page if you want to have a look. It includes a working demo URL to a Amazon EC2 hosted instance of MapGuide. Getting acquainted with Amazon Web Services and putting MapGuide up there was an interesting exercise and much easier than I thought it would be, though I didn't have enough time to use the AWS credits I redeemed over the weekend to momentarily lift this demo site out of the free usage tier range performance-wise. Still, the site seems to perform respectably well on the free usage tier.

Also on that page is a link to a short video where we talk about the hack. Please excuse the sloppy editing, it was obviously recorded in haste in a race against time. Like the solution and/or the possibilities it can offer? Be sure to vote on our entry page.

Despite the initial setbacks, I was happy with what we produced given the severely depleted time constraints imposed on us. I think we got some nice feedback demo-ing CrashTest in person at the post-mortem event several days later, which is always good to hear. Good job team!


So what do I think could be improved with GovHack?
  • Have a list of hack ideas (by participants who actually have some ideas) up some time before the hackathon starts. This would facilitate team building, letting participants with the skills, but without ideas easily gravitate towards people/teams with the ideas.
  • The mandatory video requirement for each hack entry just doesn't work in its current form. Asking teams to produce their own videos puts lots of unnecessary stress on teams, who not only have to come up with the content for their video, but have to also deal with the logistics of producing said video. I would strongly prefer that teams who can/want to make their own video do so, while other teams can just do a <= 3 minute presentation and have that be recorded by the GovHack organisers. Presentations also lets teams find out how other teams fared over the weekend. While everyone else in the ThoughtWorks Melbourne office was counting down to the end of the hackathon, I was still frantically trying to record my lines and trying not to flub them! I raided the office fridge for whatever free booze that remained just to calm myself down afterwards. I don't want to be in that situation ever again!
  • Finally, the data itself. So many "spatial" datasets as CSV files! So many datasets with no coordinates, but have addresses, horribly formatted addresses, adding even more hoops to geocode them. KML/KMZ may be a decent consumer format, but it is a terrible data source format. If ogr2ogr can't convert your dataset, and requires a manual intervention of QGIS to fix it, then perhaps it's better to use a different spatial data format. Despite my loathing of its limitations, SHP files would've been heavily preferred for all of the above cases. I've made my thoughts known on the GovHack DataRater about the quality of some of these datasets we had to deal with and got plenty of imaginary ponies in the process.
Despite the above points, the event as a whole was a lot of fun. Thanks to the team (Jackie and Felicity) for your data wrangling and video production efforts.


Also thanks to Jordan Wilson-Otto and his flickr photostream where I was able to get some of these photos for this particular post.

Would I be interested in attending the 2015 edition of GovHack? Given I am now armed with 20/20 hindsight, yes I would!

No comments: