Linked Open Data for Air Quality+ Solid progress made to open up app development possibilities

The weekend has seen some solid steps forward with the technical platform collecting air quality data and presenting it as linked open data. Here are some of the highlights:

Recent Highlights

We’re now running incremental updates from the official published data. The agents run every 20 minutes and nibble away at the (near) real time data stream.

The advantage of our datagrid is that it’s now much easier to ask questions which cover the whole dataset – and that dataset is extensible. For example, instead of being limited to presenting data about one sensor over a specific period it is now possible to ask questions of, for example, all O2 sensors within the following geographical area.

In the process, we’ve found some new friends like @CleanAirLondon. This is quite exciting as widgets we develop to visualise this data can be applied to any loaded dataset following the semantic sensor network ontology. So, if we collect data from any sensor network any widgets we develop will be immediately applicable. For city dashboard applications we can develop meaningful visualisations that can have a worldwide applicability.

Anyone interested in writing an agent that will send AQ data to us from their own sensors is very much invited to get in touch!

We’ve also started to clean up some of the Sheffield data – initially like making sure we have 4 digit years and a few other validity checks. It’s become clear, quite quickly though, that being able to see all the data in one place highlights anomalous data.

Next steps

We need to develop data publishing agents which will take the homogeneous Semantic Sensor Network data and turn it into CSV files suitable for re-publication in the Sheffield Socrata repository.

We’re developing a data submission app so that community run sensors can contribute (manually or automatically) their readings to the data grid. This will give us substantially more fidelity in geographic coverage.

There are also some experiments in progress that allow data from the triplestore to be piped directly into the google visualisation platform.

We’ve also got a challenge modelling mobile sensors in a sensible (and storage friendly) way.

Some cool stuff

The linked open data is really showing it’s value. Sensors from the Sheffield health and safety network are tagged with the “platform” scc_air_quality” (see this SPARQL Query).  The agent collecting these readings needs some properties entirely proprietary to the Sheffield City Council network. You’ll see in the query result above some predicates like ‘uri://opensheffield.org/properties#sensorId.’ This facility is incredibly helpful for those writing agents to collect data.

For example, if we wish to ingest official data from Bath, Leeds or London, it is likely that agents for those city platforms will need their own state. This can be happily stored in the database without polluting the more general semantic sensor network ontology.

Every measurement gets a URI – this means we can correct, comment, annotate, group and analyse at the lowest level possible.

Here are some cool queries for those of you so inclined::

Check back soon for more updates!

Ian Ibbotson

Open source developer and contributor working in libraries, local government, culture and learning.