Publishing Air Quality Data – Good Data, Well Published How technical design decisions make AQ+ data useful as well as open

High friction data publishing

One of the aims of Air Quality+ is to investigate issues of publishing open-ended continuous data streams in an open, useful and reusable way.

Currently, raw air quality monitoring data in Sheffield is made available through this site. Visitors can download data for specific time periods and specific sensors, often in order to:

  1. graph similar kinds of data – for example, aggregate the NO2 sensors from across the city.
  2. perform statistical aggregation or processing.

The issue with the current publishing method is that these operations are ‘high friction’ – it takes a great deal of effort to select and download the required sets of readings, then subsequently use a tool (often a spreadsheet) to manipulate the data into the form required for presentation.

These issues are magnified by the way this proprietary approach is repeated over every city and region. The operations needed for Sheffield air quality investigations are different to the ones needed for, eg, Barnsley, Bristol and Brighton. This means that crafting an air quality widget that can be reused over all local authorities (for example, as a HTML fragment that local authority content owners can drop onto their content management systems) is essentially not possible at the moment because each widget needs to be tightly coupled to local proprietary practice.

A standard semantic ontology

Better With Data has chosen to adopt the Semantic Sensor Network (SSN) as the underlying ontology for our Air Quality+ work. Our hope is to produce a series of adapters and transformations that will convert the local proprietary practice into an easily adopted common standard. Once transformed in this manner, the SPARQL queries used to interrogate the data could be used in any local authority publishing data conforming to the SSN standard.

AQ+ Ontology
This diagram represents the overall structure of the SSN classes in AQ+.

  • An observation is made by a sensing device.
  • A sensing device has an attached location, measures a set measurement property and uses a specified metric.  All are represented as URIs and using Wikipedia where possible so users can find detailed explanations of the concepts involved.

This model lets users who want to extract and manipulate data for their own purposes to select targeted observations based on location, measurement property or time period. In essence, this means that users can obtain the data they want in the way they want directly from the datastore, rather than have to download several separate files, merge and post-process and then manipulate the data.

Natural experience querying

We make no assumptions about how a user wanting the data will structure their enquiry, but we do try to model the semantic structure closely to the users real life understanding and experience – making it easier to form queries and express requirements.

For example, a user wanting hourly average Friday afternoon (between 2pm and 6pm) NO2 readings from within the Sheffield city region, reaching back to the start of measurement, would be faced with a substantial challenge in terms of getting the initial data, processing and manipulating the individual CSV files. With the semantic approach, this data can be immediately selected using a single query – perhaps 5 minutes effort for an experienced SPARQL author.

The best thing though, is that once written, this query will continue to work. If new sensors are added to the network – provided the query is correctly written – new measurements will be simply included in the query output. There is no need to re-run complex data downloading procedures, or to engage in costly rework to re-engineer the information when the dynamic situation on the ground changes. This is because the data is expressed closely to its semantics in real life, rather than as a grid of static data in a single file.

Presenting data to end users

Another aim of AQ+ is to investigate meaningful and purposeful ways of presenting open data to end users.

The project is developing the concept of a ‘display network’ capable of presenting open data visualisations of a smart city – from traffic maps, to air quality, walking routes, permit provision – anything we can think of. By adopting the semantic standards, the project has identified an opportunity to create an ‘auto-discovery’ and ‘self-configuring’ display station. In essence, this station will use a published open data catalog to identify datastreams amenable to display using its registered set of display widgets.

Initially, for Air Quality+, this will involve detecting what sensors are available in the air quality network and automatically presenting these visualisations as options in a kiosk application.

Sheffield data visualisation display with AQ+ data presented in kiosk modeThe image above is a picture of our initial investigations into a kiosk application. The data comes from the live AQ+ semantic store, but is currently configured in a static manner.

This display is an example of the loose coupling described above. If a new sensor is added to the network, this display will automatically add a new ‘blob’ to the diagram representing that station and measurement. If the area of the map is changed, the spatial extent of the query will change and sensors fit for the new map location will be shown.


Ian Ibbotson

Open source developer and contributor working in libraries, local government, culture and learning.