Sensor Analysis FrameworkLink
When dealing with sensor data, specially with low cost sensors, a great part of the effort needs to be dedicated to data analysis. After a careful data collection, this stage of our experiments is fundamental to extract meaningful conclusions and prepare reports from them. For this reason, we have developed a data analysis framework that we call the Sensor Analysis Framework. In this section, we will detail how this framework is built, how to install it, and make most use of it.
We care for open scienceLink
The framework is writen in Python, and can be run using Jupyter Notebooks or Jupyter Lab. It is intended to provide an state-of-the art data analysis environment, adapted for the uses within the Smart Citizen Project, but that can be easily expanded for other use cases. The ultimate purpose of the framework, is to allow for reproducible research by providing a set of tools that can are replicable, and expandable among researchers and users alike, contributing to FAIR data principles.
The framework integrates with the Smart Citizen API and helps with the analysis of large amounts of data in an efficient way. It also integrates functionality to generate reports in html or pdf format, and to publish datasets and documents to Zenodo.
More familiar with R?
How we use itLink
The main purpose of the framework is to make our lives easier when dealing with various sources of data. Let's see different use cases:
Get sensor data and visualise it
This is probably the most common use case: exploring data in a visual way. The framework allows downloading data from the Smart Citizen API or other sources, as well as to load local csv files. Then, different data explorations options are readily available, and not limited to them due to the great visualisation tools in python. Finally, you can generate html, or pdf reports for sharing the results.
Organise your data in tests
Handling a lot of different sensors can be at times difficult to organise and have traceability. For this, we created the concept of test, which groups a set of devices, potentially from various sources. This is convenient since metadata can be addeed to the test instance describing, for instance, what was done, the calibration data for the device, necessary preprocessing for the data, etc. This test can be later loaded in a separate analysis session, modified or expanded, keeping all the data findable.
Some example metadata that can be stored would be:
- Test Location, date and author
- Kit type and reference
- Sensor calibration data or reference
- Availability of reference equipment measurement and type
A brief schema of the test structure is specified below:
Follow the guide to organize your data in tests.
Clean sensor data
Sensor data never comes clean and tidy in the real world. For this reason, data can be cleaned with simple, and not that simple algorithms for later processing. Several functions are already implemented (filtering with convolution, Kalman filters, anomaly detection, ...), and more can be implemented in the source files.
Model sensor data
Low cost sensor data needs calibration, with more or less complex regression algorithms. This can be done at times with a simple linear regression, but it is not the only case. Sensors generally present non-linearities, and linear models might not be the bests at handling the data robustly. For this, a set of models ir rightly implemented, using the power of common statistics and machine learning frameworks such as sci-kit learn, tensorflow, keras, and stats models.
Guidelines on sensor development
Check our guidelines on sensor deployment to see why this is important in some cases.
Automatisation of all this tools can be very handy at times, since we want to spend less time programming analysis tools than actually doing analysis. Tasks can be programmed in batch to be processed automatically by the framework in an autonomous way. For instance, some interesting use cases of this could be:
- Downloading data from many devices, do something (clean it) and export it to .csv
- Downloading data and generate plots, extract metrics and generate reports for many devices
- Testing calibration models with different hyperparameters, modeling approaches and datasets
Check the guides
One important aspect of our research is to share the data so that others can work on it, and build on top of our results, validate the conclusions or simply disseminate the work done. For this, integration with zenodo is provided to share datasets and reports: