29–31 May 2019
Sykia, Peloponnese, Greece
Europe/Athens timezone

Sharing data with ODI

29 May 2019, 16:00
15m
Sykia, Peloponnese, Greece

Sykia, Peloponnese, Greece

Χylokastro, Corinthia, Greece 20400

Speaker

Dr Daniel Heynderickx

Description

ESA's Open Data Interface (ODI) is a data management system built on a MySQL (or MariaDB) SQL database server. The system consists of a server part to handle downloading and storing of time series data files, performing pre- and/or post-processing, and adding geomagnetic coordinates on request. The second component is a collection of client apllications to retrieve data from the database in several programming languages (php, Python, IDL, MATLAB) using a common API syntax. In addition, two Python REST clients are available: one uses the HAPI (Heliospheric API) specification (which is now endorsed by a COSPAR Resolution), the other allows more tailored queries using WHERE and GROUP BY SQL clauses. The ODI server and client packages can be downloaded from the ESA European Space Software Repository (https://essr.esa.int/).

Data sharing is becoming ever more challenging due to growing data volumes, usage of a variety of file formats (e.g. CSV, CDF, netCDF, PDS, FITS), and non-strict implementation of metadata (e.g. ISTP/CDF, SPASE). Data users have to download potentially large data volumes, interpret and/or complement metadata and write data file access routines for each individual dataset.

For many applications, data access using standardised API's to remote data stores would be sufficient, by-passing the need for building local file downloading, interpretation and processing. In addition, all datasets behind an API can be accessed in the same way, which greatly simplifies building local application software. On the data server side, using SQL databases (such as ODI) simplifies data processing and data selection and retrieval, but other solutions can be used as well, as the remote user is only exposed to the API.

Additional aspects of data sharing include:

  • making available to the community processed datasets, e.g. after applying de-spiking, cleaning or calibration algorithms;
  • selection and processing of data prior to downloading, e.g. data averages, or data binned in geomagnetic coordinates, thus shifting the processing burden to the remote service and vastly reducing download volumes;
  • strict adherence to metadata standards;
  • facilitation of implementation of mirror sites.

We propose the following topics for the discussion session:

  • the HAPI standard does not allow for quantified queries including, for instance, WHERE or GROUP by statements, so it also does not allow to request pre-processing; should the standard be extended, or should existing (e.g. ODI) or future non-HAPI APIs use standardised pre-processing keywords?
  • the emerging de facto metadata standard is SPASE (now endorsed by a COSPAR Resolution); however, the metadata dictionary needs to be extended (e.g. for radiation effects quantities), and the ontology currently does not allow easy definitions of non-scalar quantities (e.g. energy spectra);
  • releasing post-processed datasets after peer review, and/or inclusion of processing algorithms in data store software.

Primary authors

Dr Daniel Heynderickx Hugh Evans

Presentation materials