Towards Capturing Data Curation Provenance using Frictionless Data Package Pipelines

Main Authors: Shepherd, Adam, Rauch, Shannon, Schloer, Conrad, Kinkade, Danie, Ake, Hannah, Biddle, Mathew, Copley, Nancy, Saito, Mak, Wiebe, Peter, York, Amber
Format: Proceeding poster
Terbitan: , 2018
Subjects:
Online Access: https://zenodo.org/record/1451679
Daftar Isi:
  • Abstract At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process.