Towards Capturing Data Curation Provenance using Frictionless Data Package Pipelines
Main Authors: | Shepherd, Adam, Rauch, Shannon, Schloer, Conrad, Kinkade, Danie, Ake, Hannah, Biddle, Mathew, Copley, Nancy, Saito, Mak, Wiebe, Peter, York, Amber |
---|---|
Format: | Proceeding poster |
Terbitan: |
, 2018
|
Subjects: | |
Online Access: |
https://zenodo.org/record/1451679 |
Daftar Isi:
- Abstract At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process.