Towards a Data Generation Tool for NoSQL Data Stores
Main Author: | Hasan Mahmud |
---|---|
Format: | info Lainnya |
Terbitan: |
, 2018
|
Subjects: | |
Online Access: |
https://zenodo.org/record/1477538 |
Daftar Isi:
- In both industry and academic areas, datasets with different characteristics are needed for experimental purposes. However, real data may be difficult to obtain due to its scarcity and privacy policies. Moreover, there are many situations where researchers cannot make use of the real data due to its availability in incompatible formats and/or insufficient volume (there are many cases that the developers may require large volumes of data, e.g. benchmarking, machine learning, etc). The work presented in this thesis has been undertaken in the context of generating test data onto different data stores format at different volume to create a replacement of real data (or at least generate as real data as possible with respect to the real world application needs). Through this work, we present a concept of data generation approaches where the data is generated by extracting the data schema from the user’s given sample data sets. Then the system uses that extracted schema to generate new data that closely follow the real data key and value pattern for different data stores. By this way, the end user can get the flexibility to create data onto a different data format just from one sample JSON formatted dataset. In our long-term vision, we see our concept of data generation approach as a part in the research community in various domains (such as benchmarking) to get more flexibility on the data manipulation and to test a wider set of scenarios in their application.
- Masters thesis