uiHRDC
Main Author: | Martínez-Prieto, Miguel A. |
---|---|
Other Authors: | Fariña, Antonio, Claude, Francisco, Navarro, Gonzalo |
Format: | Dataset |
Terbitan: |
Mendeley
, 2018
|
Subjects: | |
Online Access: |
https:/data.mendeley.com/datasets/xxntkjvtxw |
Daftar Isi:
- uiHRDC (universal indexes for Highly Repetitive Document Collections) is a replication framework licensed under the GNU Lesser General Public License v2.1 (GNU LGPL). It includes all the required elements to reproduce the main experiments of the paper [1], including datasets, query patterns, source code and scripts. The general structure of the uiHRDC repository includes: i) a directory benchmark which contains a LATEX formatted report and a script that will collect all the data files resulting from running all the experiments and will generate a PDF report with all the most relevant figures; ii) a directory data, which includes the text collections (7z compressed), and the query patterns. iii) directories indexes and self-indexes that contain the source code for each indexing alternative, and scripts that permit to run all the experiments for each technique (it includes the construction of each compressed index of interest (using a builder program) and then performing both locate and extract operations over that index (using the corresponding searcher program). Each experiment will output relevant data to a results-data file); and iv) a script doAll.sh that will drive all the process of decompressing the source collections; compiling the sources for each index and running the experiments with it; and finally, generating the final report. [1] F. Claude, A. Fariña, M. A. Martínez-Prieto, and G. Navarro. Universal Indexes for Highly Repetitive Document Collections. Information Systems, 61:1–23, 2016.