Replication package for Makar
Main Author: | Pooja |
---|---|
Format: | info software Journal |
Bahasa: | eng |
Terbitan: |
, 2020
|
Subjects: | |
Online Access: |
https://zenodo.org/record/4434822 |
Daftar Isi:
- Replication Package for the tool "Makar: A Framework for Multi-source Studies based on Unstructured Data" # RP-Makar-tool Replication Package for the tool "Makar: A Framework for Multi-source Studies based on Unstructured Data" ## Structure ``` Data/ stackoverfow_questions_with_answers_by_tags.csv apache_mailing_list.csv mailing_lists_ASF_@dev_@users_1.csv mailing_lists_ASF_@dev_@users_2.csv quora.csv sample_stackoverfow_questions_with_answers_by_tags.csv Schemas/ apache_mailing_lists.json quora.json stackoverfow_questions_answers_by_tag.json stackoverfow_tag_count.json LDA-analysis LDA_input/ stackoverfow_raw_dataset.csv LDA_output/ Mallet/ output_csv/ docs-in-topics.csv topic-words.csv topics-in-docs.csv topics-metadata.csv output_html/ all_topics.html Docs/ Topics/ Background-Study.pdf Similar-Tools.md ``` ## Contents of the Replication Package Contains the data processed using the tool for the study. - **Data/** - `stackoverfow_questions_with_answers_by_tags.csv` - all StackOverflow questions used in the study as stored in Makar - `apache_mailing_list.csv` - statistically significant sample of `mailing_lists_ASF_@dev_@users_1.csv` and `mailing_lists_ASF_@dev_@users_2.csv` used in the study - `mailing_lists_ASF_@dev_@users_1.csv` - mailing list data used in the study as stored in Makar (part 1) - `mailing_lists_ASF_@dev_@users_2.csv` - mailing list data used in the study as stored in Makar (part 2) - `quora.csv` - all quora questions used in the study as stored in Makar - `sample_stackoverfow_questions_with_answers_by_tags` - statistically significant sample of `stackoverfow_questions_with_answers_by_tags.csv` used in the study - **Schemas/** - `apache_mailing_lists.json` - data schema used in Makar to store mailing list data - `quora.json` - data schema used in Makar to store quora data - `stackoverfow_questions_answers_by_tag.json` - data schema used in Makar to store StackOverflow questions data - `stackoverfow_tag_count.json` - data schema used in Makar to lookup number of questions per tag available in StackOverflow - **LDA_input/** - input data used for LDA analysis - `stackoverfow_raw_dataset.csv` - stackoverflow questions used to perform LDA analysis - **LDA_output/** - **Mallet/** - contains the LDA output generated by MALLET tool - **output_csv/** - `docs-in-topics.csv` - documents per topic - `topic-words.csv` - most relevant topic words - `topics-in-docs.csv` - topic probability per document - `topics-metadata.csv` - metadata per document and topic probability - **output_html/** - Browsable results of mallet output - `all_topics.html` - `Docs/` - `Topics/` - **Background-Study.pdf** - Literature survey of challenges researchers face in mining the studies that investigate developer information needs during program comprehension tasks. - **Similar-Tools.md** - Links of the compared similar state-of-art tools.