Detecting Code Clones in Code Snippets Using AI Techniques (Replication Package)
Main Author: | Anonymous |
---|---|
Format: | info software Journal |
Terbitan: |
, 2022
|
Subjects: | |
Online Access: |
https://zenodo.org/record/6385614 |
Daftar Isi:
- This dataset included Stack Overflow code snippets that are referenced in the GitHub projects. We created this code pair dataset by expanding GHCodeSnippetHistory dataset which contains GitHub commits with Stack Overflow post references. The correlated Stack Overflow code snippets for those GitHub commits are extracted from SOTorrent with the help of BigQuery. Our dataset consists of a total of 61,253 code pairs from Java, JavaScript, PHP, and Python. How to Use: 1. First install all the necessary dependencies: pip3 install -r Requirements.txt 2. Download the CodePairDataset.csv dataset and save it in the directory. This file contains the correlating Stack Overflow and GitHub code snippets after applying preprocessing. 3. Run ExtractFeatures.ipynb file to extract features and RunMLModels.ipynb to run models.