Code Clone Detection

Project Description

\n\t\t\t\t\nGiven the availability of large-scale source-code repositories,\nthere have been a large number of applications for clone detection. \nUnfortunately, despite a decade of active\nresearch, there is a marked lack in clone detectors that scale\nto large software repositories. In particular for detecting near-miss clones\nwhere significant editing activities may take place in the\ncloned code.
\nWe present SourcererCC, a token-based clone detector\nthat targets the first three clone types, and exploits an index\nto achieve scalability to large inter-project repositories\nusing a standard workstation. SourcererCC uses an optimized\ninverted-index to quickly query the potential clones\nof a given code block. Filtering heuristics based on token ordering\nare used to significantly reduce the size of the index,\nthe number of code-block comparisons needed to detect the\nclones, as well as the number of required token-comparisons\nneeded to judge a potential clone.
\nWe evaluate the scalability, execution time, recall and precision\nof SourcererCC, and compare it to four publicly available\nand state-of-the-art tools. To measure recall, we use\ntwo recent benchmarks, (1) an exhaustive benchmark of real\nclones, BigCloneBench, and (2) a Mutation/Injection-based\nframework of thousands of fine-grained artificial clones. We\nfind SourcererCC has both high recall and precision, and is\nable to scale to a large inter-project repository (250MLOC)\nusing a standard workstation.\n\n\n\n

\t\t\n

Tool Download and Usage

\nIn order to run the tool please follow the steps below:\n

\nA. Generating the input file of the project for which you want to detect clones\n

Click here to download input generator for the code clone detector (ast.zip).
Unzip ast.zip and import the project ast in your eclipse workspace.
Run it as an \"Eclipse Application\". This should open another eclipse instance where you will import the projects for which you want to generate the input file.
After importing the project in the workspace of the new eclipse instance, click on the \"Sample Menu\" in the top menu bar and then click on the \"Sample command\" to run. This should generate the output (desired input file) in the path specified by variable \"outputdirPath\".
Please note that you will have to change the location of output directory on line 61 of SampleHandler.java.this.outputdirPath = \"/Users/vaibhavsaini/Documents/codetime/repo/ast/output/\"; to your desired output directory.
The generated input file name will be of the format: <ProjectName>-clone-INPUT.txt. For example, if your project name is jython, then the generated input file name should be jython-clone-INPUT.txt

\n
\nB. Running the clone detection tool on the generated input file\n

Click here to download the CloneDetector (tool.zip).
Unzip tool.zip and navigate to tool/ using terminal
Copy the input file generated above (<ProjectName>-clone-INPUT.txt) into input/dataset directory.
Open cd.sh, and assign <ProjectName> as value to the variable arrayname (line #5). For example, If your generated input file is jython-clone-INPUT.txt, line #5 should be arrayname=(jython)
Execute the command ./cd.sh

\n
\nC. Generated output\n

The generated output will be in the ./output folder.
Files with extension .txt will have the computed clones and the files with .csv extension will have the time taken to detect clones

\nD. Source Code
\nThe source code of SourcererCC can be found here on github.

\n\nE. SourcererCC-I
\nSourcererCC-I is an interactive version of the tool integrated with Eclipse IDE to help developers instantly find clones during software development\nand maintenance.
\n\nA short video of Sourcerer-I in action can be found here and link to install the Eclipse plug-in is available here.\n\n\n \n\n

Reviewer\n\t\t

True Positives\n\t\t

False Positives\n\t\t

Judge 1

Judge 2

Judge 3

SourcererCC: Scaling Type-3 Clone Detection to Large Software Repositories

Project Description

Tool Download and Usage

Precision data as reported in the paper

Effectiveness of Filtering Heuristics (Figure 1 in paper)