{"url": "http://mondego.ics.uci.edu/projects/dejavu/", "content": "\n\n\n
\n \n \n \n\n \n \n \n \n \n\nCode cloning is serious and ubiquitous. Are you affected?
\nThis work analyzes a corpus of 4.5 million million non-fork projects hosted on GitHub representing over 428 million files written in Java, C++, Python, and JavaScript.
\nWe found that this corpus has a mere 85 million unique files. In other words, 70% of the code on GitHub consists of clones of previously created files.
\nWe have created a mapping between file clones in four languages: Java, C++, JavaScript and Python. This is useful systems built on open source software as well as for researchers interested in analyzing large code bases.
\nIn this website you can find how to access the code clone mapping, through a web service or direct access to a database, how to download the clone mapping and how to access the source code used to create it.
\n\nD\u00e9j\u00e0Vu Web App
\nWe provide a web-service for clones information retrieval and easy source code/projects/datasets analysis.
\nThis service is ongoing work and depends on community feedback. We are happy to implement functionalities you require.
\n\nAccess to the Code Clone Mapping
\nYou can directly download the data for each language individually:
\n\nIf you want access to the dumps through a different process we will do our best to suit your needs (come visit us and bring a hard drive!). Contact us, we like to talk.
\n\nSoftware used to create the Clone Mapping
\nThe software used to create this mapping can be found on GitHub here and here.
\nWe also created an artifact in the form of a VirtualBox virtual machine, which provides a quick access to the pipeline through a guided tutorial, and can be found here. The password is p. 8.7Gb
\n\nYou can find the accepted paper here.
\nThis website supports a research project about code cloning on GitHub, accepted for publication at OOPSLA'17 (Distinguished Award at OOPSLA).
\n\n\t\t\t\t\t\nToday, \u201cThe Morning Paper\u201d looks at \u201cD\u00e9j\u00e0vu: A Map of Code Duplicates on Github,\u201d from OOPSLA \u201917, which analyzes "482 million files written in Java, C++, Python, and JavaScript. W.\u201d
— Official ACM (@TheOfficialACM) 20 de novembro de 2017
Read in \u201cThe Morning Paper\u201d: https://t.co/VG1lWDVt8D
Read the paper: https://t.co/4GCauHzvmG pic.twitter.com/Quk6LCmVqX