Search_Engine/test/mondego_ics_uci_edu/8cf924a8684ebbb31a509f0359dcd92435131d814ed0ffe5f2b334dd72646d21.json
2022-05-27 06:29:48 -07:00

1 line
11 KiB
JSON

{"url": "http://mondego.ics.uci.edu/projects/dejavu/", "content": "<!DOCTYPE html>\n<html lang=\"en\">\n\n<head>\n <!-- Global site tag (gtag.js) - Google Analytics -->\n <script async src=\"https://www.googletagmanager.com/gtag/js?id=UA-109926351-1\"></script>\n <script>\n window.dataLayer = window.dataLayer || [];\n function gtag(){dataLayer.push(arguments);}\n gtag('js', new Date());\n\n gtag('config', 'UA-109926351-1');\n </script>\n\n <meta charset=\"utf-8\">\n <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n <meta name=\"description\" content=\"\">\n <meta name=\"author\" content=\"\">\n\n <title>D\u00e9j\u00e0Vu</title>\n\n <!-- Bootstrap Core CSS -->\n <link href=\"css/bootstrap.min.css\" rel=\"stylesheet\">\n\n <!-- Custom CSS -->\n <link href=\"css/blog-post.css\" rel=\"stylesheet\">\n\n</head>\n\n<body>\n\n <!-- Navigation -->\n <nav class=\"navbar navbar-inverse navbar-fixed-top\" role=\"navigation\">\n <div class=\"container\">\n <!-- Brand and toggle get grouped for better mobile display\n <div class=\"navbar-header\">\n <button type=\"button\" class=\"navbar-toggle\" data-toggle=\"collapse\" data-target=\"#bs-example-navbar-collapse-1\">\n <span class=\"sr-only\">Toggle navigation</span>\n <span class=\"icon-bar\"></span>\n <span class=\"icon-bar\"></span>\n <span class=\"icon-bar\"></span>\n </button>\n <a class=\"navbar-brand\" href=\"#\">Start Bootstrap</a>\n </div> -->\n <!-- Collect the nav links, forms, and other content for toggling\n <div class=\"collapse navbar-collapse\" id=\"bs-example-navbar-collapse-1\">\n <ul class=\"nav navbar-nav\">\n <li>\n <a href=\"#\">About</a>\n </li>\n <li>\n <a href=\"#\">Services</a>\n </li>\n <li>\n <a href=\"#\">Contact</a>\n </li>\n </ul>\n </div> -->\n <!-- /.navbar-collapse -->\n </div>\n <!-- /.container -->\n </nav>\n\n <!-- Page Content -->\n <div class=\"container\">\n\n <div class=\"row\">\n\n <!-- Blog Post Content Column -->\n <div class=\"col-lg-8\">\n\n <!-- Blog Post -->\n\n <!-- Title -->\n <h1>D\u00e9j\u00e0Vu: A Map of Code Duplicates on GitHub</h1>\n\n <!-- Author \n <p class=\"lead\">\n by <a href=\"#\">A group of good people</a>\n </p>-->\n\n <!-- <hr> -->\n\n <!-- Date/Time \n <p><span class=\"glyphicon glyphicon-time\"></span> Posted on August 24, 2013 at 9:00 PM</p>\n -->\n <hr>\n\n <!-- Preview Image -->\n <img class=\"img-responsive\" src=\"image.jpg\" alt=\"Software\" style=\"width:900px;height:300px;\">\n\n <hr>\n\n <!-- Post Content -->\n <p class=\"lead\">Code cloning is serious and ubiquitous. Are you affected?</p>\n <p>This work analyzes a corpus of 4.5 million million non-fork projects hosted on GitHub representing over 428 million files written in Java, C++, Python, and JavaScript.</p>\n <p>We found that this corpus has a mere 85 million unique files. In other words, 70% of the code on GitHub consists of clones of previously created files.</p>\n <p>We have created a mapping between file clones in four languages: Java, C++, JavaScript and Python. This is useful systems built on open source software as well as for researchers interested in analyzing large code bases.</p>\n <p>In this website you can find how to access the code clone mapping, through a web service or direct access to a database, how to download the clone mapping and how to access the source code used to create it.</p>\n\n <hr>\n\n <p class=\"lead\">D\u00e9j\u00e0Vu Web App</p>\n <p>We provide a <a href=\"http://dejavu.ics.uci.edu\" target=\"_blank\">web-service</a> for clones information retrieval and easy source code/projects/datasets analysis.</p>\n <p>This service is ongoing work and depends on community feedback. We are happy to implement functionalities you require.</p>\n\n <hr>\n\n <p class=\"lead\">Access to the Code Clone Mapping</p>\n <p>You can directly download the data for each language individually:</p>\n\n <ul>\n <li> <a href=\"raw_data/java_db_clones_dump.tgz\"> Java download </a> <strong><font color=\"red\"> 6.3Gb </font></strong></li>\n <li> <a href=\"raw_data/js_db_clones_dump.tgz\"> JavaScript download </a> <strong><font color=\"red\"> 54Gb </font></strong></li>\n <li> <a href=\"raw_data/python_db_clones_dump.tgz\"> Python download </a> <strong><font color=\"red\"> 2.1Gb </font></strong></li>\n <li> <a href=\"raw_data/cpp_db_clones_dump.tgz\"> C++ download </a> <strong><font color=\"red\"> 3.7Gb </font></strong></li>\n </ul>\n\n <p>If you want access to the dumps through a different process we will do our best to suit your needs (come visit us and bring a hard drive!). Contact us, we like to talk.</font></strong></p>\n\n <hr>\n\n <!-- Blog Comments -->\n <p class=\"lead\">Software used to create the Clone Mapping</p>\n <p>The software used to create this mapping can be found on GitHub <a href=\"https://github.com/Mondego/SourcererCC\" target=\"_blank\">here</a> and <a href=\"https://github.com/PRL-PRG/dejavu-artifact\" target=\"_blank\">here</a>.</p>\n <p>We also created an artifact in the form of a <a href=\"https://www.virtualbox.org\" target=\"_blank\">VirtualBox</a> virtual machine, which provides a quick access to the pipeline through a guided tutorial, and can be found <a href=\"OOPSLA_final.ova\" download>here</a>. The password is <strong>p</strong>. <strong><font color=\"red\"> 8.7Gb </font></strong></p>\n\n <hr>\n\n </div>\n\n <!-- Blog Sidebar Widgets Column -->\n <div class=\"col-md-4\">\n\n\n <!-- Blog Categories Well -->\n <div class=\"well\">\n <h4>Teams</h4>\n <div class=\"row\">\n <div class=\"col-lg-12\">\n <ul class=\"list-unstyled\">\n <li><a href=\"http://mondego.ics.uci.edu/\" target=\"_blank\">Mondego Group @ UC Irvine</a>\n </li>\n <li><a href=\"https://prl-prg.github.io/\" target=\"_blank\">PRL-PRG @ CTU in Prague</a>\n </li>\n <!--\n <li><a href=\"#\">Category Name</a>\n </li>\n <li><a href=\"#\">Category Name</a>\n </li>\n -->\n </ul>\n </div>\n\n </div>\n <!-- /.row -->\n </div>\n\n <!-- Side Widget Well -->\n <div class=\"well\">\n <h4>Quick Information</h4>\n <p>You can find the accepted paper <a href=\"https://dl.acm.org/citation.cfm?id=3133908\" target=\"_blank\">here</a>.</p>\n <p>This website supports a research project about code cloning on GitHub, accepted for publication at <a href=\"https://2017.splashcon.org/event/splash-2017-oopsla-d-j-vu-a-map-of-code-duplicates-on-github\" target=\"_blank\">OOPSLA'17</a> (Distinguished Award at OOPSLA).</p>\n </div>\n\n <div class=\"well\">\n\t\t\t\t\t<iframe src=\"https://www.youtube.com/embed/4M-ASEpVOaY\" frameborder=\"0\" allowfullscreen></iframe>\n </div>\n\n\t\t\t\t<div class=\"well\">\n <h4>As seen on the press:</h4>\n\t\t\t\t\t<ul>\n \t\t\t\t\t\t<li><a href=\"https://blog.acolyer.org/2017/11/20/dejavu-a-map-of-code-duplicates-on-github/\" target=\"_blank\">the morning paper</a></li>\n \t\t\t\t\t\t<li><a href=\"https://www.bleepingcomputer.com/news/software/82-percent-of-the-code-on-github-consists-of-clones-of-previously-created-files/\" target=\"_blank\">BLEEPINGCOMPUTER</a></li>\n \t\t\t\t\t\t<li><a href=\"https://www.theregister.co.uk/2017/11/21/github_duplicate_code/\" target=\"_blank\">The Register</a></li>\n\t\t\t\t\t\t<li><a href=\"https://developers.slashdot.org/story/17/11/23/2352233/more-than-half-of-github-is-duplicate-code-researchers-find/\" target=\"_blank\">Slashdot</a></li>\n\t\t\t\t\t\t<li><a href=\"https://www.developpez.com/actu/175363/GitHub-des-chercheurs-estiment-que-plus-de-la-moitie-des-codes-ecrits-en-Java-Python-C-Cplusplus-et-JavaScript-sont-dupliques/\" target=\"_blank\">Developpez</a> (in French)</li>\n\t\t\t\t\t\t<li><a href=\"https://www.opennet.ru/opennews/art.shtml?num=47596\" target=\"_blank\">OpenNET</a> (in Russian)</li>\n\t\t\t\t\t\t<li><a href=\"https://www.toutiao.com/a6491879685222302221/\" target=\"_blank\">Toutiao</a> (in Chinese)</li>\n\t\t\t\t\t\t<li><a href=\"http://www.sohu.com/a/206363660_114760\" target=\"_blank\">Sohu</a> (in Chinese)</li>\n\t\t\t\t\t</ul>\n </div>\n\n\t\t\t\t<div class=\"well\">\n\t\t\t\t\t<blockquote class=\"twitter-tweet\" data-lang=\"pt\"><p lang=\"en\" dir=\"ltr\">Today, \u201cThe Morning Paper\u201d looks at \u201cD\u00e9j\u00e0vu: A Map of Code Duplicates on Github,\u201d from OOPSLA \u201917, which analyzes &quot;482 million files written in Java, C++, Python, and JavaScript. W.\u201d<br><br>Read in \u201cThe Morning Paper\u201d: <a href=\"https://t.co/VG1lWDVt8D\">https://t.co/VG1lWDVt8D</a><br><br>Read the paper: <a href=\"https://t.co/4GCauHzvmG\">https://t.co/4GCauHzvmG</a> <a href=\"https://t.co/Quk6LCmVqX\">pic.twitter.com/Quk6LCmVqX</a></p>&mdash; Official ACM (@TheOfficialACM) <a href=\"https://twitter.com/TheOfficialACM/status/932666397804593152?ref_src=twsrc%5Etfw\">20 de novembro de 2017</a></blockquote>\n\t\t\t\t\t<script async src=\"https://platform.twitter.com/widgets.js\" charset=\"utf-8\"></script>\n </div>\n\n </div>\n\n </div>\n <!-- /.row -->\n\n <!-- Footer -->\n <footer>\n <div class=\"row\">\n <div class=\"col-lg-12\">\n <p>Copyright &copy; UCI, PRL-PRG, 2017</p>\n </div>\n </div>\n <!-- /.row -->\n </footer>\n\n </div>\n\n</body>\n\n</html>\n\n", "encoding": "utf-8"}