Abstract
Motivated by the amount of code that goes unidentified on the web, we introduce a practical method for algorithmically identifying the programming language of source code. Our work is based on supervised learning and intelligent statistical features. We also explored, but abandoned, a grammatical approach. In testing, our implementation greatly outperforms that of an existing tool that relies on a Bayesian classifier. Code is written in Python and available under an MIT license.
📄 Full Paper Available as PDF
This paper is available as a downloadable PDF.
📄 Download PDF
Comments (0)
No comments yet. Be the first to comment.