SonarCloud Vulnerable Code Prospector for C (SVCP4C) is a tool that aims to collect vulnerable source code (written in C) from open-source repositories linked to SonarCloud by using its REST API. The output consists of a set of tagged files suitable for extracting features and creating training datasets for Machine Learning algorithms.
Vulnerabilities are listed in each file using comments appended at the end of each file. Such comments follow the format /// starting_line,starting_offset;ending_line,ending_offset
(with offset being the column). For example:
/// ###BEGIN_VULNERABLE_LINES### /// 1126,3;1126,9 /// 1153,9;1153,15 /// 1341,9;1341,15 /// 1734,6;1734,12
As an example of its usage, please check this repository.
Built With
Reference
To cite this work, please use the following BibTeX entry:
@ARTICLE{Raducu2020, Title = {Collecting Vulnerable Source Code from Open-Source Repositories for Dataset Generation}, Author = {Raducu, Razvan and Esteban, Gonzalo and Rodr{\'i}guez Lera, Francisco Javier and Fern{\'a}ndez, Camino}, Journal = {Applied Sciences}, Volume = {10}, Number = {4}, Pages = {1270}, Year = {2020}, Publisher = {Multidisciplinary Digital Publishing Institute}, Doi = {https://doi.org/10.3390/app10041270}, }
License
This project is licensed under GNU GPLv3.
External links
- SonarCloud Vulnerable Code Prospector for C (SVCP4C), (2020), GitHub repository, https://github.com/uleroboticsgroup/SVCP4C
- Vulnerable Source Code Collected from Open Source Repositories for Dataset Generation, (2020), GitHub repository, https://github.com/uleroboticsgroup/SVCP4CDataset