Speakers
Description
In this work, we present a Large Language Model (LLM) designed for Automatic Program Repair (APR), using both source code and analysis results from the SonarQube static code analyzer. The Llama 3 (8B) was selected as the foundational model, which was fine-tuned with multiple datasets, including the CommitPack dataset, a SonarQube-generated dataset, and a synthetic dataset created with Llama 3 (70B). The fine-tuning process used techniques such as QLoRA and NEFTune to optimize training performance and reduce overfitting. Quantization was also performed using the EETQ method to reduce memory requirements and improve inference efficiency. In addition, a targeted context-based refinement approach was applied to enable the model to efficiently address specific SonarQube rules by providing precise context for each detected error. This approach provides a robust framework for automatic code repair, allowing the automatic correction of a wide range of code errors detected by Sonarqube.
The model was trained using datasets created from real-world C language projects, adhering to the MISRA C 2012 standard, which is crucial for ensuring safety and quality in software development in this language. To generate these datasets, SonarQube was used to perform static code analysis, identifying specific errors in the projects. Subsequently, manual corrections made by project's code developers were incorporated, allowing the model to learn how to apply MISRA rules more accurately and effectively. This training process was applied to significant projects such as CO2M, a key part of the COPERNICUS mission, with a particular focus on the ICU HDSW product and boot software—critical areas for system operation. Detailed manual evaluations of the results ensured that the model not only automatically corrected errors but also proved adaptable and reliable in various real-world scenarios, thus enhancing overall software quality and compliance with best coding practices. A GitLab CI/CD extension was created to run the repair pipelines in our CI/CD environment, yielding a code report for new code changes.