18th ESA Workshop on Avionics, Data, Control and Software Systems ~ ADCSS2024

Name: 18th ESA Workshop on Avionics, Data, Control and Software Systems ~ ADCSS2024
Start: 2024-10-22T08:30:00+02:00
End: 2024-10-24T18:00:00+02:00
Location: ESA/ESTEC

22–24 Oct 2024

ESA/ESTEC

Europe/Amsterdam timezone

For support please contact:

asg@esa.int

Artificial Intelligence applied to code repair after code static analysis verification

23 Oct 2024, 10:40

20m

Newton Conference Center (ESA/ESTEC)

Newton Conference Center

ESA/ESTEC

Software

Ainhoa López (TAS) David de Fitero (UAH)

In this work, we present a Large Language Model (LLM) designed for Automatic Program Repair (APR), using both source code and analysis results from the SonarQube static code analyzer. The Llama 3 (8B) was selected as the foundational model, which was fine-tuned with multiple datasets, including the CommitPack dataset, a SonarQube-generated dataset, and a synthetic dataset created with Llama 3 (70B). The fine-tuning process used techniques such as QLoRA and NEFTune to optimize training performance and reduce overfitting. Quantization was also performed using the EETQ method to reduce memory requirements and improve inference efficiency. In addition, a targeted context-based refinement approach was applied to enable the model to efficiently address specific SonarQube rules by providing precise context for each detected error. This approach provides a robust framework for automatic code repair, allowing the automatic correction of a wide range of code errors detected by Sonarqube.

The model was trained using datasets created from real-world C language projects, adhering to the MISRA C 2012 standard, which is crucial for ensuring safety and quality in software development in this language. To generate these datasets, SonarQube was used to perform static code analysis, identifying specific errors in the projects. Subsequently, manual corrections made by project's code developers were incorporated, allowing the model to learn how to apply MISRA rules more accurately and effectively. This training process was applied to significant projects such as CO2M, a key part of the COPERNICUS mission, with a particular focus on the ICU HDSW product and boot software—critical areas for system operation. Detailed manual evaluations of the results ensured that the model not only automatically corrected errors but also proved adaptable and reliable in various real-world scenarios, thus enhancing overall software quality and compliance with best coding practices. A GitLab CI/CD extension was created to run the repair pipelines in our CI/CD environment, yielding a code report for new code changes.

5. 20241021-ADCSS TASE UAH AI for code repair.pdf

18th ESA Workshop on Avionics, Data, Control and Software Systems ~ ADCSS2024

For support please contact:

Artificial Intelligence applied to code repair after code static analysis verification

Newton Conference Center

ESA/ESTEC

Speakers

Description

Presentation materials

Choose timezone

18th ESA Workshop on Avionics, Data, Control and Software Systems ~ ADCSS2024

For support please contact:

Speakers

Description

Presentation materials