RepairMaster: Enhancing LLM-Based Automated Vulnerability Repair through Cross-Fragment Information Fusion, Structure-Aware Fine-Tuning, and Bimodal Semantic Retrieval

Yang  Li; Qin  Luo; Zhen  Zhang; Hao  Wu; Mei  Chen

doi:10.63646/jiic.2025.050301

Open Access PDF

Published 2025-09-30

Yang Li

School of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China

Qin Luo*

School of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
qinluo@cqut.edu.cn

Zhen Zhang

School of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China

Hao Wu

School of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China

Mei Chen

Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China

Abstract

Software vulnerabilities in C/C++ codebases pose critical security threats, and the manual effort required for vulnerability remediation creates a productivity bottleneck in secure software development lifecycles. Large Language Model (LLM)-based Automated Program Repair (APR) holds substantial promise for accelerating vulnerability remediation, but existing approaches face two fundamental limitations that restrict their practical effectiveness in open-world deployment scenarios: (1) the inherent complexity of real-world vulnerability logic, which involves multi-fragment interdependencies across function boundaries and complex control flow patterns that exceed the contextual reasoning capacity of naive code-feeding approaches; and (2) the underutilization of the rich historical patch knowledge accumulated in vulnerability databases, which contains directly relevant repair strategies that could substantially guide LLM generation but requires sophisticated retrieval to access effectively. To address these challenges, this paper proposes RepairMaster, a comprehensive LLM-based vulnerability repair framework integrating three complementary innovations. The Cross-Fragment Information Fusion (CFIF) module enables the LLM to reason across multiple related code fragments---callee functions, global variable definitions, type declarations---that provide essential context for understanding the vulnerability root cause. The Structure-Aware Fine-Tuning (S-AST) mechanism incorporates simplified Abstract Syntax Tree, Control Flow Graph, and Program Dependence Graph structural representations into the fine-tuning objective, enabling the model to learn repair patterns at the code structure level beyond token sequences. The Bimodal Semantic Retrieval Enhancement (BSRE) module retrieves relevant historical patches using joint code embedding and natural language description similarity, providing the LLM with contextually matched repair examples from a database of 5,800+ vulnerable C/C++ functions from 1,700 real-world projects. Evaluation on the benchmark dataset demonstrates EM improvement from 20.00% to 31.76%, BLEU from 25.70% to 29.12%, and CodeBLEU from 39.40% to 43.68% compared to the best prior methods. Validation on real CVE vulnerabilities achieves CodeBLEU = 28.74%, confirming practical applicability.

Keywords: automated program repair; vulnerability repair; large language models; structure-aware fine-tuning; bimodal retrieval; code security

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Li, Y., Luo, Q., Zhang, Z. ., Wu, H., & Chen, M. (2025). RepairMaster: Enhancing LLM-Based Automated Vulnerability Repair through Cross-Fragment Information Fusion, Structure-Aware Fine-Tuning, and Bimodal Semantic Retrieval. Journal of Intelligent Industrial Convergence, 5(3), 1-10. https://doi.org/10.63646/jiic.2025.050301

Download Citation

Article sidebar

Main article

Abstract

Article details

How to Cite