Precise Learn-to-Rank Fault Localization Using Dynamic and Static Features of Target Programs
Finding the root cause of a bug requires a significant effort from developers. Automated fault localization techniques seek to reduce this cost by computing the suspiciousness scores (i.e., the likelihood of program entities being faulty). Existing techniques have been developed by utilizing input features of specific types for the computation of suspiciousness scores, such as program spectrum or mutation analysis results. This article presents a novel learn-to-rank fault localization technique called PRecise machINe-learning-based fault loCalization tEchnique (PRINCE). PRINCE uses genetic programming (GP) to combine multiple sets of localization input features that have been studied separately until now. For dynamic features, PRINCE encompasses both Spectrum Based Fault Localization (SBFL) and Mutation Based Fault Localization (MBFL) techniques. It also uses static features, such as dependency information and structural complexity of program entities. All such information is used by GP to train a ranking model for fault localization. The empirical evaluation on 65 real-world faults from CoREBench, 84 artificial faults from SIR, and 310 real-world faults from Defects4J shows that PRINCE outperforms the state-of-the-art SBFL, MBFL, and learn-to-rank techniques significantly. PRINCE localizes a fault after reviewing 2.4% of the executed statements on average (4.2 and 3.0 times more precise than the best of the compared SBFL and MBFL techniques, respectively). Also, PRINCE ranks 52.9% of the target faults within the top ten suspicious statements.