Software defect prediction is an active research area in software engineering. Accurate prediction of software defects assists software engineers in guiding software quality assurance activities to maximize utilization of testing resources, reduce maintenance cost and deliver quality software products. In the machine learning research, ensemble learning has been proven to improve the prediction performance over individual machine learning models. Recently, many boosting ensembles have been proposed in the literature, and their prediction capabilities were not investigated in defect prediction. In this paper, we will empirically investigate the prediction performance of Tree-based boosting ensembles in defect prediction, and they are: Ada boost, Random Forest, Extra Trees, Gradient Boosting, Hist Gradient Boosting, XGBoost and CatBoost. The study utilized 11 publicly available MDP NASA software defect datasets. Empirical results indicate the superiority of Random Forest and Extra Trees ensembles over other boosting ensembles. However, none of the boosting ensembles was significantly lower than individual decision trees in prediction performance. Finally, Ada boost ensemble was the worst performing ensemble among other ensembles.
Thu 5 NovDisplayed time zone: (UTC) Coordinated Universal Time change
16:00 - 16:40 | |||
16:00 20mTalk | Software Defect Prediction using Tree-Based Ensembles PROMISE 2020 | ||
16:20 20mTalk | Improving Real-World Vulnerability Characterization with Vulnerable Slices PROMISE 2020 Solmaz Salimi Sharif University of Technology, Maryam Ebrahimzadeh Sharif University of Technology, Mehdi Kharrazi Sharif University of Technology |