Skip to the content.

Detecting False-passing Products and Mitigating their Impact on Variability Fault Localization in Software Product Lines

In a Software Product Line (SPL) system, variability bugs can cause failures in certain products (buggy products), not in the others. In practice, variability bugs are not always exposed, and buggy products can still pass all the tests due to their ineffective test suites (so-called false-passing products). The misleading indications by those false-passing products’ test results can negatively impact variability fault localization performance. In this paper, we introduce CLAP, a novel approach to detect false-passing products in SPL systems failed by variability bugs. Our key idea is to collect failure indications in failing products based on their implementation and test quality. For a passing product, we evaluate these indications, and the stronger indications, the more likely the product is false-passing. The possibility of being false-passing of the product is evaluated based on if it is implemented by a large number of the statements which are highly suspicious in the failing products, and if its test suite is in lower quality compared to the failing products’ test suites. We conducted several experiments to evaluate our false-passing product detection approach on a large benchmark of 14,191 false-passing products and 22,555 true-passing products in 823 buggy versions of the existing SPL systems. The experimental results show that CLAP can effectively detect false-passing and true-passing products with the average accuracy of +90%. Especially, the precision of false-passing product detection by CLAP is up to 96%. This means, among 10 products predicted as false-passing products, more than 9 products are precisely detected. Furthermore, we propose two simple and effective methods to mitigate the negative impact of false-passing products on variability fault localization. These methods can improve the performance of the state-of-the-art variability fault localization techniques by up to 34%.

Dataset overview

System #Versions #Fs #FPs #TPs
BankAccountTP 187 2,055 2,328 1,975
Elevator 41 217 326 195
Email 69 553 587 723
ExamDB 77 201 127 288
GPL 355 6,612 9,995 18,538
ZipMe 94 686 828 836
Total 823 10,433 14,191 22,555

Note that:

#Versions: the number of buggy versions
#Fs: the number of failing products
#FPs: the number of false-passing products
#TPs: the number of true-passing products

Dataset can be found here

Empirical results

  1. Accuracy of false-passing product detection model
Classifer SVM KNN Naive Bayes Logistic Regression Decision Tree
Lable TP FP TP FP TP FP TP FP TP FP
Precision 88.16% 94.19% 90.41% 89.30% 88.36% 90.95% 88.75% 92.30% 90.03% 92.99%
Recall 97.09% 78.36% 93.97% 83.46% 95.25% 79.18% 95.99% 79.81% 96.26% 82.30%
F1-Score 92.41% 85.55% 92.16% 86.28% 91.68% 84.66% 92.23% 85.60% 93.04% 87.32%
Accuracy 90.04% 90.02% 89.21% 89.91% 91.01%
  1. Mitigating the false-passing products’ negative impact on fault localization performance
Metric VARCOP SBFL
Original Removing FPs Adding tests for FPs Original Removing FPs Adding tests for FPs
Tarantula 3.35 2.52 2.22 5.10 4.75 4.53
Ochiai 2.39 2.23 2.28 3.00 2.77 2.86
Op2 4.31 4.18 4.33 7.04 6.84 6.96
Barinel 3.69 2.83 2.91 5.10 4.74 4.53
Dstar 2.55 2.14 2.19 3.06 2.91 2.98
  1. Impact of different experimental scenarios
Edition System-based Version-based Product-based Within-system
Lable TP FP TP FP TP FP TP FP
Precision 87.51% 89.42% 88.16% 94.19% 87.53% 94.27% 88.73% 96.12%
Recall 92.16% 85.83% 97.09% 78.36% 96.97% 78.26% 96.29% 87.02%
F1-Score 89.15% 86.83% 92.41% 85.55% 92.01% 85.52% 92.21% 91.16%
Accuracy 88.44% 90.04% 89.70% 92.29%
  1. Impact of different training data sizes (the number of systems)
#Systems 1 2 3 4 5
Lable TP FP TP FP TP FP TP FP TP FP
Precision 92.02% 74.90% 96.82% 68.18% 95.37% 77.03% 90.18% 79.48% 91.19% 82.50%
Recall 81.88% 93.19% 63.07% 97.44% 76.90% 95.40% 81.33% 89.10% 84.51% 89.95%
F1-Score 86.65% 83.05% 76.38% 80.23% 85.14% 85.24% 85.53% 84.02% 87.72% 86.06%
Accuracy 82.60% 78.47% 85.19% 84.81% 86.95%
  1. Impact of CLAP’s attributes on the false-passing product detection performance
Attributes Product Implementation Test Adequacy Test Effectiveness All
Lable TP FP TP FP TP FP TP FP
Precision 84.69% 74.80% 80.45% 99.07% 78.74% 88.50% 87.47% 88.29%
Recall 87.71% 69.69% 99.74% 53.69% 96.59% 50.18% 94.66% 74.82%
F1-Score 86.17% 72.15% 89.06% 69.64% 86.76% 64.05% 90.02% 81.00%
Accuracy 81.52% 83.92% 80.64% 87.71%