Detecting False-passing Products and Mitigating their Impact on Variability Fault Localization in Software Product Lines
In a Software Product Line (SPL) system, variability bugs can cause failures in certain products (buggy products), not in the others. In practice, variability bugs are not always exposed, and buggy products can still pass all the tests due to their ineffective test suites (so-called false-passing products). The misleading indications by those false-passing products’ test results can negatively impact variability fault localization performance. In this paper, we introduce CLAP, a novel approach to detect false-passing products in SPL systems failed by variability bugs. Our key idea is to collect failure indications in failing products based on their implementation and test quality. For a passing product, we evaluate these indications, and the stronger indications, the more likely the product is false-passing. The possibility of being false-passing of the product is evaluated based on if it is implemented by a large number of the statements which are highly suspicious in the failing products, and if its test suite is in lower quality compared to the failing products’ test suites. We conducted several experiments to evaluate our false-passing product detection approach on a large benchmark of 14,191 false-passing products and 22,555 true-passing products in 823 buggy versions of the existing SPL systems. The experimental results show that CLAP can effectively detect false-passing and true-passing products with the average accuracy of +90%. Especially, the precision of false-passing product detection by CLAP is up to 96%. This means, among 10 products predicted as false-passing products, more than 9 products are precisely detected. Furthermore, we propose two simple and effective methods to mitigate the negative impact of false-passing products on variability fault localization. These methods can improve the performance of the state-of-the-art variability fault localization techniques by up to 34%.
Dataset overview
System |
#Versions |
#Fs |
#FPs |
#TPs |
BankAccountTP |
187 |
2,055 |
2,328 |
1,975 |
Elevator |
41 |
217 |
326 |
195 |
Email |
69 |
553 |
587 |
723 |
ExamDB |
77 |
201 |
127 |
288 |
GPL |
355 |
6,612 |
9,995 |
18,538 |
ZipMe |
94 |
686 |
828 |
836 |
Total |
823 |
10,433 |
14,191 |
22,555 |
Note that:
- #Versions: the number of buggy versions
- #Fs: the number of failing products
- #FPs: the number of false-passing products
- #TPs: the number of true-passing products
Dataset can be found here
Empirical results
- Accuracy of false-passing product detection model
Classifer |
SVM |
KNN |
Naive Bayes |
Logistic Regression |
Decision Tree |
Lable |
TP |
FP |
TP |
FP |
TP |
FP |
TP |
FP |
TP |
FP |
Precision |
88.16% |
94.19% |
90.41% |
89.30% |
88.36% |
90.95% |
88.75% |
92.30% |
90.03% |
92.99% |
Recall |
97.09% |
78.36% |
93.97% |
83.46% |
95.25% |
79.18% |
95.99% |
79.81% |
96.26% |
82.30% |
F1-Score |
92.41% |
85.55% |
92.16% |
86.28% |
91.68% |
84.66% |
92.23% |
85.60% |
93.04% |
87.32% |
Accuracy |
90.04% |
90.02% |
89.21% |
89.91% |
91.01% |
- Mitigating the false-passing products’ negative impact on fault localization performance
Metric |
VARCOP |
SBFL |
Original |
Removing FPs |
Adding tests for FPs |
Original |
Removing FPs |
Adding tests for FPs |
Tarantula |
3.35 |
2.52 |
2.22 |
5.10 |
4.75 |
4.53 |
Ochiai |
2.39 |
2.23 |
2.28 |
3.00 |
2.77 |
2.86 |
Op2 |
4.31 |
4.18 |
4.33 |
7.04 |
6.84 |
6.96 |
Barinel |
3.69 |
2.83 |
2.91 |
5.10 |
4.74 |
4.53 |
Dstar |
2.55 |
2.14 |
2.19 |
3.06 |
2.91 |
2.98 |
- Impact of different experimental scenarios
Edition |
System-based |
Version-based |
Product-based |
Within-system |
Lable |
TP |
FP |
TP |
FP |
TP |
FP |
TP |
FP |
Precision |
87.51% |
89.42% |
88.16% |
94.19% |
87.53% |
94.27% |
88.73% |
96.12% |
Recall |
92.16% |
85.83% |
97.09% |
78.36% |
96.97% |
78.26% |
96.29% |
87.02% |
F1-Score |
89.15% |
86.83% |
92.41% |
85.55% |
92.01% |
85.52% |
92.21% |
91.16% |
Accuracy |
88.44% |
90.04% |
89.70% |
92.29% |
- Impact of different training data sizes (the number of systems)
#Systems |
1 |
2 |
3 |
4 |
5 |
Lable |
TP |
FP |
TP |
FP |
TP |
FP |
TP |
FP |
TP |
FP |
Precision |
92.02% |
74.90% |
96.82% |
68.18% |
95.37% |
77.03% |
90.18% |
79.48% |
91.19% |
82.50% |
Recall |
81.88% |
93.19% |
63.07% |
97.44% |
76.90% |
95.40% |
81.33% |
89.10% |
84.51% |
89.95% |
F1-Score |
86.65% |
83.05% |
76.38% |
80.23% |
85.14% |
85.24% |
85.53% |
84.02% |
87.72% |
86.06% |
Accuracy |
82.60% |
78.47% |
85.19% |
84.81% |
86.95% |
- Impact of CLAP’s attributes on the false-passing product detection performance
Attributes |
Product Implementation |
Test Adequacy |
Test Effectiveness |
All |
Lable |
TP |
FP |
TP |
FP |
TP |
FP |
TP |
FP |
Precision |
84.69% |
74.80% |
80.45% |
99.07% |
78.74% |
88.50% |
87.47% |
88.29% |
Recall |
87.71% |
69.69% |
99.74% |
53.69% |
96.59% |
50.18% |
94.66% |
74.82% |
F1-Score |
86.17% |
72.15% |
89.06% |
69.64% |
86.76% |
64.05% |
90.02% |
81.00% |
Accuracy |
81.52% |
83.92% |
80.64% |
87.71% |