ARist

ARist: An Effective API Argument Recommendation Approach

Learning and remembering to use APIs are difficult. Several techniques have been proposed to assist developers in using APIs. Most existing techniques focus on recommending the right API methods to call, but very few techniques focus on recommending API arguments. In this paper, we propose ARist, a novel automated argument recommendation approach which suggests arguments by predicting developers’ expectations when they define and use API methods. To implement this idea in the recommendation process, ARist combines program analysis (PA), language models (LMs), and several features specialized for the recommendation task which consider the functionality of formal parameters and the positional information of code elements (e.g., variables or method calls) in the given context. In ARist, the LMs and the recommending features are used to suggest the promising candidates identified by PA. Meanwhile, PA navigates the LMs and the features working on the set of valid candidates which satisfy syntax, accessibility, and type-compatibility constraints defined by the programming language in use. Our empirical evaluation on a large dataset of real-world projects shows that ARist improves the state-of-the-art approach by 19% and 18% in top-1 precision and recall for recommending arguments of frequently-used libraries. For general argument recommendation task, i.e., recommending arguments for every method call, ARist outperforms the baseline approaches by up to 125% top-1 accuracy. Moreover, for newly-encountered projects, ARist achieves more than 60% top-3 accuracy when evaluating on a larger dataset. For working/maintaining projects, with a personalized LM to capture developers’ coding practice, ARist can productively rank the expected arguments at the top-1 position in 7/10 requests.

Data.

List of 1000 most starred project in the empirical study section here
Small corpus: Eclipse and Netbeans
Large corpus

Source code.

Experimental results

Statistics of the dataset

	Small corpus	Large corpus
#Projects	Eclipse & Netbeans	9,271
#Files	53,787	961,493
#LOCs	7,218,637	84,236,829
#AR requests	700,696	913,175

Accuracy Comparison (RQ1)

2.1 Performance of the AR approaches for the methods in the frequently-used libraries

Project		ARist		PARC		GPT-2		SLP
Project		Precision	Recall	Precision	Recall	Precision	Recall	Precision	Recall
Netbeans	Top-1	52.92%	51.67%	46.46%	44.86%	47.72%	46.63%	36.04%	36.04%
	Top-3	70.18%	68.28%	66.20%	66.75%	55.15%	53.90%	49.52%	49.52%
	Top-10	78.36%	76.15%	72.06%	69.57%	55.94%	54.67%	64.52%	64.52%
Eclipse	Top-1	56.66%	55.04%	47.65%	46.65%	61.37%	58.87%	26.24%	26.24%
	Top-3	67.88%	65.63%	65.05%	63.68%	68.85%	66.03%	37.00%	37.00%
	Top-10	73.14%	70.76%	72.26%	70.73%	69.75%	66.85%	54.39%	54.39%

2.2. Comparison in general AR task

Project		ARist	GPT-2	CodeT5	SLP
Netbeans	Top-1	65.15%	52.63%	59.97%	34.91%
	Top-3	78.16%	57.69%	67.16%	48.10%
	Top-5	81.10%	57.87%	67.57%	55.02%
	Top-10	83.53%	57.88%	67.60%	67.20%
	MRR	0.72	0.55	0.63	0.44
Eclipse	Top-1	64.19%	56.53%	61.20%	28.52%
	Top-3	76.29%	61.89%	67.21%	41.60%
	Top-5	79.23%	62.09%	67.53%	49.46%
	Top-10	81.65%	62.10%	67.54%	62.67%
	MRR	0.70	0.59	0.64	0.38

Sensitivity analysis

3.1. Top-𝑘 accuracy of ARist in different scenarios

	New project	Working project	Maintain project
Top-1	53.42%	69.96%	74.49%
Top-3	61.50%	81.14%	83.23%
Top-5	64.21%	83.74%	85.38%
Top-10	67.96%	85.88%	87.38%
MRR	0.58	0.76	0.79

3.2. ARist’s performance by the expression types of expected arguments

Expression type	Distribution (%)	Top-1 (%)
Simple Name	48.14	83.66
Method Invocation	15.19	45.51
Field Access	6.09	31.01
Array Access	0.74	53.26
Cast Expr	0.99	18.46
String Literal	10.03	98.14
Number Literal	5.06	95.66
Character Literal	0.47	87.93
Type Literal	0.90	81.92
Bool Literal	1.50	78.43
Null Literal	0.79	84.45
Object Creation	2.09	51.96
Array Creation	0.29	43.14
This Expr	1.06	91.05
Super Expr	0.00	0.00
Compound Expr	5.65	3.69
Lamda Expr	0.73	78.83
Method Reference	0.28	0.56
Total	100.00	69.96

3.3. Impact of Context Length on Performance

	l1	l2	l3	l4	l5
Top-1 (%)	62.05	65.86	66.14	67.00	67.83
MRR	0.70	0.72	0.72	0.73	0.74
Run. time (s)	0.33	0.39	0.42	0.51	0.56

Intrinsic Evaluation Results

4.1. Impact of Valid Candidate Identification

	Top-1 (%)	MRR	Run. time (s)
ON	69.96	0.76	0.444
OFF	47.50	0.51	0.809

4.2. Impact of Candidate Reduction

	Top-1 (%)	MRR	Run. time (s)
ON	69.96	0.76	0.444
OFF	61.98	0.69	2.424

4.3. Impact of reducing threshold, 𝑅𝑇

RT	10	20	30	40	50
Top-1 (%)	63.77	64.67	65.10	65.34	65.49
Run. time (s)	0.342	0.406	0.418	0.464	0.508

4.4. Impact of heavy-ranking stage

P_ℎ𝑟	Top-1 (%)	MRR	Run. time (s)
OFF	65.37	0.72	0.125
GPT-2	70.71	0.76	0.732
CodeT5	68.59	0.74	0.186
LSTM	49.26	0.61	0.198
n-gram	36.89	0.51	0.137