BigCode


陈振宇
ZhenYu Chen
Professor

王兴亚
Xingya Wang
Associate Professor

何铁科
Tieke He
Research Assistant

房春荣
Chunrong Fang
Research Assistant

冯洋
Yang Feng
Research Assistant

虞圣呈
Shengcheng Yu
Ph.D. student

张犬俊
Quanjun Zhang
Ph.D. student

葛修婷
Xiuting Ge
Ph.D. student

孙伟松
Weisong Sun
Ph.D. student

赵源
Yuan Zhao
Ph.D. student

郝蕊
Rui Hao
Ph.D. student

钟怡
Yi Zhong
Ph.D. student

李玉莹
Yuying Li
Ph.D. student

曹智豪
Zhihao Cao
M.Sc. Student

潘中颢
Zhonghao Pan
M.Sc. Student

石孟雨
Mengyu Shi
M.Sc. Student

曾鹏程
Pengcheng Zeng
M.Sc. Student

于亚东
Yatung Yu
M.Sc. Student

刘凡
Fan Liu
M.Sc. Student

黄志斌
Zhibin Huang
M.Sc. Student

陈骁
Xiao Chen
M.Sc. Student

张鹤元
Heyuan Zhang
M.Sc. Student

钱瑞祥
Ruixiang Qian
M.Sc. Student

吴青衡
Qingheng Wu
M.Sc. Student

李彤宇
Tongyu Li
M.Sc. Student

钱雨波
Yubo Qian
M.Sc. Student

刘子夕
Zixi Liu
M.Sc. Student

王旭
Xu Wang
M.Sc. Student

葛宇
Yu Ge
M.Sc. Student

曹振飞
Zhenfei Cao
M.Sc. Student

恽叶霄
Yexiao Yun
M.Sc. Student

刘智彪
Zhibiao Liu
M.Sc. Student

唐昊杰
Haojie Tang
M.Sc. Student

钱美缘
Meiyuan Qian
M.Sc. Student

张晶
Jing Zhang
M.Sc. Student

朱晨乾
Chenqian Zhu
M.Sc. Student
FuRong builds a bug model with complete context information, such as screen-shoots, execution events and logs from multi-devices, which are significant for developers, and then inducts a classification rule for bugs, which is the foundation for bug classification and deduplication. FuRong classifies bugs and removes some redundant bug information. FuRong also recommends a possible fixing solution for each type of bug. An empirical study of 8 open source Android applications with automated testing on 20 devices has been conducted. The preliminary results show the effectiveness of FuRong with the average accuracy of 93%.
MAF, a plagiarism detection technology for test code, which relies on a constant similarity threshold to determine whether there is plagiarism between two pieces of test code. However, finding an appropriate threshold is never easy. We realize that a constant threshold cannot be used in every circumstance. To address this issue and make MAF more usable, we developed MAF-2 by applying a stable and reliable classifier based on Support Vector Machine classification algorithm. Experiments were conducted on three test code data sets, and the results show that MAF-2 can achieve plagiarism detection effectively. The video presentation of MAF-2 is available at Youtube and the source code can be downloaded at Github.
The system introduces the Isolation Forest algorithm to implement data annotation preprocessing, thereby reducing the workload of data labeling for operation and maintenance personnel. Furthermore, the accuracy of anomaly detection and trend prediction is improved by iteratively updating data labels and resetting models. The system is mainly divided into four modules. The monitoring module is responsible for the collection and storage of monitoring data.
This system is realized by mainstream frameworks, with Angular2 as the frontend framework, Spring Boot as the back-end framework, Redis as the query cache, and MongoDB as the database. Similar report recommendation is implemented by Word2Vec and WMD algorithm. The audit task recommendation is implemented using a model-based collaborative filtering approach. The test page recommendation is implemented using the multi-source shortest path method based on users’histories.

Intelligent Code Recommendation System Based on Structure Embedding Analysis
This tool
relies
on the Mooctest WebIDE system to update and optimize, and implements an intelligent code
recommendation system based on structure embedding analysis. When users perform programming
exercises or exams, they access the system through a browser and use an intelligent editor
for online programming. The tool will monitor the user's programming behavior and grab code
information for analysis and recommend code fragments to users based on the analysis
results. Code recommendation relies on a large amount of source code data, so it is
necessary to build a corpus based on a large amount of source code and preprocessing. The
candidate code fragments with high similarity are trained by word embedding. Sentence
embedding is performed to obtain a vector representation. At present, the tool has replaced
the original Mooctest WebIDE system for online use. Experiment shows that the system
improves the programming experience of 82.72% users on average.

Test Recommendation System Based on Slicing Coverage Filtering
This tool
relies
on Mooctest WebIDE to design and implement a Test Code Snippets Recommendation System based
on Slicing Coverage Filtering. This tool uses Wala as a program slicing tool. AST program
analysis technology is used to merge the code snippets with the project template. The
OpenClover tool is used to analyze the test coverage of the code snippets and store it in
the corpus. During the user's test learning, the system will analyze the user's test
coverage information in real time, and use the test coverage vector to calculate the Jaccard
vector similarity filter to obtain the relevant code snippets in the recommended corpus. The
tool has constructed a recommended corpus containing 11 original questions and more than
2,200 test code snippets.

WebTester
This tool
relies
on Mooctest WebIDE to design and implement an online test development system based on
homologous code matching. The LSP protocol is used to add multi-language intelligent code
prompts. At the same time, a test automation generation scheme is proposed. WebTester
collects historical data of open source websites and examination platforms to build a corpus
of test codes. The structure and text information of the code under test is extracted;
WebTester combines string matching, spelling correction, near-sense search, program
similarity analysis and other means to measure homology. The test cases of the homologous
method in the corpus are searched and modified, and concise and usable test code for the
method to be tested is generated. For projects not covered, the system integrates the
optimized Evosuite tool to provide users with basic test cases for the target method.
Statistics shows that WebTester reduces the average processing time of a single project from
65s to 20s, and the throughput is greatly improved.

MAF
We deeply
analyzes the test code designed under the unit test framework and production code, explores
the potential differences between them, designs and implements a test code plagiarism
detection system based on program slicing. In order to achieve high-quality test code
plagiarism detection, this system innovatively proposes a static two-way program slicing
technology, extracting effective test fragments from non-standard test code based on the
method under test, further calculates the similarity between test fragments, and carrys out
plagiarism analysis based on similarity. After a series of strict system tests, the system
has achieved functional and non-functional requirements, which measures up to the
anticipative assumptions. Through the analysis and verification on a large number of real
datasets, the experiment shows that the system can effectively detect the plagiarism of test
code, and the system has good robustness through performance analysis.
- National key R&D program of China:Intelligent real-time quality improvement method and
technology based on collaborative programming field (2018YFB1003901), 2018-2021
国家重点研发计划课题:基于协同编程现场的智能实时质量提升方法与技术(2018YFB1003901),2018-2021 - Project for Huawei: Automated precision test project (YBN2016120004), 2016-2017
华为项目:自动化精准测试项目(YBN2016120004),2016-2017 - National natural science foundation of China (General Program): Software maintenance technology
based on developer social network (61472176), 2015-2018
国家自然科学基金项目(面上项目):基于开发者社交网络的软件维护技(61472176), 2015-2018 - National natural science foundation of China:Test Case selection techniques based on clustering
analysis of software behaviors(61003024). 2011-2013
国家自然科学基金:基于软件行为聚类分析的测试用例选择技术(61003024),2011-2013 - National natural science foundation of China :Software testing optimization techniques based on
sclicing(60803007), 2009-2011
国家自然科学基金:基于切片的软件测试优化技术(60803007),2009-2011
2022
- Weisong Sun, Chunrong Fang, Yuchen Chen, Guanhong Tao, Tingxu Han, Quanjun Zhang. Code Search based on Context-aware Code Translation ICSE 2022
- Rui Hao, Yuying Li, Yang Feng, Zhenyu Chen. Are Duplicates Really Harmful? An Empirical Study on Bug Report Summarization Techniques Techniques. Journal of Software: Evolution and Process
2021
- Li Y, Feng Y, Hao R, et al. Classifying crowdsourced mobile test reports with image features: An empirical study [J]. Journal of Systems and Software, 2022, 184: 111121.
- Liu Z, Feng Y, Chen Z. DialTest: Automated Testing for Recurrent-Neural-Network-Driven Dialogue Systems [C]//Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2021: 115-126.
- Luo W, Chai D, Run X, et al. Graph-based Fuzz Testing for Deep Learning Inference Engines [C]//2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021: 288-299.
- Yu S, Fang C, Cao Z, et al. Prioritize Crowdsourced Test Reports via Deep Screenshot Understanding [C]//2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021: 946-956.
2020
- Sun W, Xu G, Yang Z, et al. Early Detection of Smart Ponzi Scheme Contracts Based on Behavior Forest Similarity [C]//2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). IEEE, 2020: 297-309.
- Zhu C, Sun W, Liu Q, et al. HomoTR: Online Test Recommendation System Based on Homologous Code Matching [C]//2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2020: 1302-1306.
- Qian R, Zhao Y, Men D, et al. Test Recommendation System Based on Slicing Coverage Filtering [C]//Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2020: 573-576.
- Guo C, He T, Yuan W, et al. Crowdsourced Requirements Generation for Automatic Testing via Knowledge Graph [C]//Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2020: 545-548.
- Tian Y, Yu S, Fang C, et al. FuRong: Fusing Report of Automated Android Testing on Multi-Devices [C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings. 2020: 49-52.
2019
- Yu S, Fang C, Feng Y, et al. LIRAT: Layout and image recognition driving automated mobile testing of cross-platform [C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019: 1066-1069.
- Wang X, Wu H, Sun W, et al. Towards generating cost-effective Test-Suite for ethereum smart contract [C]//2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2019: 549-553.
- Sun W, Wang X, Wu H, et al. MAF: Method-anchored test fragmentation for test code plagiarism detection [C]//2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET). IEEE, 2019: 110-120.
Before the 2017
- Gao R, Wong W E, Chen Z, et al. Effective software fault localization using predicted execution results [J]. Software Quality Journal, 2017, 25(1): 131-169.
- Yang Y, Huang X, Hao X, et al. An industrial study of natural language processing based test case prioritization [C]//2017 IEEE International Conference on Software Testing, Verification and Validation (ICST). IEEE, 2017: 548-549.
- Gao R , Wong W E , Chen Z , et al. Effective software fault localization using predicted execution results [J]. Software Quality Journal, 2017.
- Feng Y, Jones J A, Chen Z, et al. Multi-objective test report prioritization using image understanding [C]//2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2016: 202-213.[Chinese Brief]
- Shi Q, Chen Z, Fang C, et al. Measuring the diversity of a test set with distance entropy [J]. IEEE Transactions on Reliability, 2015, 65(1): 19-27.
- Yang R, Chen Z, Zhang Z, et al. Efsm-based test case generation: Sequence, data, and oracle [J]. International Journal of Software Engineering and Knowledge Engineering, 2015, 25(04): 633-667.
- YANG R, CHEN Z Y, ZHANG Z Y, et al. A novel approach of automated test case generation on extended finite state machine [J]. SCIENTIA SINICA Informationis, 2014, 44(5): 588-609.
- Yang W, Chen Z, Gao Z, et al. GUI testing assisted by human knowledge: random vs. functional [J]. Journal of Systems and Software, 2014, 89: 76-86.
- Fang C, Chen Z, Wu K, et al. Similarity-based test case prioritization using ordered sequences of program entities [J]. Software Quality Journal, 2014, 22(2): 335-361.[Chinese Brief]
- Miao Y, Chen Z, Li S, et al. A clustering-based strategy to identify coincidental correctness in fault localization [J]. International Journal of Software Engineering and Knowledge Engineering, 2013, 23(05): 721-741.
- Yang R, Chen Z, Xu B, et al. Improve the effectiveness of test case generation on EFSM via automatic path feasibility analysis [C]//2011 ieee 13th international symposium on high-assurance systems engineering. IEEE, 2011: 17-24.
- Chen Z, Zhang J, Luo B. Teaching software testing methods based on diversity principles [C]//2011 24th IEEE-CS Conference on Software Engineering Education and Training (CSEE&T). IEEE, 2011: 391-395.
- Chen S, Chen Z, Zhao Z, et al. Using semi-supervised clustering to improve regression test selection techniques [C]//2011 Fourth IEEE International Conference on Software Testing, Verification and Validation. IEEE, 2011: 1-10.[Chinese Brief]
- Zhang C, Chen Z, Zhao Z, et al. An improved regression test selection technique by clustering execution profiles [C]//2010 10th International Conference on Quality Software. IEEE, 2010: 171-179.
- Yan S, Chen Z, Zhao Z, et al. A dynamic test cluster sampling strategy by leveraging execution spectra information [C]//2010 Third International Conference on Software Testing, Verification and Validation. IEEE, 2010: 147-154.[Chinese Brief]