Intelligent Software Engineering

南京大学智能软件工程实验室

iSE
DTO
CSE
CFK
BAT

Title: Joint Seminar with Dean Zhu's Group from School of Information Management
Speaker: Zhenyu Chen, Tieke He, etc.
Location: Room 820, Feiyimin Building
Time: 14:00-18:00
Content: A joint seminar with Prof. Qinghua Zhu's Group from School of Information Management. Dr. Tieke He will introduce his proposal in Knowledge Graph, Prof. Zhenyu Chen will propose his plan on crowdsourced intelligent software testing and quality of judicial data, Yilin Yang and Yuying Li will brief their progress on paper work. We are seeking collaboration in these areas.
内容简介: 信管院朱庆华院长带队跟实验室交流,何铁科介绍知识图谱的研究计划,陈振宇介绍群体智能软件测试和司法数据质量方面的研究计划,杨乙霖、李玉莹介绍具体的论文计划和进展。期待在这些领域形成合作。
Title: Which Factor Impacts GUI Traversal-Based Test Case Generation Technique Most
Speaker: Yuanhan Tian
Location: Feiyimin Building
Time: 13:00-14:30
Content: None
内容简介: None
Title: Opening of Collective Intelligence Group of Fall 2018
Speaker: Jia Liu
Location: Room 820, Feiyimin Building
Time: 15:30-18:00
Content: Grand meeting with Prof. Jia Liu.
内容简介: 刘嘉老师见面会!
Title: KG:测试覆盖率、测试生成、测试多样性
Speaker: Tieke He
Location: Room 820, Feiyimin Building
Time: 14:00-18:00
Content: 结合软件工程中的工程思想,讨论如何将测试覆盖率、测试生成、测试多样性等工程指标运用在知识图谱当中
内容简介: Combining with the engineering idea of software engineering, we may discuss how to apply engineering indexes such as test coverage, test generation and test diversity to knowledge map.
Title: Progress of KG Group
Speaker: Tieke He, et al.
Location: Room 817, Feiyimin Building
Time: 14:00-16:30
Content: Members: Tieke He // Yaming Gu, Yu Gu, Xuekai Jiang, Yu Li, Li Qiao, Siyuan Shen, Zicong Xie, Ge Yan, Zhipeng Zou
内容简介: 成员: 何铁科 // 顾亚明、顾宇、蒋学垲、黎宇、乔力、沈思媛、谢子聪、严格、邹智鹏
Title: Brief Introduction of Restricted Boltzmann Machine
Speaker: Li Qiao
Location: Room 820, Feiyimin Building
Time: 14:00-16:00
Content: Restricted Boltzmann Machine is a model of deep learning. A Restricted Boltzmann Machine is a particular type of Markov random field that has a two-layer architecture. RBM becomes increasingly popular because of its fast learning algorithm, Contrastive Divergence. RBM has been successfully applied in various machine learning domains, such as classification, regression, dimension reduction, high-dimensional time series modeling, sparse overcomplete representations, image transformations, collaborative filtering.
内容简介: 受限玻尔兹曼机一种深度学习学习模型,结构上它是一类具有两层结构的马尔可夫随机场。随着RBM的快速学习算法–对比散度(Contrastive Divergence,CD)的出现,引起了一轮研究RBM、CD算法的理论及应用的热潮。目前RBM已被成功地应用于不同的机器学习问题,如分类、回归、降维、高维时间序列建模、稀疏超完备表示、图像变换、协同过滤等等。
Title: Systematic Error Analysis of the Stanford Question Answering Dataset
Speaker: Yu Gu
Location: Room 820, Feiyimin Building
Time: 14:00-16:00
Content: We analyzed the outputs of multiple question answering (QA) models applied to the Stanford Question Answering Dataset (SQuAD) to identify the core challenges for QA systems on this data set. Through an iterative process, challenging aspects were hypothesized through qualitative analysis of the common error cases. A classifier was then constructed to predict whether SQuAD test examples were likely to be difficult for systems to answer based on features associated with the hypothesized aspects. The classifier’s performance was used to accept or reject each aspect as an indicator of difficulty. With this approach, we ensured that our hypotheses were systematically tested and not simply accepted based on our pre-existing biases. Our explanations are not accepted based on human evaluation of individual examples. This process also enabled us to identify the primary QA strategy learned by the models, i.e., systems determined the acceptable answer type for a question and then selected the acceptable answer span of that type containing the highest density of words present in the question within its local vicinity in the passage.
内容简介: 我们分析了应用于斯坦福问答数据集(SQuAD)的多个问答(QA)模型的输出,以识别出该数据集上QA系统的核心挑战。通过迭代过程,通过对常见错误案例的定性分析来假设具有挑战性的方面。基于与假设方面相关联的特征,我们构建分类器来预测SQuAD测试示例是否可能使系统难以回答。根据分类器的性能来决定每一个方面是否可以作为难度的指标。通过这种方法,我们确保我们的假设得到了系统的测试,而不是简单地根据我们先前存在的偏差来接受。基于对个别实例的人工评估,我们的解释是不会被接受的。此过程还使我们能够识别模型所学习的主要QA策略,即系统确定问题的可接受答案类型,然后选择该类型中可接受的答案范围,这答案范围包含了段落和问题呈现的最高单词密度。
Title: Graphwave: an unsupervised node clustering algorithm
Speaker: Xuekai Jiang
Location: Room 820, Feiyimin Building
Time: 14:00-16:00
Content: In this paper, a low-dimensional embedding method for nodes in large-scale topology is proposed. The time complexity can be linearly proportional to the number of nodes due to approximate computation. This paper generates a low-dimensional high-density vector representation for each node in the network. As a follow-up work, it can continue to do clustering or similarity calculation. This method has the advantages of low complexity, few tuning parameters, no supervision, good expansibility, proper mathematical explainability, and performs well in similar algorithms.
内容简介: 本文提出了一种在大型拓扑中对节点作低维嵌入的方法,由于采用了近似计算,其时间复杂度可以达到线性于节点数。本文为网络中的每个节点产生一个低维高密度向量表示,作为后续工作,可以继续做聚类或者相似度计算等。 本方法具有复杂度低,调参少,无监督,可拓展性好,数学可解释性好等优点,在同类算法中表现优异。
Title: CROWD-IN-THE-LOOP: A Hybrid Approach for Annotating Semantic Roles
Speaker: Ge Yan
Location: Room 820, Feiyimin Building
Time: 14:00-16:00
Content: Crowdsourcing has proven to be an effective method for generating labeled data for a range of NLP tasks. However, multiple recent attempts of using crowdsourcing to generate gold-labeled training data for semantic role labeling (SRL) reported only modest results, indicating that SRL is perhaps too difficult a task to be effectively crowdsourced. In this paper, we postulate that while producing SRL annotation does require expert involvement in general, a large subset of SRL labeling tasks is in fact appropriate for the crowd. We present a novel workflow in which we employ a classifier to identify difficult annotation tasks and route each task either to experts or crowd workers according to their difficulties. Our experimental evaluation shows that the proposed approach reduces the workload for experts by over two-thirds, and thus significantly reduces the cost of producing SRL annotation at little loss in quality.
内容简介: 众包是NLP任务中生成标记数据的一种有效方法。然而,最近多项用众包来生成用于语义角色标注(SRL)的金标记训练数据的工作的结果均表现一般,这表明用众包可能难以有效地实现SRL任务。在本文中,我们假设生产SRL确实需要专家参与,但实际上大部分SRL任务是适合群众的。我们提出了一个新的工作流程,用分类器识别出难度高的标注任务,并根据困难程度将每个任务分给专家或群众。实验评估表明我们提出的方法将专家的工作量减轻了三分之二以上,从而显著降低了生产成本,且质量损失很小。
Title: 区块链的安全检测模型
Speaker: Yi Zhong
Location: Feiyimin Building
Time: 13:00-14:30
Content: None
内容简介: None