首页 | 官方网站   微博 | 高级检索  
     

铝硅合金实体关系抽取数据集的构建方法
引用本文:刘英莉,吴瑞刚,么长慧,沈韬.铝硅合金实体关系抽取数据集的构建方法[J].浙江大学学报(自然科学版 ),2022,56(2):245-253.
作者姓名:刘英莉  吴瑞刚  么长慧  沈韬
作者单位:1. 昆明理工大学 信息工程与自动化学院,云南 昆明 6505002. 昆明理工大学 云南省计算机技术应用重点实验室,云南 昆明 650500
基金项目:国家自然科学基金资助项目(52061020,61971208,51864027);云南计算机技术应用重点实验室开放基金资助项目(2020103)
摘    要:针对材料领域没有适合材料实体关系抽取技术研究工作的公开数据集这一问题,通过研究高硅铝合金喷射沉积文献提出铝硅合金实体关系抽取数据集的构建方法. 在材料领域专家的指导下制定铝硅合金实体关系抽取数据集的构建标准,并根据构建标准对收集的数据进行实体标注和关系标注. 在标注完成后,通过数据预处理生成铝硅合金实体关系抽取数据集. 通过实体关系联合抽取模型进行实验,验证该数据集可以应用于实体关系抽取任务. 与公开数据集相比,材料数据集句子的语义和语法更为复杂,长句更多,导致实体关系联合抽取模型在材料数据集上的表现略差. 针对上述问题,在实体关系联合抽取模型上加入自注意力机制,使该模型整体的F1值提高了约5.8%. 该数据集的构建方法具有普适性,可以通过该构建方法构建材料数据集.

关 键 词:数据集  构建标准  数据标注  实体关系联合抽取模型  自注意力机制  

Construction method of extraction dataset of Al-Si alloy entity relationship
Ying-li LIU,Rui-gang WU,Chang-hui YAO,Tao SHEN.Construction method of extraction dataset of Al-Si alloy entity relationship[J].Journal of Zhejiang University(Engineering Science),2022,56(2):245-253.
Authors:Ying-li LIU  Rui-gang WU  Chang-hui YAO  Tao SHEN
Abstract:At present, there is no public dataset suitable for the research work of material entity relationship extraction technology in the field of materials. Aiming at the above problem, the construction method of aluminum-silicon alloy entity relationship extraction dataset was proposed through the literature of high-silicon aluminum alloy spray deposition. The construction standards of the aluminum-silicon alloy entity relationship extraction dataset were formulated under the guidance of experts in the material field, and the collected data were marked with entities and relationships according to the construction standards. After the annotation was completed, the aluminum-silicon alloy entity relationship extraction dataset was generated through data preprocessing. Experiments were conducted through the entity-relationship joint extraction model to verify that the dataset can be applied to entity-relationship extraction tasks. Compared with the public dataset, the semantics and grammar of the sentence in the material dataset were more complicated, and there were more long sentences, which led to a slightly worse performance of the entity relationship joint extraction model on the material dataset. Therefore, a self-attention mechanism was added to the entity relationship joint extraction model, which increased the overall F1 value by about 5.8%. The method of constructing the dataset is universal, and the material dataset can be constructed by the construction method.
Keywords:dataset  construction standard  data annotation  entity relationship joint extraction model  self-attention mechanism  
点击此处可从《浙江大学学报(自然科学版 )》浏览原始摘要信息
点击此处可从《浙江大学学报(自然科学版 )》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号