首页 | 官方网站   微博 | 高级检索  
     


SINA: Semantic interpretation of user queries for question answering on interlinked data
Affiliation:1. Speech Technology Group, ETSI Telecomunicación Universidad Politécnica de Madrid, Ciudad Universitaria, Av. Complutense, 30, 28040 Madrid, Spain;2. School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore;3. MERL, Mitsubishi Electric Research Laboratories, 201 Broadway, 8th Floor, Cambridge, MA 02139-1955, USA;4. Department of Electrical and Computer Engineering, National University of Singapore, Block E4, 4 Engineering Drive 3, Singapore 117583, Singapore;2. Institute for Biological Interfaces, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany;1. College of Computer Science and Software Engineering, Shenzhen University, 518060, China;2. Department of Computer Science, The Univerisity of Texas at San Antonio, TX 78249, USA;3. College of Computer Science, Changchun University of Science and Technology, 130022, China
Abstract:The architectural choices underlying Linked Data have led to a compendium of data sources which contain both duplicated and fragmented information on a large number of domains. One way to enable non-experts users to access this data compendium is to provide keyword search frameworks that can capitalize on the inherent characteristics of Linked Data. Developing such systems is challenging for three main reasons. First, resources across different datasets or even within the same dataset can be homonyms. Second, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain user query. Finally, constructing a federated formal query from keywords across different datasets requires exploiting links between the different datasets on both the schema and instance levels. We present Sina, a scalable keyword search system that can answer user queries by transforming user-supplied keywords or natural-languages queries into conjunctive SPARQL queries over a set of interlinked data sources. Sina uses a hidden Markov model to determine the most suitable resources for a user-supplied query from different datasets. Moreover, our framework is able to construct federated queries by using the disambiguated resources and leveraging the link structure underlying the datasets to query. We evaluate Sina over three different datasets. We can answer 25 queries from the QALD-1 correctly. Moreover, we perform as well as the best question answering system from the QALD-3 competition by answering 32 questions correctly while also being able to answer queries on distributed sources. We study the runtime of SINA in its mono-core and parallel implementations and draw preliminary conclusions on the scalability of keyword search on Linked Data.
Keywords:Keyword search  Question answering  Hidden Markov model  SPARQL  RDF  Disambiguation
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号