共查询到20条相似文献,搜索用时 15 毫秒
1.
Yi Zhang Jie Lu Feng Liu Qian Liu Alan Porter Hongshu Chen Guangquan Zhang 《Journal of Informetrics》2018,12(4):1099-1117
Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method’s ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities. 相似文献
2.
Emerging research topic detection can benefit the research foundations and policy-makers. With the long-term and recent interest in detecting emerging research topics, various approaches are proposed in the literature. Though, there is still a lack of well-established linkages between the clear conceptual definition of emerging research topics and the proposed indicators for operationalization. This work follows the definition by Wang (2018), and several machine learning models are together used to detect and foresight the emerging research topics. Finally, experimental results on gene editing dataset discover three emerging research topics, which make clear that it is feasible to identify emerging research topics with our framework. 相似文献
3.
基于机器学习的自动文本分类模型研究 总被引:2,自引:0,他引:2
基于机器学习的方法是自动文本分类中非常重要的一大类方法。本文先给出了形式化的定义,提出了自动文本分类的流程模型,然后选取了支持向量机(Support Vector Machine,SVM)算法作为一个典型例子进行分析,最后作者通过一个中文文本分类实验评价了该算法的效果。 相似文献
4.
《Journal of Informetrics》2014,8(3):776-790
This study proposes a temporal analysis method to utilize heterogeneous resources such as papers, patents, and web news articles in an integrated manner. We analyzed the time gap phenomena between three resources and two academic areas by conducting text mining-based content analysis. To this end, a topic modeling technique, Latent Dirichlet Allocation (LDA) was used to estimate the optimal time gaps among three resources (papers, patents, and web news articles) in two research domains. The contributions of this study are summarized as follows: firstly, we propose a new temporal analysis method to understand the content characteristics and trends of heterogeneous multiple resources in an integrated manner. We applied it to measure the exact time intervals between academic areas by understanding the time gap phenomena. The results of temporal analysis showed that the resources of the medical field had more up-to-date property than those of the computer field, and thus prompter disclosure to the public. Secondly, we adopted a power-law exponent measurement and content analysis to evaluate the proposed method. With the proposed method, we demonstrate how to analyze heterogeneous resources more precisely and comprehensively. 相似文献
5.
6.
Massih R. Amini Anastasios Tombros Nicolas Usunier Mounia Lalmas 《Information Retrieval》2007,10(3):233-255
Documents formatted in eXtensible Markup Language (XML) are available in collections of various document types. In this paper,
we present an approach for the summarisation of XML documents. The novelty of this approach lies in that it is based on features
not only from the content of documents, but also from their logical structure. We follow a machine learning, sentence extraction-based
summarisation technique. To find which features are more effective for producing summaries, this approach views sentence extraction
as an ordering task. We evaluated our summarisation model using the INEX and SUMMAC datasets. The results demonstrate that
the inclusion of features from the logical structure of documents increases the effectiveness of the summariser, and that
the learnable system is also effective and well-suited to the task of summarisation in the context of XML documents. Our approach
is generic, and is therefore applicable, apart from entire documents, to elements of varying granularity within the XML tree.
We view these results as a step towards the intelligent summarisation of XML documents.
相似文献
Mounia LalmasEmail: |
7.
Web文本分类技术研究现状述评 总被引:1,自引:0,他引:1
本文在分析国内外Web文本分类方法研究现状的基础上,对新近出现的基于群的分类方法、基于模糊—粗糙集的文本分类模型、多分类器融合的方法、基于RBF网络的文本分类模型、潜在语义分类模型等新方法,以及K—近邻算法和支持向量机的新发展等进行了深入探讨;并对Web文本分类过程的几个关键技术:文本预处理、文本表示、特征降维、训练方法和分类算法进行了分析;最后总结了Web文本分类技术存在着新分类方法不断涌现、传统分类方法的进一步发展、文本、语音和图像分类技术的融合等几种发展趋势,以及存在着分词问题、目前还没有发现"最佳"的特征选择等研究的不足之处。 相似文献
8.
Yoojin Kwon Michelle Lemieux Jill McTavish Nadine Wathen 《Journal of the Medical Library Association》2015,103(4):184-188
Objective
The purpose of this study was to compare effectiveness of different options for de-duplicating records retrieved from systematic review searches.Methods
Using the records from a published systematic review, five de-duplication options were compared. The time taken to de-duplicate in each option and the number of false positives (were deleted but should not have been) and false negatives (should have been deleted but were not) were recorded.Results
The time for each option varied. The number of positive and false duplicates returned from each option also varied greatly.Conclusion
The authors recommend different de-duplication options based on the skill level of the searcher and the purpose of de-duplication efforts. 相似文献9.
Wenli Gao 《Behavioral & Social Sciences Librarian》2017,36(1):36-47
Text analysis has been widely used to identify research trends in many disciplines. However, little has been written about using this method to discover a department’s research trends and faculty research interests in the library setting. This study examined 107 faculty publications from the School of Communication at the University of Houston. It analyzed word and phrase frequencies from titles and abstracts using Voyant tools. The analysis was performed on both the department level and the individual faculty level. This article demonstrates a new way for librarians to understand faculty research. The method can be replicated by subject librarians to analyze their own departments. 相似文献
10.
11.
12.
用户模型及其学习方法 总被引:13,自引:2,他引:13
主要通过分析检索中影响个体用户满意度的用户相关度,指出了利用用户模型可以对用户的检索行为、信息需求喜好等进行学习和推导。然后重点阐述了在信息检索过程中,用户需求的特点及针对用户建模的方式和学习的方法,最后用户模型在检索中的应用前景及其存在的难点进行了简要的评述。 相似文献
13.
由于自然语言的复杂性,使得情感挖掘仍存在一些问题需要解决,如情感词的领域依赖性、隐式特征识别、同指特征处理和特征极性计算等。为解决这些问题,提出一种基于语义的情感挖掘方法,该方法以主题图为指导进行特征及情感词的识别和情感极性强度计算,充分利用特征之间及其特征与情感词之间的语义关系,可以在一定程度上提高意见挖掘的准确性。 相似文献
14.
文本与数据挖掘(TDM)实现了从人工阅读到机器阅读的变革,为创新提供了新的方法和工具,因而在科研界有强烈的应用需求,但也因著作权法方面的不确定性而导致发展受限,实践界试图通过政策声明、司法个案和立法程序来解决TDM所面临的著作权问题。在厘清TDM著作权问题的基础上,本研究结合实践中的应对措施与我国的法律环境,建议通过设立TDM著作权例外、推动“转化性使用”理论的司法适用、借助订购协议强化权利、发展开放获取事业等多种措施来应对TDM面临的著作权问题,为TDM技术的发展与应用提供法律政策支持。图2。参考文献26。 相似文献
15.
从基金论文看我国情报学的发展 总被引:2,自引:0,他引:2
论文收集整理了我国1989年到2007年间的情报学基金论文,并从基金论文的数量、基金级别、论文主题、论文作者、作者间的合作等方面进行多角度的分析,以揭示我国情报学研究的现状,探讨我国情报学研究的趋向. 相似文献
16.
加拿大联邦政府电子文件管理策略分析 总被引:1,自引:0,他引:1
文章首先介绍了加拿大图书档案馆(LAC)在联邦政府电子文件管理中的职责、与政府信息主管部门的关系;然后,从政府机构内电子文件保管、LAC接收保存电子文件两个相承接的角度,分析LAC在履行作为政府机构电子文件永久保存地的职责中遇到的困难和挑战,以及采取的应对思路和对策。 相似文献
17.
《Journal of Informetrics》2020,14(2):101008
The publication indicator of the Finnish research funding system is based on a manual ranking of scholarly publication channels. These ranks, which represent the evaluated quality of the channels, are continuously kept up to date and thoroughly reevaluated every four years by groups of nominated scholars belonging to different disciplinary panels. This expert-based decision-making process is informed by available citation-based metrics and other relevant metadata characterizing the publication channels. The purpose of this paper is to introduce various approaches that can explain the basis and evolution of the quality of publication channels, i.e., ranks. This is important for the academic community, whose research work is being governed using the system. Data-based models that, with sufficient accuracy, explain the level of or changes in ranks provide assistance to the panels in their multi-objective decision making, thus suggesting and supporting the need to use more cost-effective, automated ranking mechanisms. The analysis relies on novel advances in machine learning systems for classification and predictive analysis, with special emphasis on local and global feature importance techniques. 相似文献
18.
基于文本挖掘的理论,提出不同购物网站商品评论对比分析的方法,对不同购物网站同一商品评论是否一致进行研究。首先对商品单个特征的评论进行对比分析,然后衍生到商品的整体特征对比。研究发现,不同购物网站对同一商品的评论并不完全一致,这种不一致主要体现在商品特征上面,这说明商品评论会因为购物网站的不同而有所差异。 相似文献
19.
Lars Asker Atelach Alemu Argaw Björn Gambäck Samuel Eyassu Asfeha Lemma Nigussie Habte 《Information Retrieval》2009,12(3):416-435
We present work aimed at compiling an Amharic corpus from the Web and automatically categorizing the texts. Amharic is the
second most spoken Semitic language in the World (after Arabic) and used for countrywide communication in Ethiopia. It is
highly inflectional and quite dialectally diversified. We discuss the issues of compiling and annotating a corpus of Amharic
news articles from the Web. This corpus was then used in three sets of text classification experiments. Working with a less-researched
language highlights a number of practical issues that might otherwise receive less attention or go unnoticed. The purpose
of the experiments has not primarily been to develop a cutting-edge text classification system for Amharic, but rather to
put the spotlight on some of these issues. The first two sets of experiments investigated the use of Self-Organizing Maps
(SOMs) for document classification. Testing on small datasets, we first looked at classifying unseen data into 10 predefined
categories of news items, and then at clustering it around query content, when taking 16 queries as class labels. The second
set of experiments investigated the effect of operations such as stemming and part-of-speech tagging on text classification
performance. We compared three representations while constructing classification models based on bagging of decision trees
for the 10 predefined news categories. The best accuracy was achieved using the full text as representation. A representation
using only the nouns performed almost equally well, confirming the assumption that most of the information required for distinguishing
between various categories actually is contained in the nouns, while stemming did not have much effect on the performance
of the classifier.
相似文献
Lemma Nigussie HabteEmail: |
20.
The primary goal of the present study is to discover new drug treatments by topology analysis of drug associations and their therapeutic group network. To this end, we collected 19,869 papers dated from 1946 to 2015 that are related to autism treatment from PubMed. We extracted 145 drugs based on MeSH terms and their synonyms (the total number is 6624) within the same ATC classification hierarchy and used them to find drug associations in the collected datasets. We introduced a new topology-driven method that incorporates various network analyses including co-word network, clique percolation, weak component, pathfinding-based analysis of therapeutic groups, and detection of important drug interaction within a clique. The present study showed that the in-depth analysis of the drug relationships extracted from the literature-based network sheds new light on drug discovery research. The results also suggested that certain drugs could be repurposed for autism treatment in the future. In particular, the results indicated that the discovered four drugs such as Tocilizumab, Tacrolimus, Prednisone, and Sulfisoxazole are worthy of further study in laboratory experiments with formal assessment of possible effects on symptoms, which may provide psychologists, physicians, and researchers with data-based scientific hypotheses in autism-drug discovery. 相似文献