期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

Yi Zhang Jie Lu Feng Liu Qian Liu Alan Porter Hongshu Chen Guangquan Zhang 《Journal of Informetrics》2018,12(4):1099-1117

Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method’s ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities. 相似文献

2.

Emerging research topics detection with multiple machine learning models

Shuo Xu Liyuan Hao Xin An Guancan Yang Feifei Wang 《Journal of Informetrics》2019,13(4):100983

Emerging research topic detection can benefit the research foundations and policy-makers. With the long-term and recent interest in detecting emerging research topics, various approaches are proposed in the literature. Though, there is still a lack of well-established linkages between the clear conceptual definition of emerging research topics and the proposed indicators for operationalization. This work follows the definition by Wang (2018), and several machine learning models are together used to detect and foresight the emerging research topics. Finally, experimental results on gene editing dataset discover three emerging research topics, which make clear that it is feasible to identify emerging research topics with our framework. 相似文献

3.

基于机器学习的自动文本分类模型研究 总被引：2，自引：0，他引：2

陈立孚周宁李丹《现代图书情报技术》2005,21(10):23-27

基于机器学习的方法是自动文本分类中非常重要的一大类方法。本文先给出了形式化的定义,提出了自动文本分类的流程模型,然后选取了支持向量机（Support Vector Machine,SVM）算法作为一个典型例子进行分析,最后作者通过一个中文文本分类实验评价了该算法的效果。相似文献

4.

《Journal of Informetrics》2014,8(3):776-790

This study proposes a temporal analysis method to utilize heterogeneous resources such as papers, patents, and web news articles in an integrated manner. We analyzed the time gap phenomena between three resources and two academic areas by conducting text mining-based content analysis. To this end, a topic modeling technique, Latent Dirichlet Allocation (LDA) was used to estimate the optimal time gaps among three resources (papers, patents, and web news articles) in two research domains. The contributions of this study are summarized as follows: firstly, we propose a new temporal analysis method to understand the content characteristics and trends of heterogeneous multiple resources in an integrated manner. We applied it to measure the exact time intervals between academic areas by understanding the time gap phenomena. The results of temporal analysis showed that the resources of the medical field had more up-to-date property than those of the computer field, and thus prompter disclosure to the public. Secondly, we adopted a power-law exponent measurement and content analysis to evaluate the proposed method. With the proposed method, we demonstrate how to analyze heterogeneous resources more precisely and comprehensively. 相似文献

5.

政府公开目录系统的技术及应用

孙丽华曹辉施水才《数字图书馆论坛》2008,(5):38-42

文章系统地阐述了政府信息公开目录系统的可行性,从资源层、编目层、渠道层和保障层等四个层次上详细论述了TRS的总体实现方案,并分别就四个层次介绍了TRS的实现技术,从而形成一个严格的开放的政府信息公开平台系统,为《政府信息公开条例》提供全面而先进的技术保障与支撑。相似文献

6.

Learning-based summarisation of XML documents

Massih R. Amini Anastasios Tombros Nicolas Usunier Mounia Lalmas 《Information Retrieval》2007,10(3):233-255

Documents formatted in eXtensible Markup Language (XML) are available in collections of various document types. In this paper, we present an approach for the summarisation of XML documents. The novelty of this approach lies in that it is based on features not only from the content of documents, but also from their logical structure. We follow a machine learning, sentence extraction-based summarisation technique. To find which features are more effective for producing summaries, this approach views sentence extraction as an ordering task. We evaluated our summarisation model using the INEX and SUMMAC datasets. The results demonstrate that the inclusion of features from the logical structure of documents increases the effectiveness of the summariser, and that the learnable system is also effective and well-suited to the task of summarisation in the context of XML documents. Our approach is generic, and is therefore applicable, apart from entire documents, to elements of varying granularity within the XML tree. We view these results as a step towards the intelligent summarisation of XML documents.

Mounia LalmasEmail:

相似文献

7.

Web文本分类技术研究现状述评 总被引：1，自引：0，他引：1

高淑琴《图书情报知识》2008,(3):81-86

本文在分析国内外Web文本分类方法研究现状的基础上,对新近出现的基于群的分类方法、基于模糊—粗糙集的文本分类模型、多分类器融合的方法、基于RBF网络的文本分类模型、潜在语义分类模型等新方法,以及K—近邻算法和支持向量机的新发展等进行了深入探讨;并对Web文本分类过程的几个关键技术:文本预处理、文本表示、特征降维、训练方法和分类算法进行了分析;最后总结了Web文本分类技术存在着新分类方法不断涌现、传统分类方法的进一步发展、文本、语音和图像分类技术的融合等几种发展趋势,以及存在着分词问题、目前还没有发现"最佳"的特征选择等研究的不足之处。相似文献

8.

Identifying and removing duplicate records from systematic review searches

Yoojin Kwon Michelle Lemieux Jill McTavish Nadine Wathen 《Journal of the Medical Library Association》2015,103(4):184-188

Objective

The purpose of this study was to compare effectiveness of different options for de-duplicating records retrieved from systematic review searches.

Methods

Using the records from a published systematic review, five de-duplication options were compared. The time taken to de-duplicate in each option and the number of false positives (were deleted but should not have been) and false negatives (should have been deleted but were not) were recorded.

Results

The time for each option varied. The number of positive and false duplicates returned from each option also varied greatly.

Conclusion

The authors recommend different de-duplication options based on the skill level of the searcher and the purpose of de-duplication efforts. 相似文献

9.

Wenli Gao 《Behavioral & Social Sciences Librarian》2017,36(1):36-47

Text analysis has been widely used to identify research trends in many disciplines. However, little has been written about using this method to discover a department’s research trends and faculty research interests in the library setting. This study examined 107 faculty publications from the School of Communication at the University of Houston. It analyzed word and phrase frequencies from titles and abstracts using Voyant tools. The analysis was performed on both the department level and the individual faculty level. This article demonstrates a new way for librarians to understand faculty research. The method can be replicated by subject librarians to analyze their own departments. 相似文献

10.

Simple export of journal citation data to Excel using any reference manager

David Brennan 《Journal of the Medical Library Association》2016,104(1):72-75

相似文献

11.

编辑距离算法在科研基金名称数据分析中的应用

赵胜钢李军莲陈颖《数字图书馆论坛》2014,(5):53-58

通过对科研基金名称数据特点和文本数据聚类方法的分析,提出并实现了基于编辑距离算法（Levenshtein Distance）的科研基金名称数据分析方法,该算法首先通过设定相似度方式对科研基金名称数据进行聚类形成数据分组,再对分组数据进行二次聚类计算出组的相似度之和,并据此判定数据聚类中心。该方法已经成功应用于中国医学科学院医学信息研究所的医学文献基金数据处理。相似文献

12.

用户模型及其学习方法 总被引：13，自引：2，他引：13

李广建黄崑《现代图书情报技术》2002,18(6):24-27

主要通过分析检索中影响个体用户满意度的用户相关度,指出了利用用户模型可以对用户的检索行为、信息需求喜好等进行学习和推导。然后重点阐述了在信息检索过程中,用户需求的特点及针对用户建模的方式和学习的方法,最后用户模型在检索中的应用前景及其存在的难点进行了简要的评述。相似文献

13.

基于语义的情感挖掘系统的设计与实现

李纲王忠义《现代图书情报技术》2011,(Z1):97-103

由于自然语言的复杂性,使得情感挖掘仍存在一些问题需要解决,如情感词的领域依赖性、隐式特征识别、同指特征处理和特征极性计算等。为解决这些问题,提出一种基于语义的情感挖掘方法,该方法以主题图为指导进行特征及情感词的识别和情感极性强度计算,充分利用特征之间及其特征与情感词之间的语义关系,可以在一定程度上提高意见挖掘的准确性。相似文献

14.

支持文本与数据挖掘的著作权法律政策建议

下载免费PDF全文

罗娇张晓林《中国图书馆学报》2018,44(3):21-34

文本与数据挖掘（TDM）实现了从人工阅读到机器阅读的变革,为创新提供了新的方法和工具,因而在科研界有强烈的应用需求,但也因著作权法方面的不确定性而导致发展受限,实践界试图通过政策声明、司法个案和立法程序来解决TDM所面临的著作权问题。在厘清TDM著作权问题的基础上,本研究结合实践中的应对措施与我国的法律环境,建议通过设立TDM著作权例外、推动“转化性使用”理论的司法适用、借助订购协议强化权利、发展开放获取事业等多种措施来应对TDM面临的著作权问题,为TDM技术的发展与应用提供法律政策支持。图2。参考文献26。 相似文献

15.

从基金论文看我国情报学的发展 总被引：2，自引：0，他引：2

常李艳华薇娜《新世纪图书馆》2009,(1)

论文收集整理了我国1989年到2007年间的情报学基金论文,并从基金论文的数量、基金级别、论文主题、论文作者、作者间的合作等方面进行多角度的分析,以揭示我国情报学研究的现状,探讨我国情报学研究的趋向. 相似文献

16.

加拿大联邦政府电子文件管理策略分析 总被引：1，自引：0，他引：1

马林青《档案学研究》2010,(6)

文章首先介绍了加拿大图书档案馆(LAC)在联邦政府电子文件管理中的职责、与政府信息主管部门的关系;然后,从政府机构内电子文件保管、LAC接收保存电子文件两个相承接的角度,分析LAC在履行作为政府机构电子文件永久保存地的职责中遇到的困难和挑战,以及采取的应对思路和对策。相似文献

17.

Can we automate expert-based journal rankings? Analysis of the Finnish publication indicator

《Journal of Informetrics》2020,14(2):101008

The publication indicator of the Finnish research funding system is based on a manual ranking of scholarly publication channels. These ranks, which represent the evaluated quality of the channels, are continuously kept up to date and thoroughly reevaluated every four years by groups of nominated scholars belonging to different disciplinary panels. This expert-based decision-making process is informed by available citation-based metrics and other relevant metadata characterizing the publication channels. The purpose of this paper is to introduce various approaches that can explain the basis and evolution of the quality of publication channels, i.e., ranks. This is important for the academic community, whose research work is being governed using the system. Data-based models that, with sufficient accuracy, explain the level of or changes in ranks provide assistance to the panels in their multi-objective decision making, thus suggesting and supporting the need to use more cost-effective, automated ranking mechanisms. The analysis relies on novel advances in machine learning systems for classification and predictive analysis, with special emphasis on local and global feature importance techniques. 相似文献

18.

基于文本挖掘的不同购物网站商品评论一致性研究

施国良石桥峰《现代图书情报技术》2011,(12):64-68

基于文本挖掘的理论,提出不同购物网站商品评论对比分析的方法,对不同购物网站同一商品评论是否一致进行研究。首先对商品单个特征的评论进行对比分析,然后衍生到商品的整体特征对比。研究发现,不同购物网站对同一商品的评论并不完全一致,这种不一致主要体现在商品特征上面,这说明商品评论会因为购物网站的不同而有所差异。相似文献

19.

Classifying Amharic webnews

Lars Asker Atelach Alemu Argaw Björn Gambäck Samuel Eyassu Asfeha Lemma Nigussie Habte 《Information Retrieval》2009,12(3):416-435

We present work aimed at compiling an Amharic corpus from the Web and automatically categorizing the texts. Amharic is the second most spoken Semitic language in the World (after Arabic) and used for countrywide communication in Ethiopia. It is highly inflectional and quite dialectally diversified. We discuss the issues of compiling and annotating a corpus of Amharic news articles from the Web. This corpus was then used in three sets of text classification experiments. Working with a less-researched language highlights a number of practical issues that might otherwise receive less attention or go unnoticed. The purpose of the experiments has not primarily been to develop a cutting-edge text classification system for Amharic, but rather to put the spotlight on some of these issues. The first two sets of experiments investigated the use of Self-Organizing Maps (SOMs) for document classification. Testing on small datasets, we first looked at classifying unseen data into 10 predefined categories of news items, and then at clustering it around query content, when taking 16 queries as class labels. The second set of experiments investigated the effect of operations such as stemming and part-of-speech tagging on text classification performance. We compared three representations while constructing classification models based on bagging of decision trees for the 10 predefined news categories. The best accuracy was achieved using the full text as representation. A representation using only the nouns performed almost equally well, confirming the assumption that most of the information required for distinguishing between various categories actually is contained in the nouns, while stemming did not have much effect on the performance of the classifier.

Lemma Nigussie HabteEmail:

相似文献

20.

Topology-driven trend analysis for drug discovery

Yanhua Lv Ying Ding Min Song Zhiguang Duan 《Journal of Informetrics》2018,12(3):893-905

The primary goal of the present study is to discover new drug treatments by topology analysis of drug associations and their therapeutic group network. To this end, we collected 19,869 papers dated from 1946 to 2015 that are related to autism treatment from PubMed. We extracted 145 drugs based on MeSH terms and their synonyms (the total number is 6624) within the same ATC classification hierarchy and used them to find drug associations in the collected datasets. We introduced a new topology-driven method that incorporates various network analyses including co-word network, clique percolation, weak component, pathfinding-based analysis of therapeutic groups, and detection of important drug interaction within a clique. The present study showed that the in-depth analysis of the drug relationships extracted from the literature-based network sheds new light on drug discovery research. The results also suggested that certain drugs could be repurposed for autism treatment in the future. In particular, the results indicated that the discovered four drugs such as Tocilizumab, Tacrolimus, Prednisone, and Sulfisoxazole are worthy of further study in laboratory experiments with formal assessment of possible effects on symptoms, which may provide psychologists, physicians, and researchers with data-based scientific hypotheses in autism-drug discovery. 相似文献