首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
中文分词任务是自然语言处理的一项基本任务。但基于统计的中文分词方法需要大规模的训练样本,且拥有较差的领域适应性。然而,法律文书涉及众多领域,对大量的语料进行标注需要耗费大量的人力、物力。针对该问题,该文提出了一种基于联合学习的跨领域中文分词方法,该方法通过联合学习将大量的源领域样本辅助目标领域的分词,从而提升分词性能。实验结果表明,在目标领域标注样本较少的条件下,该文方法的中文分词性能明显优于传统方法。  相似文献   

2.
字标注分词方法是当前中文分词领域中一种较为有效的分词方法,但由于中文汉字本身带有语义信息,不同字在不同语境中其含义与作用不同,导致每个字的构词规律存在差异。针对这一问题,提出了一种基于字簇的多模型中文分词方法,首先对每个字进行建模,然后对学习出的模型参数进行聚类分析形成字簇,最后基于字簇重新训练模型参数。实验结果表明,该方法能够有效地发现具有相同或相近构词规律的字簇,很好地区别了同类特征对不同字的作用程度。  相似文献   

3.
上下文是统计语言学中获取语言知识和解决自然语言处理中多种实际应用问题必须依靠的资源和基础。近年来基于字的词位标注的方法极大地提高了汉语分词的性能,该方法将汉语分词转化为字的词位标注问题,当前字的词位标注需要借助于该字的上下文来确定。为克服仅凭主观经验给出猜测结果的不足,采用四词位标注集,使用条件随机场模型研究了词位标注汉语分词中上文和下文对分词性能的贡献情况,在国际汉语分词评测Bakeoff2005的PKU和MSRA两种语料上进行了封闭测试,采用分别表征上文和下文的特征模板集进行了对比实验,结果表明,下文对分词性能的贡献比上文的贡献高出13个百分点以上。  相似文献   

4.
在专业领域分词任务中,基于统计的分词方法的性能受限于缺少专业领域的标注语料,而基于词典的分词方法在处理新词和歧义词方面还有待提高。针对专业领域分词的特殊性,提出统计与词典相结合的分词方法,完善领域词典构建流程,设计基于规则和字表的二次分词歧义消解方法。在工程法领域语料上进行分词实验。实验结果表明,在工程法领域的分词结果准确率为92.08%,召回率为94.26%,F值为93.16%。该方法还可与新词发现等方法结合,改善未登录词的处理效果。  相似文献   

5.
分词和词性标注是中文语言处理的重要技术,广泛应用于语义理解、机器翻译、信息检索等领域。在搜集整理当前分词和词性标注研究与应用成果的基础上,对中文分词和词性标注的基本方法进行了分类和探讨。首先在分词方面,对基于词典的和基于统计的方法进行了详细介绍,并且列了三届分词竞赛的结果;其次在词性标注方面,分别对基于规则的方法和基于统计的方法进行了阐述;接下来介绍了中文分词和词性标注一体化模型相关方法。此外还分析了各种分词和词性标注方法的优点和不足,在此基础上,为中文分词和词性标注的进一步发展提供了建议。  相似文献   

6.
梁喜涛  顾磊 《微机发展》2015,(2):175-180
分词和词性标注是中文语言处理的重要技术,广泛应用于语义理解、机器翻译、信息检索等领域。在搜集整理当前分词和词性标注研究与应用成果的基础上,对中文分词和词性标注的基本方法进行了分类和探讨。首先在分词方面,对基于词典的和基于统计的方法进行了详细介绍,并且列了三届分词竞赛的结果;其次在词性标注方面,分别对基于规则的方法和基于统计的方法进行了阐述;接下来介绍了中文分词和词性标注一体化模型相关方法。此外还分析了各种分词和词性标注方法的优点和不足,在此基础上,为中文分词和词性标注的进一步发展提供了建议。  相似文献   

7.
中文分词是中文信息处理领域的一项关键基础技术。随着中文信息处理应用的发展,专业领域中文分词需求日益增大。然而,现有可用于训练的标注语料多为通用领域(或新闻领域)语料,跨领域移植成为基于统计的中文分词系统的难点。在跨领域分词任务中,由于待分词文本与训练文本构词规则和特征分布差异较大,使得全监督统计学习方法难以获得较好的效果。该文在全监督CRF中引入最小熵正则化框架,提出半监督CRF分词模型,将基于通用领域标注文本的有指导训练和基于目标领域无标记文本的无指导训练相结合。同时,为了综合利用各分词方法的优点,该文将加词典的方法、加标注语料的方法和半监督CRF模型结合起来,提高分词系统的领域适应性。实验表明,半监督CRF较全监督CRF OOV召回率提高了3.2个百分点,F-值提高了1.1个百分点;将多种方法混合使用的分词系统相对于单独在CRF模型中添加标注语料的方法OOV召回率提高了2.9个百分点,F-值提高了2.5个百分点。  相似文献   

8.
近年来,基于神经网络的分词模型在封闭领域文本上取得了很高的性能。然而,在领域移植场景下,即测试数据与训练数据的领域差异较大时,分词的性能会显著下降。该文尝试利用自动获取的弱标注数据来提升领域移植场景下的分词性能。首先,对目前性能最好的BiLSTM-CRF分词模型进行扩展,引入适用于弱标注数据的损失函数;进而提出一种简单有效的数据筛选方法,从海量弱标注数据中筛选和目前领域更相关的数据;最后,该文发现数据预处理和在神经网络中引入传统特征均可以有效提高分词性能。在SIGHAN Bakeoff 2010和ZhuXian标注测试集上的实验结果表明,该文所提方法可有效提升汉语分词领域移植性能,平均F值提高了3.6%。  相似文献   

9.
针对汉语词法分析中分词、词性标注、命名实体识别三项子任务分步处理时多类信息难以整合利用,且错误向上传递放大的不足,该文提出一种三位一体字标注的汉语词法分析方法,该方法将汉语词法分析过程看作字序列的标注过程,将每个字的词位、词性、命名实体三类信息融合到该字的标记中,采用最大熵模型经过一次标注实现汉语词法分析的三项任务。并在Bakeoff2007的PKU语料上进行了封闭测试,通过对该方法和传统分步处理的分词、词性标注、命名实体识别的性能进行大量对比实验,结果表明,三位一体字标注方法的分词、词性标注、命名实体识别的性能都有不同程度的提升,汉语分词的F值达到了96.4%,词性标注的标注精度达到了95.3%,命名实体识别的F值达到了90.3%,这说明三位一体字标注的汉语词法分析性能更优。  相似文献   

10.
用于文本校对的分词与词性标注一体化算法   总被引:1,自引:0,他引:1  
分词和词性标注是中文处理中的一项基本步骤,其性能的好坏很大程度上影响了中文处理的效果.传统上人们使用基于词典的机械分词法,但是,在文本校对处理中的文本错误会恶化这种方法的结果,使之后的查错和纠错就建立在一个不正确的基础上.文中试探着寻找一种适用于文本校对处理的分词和词性标注算法.提出了全切分和一体化标注的思想.试验证明,该算法除了具有较高的正确率和召回率之外,还能够很好地抑制文本错误给分词和词性标注带来的影响.  相似文献   

11.
Reflecting on a feasibility study into archiving social media, this article traces how “events” are defined in various domains and contexts, and employs case studies to analyze key relationships between hashtags and events to provide a critical analysis of how archival events can be constructed out of social events. It provides an overview of the archival and curatorial considerations involved in defining and preserving a social media event, and outlines the technologies developed for the process of collecting, annotating, and preserving social media events. Overall, the article endeavors to reveal how pragmatic considerations, computational approaches and curatorial perspectives shape digital archives and historical narratives.  相似文献   

12.
BBS中信息传播模式的特征分析   总被引:2,自引:1,他引:1       下载免费PDF全文
通过比较传染病传播机制与信息传播机制,提出BBS中的信息传播机制模型。通过对BBS中帖子数量变化规律的建模,分析了BBS中信息传播模式的特征,并使用实际数据说明BBS中的信息传播模式。实验表明:BBS可以吸引大批的用户参与,但用户只对部分话题感兴趣并参与讨论;绝大多数话题(占94.9%)帖子数的增长率先增加再减小直至为0,而少量话题(占5.1%)帖子数的增长率直接减小至0。这些结论有助于认识BBS的信息传播机制,对控制和管理BBS的信息传播有启发意义。  相似文献   

13.
The use of social media to share information, enhance learning, and connect with an online community has grown rapidly over the past 10 years. As social media becomes a more common tool in both formal and informal education, it is imperative to understand how it is used by individuals with disabilities. Through a systematic study of the literature, 215 articles on social media used by individuals with disabilities were selected and 29 selected for in-depth thematic analysis. Six major themes were identified: community, cyberbullying, self-esteem, self-determination, access to technology, and accessibility. To confirm these six categories, we expanded our search, yielding an additional 30 articles, for a total 59 articles reviewed in-depth. Interactions between individuals with disabilities within online communities often had the goal of acquiring knowledge or learning new information. A communities of practice theoretical framework is used to discuss interactions among the elements of social media design, learning, and the building of community by individuals with disabilities.  相似文献   

14.
网络实名制的提出,是为了解决网络匿名性所带来的问题,却又面临实名信息泄露的诟病。造成信息泄露的根源在于实名认证依赖于实名信息。基于社会认证的网络身份模型,依赖社会关系进行身份认证,其利用OSN节点的社会关系构建网络身份,在发挥网络监管作用的同时,避免实名信息的泄漏。模型首先在OSN中依据一定策略选择根节点;然后,采用担保方式进行社会认证;最后,在不依赖实名信息的基础上,构建节点的唯一网络身份SANI。SNAI身份含节点的社会认证信息,具有身份认证和行为溯源的功能。  相似文献   

15.
This paper explores the affordances of social technologies for supporting the construction of a shareable artefact by a group of learners. A qualitative study that captures the use of five different types of social technologies (Facebook, blogs, wikis, Google Documents and Dropbox) in three different classroom settings sheds light on the potentials and challenges of these tools for supporting material exploration, artefact construction and evaluation. Qualitative content analysis of instructors’ field notes, students’ and instructors’ reflections, interviews and focus groups sheds light on the potential of social technologies to transform the activity of learning across a new culture of computational tools. The affordances of social technologies are discussed as well as design principles that need to be followed in these new arenas.  相似文献   

16.
在社会网络的影响的测量在数据采矿社区收到了很多注意。影响最大化指发现尽量利用信息或产品采纳的有影响的用户的过程。在真实设置,在一个社会网络的一个用户的影响能被行动的集合建模(例如,份额,重新鸣叫,注释) 在其出版物以后由网络的另外的用户表现了。就我们的知识而言,在文学的所有建议模型同等地对待这些行动。然而,它是明显的一工具少些比一样的出版的份额影响的一份出版物相似。这建议每个行动有它影响的自己的水平(或重要性) 。在这份报纸,我们建议一个模型(叫的社会基于行动的影响最大化模型, SAIM ) 为在社会网络的影响最大化。在 SAIM,行动没在测量一个个人的影响力量同等地被考虑,并且它由二主要的步组成。在第一步,我们在社会网络计算每个个人的影响力量。这影响力量用 PageRank 从用户行动被计算。在这步的结束,我们得到每个节点被它的影响力量在标记的一个加权的社会网络。在 SAIM 的第二步,我们计算一个新概念说出 influence-BFS 树的使用的有影响的节点的一个最佳的集合。在大规模真实世界、合成的社会网络上进行的实验在计算揭示我们的模型 SAIM 的好表演,在可接受的时间规模,允许信息的最大的传播的有影响的节点的一个最小的集合。  相似文献   

17.
This study explores the relationship between perceived bridging social capital and specific Facebook‐enabled communication behaviors using survey data from a sample of U.S. adults (N=614). We explore the role of a specific set of Facebook behaviors that support relationship maintenance and assess the extent to which demographic variables, time on site, total and “actual” Facebook Friends, and this new measure (Facebook Relationship Maintenance Behaviors) predict bridging social capital. Drawing upon scholarship on social capital and relationship maintenance, we discuss the role of social grooming and attention‐signaling activities in shaping perceived access to resources in one's network as measured by bridging social capital.  相似文献   

18.
Posting behaviour on social networking sites (SNS) has become a method enabling unsatisfied users to vent emotions. Based on social cognition theory (SCT), personal outcome expectations and self-efficacy affect posting behaviour for venting emotions on SNS. However, perceived social support (PSS) may alter the relationships within the SCT model. Thus, this study aimed to explore the moderating effect of PSS on the relationships between variables in the SCT model for venting emotions on SNS. In total, 310 unsatisfied customers in Taiwan were investigated, and structural equation modelling was performed to test the hypotheses. The results indicated that personal outcome expectations and self-efficacy were positively associated with posting behaviour which, in turn, increased venting emotions on SNS. Moreover, PSS moderated the relationships between variables in the SCT model.  相似文献   

19.
Nowadays,more and more users share real-time news and information in micro-blogging communities such as Twitter,Tumblr or Plurk.In these sites,information is shared via a followers/followees social network structure in which a follower will receive all the micro-blogs from the users he/she follows,named followees.With the increasing number of registered users in this kind of sites,finding relevant and reliable sources of information becomes essential.The reduced number of characters present in micro-posts along with the informal language commonly used in these sites make it difficult to apply standard content-based approaches to the problem of user recommendation.To address this problem,we propose an algorithm for recommending relevant users that explores the topology of the network considering different factors that allow us to identify users that can be considered good information sources.Experimental evaluation conducted with a group of users is reported,demonstrating the potential of the approach.  相似文献   

20.
Increasing interactions and engagements in social networks through monetary and material incentives is not always feasible. Some social networks, specifically those that are built on the basis of fairness, cannot incentivize members using tangible things and thus require an intangible way to do so. In such networks, a personalized recommender could provide an incentive for members to interact with other members in the community. Behavior‐based trust models that generally compute social trust values using the interactions of a member with other members in the community have proven to be good for this. These models, however, largely ignore the interactions of those members with whom a member has interacted, referred to as “friendship effects.” Results from social studies and behavioral science show that friends have a significant influence on the behavior of the members in the community. Following the famous Spanish proverb on friendship “Tell Me Your Friends and I Will Tell You Who You Are,” we extend our behavior‐based trust model by incorporating the “friendship effect” with the aim of improving the accuracy of the recommender system. In this article, we describe a trust propagation model based on associations that combines the behavior of both individual members and their friends. The propagation of trust in our model depends on three key factors: the density of interactions, the degree of separation, and the decay of friendship effect. We evaluate our model using a real data set and make observations on what happens in a social network with and without trust propagation to understand the expected impact of trust propagation on the ranking of the members in the recommended list. We present the model and the results of its evaluation. This work is in the context of moderated networks for which participation is by invitation only and in which members are anonymous and do not know each other outside the community. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号