首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Abstract: Although data mining and knowledge discovery techniques have recently been used to diagnose human disease, little research has been conducted on disease diagnostic modelling using human gene information. Furthermore, to our knowledge, no study has reported on diagnosis models using single nucleotide polymorphism (SNP) information. A disease diagnosis model using data mining techniques and SNP information should prove promising from a practical perspective as more information on human genes becomes available. Data mining and knowledge discovery techniques can be put to practical use detecting human disease, since a haplotype analysis using high-density SNP markers has gained great attention for evaluating human genes related to various human diseases. This paper explores how data mining and knowledge discovery can be applied to medical informatics using human gene information. As an example, we applied case-based reasoning to a cancer detection problem using human gene information and SNP analysis because case-based reasoning has been applied in medicine relatively less often than other data mining techniques. We propose a modified case-based reasoning method that is appropriate for associated categorical variables to use in detecting gastric cancer.  相似文献   

2.
电子病历文本挖掘研究综述   总被引:1,自引:0,他引:1  
电子病历是医院信息化发展的产物,其中包含了丰富的医疗信息和临床知识,是辅助临床决策和药物挖掘等的重要资源.因此,如何高效地挖掘大量电子病历数据中的信息是一个重要的研究课题.近些年来,随着计算机技术尤其是机器学习以及深度学习的蓬勃发展,对电子病历这一特殊领域数据的挖掘有了更高的要求.电子病历综述旨在通过对电子病历研究现状的分析来指导未来电子病历文本挖掘领域的发展.具体而言,综述首先介绍了电子病历数据的特点和电子病历的数据预处理的常用方法;然后总结了电子病历数据挖掘的4个典型任务(医学命名实体识别、关系抽取、文本分类和智能问诊),并且围绕典型任务介绍了常用的基本模型以及研究人员在任务上的部分探索;最后结合糖尿病和心脑血管疾病2类特定疾病,对电子病历的现有应用场景做了简单介绍.  相似文献   

3.
由于上海市区域医疗健康平台整合了38家三级医院的电子病历,各医院表述同一临床检验指标的多样性和歧义性已严重影响病历挖掘研究。然而现有术语库理论性强,难以覆盖实际临床用语,需要构建融合38家医院的临床检验指标术语库。针对该问题,在模式图定义、知识抽取、知识融合和知识校验4个步骤基础上,提出半自动的术语库构建方案,以上海卫健委制定的医保术语为标准,先构建标准指标术语子库,再利用基于BERT的临床检验指标对齐模型,将38家医院的指标作为同义词归入标准术语。最终形成的指标术语库包含23 495个实体和47 746条事实三元组,可用于病历清洗、病历查询等应用。实验表明,所用指标对齐模型的F1-score可达95.78%,在大肠癌挖掘课题中使用术语库可增加查询记录高达94%。此外,大肠癌相关指标的专病术语库已在dcakb.ecustnlplab.com公开。  相似文献   

4.
研究数据挖掘算法中的Microsoft聚类算法以及其在金融领域的应用。从海量的数据里挖掘出潜在的信息是数据挖掘的主要工作,通过对客户交易信息的过滤和挖掘,建立起为银行更好地提供智能决策和建议数据挖掘商业应用实例系统。系统的客户端开发选择的是Visual Studio.NET 2008,并使用ADOMD.NET对象及Web控件对模型的结果进行输出展示。用户可以应用这个系统通过输入客户的一些个人属性以及办理业务的基本情况,查看所关心的信誉情况、业务的办理趋向、银行开展新业务的趋向等信息。在整个实例系统的构建过程中,对聚类分析模型的挖掘过程进行了详细的分析,促进了数据挖掘的应用实践。  相似文献   

5.
This paper discusses several data mining algorithms and techniques thatwe have developed at the University of Arizona Artificial Intelligence Lab.We have implemented these algorithms and techniques into severalprototypes, one of which focuses on medical information developed incooperation with the National Cancer Institute (NCI) and the University ofIllinois at Urbana-Champaign. We propose an architecture for medicalknowledge information systems that will permit data mining across severalmedical information sources and discuss a suite of data mining tools that weare developing to assist NCI in improving public access to and use of theirexisting vast cancer information collections.  相似文献   

6.
We present a decision support system to let medical doctors analyze important clinical data, like patients medical history, diagnosis, or therapy, in order to detect common patterns of knowledge useful in the diagnosis process. The underlying approach mainly exploits case-based reasoning (CBR), which is useful to extract knowledge from previously experienced cases. In particular, we used sequence data mining to detect common patterns in patients histories and to highlight the effects of medical practices, based on evidence.We also exploited data warehousing techniques, such OLAP queries to let medical doctor analyze diagnosis along several measures, and recent visual data integration approaches and tools to effectively support the complex task of integrating and reconciling data from different medical data sources. In addition, due to massive presence of textual information within the clinical records of many hospitals, text mining techniques have been devised. In particular, we performed lexical analysis of free text in order to extract discriminatory terms and to derive encoded information. Finally, the system provides user friendly mechanisms to manage the protection of confidential medical data.System validation has been performed, mainly focusing on usability issues, by running experiments based on a large database from a primary public hospital.  相似文献   

7.
开源数据库-重症特别护理信息集MIMIC数据库包含了大量的医学数据,自它发布之日起,便得到了众多研究人员的青睐。但低效的挖掘方法很难发现内部的隐含信息,这使得MIMIC数据库得不到很好的利用,造成了资源的浪费。探索新兴的挖掘方法进行知识发现便显得异常重要。文中对围绕MIMIC数据库的各种挖掘方法进行综述,重点阐述了新出现的机器学习和深度学习方法。同时将传统统计学模型与新出现的人工智能技术包括机器学习和深度学习技术进行比较分析。结果发现相比传统的统计学模型,机器学习和深度学习技术在预测病人的早期死亡率、发现疾病影响因素等方面普遍效果更好,这有助于改善医疗质量、帮助医生进行辅助诊断,在一定程度上也减少了病人的医疗费用。  相似文献   

8.
通过分析医疗保险管理信息化深入发展的需求,从技术的角度提出医疗保险信息系统数据整合及数据挖掘的总体解决方案,并对医疗保险信息系统的数据仓库的设计、数据整合的方案以及数据挖掘的技术和应用进行概要的分析和论述,并用关联规则挖掘算法实证研究医保信息挖掘的可能性与必要性。利用编码、解码技术和SQL的聚集函数,实现基于SQL的FP-Growth算法,从而突破机器内存对数据挖掘的处理效率,实现对海量数据挖掘的高效挖掘。  相似文献   

9.
Data mining is a powerful method to extract knowledge from data. Raw data faces various challenges that make traditional method improper for knowledge extraction. Data mining is supposed to be able to handle various data types in all formats. Relevance of this paper is emphasized by the fact that data mining is an object of research in different areas. In this paper, we review previous works in the context of knowledge extraction from medical data. The main idea in this paper is to describe key papers and provide some guidelines to help medical practitioners. Medical data mining is a multidisciplinary field with contribution of medicine and data mining. Due to this fact, previous works should be classified to cover all users’ requirements from various fields. Because of this, we have studied papers with the aim of extracting knowledge from structural medical data published between 1999 and 2013. We clarify medical data mining and its main goals. Therefore, each paper is studied based on the six medical tasks: screening, diagnosis, treatment, prognosis, monitoring and management. In each task, five data mining approaches are considered: classification, regression, clustering, association and hybrid. At the end of each task, a brief summarization and discussion are stated. A standard framework according to CRISP-DM is additionally adapted to manage all activities. As a discussion, current issue and future trend are mentioned. The amount of the works published in this scope is substantial and it is impossible to discuss all of them on a single work. We hope this paper will make it possible to explore previous works and identify interesting areas for future research.  相似文献   

10.
空间数据挖掘及其与智能系统的集成框架   总被引:4,自引:1,他引:4  
空间数据挖掘是指从空间数据库中抽取隐含的知识、空间关系和非显式地存储在空间数据库 中有意义的特征或模式.它在遥感、地理信息系统、医疗影像、信息融合系统等领域具有广 阔的应用前景,因此日渐受到关注和重视.本文从知识发现、认知科学与智能系统交叉结合的 角度,提出了基于数据库和知识库双库协同机制的空间数据挖掘模型,并系统地介绍了从空间 数据库中可发现的知识类型及挖掘方法,然后提出了基于空间数据挖掘的新型智能系统总体 框架和系统开发基本原则,最后探讨了空间数据挖掘的发展方向.  相似文献   

11.
Nowadays, many healthcares are generating and collecting a huge amount of medical data. Due to the difficulty of analyzing this massive volume of data using traditional methods, medical data mining on Electronic Health Record (EHR) has been a major concern in medical research. Therefore, it is necessary to assess EHR architectures based on the capabilities of extracting useful medical knowledge from a huge amount of EHR databases. In this paper, we develop a bi-level interactive decision support framework to identify data mining-oriented EHR architectures. The contribution of this bi-level framework is fourfold: (1) it develops Interactive Simple Additive Weighting (ISAW) model from an individual single-level environment to a group bi-level environment; (2) it utilizes decision makers’ preferences gradually in the course of interactions to reach to a consensus on an data mining-oriented EHR architecture; (3) it considers fuzzy logic and fuzzy sets to represent ambiguous, uncertain or imprecise information; and (4) it synthesizes a representative outcome based on qualitative and quantitative indicators in the EHR assessment process. A case study demonstrates the applicability of the proposed bi-level interactive framework for benchmarking a national data mining-oriented EHR.  相似文献   

12.
如何有效地挖掘和利用医疗活动过程中所产生的大量医学信息资源,使之有效地服务于医学的科学研究和临床诊断是医学信息数据挖掘所面临的主要问题.本文首先介绍目前医学数据挖掘研究的特点和现状,以及其区别于其它一般数据挖掘方法的地方;然后探讨粗糙集理论在医学乳腺X片数据挖掘中的应用,最后分析和总结实验中涉及的一些关键问题,并对未来...  相似文献   

13.
This paper examines the problem of prioritizing actions under uncertainty. Our motivating applications come from the domain of data mining. Data mining problems present the user with a huge collection of individual items (e.g., abstracts, medical histories, and computer users' command histories) and require that these items be prioritized according to which should be pursued thoroughly. More precisely, each data item is assumed to be generated by one of two processes: A large majority of the data comes from a common, mundane process and a very small fraction comes from a rare, phenomenon process. The problem is to rank the information so as to optimally direct the user in his or her pursuit of the data items that were generated by the phenomenon process. Our previous work has developed the theoretical foundations of the information prioritization problem. The current paper summarizes these foundations, derives new theoretical results, and details initial experimental results of a prioritization system based on the theory. We focus here on feature selection techniques and the method of model surrogates, each tailored to the classes of prioritization applications of greatest current interest. Our results demonstrate the effectiveness of the techniques and motivate further research to improve the existing system  相似文献   

14.
基于粗糙集的医疗数据挖掘研究与应用   总被引:1,自引:0,他引:1       下载免费PDF全文
医疗数据挖掘能够对现有病历数据库中数据进行自动分析并且提供有价值的医学知识。针对临床病历数据库中存在大量重复样本和冗余属性,从而影响医疗诊断的精度和速度这一问题,建立了基于信息论的粗糙集理论模型和SQL语言之间的关系,提出了基于SQL语言的条件信息熵属性约简算法,利用数据库查询语言实现了数据清洗、求核和属性约简等过程。实验结果表明该算法实现简单,运行效率高,为粗糙集理论更广泛地应用于具体的医疗数据挖掘提供了一种方法。  相似文献   

15.
The availability of a large amount of medical data leads to the need of intelligent disease prediction and analysis tools to extract hidden information. A large number of data mining and statistical analysis tools are used for disease prediction. Single data‐mining techniques show acceptable level of accuracy for heart disease diagnosis. This article focuses on prediction and analysis of heart disease using weighted vote‐based classifier ensemble technique. The proposed ensemble model overcomes the limitations of conventional data‐mining techniques by employing the ensemble of five heterogeneous classifiers: naive Bayes, decision tree based on Gini index, decision tree based on information gain, instance‐based learner, and support vector machines. We have used five benchmark heart disease data sets taken from UCI repository. Each data set contains different set of feature space that ultimately leads to the prediction of heart disease. The effectiveness of proposed ensemble classifier is investigated by comparing the performance with different researchers' techniques. Tenfold cross‐validation is used to handle the class imbalance problem. Moreover, confusion matrices and analysis of variance statistics are used to show the prediction results of all classifiers. The experimental results verify that the proposed ensemble classifier can deal with all types of attributes and it has achieved the high diagnosis accuracy of 87.37%, sensitivity of 93.75%, specificity of 92.86%, and F‐measure of 82.17%. The F‐ratio higher than the F‐critical and p‐value less than 0.01 for a 95% confidence interval indicate that the results are statistically significant for all the data sets.  相似文献   

16.
数据挖掘在医学信息系统中的应用   总被引:2,自引:0,他引:2  
在介绍了数据挖掘技术的定义、功能和方法的基础上,结合医学数据的特点,归纳总结了数据挖掘技术在医学信息系统中的应用,为建设智能化信息系统提供了思路。  相似文献   

17.
针对流行病学研究的特点,论文提出计算机辅助医学数据挖掘系统构架,以糖尿病并发症为研究实例,探讨医学数据的冗余性消除、规范化储存、知识归纳及可视化表达等问题。以天津总医院3022例普查数据为研究对象,尝试解决用计算机实现糖尿病并发症这类定性数据的定量化数据挖掘和知识发现。通过对于43种并发症的定性数据挖掘,可以发现诸如高血脂、冠心病、高血压、脑血管病等具有明显并发倾向的知识规则18条。同时,采用知识树方式和决策树等方法实现知识规则的可视化表达。基于数据挖掘和知识发现计算机辅助医学数据挖掘系统能够对现有病历数据库中数据进行自动分析并且提供有价值医学知识,特别适合流行病学分析和全民健康评估,因此与社区医疗和医院HIS系统结合是未来一个非常现实的发展方向。  相似文献   

18.
电子病历挖掘(EPRM)指的是在电子病历数据库中提取有用的医疗信息,并挖掘隐含其中医学诊断规则和模式,为疾病诊断和治疗提供科学的、准确的辅助决策等。在研究粗糙集和概念格基本理论的基础上,结合电子病历数据库中医学数据的特征,提出了基于粗糙概念格电子病历挖掘模型设计方法,该模型采用条件熵对病历大量属性进行约简和粗糙决策规则格的构造算法(EPRM),实验表明该模型在决策规则挖掘效率、运行速度和适应性等方面都具有较好的性能。  相似文献   

19.
目前临床病例异常检测的研究主要采用病症关联、费用控制和临床序列模式挖掘等方法,对无症状信息、无完整临床行为时间等临床数据仍具有一定的局限性.根据这一类临床数据特点,提出了基于模式识别的CC-FR模型,该模型采用频繁模式挖掘的方法确定单病种隶属函数,通过隶属函数中的频繁模式与待检测临床病例相匹配得到检测结果.实验结果表明,该模型可以有效的检测临床病例异常性,在临床医疗中起到监督和警示的作用.  相似文献   

20.
电子病历系统广泛应用于医院的管理系统中,由于职业的特殊性,医生在录入病历时不可能像专业人员那样快速录入,论文研究了帮助医生快速录入病历信息的方法,并基于文本挖掘技术提出了一种电子病历书写辅助系统,该系统利用数据挖掘技术对病历中的常用信息进行挖掘,为不同类型的病历构建不同的词库,并利用拼音首字母缩写代替汉字输入来加快病历的录入速度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号