首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
ObjectivesExtracting data from publication reports is a standard process in systematic review (SR) development. However, the data extraction process still relies too much on manual effort which is slow, costly, and subject to human error. In this study, we developed a text summarization system aimed at enhancing productivity and reducing errors in the traditional data extraction process.MethodsWe developed a computer system that used machine learning and natural language processing approaches to automatically generate summaries of full-text scientific publications. The summaries at the sentence and fragment levels were evaluated in finding common clinical SR data elements such as sample size, group size, and PICO values. We compared the computer-generated summaries with human written summaries (title and abstract) in terms of the presence of necessary information for the data extraction as presented in the Cochrane review’s study characteristics tables.ResultsAt the sentence level, the computer-generated summaries covered more information than humans do for systematic reviews (recall 91.2% vs. 83.8%, p < 0.001). They also had a better density of relevant sentences (precision 59% vs. 39%, p < 0.001). At the fragment level, the ensemble approach combining rule-based, concept mapping, and dictionary-based methods performed better than individual methods alone, achieving an 84.7% F-measure.ConclusionComputer-generated summaries are potential alternative information sources for data extraction in systematic review development. Machine learning and natural language processing are promising approaches to the development of such an extractive summarization system.  相似文献   

2.
  1. Download : Download high-res image (51KB)
  2. Download : Download full-size image
  相似文献   

3.
Background and aimsMachine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.Materials and methodsThe first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.ResultsThe best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.ConclusionsThese results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.  相似文献   

4.
BackgroundBodyweight related measures (weight, height, BMI, abdominal circumference) are extremely important for clinical care, research and quality improvement. These and other vitals signs data are frequently missing from structured tables of electronic health records. However they are often recorded as text within clinical notes. In this project we sought to develop and validate a learning algorithm that would extract bodyweight related measures from clinical notes in the Veterans Administration (VA) Electronic Health Record to complement the structured data used in clinical research.MethodsWe developed the Regular Expression Discovery Extractor (REDEx), a supervised learning algorithm that generates regular expressions from a training set. The regular expressions generated by REDEx were then used to extract the numerical values of interest.To train the algorithm we created a corpus of 268 outpatient primary care notes that were annotated by two annotators. This annotation served to develop the annotation process and identify terms associated with bodyweight related measures for training the supervised learning algorithm. Snippets from an additional 300 outpatient primary care notes were subsequently annotated independently by two reviewers to complete the training set. Inter-annotator agreement was calculated.REDEx was applied to a separate test set of 3561 notes to generate a dataset of weights extracted from text. We estimated the number of unique individuals who would otherwise not have bodyweight related measures recorded in the CDW and the number of additional bodyweight related measures that would be additionally captured.ResultsREDEx’s performance was: accuracy = 98.3%, precision = 98.8%, recall = 98.3%, F = 98.5%. In the dataset of weights from 3561 notes, 7.7% of notes contained bodyweight related measures that were not available as structured data. In addition 2 additional bodyweight related measures were identified per individual per year.ConclusionBodyweight related measures are frequently stored as text in clinical notes. A supervised learning algorithm can be used to extract this data. Implications for clinical care, epidemiology, and quality improvement efforts are discussed.  相似文献   

5.
6.
PURPOSE: We assessed the current state of commercial natural language processing (NLP) engines for their ability to extract medication information from textual clinical documents. METHODS: Two thousand de-identified discharge summaries and family practice notes were submitted to four commercial NLP engines with the request to extract all medication information. The four sets of returned results were combined to create a comparison standard which was validated against a manual, physician-derived gold standard created from a subset of 100 reports. Once validated, the individual vendor results for medication names, strengths, route, and frequency were compared against this automated standard with precision, recall, and F measures calculated. RESULTS: Compared with the manual, physician-derived gold standard, the automated standard was successful at accurately capturing medication names (F measure=93.2%), but performed less well with strength (85.3%) and route (80.3%), and relatively poorly with dosing frequency (48.3%). Moderate variability was seen in the strengths of the four vendors. The vendors performed better with the structured discharge summaries than with the clinic notes in an analysis comparing the two document types. CONCLUSION: Although automated extraction may serve as the foundation for a manual review process, it is not ready to automate medication lists without human intervention.  相似文献   

7.
Radiological reporting generates a large amount of free-text clinical narratives, a potentially valuable source of information for improving clinical care and supporting research. The use of automatic techniques to analyze such reports is necessary to make their content effectively available to radiologists in an aggregated form. In this paper we focus on the classification of chest computed tomography reports according to a classification schema proposed for this task by radiologists of the Italian hospital ASST Spedali Civili di Brescia. The proposed system is built exploiting a training data set containing reports annotated by radiologists. Each report is classified according to the schema developed by radiologists and textual evidences are marked in the report. The annotations are then used to train different machine learning based classifiers. We present in this paper a method based on a cascade of classifiers which make use of a set of syntactic and semantic features. The resulting system is a novel hierarchical classification system for the given task, that we have experimentally evaluated.  相似文献   

8.
ObjectiveTo compare linear and Laplacian SVMs on a clinical text classification task; to evaluate the effect of unlabeled training data on Laplacian SVM performance.BackgroundThe development of machine-learning based clinical text classifiers requires the creation of labeled training data, obtained via manual review by clinicians. Due to the effort and expense involved in labeling data, training data sets in the clinical domain are of limited size. In contrast, electronic medical record (EMR) systems contain hundreds of thousands of unlabeled notes that are not used by supervised machine learning approaches. Semi-supervised learning algorithms use both labeled and unlabeled data to train classifiers, and can outperform their supervised counterparts.MethodsWe trained support vector machines (SVMs) and Laplacian SVMs on a training reference standard of 820 abdominal CT, MRI, and ultrasound reports labeled for the presence of potentially malignant liver lesions that require follow up (positive class prevalence 77%). The Laplacian SVM used 19,845 randomly sampled unlabeled notes in addition to the training reference standard. We evaluated SVMs and Laplacian SVMs on a test set of 520 labeled reports.ResultsThe Laplacian SVM trained on labeled and unlabeled radiology reports significantly outperformed supervised SVMs (Macro-F1 0.773 vs. 0.741, Sensitivity 0.943 vs. 0.911, Positive Predictive value 0.877 vs. 0.883). Performance improved with the number of labeled and unlabeled notes used to train the Laplacian SVM (pearson’s ρ = 0.529 for correlation between number of unlabeled notes and macro-F1 score). These results suggest that practical semi-supervised methods such as the Laplacian SVM can leverage the large, unlabeled corpora that reside within EMRs to improve clinical text classification.  相似文献   

9.
Most patient care questions raised by clinicians can be answered by online clinical knowledge resources. However, important barriers still challenge the use of these resources at the point of care.ObjectiveTo design and assess a method for extracting clinically useful sentences from synthesized online clinical resources that represent the most clinically useful information for directly answering clinicians’ information needs.Materials and methodsWe developed a Kernel-based Bayesian Network classification model based on different domain-specific feature types extracted from sentences in a gold standard composed of 18 UpToDate documents. These features included UMLS concepts and their semantic groups, semantic predications extracted by SemRep, patient population identified by a pattern-based natural language processing (NLP) algorithm, and cue words extracted by a feature selection technique. Algorithm performance was measured in terms of precision, recall, and F-measure.ResultsThe feature-rich approach yielded an F-measure of 74% versus 37% for a feature co-occurrence method (p < 0.001). Excluding predication, population, semantic concept or text-based features reduced the F-measure to 62%, 66%, 58% and 69% respectively (p < 0.01). The classifier applied to Medline sentences reached an F-measure of 73%, which is equivalent to the performance of the classifier on UpToDate sentences (p = 0.62).ConclusionsThe feature-rich approach significantly outperformed general baseline methods. This approach significantly outperformed classifiers based on a single type of feature. Different types of semantic features provided a unique contribution to overall classification performance. The classifier’s model and features used for UpToDate generalized well to Medline abstracts.  相似文献   

10.
Identification of co-referent entity mentions inside text has significant importance for other natural language processing (NLP) tasks (e.g. event linking). However, this task, known as co-reference resolution, remains a complex problem, partly because of the confusion over different evaluation metrics and partly because the well-researched existing methodologies do not perform well on new domains such as clinical records. This paper presents a variant of the influential mention-pair model for co-reference resolution. Using a series of linguistically and semantically motivated constraints, the proposed approach controls generation of less-informative/sub-optimal training and test instances. Additionally, the approach also introduces some aggressive greedy strategies in chain clustering. The proposed approach has been tested on the official test corpus of the recently held i2b2/VA 2011 challenge. It achieves an unweighted average F1 score of 0.895, calculated from multiple evaluation metrics (MUC, B3 and CEAF scores). These results are comparable to the best systems of the challenge. What makes our proposed system distinct is that it also achieves high average F1 scores for each individual chain type (Test: 0.897, Person: 0.852, Problem: 0.855, Treatment: 0.884). Unlike other works, it obtains good scores for each of the individual metrics rather than being biased towards a particular metric.  相似文献   

11.
ObjectiveDeath certificates are an invaluable source of cancer mortality statistics. However, this value can only be realised if accurate, quantitative data can be extracted from certificates—an aim hampered by both the volume and variable quality of certificates written in natural language. This paper proposes an automatic classification system for identifying all cancer related causes of death from death certificates.MethodsDetailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. The features were used as input to two different classification sub-systems: a machine learning sub-system using Support Vector Machines (SVMs) and a rule-based sub-system. A fusion sub-system then combines the results from SVMs and rules into a single final classification. A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure.ResultsThe system was highly effective at determining the type of cancers for both common cancers (F-measure of 0.85) and rare cancers (F-measure of 0.7). In general, rules performed superior to SVMs; however, the fusion method that combined the two was the most effective.ConclusionThe system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.  相似文献   

12.
Background and significanceSparsity is often a desirable property of statistical models, and various feature selection methods exist so as to yield sparser and interpretable models. However, their application to biomedical text classification, particularly to mortality risk stratification among intensive care unit (ICU) patients, has not been thoroughly studied.ObjectiveTo develop and characterize sparse classifiers based on the free text of nursing notes in order to predict ICU mortality risk and to discover text features most strongly associated with mortality.MethodsWe selected nursing notes from the first 24 h of ICU admission for 25,826 adult ICU patients from the MIMIC-II database. We then developed a pair of stochastic gradient descent-based classifiers with elastic-net regularization. We also studied the performance-sparsity tradeoffs of both classifiers as their regularization parameters were varied.ResultsThe best-performing classifier achieved a 10-fold cross-validated AUC of 0.897 under the log loss function and full L2 regularization, while full L1 regularization used just 0.00025% of candidate input features and resulted in an AUC of 0.889. Using the log loss (range of AUCs 0.889–0.897) yielded better performance compared to the hinge loss (0.850–0.876), but the latter yielded even sparser models.DiscussionMost features selected by both classifiers appear clinically relevant and correspond to predictors already present in existing ICU mortality models. The sparser classifiers were also able to discover a number of informative – albeit nonclinical – features.ConclusionThe elastic-net-regularized classifiers perform reasonably well and are capable of reducing the number of features required by over a thousandfold, with only a modest impact on performance.  相似文献   

13.
The sheer volume of textual information that needs to be reviewed and analyzed in many clinical settings requires the automated retrieval of key clinical and temporal information. The existing natural language processing systems are often challenged by the low quality of clinical texts and do not demonstrate the required performance. In this study, we focus on medical product safety report narratives and investigate the association of the clinical events with appropriate time information. We developed a novel algorithm for tagging and extracting temporal information from the narratives, and associating it with related events. The proposed algorithm minimizes the performance dependency on text quality by relying only on shallow syntactic information and primitive properties of the extracted event and time entities. We demonstrated the effectiveness of the proposed algorithm by evaluating its tagging and time assignment capabilities on 140 randomly selected reports from the US Vaccine Adverse Event Reporting System (VAERS) and the FDA (Food and Drug Administration) Adverse Event Reporting System (FAERS). We compared the performance of our tagger with the SUTime and HeidelTime taggers, and our algorithm’s event-time associations with the Temporal Awareness and Reasoning Systems for Question Interpretation (TARSQI). We further evaluated the ability of our algorithm to correctly identify the time information for the events in the 2012 Informatics for Integrating Biology and the Bedside (i2b2) Challenge corpus. For the time tagging task, our algorithm performed better than the SUTime and the HeidelTime taggers (F-measure in VAERS and FAERS: Our algorithm: 0.86 and 0.88, SUTime: 0.77 and 0.74, and HeidelTime 0.75 and 0.42, respectively). In the event-time association task, our algorithm assigned an inappropriate timestamp for 25% of the events, while the TARSQI toolkit demonstrated a considerably lower performance, assigning inappropriate timestamps in 61.5% of the same events. Our algorithm also supported the correct calculation of 69% of the event relations to the section time in the i2b2 testing set.  相似文献   

14.
15.
Efforts to improve the treatment of congestive heart failure, a common and serious medical condition, include the use of quality measures to assess guideline-concordant care. The goal of this study is to identify left ventricular ejection fraction (LVEF) information from various types of clinical notes, and to then use this information for heart failure quality measurement. We analyzed the annotation differences between a new corpus of clinical notes from the Echocardiography, Radiology, and Text Integrated Utility package and other corpora annotated for natural language processing (NLP) research in the Department of Veterans Affairs. These reports contain varying degrees of structure. To examine whether existing LVEF extraction modules we developed in prior research improve the accuracy of LVEF information extraction from the new corpus, we created two sequence-tagging NLP modules trained with a new data set, with or without predictions from the existing LVEF extraction modules. We also conducted a set of experiments to examine the impact of training data size on information extraction accuracy. We found that less training data is needed when reports are highly structured, and that combining predictions from existing LVEF extraction modules improves information extraction when reports have less structured formats and a rich set of vocabulary.  相似文献   

16.
17.
Being related to the adoption of new beliefs, attitudes and, ultimately, behaviors, analyzing online communication is of utmost importance for medicine. Multiple health care, academic communities, such as information seeking and dissemination and persuasive technologies, acknowledge this need. However, in order to obtain understanding, a relevant way to model online communication for the study of behavior is required. In this paper, we propose an automatic method to reveal process models of interrelated speech intentions from conversations. Specifically, a domain-independent taxonomy of speech intentions is adopted, an annotated corpus of Reddit conversations is released, supervised classifiers for speech intention prediction from utterances are trained and assessed using 10-fold cross validation (multi-class, one-versus-all and multi-label setups) and an approach to transform conversations into well-defined, representative logs of verbal behavior, needed by process mining techniques, is designed. The experimental results show that: (1) the automatic classification of intentions is feasible (with Kappa scores varying between 0.52 and 1); (2) predicting pairs of intentions, also known as adjacency pairs, or including more utterances from even other heterogeneous corpora can improve the predictions of some classes; and (3) the classifiers in the current state are robust to be used on other corpora, although the results are poorer and suggest that the input corpus may not sufficiently capture varied ways of expressing certain speech intentions. The extracted process models of interrelated speech intentions open new views on grasping the formation of beliefs and behavioral intentions in and from speech, but in-depth evaluation of these conversational models is further required.  相似文献   

18.
ObjectivesNamed entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes.MethodsUsing the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed.ResultsBased on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random sampling, the best uncertainty based method saved 42% annotations in words. But the best diversity based method reduced only 7% annotation effort.ConclusionIn the simulated setting, AL methods, particularly uncertainty-sampling based approaches, seemed to significantly save annotation cost for the clinical NER task. The actual benefit of active learning in clinical NER should be further evaluated in a real-time setting.  相似文献   

19.
ObjectiveClinicians pose complex clinical questions when seeing patients, and identifying the answers to those questions in a timely manner helps improve the quality of patient care. We report here on two natural language processing models, namely, automatic topic assignment and keyword identification, that together automatically and effectively extract information needs from ad hoc clinical questions. Our study is motivated in the context of developing the larger clinical question answering system AskHERMES (Help clinicians to Extract and aRrticulate Multimedia information for answering clinical quEstionS).Design and measurementsWe developed supervised machine-learning systems to automatically assign predefined general categories (e.g. etiology, procedure, and diagnosis) to a question. We also explored both supervised and unsupervised systems to automatically identify keywords that capture the main content of the question.ResultsWe evaluated our systems on 4654 annotated clinical questions that were collected in practice. We achieved an F1 score of 76.0% for the task of general topic classification and 58.0% for keyword extraction. Our systems have been implemented into the larger question answering system AskHERMES. Our error analyses suggested that inconsistent annotation in our training data have hurt both question analysis tasks.ConclusionOur systems, available at http://www.askhermes.org, can automatically extract information needs from both short (the number of word tokens <20) and long questions (the number of word tokens >20), and from both well-structured and ill-formed questions. We speculate that the performance of general topic classification and keyword extraction can be further improved if consistently annotated data are made available.  相似文献   

20.
Rapid, automated determination of the mapping of free text phrases to pre-defined concepts could assist in the annotation of clinical notes and increase the speed of natural language processing systems. The aim of this study was to design and evaluate a token-order-specific naïve Bayes-based machine learning system (RapTAT) to predict associations between phrases and concepts. Performance was assessed using a reference standard generated from 2860 VA discharge summaries containing 567,520 phrases that had been mapped to 12,056 distinct Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT) concepts by the MCVS natural language processing system. It was also assessed on the manually annotated, 2010 i2b2 challenge data. Performance was established with regard to precision, recall, and F-measure for each of the concepts within the VA documents using bootstrapping. Within that corpus, concepts identified by MCVS were broadly distributed throughout SNOMED CT, and the token-order-specific language model achieved better performance based on precision, recall, and F-measure (0.95 ± 0.15, 0.96 ± 0.16, and 0.95 ± 0.16, respectively; mean ± SD) than the bag-of-words based, naïve Bayes model (0.64 ± 0.45, 0.61 ± 0.46, and 0.60 ± 0.45, respectively) that has previously been used for concept mapping. Precision, recall, and F-measure on the i2b2 test set were 92.9%, 85.9%, and 89.2% respectively, using the token-order-specific model. RapTAT required just 7.2 ms to map all phrases within a single discharge summary, and mapping rate did not decrease as the number of processed documents increased. The high performance attained by the tool in terms of both accuracy and speed was encouraging, and the mapping rate should be sufficient to support near-real-time, interactive annotation of medical narratives. These results demonstrate the feasibility of rapidly and accurately mapping phrases to a wide range of medical concepts based on a token-order-specific naïve Bayes model and machine learning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号