期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Off-line isolated handwritten Thai OCR using island-based projection with n-gram model and hidden Markov models

《Information processing & management》2005,41(1):139-160

Many traditional works on off-line Thai handwritten character recognition used a set of local features including circles, concavity, endpoints and lines to recognize hand-printed characters. However, in natural handwriting, these local features are often missing due to rough or quick writing, resulting in dramatic reduction of recognition accuracy. Instead of using such local features, this paper presents a method called multi-directional island-based projection to extract global features from handwritten characters. As the recognition model, two statistical approaches, namely interpolated n-gram model (n-gram) and hidden Markov model (HMM), are proposed. The experimental results indicate that the proposed scheme achieves high accuracy in the recognition of naturally-written Thai characters with numerous variations, compared to some common previous feature extraction techniques. Another experiment with English characters also displays quite promising results. 相似文献

2.

Data filtering-based multi-innovation forgetting gradient algorithms for input nonlinear FIR-MA systems with piecewise-linear characteristics

《Journal of The Franklin Institute》2021,358(18):9818-9840

The piecewise-linear characteristics often appear in the nonlinear systems that operate in different ways in different input regions. This paper studies the identification issue of a class of block-oriented systems with piecewise-linear characteristics. The asymmetric piecewise-linear nonlinearity is expressed as a linear parametric representation through introducing an appropriate switching function, then the identification model of the system is derived by using the key term separation technique. On this model basis, a multi-innovation forgetting gradient algorithm is presented to estimate the unknown parameters. To further enhance the identification accuracy, the filtering identification model of the system is derived by changing the structure of the system without changing the relationship between the input and output. Further, a data filtering-based multi-innovation forgetting gradient algorithm is proposed through the use of the data filtering technique. A simulation example is employed to illustrate that the proposed approaches are effective for parameter estimation and the data filtering-based multi-innovation forgetting gradient algorithm has better estimation performance. 相似文献

3.

亲其师信其道——中职学校写字教学初探

孙日新《科教文汇》2013,(20):123-124

中职学校生源的特殊性以及该年龄段学生心理的特定性给本来教学手段就匮乏的写字教学带来了一定的困难和挑战,笔者在近一年的教学过程中感到了来自课堂、教材、学生及实践经验等各方面不小的压力,在压力下不断地反思、尝试和总结,希望寻找出一些切实可行的方法和措施来改进教学,提高学生书写水平。本文针对学生的书写现状反思教学,着重讨论写字教学中的教学策略和教学设计这一环节,提出了一些设想,希望能更好地促进写字教学。相似文献

4.

模拟现场在本科法医物证学实践教学中的应用

王启燕李明超任峥黄江张红玲《科教文汇》2021,(1)

法医物证学是法医学一门重要的分支应用性学科。随着社会的进步和专业技术的发展,传统法医物证教学培养的人才已不能满足公检法及社会鉴定机构的需求。培养理论基础扎实、综合分析并解决问题能力更强的法医学专业人才,已成为当前法医学教学的重点。实验教学是法医学人才实践能力培养的核心环节。我校法医物证教研室通过与公安机关合作,将模拟刑事案件现场用于法医物证的实践教学中,构建综合性实验教学体系,让学生运用所学专业知识,完成现场勘验,现场生物检材的发现、提取、包装、保存和送检以及实验室检测、出具鉴定报告等,勇于打破专业课间的界限,加强关联学科知识的学习等实验教学体系的改革,取得了较好的教学效果。相似文献

5.

Counterfactual can be strong in medical question and answering

《Information processing & management》2023,60(4):103408

Medical question and answering is a crucial aspect of medical artificial intelligence, as it aims to enhance the efficiency of clinical diagnosis and improve treatment outcomes. Despite the numerous methods available for medical question and answering, they tend to overlook the data generation mechanism’s imbalance and the pseudo-correlation caused by the task’s text characteristics. This pseudo-correlation is due to the fact that many words in the question and answering task are irrelevant to the answer but carry significant weight. These words can affect the feature representation and establish a false correlation with the final answer. Furthermore, the data imbalance mechanism can cause the model to blindly follow a large number of classes, leading to bias in the final answer. Confounding factors, including the data imbalance mechanism, bias due to textual characteristics, and other unknown factors, may also mislead the model and limit its performance.In this study, we propose a new counterfactual-based approach that includes a feature encoder and a counterfactual decoder. The feature encoder utilizes ChatGPT and label resetting techniques to create counterfactual data, compensating for distributional differences in the dataset and alleviating data imbalance issues. Moreover, the sampling prior to label resetting also helps us alleviate the data imbalance issue. Subsequently, label resetting can yield better and more balanced counterfactual data. Additionally, the construction of counterfactual data aids the subsequent counterfactual classifier in better learning causal features. The counterfactual decoder uses counterfactual data compared with real data to optimize the model and help it acquire the causal characteristics that genuinely influence the label to generate the final answer. The proposed method was tested on PubMedQA, a medical dataset, using machine learning and deep learning models. The comprehensive experiments demonstrate that this method achieves state-of-the-art results and effectively reduces the false correlation caused by confounders. 相似文献

6.

PecidRL: Petition expectation correction and identification based on deep reinforcement learning

《Information processing & management》2023,60(3):103285

Identifying petition expectation for government response plays an important role in government administrative service. Although some petition platforms allow citizens to label the petition expectation when they submit e-petitions, the misunderstanding and misselection of petition labels still has necessitated manual classification involved. Automatic petition expectation identification has faced challenges in poor context information, heavy noise and casual syntactic structure of the petition text. In this paper we propose a novel deep reinforcement learning based method for petition expectation (citizens’ demands for the level of government response) correction and identification named PecidRL. We collect a dataset from Message Board for Leaders, the largest official petition platform in China, containing 237,042 petitions. Firstly, we introduce a deep reinforcement learning framework to automatically correct the mislabeled and ambiguous labels of the petitions. Then, multi-view textual features, including word-level and document-level semantic features, sentiment features and different textual graph representations are extracted and integrated to enrich more auxiliary information. Furthermore, based on the corrected petitions, 19 novel petition expectation identification models are constructed by extending 11 popular machine learning models for petition expectation detection. Finally, comprehensive comparison and evaluation are conducted to select the final petition expectation identification model with the best performance. After performing correction by PecidRL, each metric on all extended petition expectation identification models improves by an average of 8.3% with the highest increase ratio reaching 14.2%. The optimal model is determined as Peti-SVM-bert with the highest accuracy 93.66%. We also analyze the petition expectation label variation of the dataset by using PecidRL. We derive that 16.9% of e-petitioners tend to exaggerate the urgency of their petitions to make the government pay high attention to their appeals and 4.4% of the petitions urgency are underestimated. This study has substantial academic and practical value in improving government efficiency. Additionally, a web-server is developed to facilitate government administrators and other researchers, which can be accessed at http://www.csbg-jlu.info/PecidRL/. 相似文献

7.

Question categorization and classification using grammar based approach

Alaa Mohasseb Mohamed Bader-El-Den Mihaela Cocea 《Information processing & management》2018,54(6):1228-1243

Question-answering has become one of the most popular information retrieval applications. Despite that most question-answering systems try to improve the user experience and the technology used in finding relevant results, many difficulties are still faced because of the continuous increase in the amount of web content. Questions Classification (QC) plays an important role in question-answering systems, with one of the major tasks in the enhancement of the classification process being the identification of questions types. A broad range of QC approaches has been proposed with the aim of helping to find a solution for the classification problems; most of these are approaches based on bag-of-words or dictionaries. In this research, we present an analysis of the different type of questions based on their grammatical structure. We identify different patterns and use machine learning algorithms to classify them. A framework is proposed for question classification using a grammar-based approach (GQCC) which exploits the structure of the questions. Our findings indicate that using syntactic categories related to different domain-specific types of Common Nouns, Numeral Numbers and Proper Nouns enable the machine learning algorithms to better differentiate between different question types. The paper presents a wide range of experiments the results show that the GQCC using J48 classifier has outperformed other classification methods with 90.1% accuracy. 相似文献

8.

Textual keyword extraction and summarization: State-of-the-art

Zara Nasar Syed Waqar Jaffry Muhammad Kamran Malik 《Information processing & management》2019,56(6):102088

With the advent of Web 2.0, there exist many online platforms that results in massive textual data production such as social networks, online blogs, magazines etc. This textual data carries information that can be used for betterment of humanity. Hence, there is a dire need to extract potential information out of it. This study aims to present an overview of approaches that can be applied to extract and later present these valuable information nuggets residing within text in brief, clear and concise way. In this regard, two major tasks of automatic keyword extraction and text summarization are being reviewed. To compile the literature, scientific articles were collected using major digital computing research repositories. In the light of acquired literature, survey study covers early approaches up to all the way till recent advancements using machine learning solutions. Survey findings conclude that annotated benchmark datasets for various textual data-generators such as twitter and social forms are not available. This scarcity of dataset has resulted into relatively less progress in many domains. Also, applications of deep learning techniques for the task of automatic keyword extraction are relatively unaddressed. Hence, impact of various deep architectures stands as an open research direction. For text summarization task, deep learning techniques are applied after advent of word vectors, and are currently governing state-of-the-art for abstractive summarization. Currently, one of the major challenges in these tasks is semantic aware evaluation of generated results. 相似文献

9.

Question classification using limited labelled data

《Information processing & management》2022,59(6):103094

Question classification (QC) involves classifying given question based on the expected answer type and is an important task in the Question Answering(QA) system. Existing approaches for question classification use full training dataset to fine-tune the models. It is expensive and requires more time to develop labelled datasets in huge size. Hence, there is a need to develop approaches that can achieve comparable or state of the art performance using limited training instances. In this paper, we propose an approach that uses data augmentation as a tool to generate additional training instances. We evaluate our proposed approach on two question classification datasets namely TREC and ICHI datasets. Experimental results show that our proposed approach reduces the requirement of labelled instances (a) up to 81.7% and achieves new state of the art accuracy of 98.11 on TREC dataset and (b) up to 75% and achieves 67.9 on ICHI dataset. 相似文献

10.

问题导向搜索与创业企业新产品创新绩效———来自苹果App Store 应用平台的证据

问题导向搜索与创业企业新产品创新绩效———来自苹果App Store 应用平台的证据《科学学研究》2021,39(9):1697-1705

创业企业在过去产品失败的情况下如何进行创新策略调整始终是学术界关注的热点问题。本文基于问题导向搜索理论,考察了平台上的创业企业在过去产品失败时,未来产品研发的搜索边界和搜索距离对新产品创新绩效的影响机制,并利用来自苹果APP Store应用商店中获取的2018年5月至2019年2月期间的一组新数据集,借助文本分析的方法进行了实证检验,结果发现：相较于创业初期研发成功的企业,在创业初期研发不成功的企业跨越先前的搜索边界,从事跨市场类别的产品研发更有利于企业下一款新产品创新绩效水平的提高。同时,对于创业初期研发不成功的企业来说,在选定的市场类别中进行非本地搜索更有利于企业新产品创新绩效水平的提升。研究结论旨在丰富平台市场背景下创业企业的问题导向搜索理论,对创业企业在应用平台上的新产品研发策略选择具有重要启示。相似文献

11.

PHQ-aware depressive symptoms identification with similarity contrastive learning on social media

《Information processing & management》2023,60(5):103417

相似文献

12.

An architecture for Malay Tweet normalization

Mohammad Arshi Saloot Norisma IdrisRohana Mahmud 《Information processing & management》2014

Research in natural language processing has increasingly focused on normalizing Twitter messages. Currently, while different well-defined approaches have been proposed for the English language, the problem remains far from being solved for other languages, such as Malay. Thus, in this paper, we propose an approach to normalize the Malay Twitter messages based on corpus-driven analysis. An architecture for Malay Tweet normalization is presented, which comprises seven main modules: (1) enhanced tokenization, (2) In-Vocabulary (IV) detection, (3) specialized dictionary query, (4) repeated letter elimination, (5) abbreviation adjusting, (6) English word translation, and (7) de-tokenization. A parallel Tweet dataset, consisting of 9000 Malay Tweets, is used in the development and testing stages. To measure the performance of the system, an evaluation is carried out. The result is promising whereby we score 0.83 in BLEU against the baseline BLEU, which scores 0.46. To compare the accuracy of the architecture with other statistical approaches, an SMT-like normalization system is implemented, trained, and evaluated with an identical parallel dataset. The experimental results demonstrate that we achieve higher accuracy by the normalization system, which is designed based on the features of Malay Tweets, compared to the SMT-like system. 相似文献

13.

Back to common sense: Oxford dictionary descriptive knowledge augmentation for aspect-based sentiment analysis

《Information processing & management》2023,60(3):103260

相似文献

14.

Arabic abstractive text summarization using RNN-based and transformer-based architectures

《Information processing & management》2023,60(2):103227

Recently, the Transformer model architecture and the pre-trained Transformer-based language models have shown impressive performance when used in solving both natural language understanding and text generation tasks. Nevertheless, there is little research done on using these models for text generation in Arabic. This research aims at leveraging and comparing the performance of different model architectures, including RNN-based and Transformer-based ones, and different pre-trained language models, including mBERT, AraBERT, AraGPT2, and AraT5 for Arabic abstractive summarization. We first built an Arabic summarization dataset of 84,764 high-quality text-summary pairs. To use mBERT and AraBERT in the context of text summarization, we employed a BERT2BERT-based encoder-decoder model where we initialized both the encoder and decoder with the respective model weights. The proposed models have been tested using ROUGE metrics and manual human evaluation. We also compared their performance on out-of-domain data. Our pre-trained Transformer-based models give a large improvement in performance with ～79% less data. We found that AraT5 scores ～3 ROUGE higher than a BERT2BERT-based model that is initialized with AraBERT, indicating that an encoder-decoder pre-trained Transformer is more suitable for summarizing Arabic text. Also, both of these two models perform better than AraGPT2 by a clear margin, which we found to produce summaries with high readability but with relatively lesser quality. On the other hand, we found that both AraT5 and AraGPT2 are better at summarizing out-of-domain text. We released our models and dataset publicly¹^,.² 相似文献

15.

A Contrastive learning-based Task Adaptation model for few-shot intent recognition

《Information processing & management》2022,59(3):102863

Few-shot intent recognition aims to identify user’s intent from the utterance with limited training data. A considerable number of existing methods mainly rely on the generic knowledge acquired on the base classes to identify the novel classes. Such methods typically ignore the characteristics of each meta task itself, resulting in the inability to make full use of limited given samples when classifying unseen classes. To deal with such issues, we propose a Contrastive learning-based Task Adaptation model (CTA) for few-shot intent recognition. In detail, we leverage contrastive learning to help achieve task adaptation and make full use of the limited samples of novel classes. First, a self-attention layer is employed in the task adaptation module, which aims to establish interactions between samples of different categories so that new representations are task-specific rather than relying entirely on the base classes. Then, the contrastive-based loss functions and the semantics of the label name are respectively used for reducing the similarity between sample representations in different categories while increasing it in the same categories. Experimental results on a public dataset OOS verify the effectiveness of our proposal by beating the competitive baselines in terms of accuracy. Besides, we conduct the cross-domain experiments on three datasets, i.e., OOS, SNIPS as well as ATIS. We find that CTA gains obvious improvements in terms of accuracy in all cross-domain experiments, indicating that it has a better generalization ability than other competitive baselines in both cross-domain and single-domain settings. 相似文献

16.

A methodology for character recognition and revision of the linear equations solving procedure

《Information processing & management》2023,60(1):103088

Linear equations are valuable for real-world modeling phenomena involving at least one variable. However, verifying if the procedure followed by a human for solving a linear equation was done correctly is still a complicated matter. In this paper, we propose a methodology for the automatic character recognition and revision of the solving procedure of linear equations with one unknown. First, a camera is used to acquire an image of the handwritten solving procedure. Then, the image is pre-processed, and each character and equation lines are segmented. Subsequently, a convolutional neural network (CNN) is used to conduct the character recognition stage. Finally, a comparison rule is applied to revise the solving procedure. The character recognition was verified on a 2800 image data set (2100 for training and 700 for testing), including the ten digits and four symbols: ×, +, -, /. The revision procedure was tested on a data set with 140 handwritten equations (125 for training and 15 for testing). The results revealed that we recognized handwritten characters with an accuracy of 99%, which is similar to the state-of-the-art. Moreover, our proposal revised the solving procedure with an efficiency of 86.66%. 相似文献

17.

Modified multi-innovation stochastic gradient algorithm for Wiener–Hammerstein systems with backlash

Linwei Li Xuemei Ren Fumin Guo 《Journal of The Franklin Institute》2018,355(9):4050-4075

In this paper, the identification of the Wiener–Hammerstein systems with unknown orders linear subsystems and backlash is investigated by using the modified multi-innovation stochastic gradient identification algorithm. In this scheme, in order to facilitate subsequent parameter identification, the orders of linear subsystems are firstly determined by using the determinant ratio approach. To address the multi-innovation length problem in the conventional multi-innovation least squares algorithm, the innovation updating is decomposed into sub-innovations updating through the usage of multi-step updating technique. In the identification procedure, by reframing two auxiliary models, the unknown internal variables are replaced by using the outputs of the corresponding auxiliary model. Furthermore, the convergence analysis of the proposed algorithm has shown that the parameter estimation error can converge to zero. Simulation examples are provided to validate the efficiency of the proposed algorithm. 相似文献

18.

Exploring key indicators of social identity in the #MeToo era: Using discourse analysis in UGC

《International Journal of Information Management》2020

Recent years have been characterized by the ubiquitous use of social networks as a mean of self and social identity, which offers new opportunities for qualitative and quantitative research in social sciences. The dynamics of interactions on social platforms such as Twitter promote the development of social movements around hashtags, such as #MeToo. According to previous research, this movement has set the beginning of an era. The present study aims to determine the key indicators of social identity in the #MeToo movement in Twitter using textual analysis and sentiment analysis of user-generated content. To this end, we use a cognitive pragmatics point of view to study a corpus of 31.305 tweets. Using the methodological approaches of corpus linguistics (CL) and discourse analysis (DA), we identify keywords, topics, frequency, and n-grams or collocations to understand the social identity of the #MeToo movement. The key indicators of the social identity in the #MeToo Era are validated using association statistical measures of Log-Likelihood and Mutual Information (MI). Our results reveal the polarization of sentiments where UGC is associated with both negative and positive topics. The social identity is particularly strongly correlated with women and the workplace. Finally, regardless the industry or area, these results present a holistic approach to the social identity of #MeToo. 相似文献

19.

基于专利的技术机会识别：深度学习领域的案例分析

杨辰王楚涵陶琬莹《科技管理研究》2021,41(12):172-176

为及时有效地识别潜在技术机会,采用文本挖掘和异常值检测的方法,提出一种基于专利文本的技术机会识别方法.首先采用文本表示模型Doc2vec技术对专利摘要进行建模,以更深层表征文本语义信息;然后利用基于密度的离群值检测算法,识别出具有潜在技术机会的专利方向;最后以深度学习领域潜在技术识别为例,构建专利检索式并收集458条专利文献作为数据集.实证结果总结出4类主题共10个潜在的技术机会,验证了该基于专利的技术机会识别方法的有效性,可为企业相应技术应用、研发和创新提供参考. 相似文献

20.

An empirical study of scientific production: A cross country analysis, 1981-2002

Gustavo A. Crespi Aldo Geuna 《Research Policy》2008,37(4):565-579

This paper presents the results of an econometric approach to examine the determinants of scientific production at cross-country level. The paper aims not to provide accurate and robust estimates of investment elasticities (a doubtful task given the poor quality of the data sources and the modelling problems), but to develop and critically assess the validity of an empirical approach for characterising the production of science and its impact, from a comparative perspective. We employ and discuss the limitations of a production function approach to relate investment inputs to scientific outputs using a sample of 14 countries for which we have information on higher education research and development (HERD). The outputs are taken from the Thomson ISI^® national science indicators (2002) database on published papers and citations. The inputs and outputs for this sample of countries have been recorded for a period of 21 years (1981-2002). A thorough discussion of the data shortcomings is provided. On the basis of this panel dataset we investigate the profile of the time lag between investment in HERD and research output and returns to national investment in science. We devote particular attention to analysing the presence of cross-country spillovers. We show their relevance and underline the international effect of the US system. 相似文献