首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Attention is drawn to a method of implementing data structures in core memory by means of associative links instead of pointers. The properties of associative links are discussed and the way in which they may be exploited in a program for formal differentiation is illustrated. There is a section on microprogramming support for the associative search operations involved.  相似文献   

2.
Spreadsheets, comma separated value files and other tabular data representations are in wide use today. However, writing, maintaining and identifying good formulas for tabular data and spreadsheets can be time-consuming and error-prone. We investigate the automatic learning of constraints (formulas and relations) in raw tabular data in an unsupervised way. We represent common spreadsheet formulas and relations through predicates and expressions whose arguments must satisfy the inherent properties of the constraint. The challenge is to automatically infer the set of constraints present in the data, without labeled examples or user feedback. We propose a two-stage generate and test method where the first stage uses constraint solving techniques to efficiently reduce the number of candidates, based on the predicate signatures. Our approach takes inspiration from inductive logic programming, constraint learning and constraint satisfaction. We show that we are able to accurately discover constraints in spreadsheets from various sources.  相似文献   

3.
National Statistical Agencies routinely disseminate large amount of data. Prior to dissemination these data have to be protected to avoid releasing confidential information. Controlled tabular adjustment (CTA) is one of the available methods for this purpose. CTA formulates an optimization problem that looks for the safe table which is closest to the original one. The standard CTA approach results in a mixed integer linear optimization (MILO) problem, which is very challenging for current technology. In this work we present a much less costly variant of CTA that formulates a multiobjective linear optimization (LO) problem, where binary variables are pre-fixed, and the resulting continuous problem is solved by lexicographic optimization. Extensive computational results are reported using both commercial (CPLEX and XPRESS) and open source (Clp) solvers, with either simplex or interior-point methods, on a set of real instances. Most instances were successfully solved with the LO-CTA variant in less than one hour, while many of them are computationally very expensive with the MILO-CTA formulation. The interior-point method outperformed simplex in this particular application.  相似文献   

4.
Fuzzy rule-based systems are effective tools for acquiring knowledge from data and represent it in a linguistically interpretable form. To achieve interpretability, input features are granulated in fuzzy partitions. A critical design decision is the selection of the granularity level for each input feature. This paper presents an approach, called DC* (Double Clustering with A*), for automatically designing interpretable fuzzy partitions with optimal granularity. DC* is specific for classification problems and is mainly based on a two-stage process: the first stage identifies clusters of multidimensional samples in order to derive class-labeled prototypes; in the second stage the one-dimensional projections of such prototypes are further clustered along each dimension simultaneously, thus minimizing the number of clusters for each feature. Moreover, the resulting one-dimensional clusters provide information to define fuzzy partitions that satisfy a number of interpretability constraints and exhibit variable granularity levels. The fuzzy sets in each partition can be labeled by meaningful linguistic terms and used to represent knowledge in a natural language form. Experimental results on both synthetic and real data show that the derived fuzzy partitions can be exploited to define very compact fuzzy rule-based systems that exhibit high linguistic interpretability and good classification accuracy.  相似文献   

5.
Most methods for mining association rules from tabular data mine simple rules which only use the equality operator “=” in their items. For quantitative attributes, approaches tend to discretize domain values by partitioning them into intervals. Limiting the operator only to “=” results in many interesting frequent patterns that may not be identified. It is obvious that where there is an order between objects, operators such as greater than or less than a given value are as important as the equality operator. This motivates us to extend association rules, from the simple equality operator, to a more general set of operators. We address the problem of mining general association rules in tabular data where rules can have all operators {?, >, ≠, =} in their antecedent part. The proposed algorithm, mining general rules (MGR), is applicable to datasets with discrete-ordered attributes and on quantitative discretized attributes. The proposed algorithm stores candidate general itemsets in a tree structure in such a way that supports of complex itemsets can be recursively computed from supports of simpler itemsets. The algorithm is shown to have benefits in terms of time complexity, memory management and has good potential for parallelization.  相似文献   

6.
为了提升合成表格数据的质量,提出一种简单的方法生成每个类的数据,使用度量损失控制每一类结构化数据的生成,将此方法命名为SCGAN.文章用此方法在二分类问题上进行了尝试.使用三种不同的度量损失在三个真实的数据集上训练生成对抗网络:逐次对每一类数据进行合成,利用合成数据训练分类器模型,使用gmean来评估模型的性能.结果表...  相似文献   

7.
8.
Tsuda  Takao  Sato  Takashi 《Acta Informatica》1983,19(1):13-33
Acta Informatica - Tabular data structures of a relational database, placed in a paged virtual space or in some two-level storage, are discussed. To rearrange, or transpose, those data stored...  相似文献   

9.
知识追踪任务通过建模用户的习题作答序列跟踪其认知状态,进而预测其下一时刻的答题情况,实现对用户知识掌握程度的智能评估.当前知识追踪方法多针对知识点建模,忽略了习题信息建模与用户个性化表征,并且对于预测结果缺乏可解释性.针对以上问题,提出了一个可解释的深度知识追踪框架.首先引入习题的上下文信息挖掘习题与知识点间的隐含关系,得到更有表征能力的习题与知识点表示,缓解数据稀疏问题.接着建模用户答题序列获得其当前知识状态,并以此学习个性化注意力,进而得到当前习题基于用户知识状态的个性化表示.最后,对于预测结果,依据个性化注意力选择一条推理路径作为其解释.相较于现有方法,所提模型不仅取得了更好的预测结果,还能为预测结果提供推理路径层面的解释,体现了其优越性.  相似文献   

10.
11.
In the paper, a domain-specific language of executable specifications is proposed. This language makes it possible to describe models of formalized subject domains in a graphical form, formulate computational problems on these models, and synthesize programs for solving these problems (including parallel ones) based on deductive inference in a special class of proposition calculus.  相似文献   

12.
最优超球体支持向量机(SSLM)是一种典型的黑箱模型,其运行模式不需要考察被研究对象的内部结构和机理,仅利用对象的输入输出数据即能达到认识其功能和作用机制,因此具有响应快、实时性强等优点,但也因此缺乏可解释性和透明性.鉴于此,本文研究从SSLM黑箱模型的输入端加入先验知识的方法,增强其可解释性.本文开发了基于数据的非线性圆形知识挖掘算法以及知识的离散化算法,离散后的数据点不仅包含产生知识的原始数据点,还增加了新的数据点.通过将所挖掘的圆形知识以不等式约束的形式集成至SSLM模型,构造了可解释的SSLM模型(i-SSLM).该模型在训练时要确保知识约束的数据点分类正确,因此对模型结果有一定程度的预知,表明模型具有可解释性;同时,又由于知识的离散化增加了新的数据信息,因此,模型能具有更高的精度. i-SSLM模型的有效性在10组公共样本集和2组实际高炉数据集上得到了验证.  相似文献   

13.
当前,基于协同过滤和隐因子模型的大学生就业推荐方法,仅考虑学生对就业单位单向偏好易导致“能力失配”,且一个用户一次就业的历史记录极易致负样本不可信,影响推荐性能,同时忽略了对推荐结果的可解释性需求.针对此,依据多任务学习的思路,设计并构建了基于互惠性约束的可解释就业推荐方法.其中,引入注意力机制与模糊门机制,提取并自适应聚合学生与就业单位双向的偏好与需求,缓解“能力失配”问题;提出面向就业意图和就业特征的推荐解释方法,满足可解释性需求;提出基于相似度的随机负采样方法,克服负样本不置信问题.在某高校5届毕业生就业真实数据集上的实验结果表明:相比于多个经典和同时代的推荐方法,所提方法在AUC指标上提升超6%,并且通过消融实验验证了所提方法中各模块的有效性.  相似文献   

14.
Statistical agencies collect data from individuals and businesses, and deliver information to the society based on these data. A fundamental feature to consider when releasing information is the “protection” of sensitive values, since too many details could disseminate private information from respondents and therefore violate their rights. Another feature to consider when releasing information is the “utility” to a data user, as a scientist may need this information for research or a politician for making decisions. Clearly the more details there are in the output, the more useful it is, but it is also less protected. This paper discusses a new technique called Enhanced Controlled Tabular Adjustment (ECTA) to ensure that an output is both protected and useful. This technique has been motivated by another approach in the literature of the last decade, and both are compared and evaluated on a set of benchmark instances.  相似文献   

15.
We provide a formalization of data structures used in tabular databases. A tabular computable function is defined. A finite base of the algebra of computable tabular functions is constructed, using the operations multiply, branch, loop, and parallel execution.Translated from Kibernetika, No. 2, pp. 24–29, March–April, 1989.  相似文献   

16.
Tool-path generation from measured data   总被引:4,自引:0,他引:4  
Presented in the paper is a procedure through which 3-axis NC tool-paths (for roughing and finishing) can be directly generated from measured data (a set of point sequence curves). The rough machining is performed by machining volumes of material in a slice-by-slice manner. To generate the roughing tool-path, it is essential to extract the machining regions (contour curves and their inclusion relationships) from each slice. For the machining region extraction, we employ the boundary extraction algorithm suggested by Park and Choi (Comput.-Aided Des. 33 (2001) 571). By making use of the boundary extraction algorithm, it is possible to extract the machining regions with O(n) time complexity, where n is the number of runs. The finishing tool-path can be obtained by defining a series of curves on the CL (cutter location) surface. However, calculating the CL-surface of the measured data involves time-consuming computations, such as swept volume modeling of an inverse tool and Boolean operations between polygonal volumes. To avoid these computational difficulties, we develop an algorithm to calculate the finishing tool-path based on well-known 2D geometric algorithms, such as 2D curve offsetting and polygonal chain intersection algorithms.  相似文献   

17.
A novel technique for automatically generating test data is presented. The technique is based on mutation analysis and creates test data that approximate relative adequacy. It is a fault-based technique that uses algebraic constraints to describe test cases designed to find particular types of faults. A set of tools (collectively called Godzilla) that automatically generates constraints and solves them to create test cases for unit and module testing has been implemented. Godzilla has been integrated with the Mothra testing system and has been used as an effective way to generate test data that kill program mutants. The authors present an initial list of constraints and discuss some of the problems that have been solved to develop the complete implementation of the technique  相似文献   

18.
Automated software test data generation   总被引:3,自引:0,他引:3  
An alternative approach to test-data generation based on actual execution of the program under test, function-minimization methods and dynamic data-flow analysis is presented. Test data are developed for the program using actual values of input variables. When the program is executed, the program execution flow is monitored. If during program execution an undesirable execution flow is observed then function-minimization search algorithms are used to automatically locate the values of input variables for which the selected path is traversed. In addition, dynamic data-flow analysis is used to determine those input variables responsible for the undesirable program behavior, significantly increasing the speed of the search process. The approach to generating test data is then extended to programs with dynamic data structures and a search method based on dynamic data-flow analysis and backtracking is presented. In the approach described, values of array indexes and pointers are known at each step of program execution; this information is used to overcome difficulties of array and pointer handling  相似文献   

19.
Tabular data often refers to data that is organized in a table with rows and columns. We observe that this data format is widely used on the Web and within enterprise data repositories. Tables potentially contain rich semantic information that still needs to be interpreted. The process of extracting meaningful information out of tabular data with respect to a semantic artefact, such as an ontology or a knowledge graph, is often referred to as Semantic Table Interpretation (STI) or Semantic Table Annotation. In this survey paper, we aim to provide a comprehensive and up-to-date state-of-the-art review of the different tasks and methods that have been proposed so far to perform STI. First, we propose a new categorization that reflects the heterogeneity of table types that one can encounter, revealing different challenges that need to be addressed. Next, we define five major sub-tasks that STI deals with even if the literature has mostly focused on three sub-tasks so far. We review and group the many approaches that have been proposed into three macro families and we discuss their performance and limitations with respect to the various datasets and benchmarks proposed by the community. Finally, we detail what are the remaining scientific barriers to be able to truly automatically interpret any type of tables that can be found in the wild Web.  相似文献   

20.
针对聚类中的多视角和可解释的问题,提出多视角生成模型的可解释性聚类算法(interpretable clustering with multi-view generative model, ICMG).ICMG能够产生多个视角的聚类划分,并通过视角的语义信息对聚类结果进行定性和定量地解释.首先,构建一种多视角生成模型(multi-view generative model, MGM),该模型使用贝叶斯程序学习(Bayesian program learning, BPL)和嵌入多视角因素的贝叶斯案例模型(multi-view Bayesian case model, MBCM)生成多个视角.其次,基于视角的匹配度进行聚类得到多种聚类方案.最后使用视角的原型和子空间所附带的语义信息定性和定量地解释聚类结果.实验结果表明:ICMG能够得到多种可解释的聚类结果,相比于传统多视角聚类算法具有较明显的优势.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号