首页 | 官方网站   微博 | 高级检索  
     


Introduction of a generally applicable method to estimate retrieval of active molecules for similarity searching using fingerprints
Authors:Vogt Martin  Bajorath Jürgen
Affiliation:Department of Life Science Informatics, Bonn‐Aachen International Center for Information Technology, Rheinische Friedrich‐Wilhelms‐Universit?t Bonn, Dahlmannstr. 2, 53113 Bonn, Germany, Fax: (+49)?228‐2699‐341
Abstract:Fingerprints are bit string representations of molecular structure and properties and are among the most widely used computational tools for similarity searching and database screening. Various fingerprint designs are available and their search performance is in general strongly dependent on the compound classes under study and the chemical characteristics of screening databases. Currently, it is not possible to predict the probability of identifying novel hits through fingerprint searching. However, for practical applications, such estimations would be very useful because one might be able, for example, to prioritize fingerprints and compound selection strategies or decide whether or not a similarity search campaign with subsequent experimental evaluation of candidate compounds would be promising at all. We have developed a method that makes it possible to predict the outcome of similarity search calculations using any type of keyed fingerprint. The methodology incorporates bit frequency distributions of reference molecules and the screening database into an information-theoretic function and determines the principally possible recall of active compounds within selection sets of varying size. We calibrate the function on diverse compound classes and accurately predict compound recovery in retrospective virtual screening trials. Furthermore, we correctly predict fingerprint search performance on two experimental high-throughput screening data sets (HTS). Our findings indicate that given a set of reference molecules, a fingerprint, and a screening database, we can readily estimate how likely it will be to retrieve active compounds, without knowledge about the distribution of potential hits in the database.
Keywords:fingerprints  molecular similarity  probabilistic modeling  screening data  virtual screening
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号