首页 | 官方网站   微博 | 高级检索  
     

Probabilistic outlier detection for sparse multivariate geotechnical site investigation data using Bayesian learning
作者姓名:Shuo Zheng  Yu-Xin Zhu  Dian-Qing Li  Zi-Jun Cao  Qin-Xuan Deng  Kok-Kwang Phoon
作者单位:State Key Laboratory of Water Resources and Hydropower Engineering Science;Department of Civil and Environmental Engineering
基金项目:supported by the National Key R&D Program of China(Project No.2016YFC0800200);the NRF-NSFC 3rd Joint Research Grant(Earth Science)(Project No.41861144022);the National Natural Science Foundation of China(Project Nos.51679174,and 51779189);the Shenzhen Key Technology R&D Program(Project No.20170324);The financial support is grateful acknowledged。
摘    要:Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse multivariate data obtained from geotechnical site investigation,it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity.This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation.The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5.It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts,rationally,for the statistical uncertainty by Bayesian machine learning.Moreover,the proposed approach also suggests an exclusive method to determine outlying components of each outlier.The proposed approach is illustrated and verified using simulated and real-life dataset.It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner.It can significantly reduce the masking effect(i.e.,missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification.This emphasizes the necessity of data cleaning process(e.g.,outlier detection)for uncertainty quantification based on geoscience data.

关 键 词:Outlier  detection  Site  investigation  Sparse  multivariate  data  Mahalanobis  distance  Resampling  by  half-means  Bayesian  machine  learning
收稿时间:1 October 2019

Probabilistic outlier detection for sparse multivariate geotechnical site investigation data using Bayesian learning
Shuo Zheng,Yu-Xin Zhu,Dian-Qing Li,Zi-Jun Cao,Qin-Xuan Deng,Kok-Kwang Phoon.Probabilistic outlier detection for sparse multivariate geotechnical site investigation data using Bayesian learning[J].Geoscience Frontiers,2021,12(1):425-439.
Authors:Shuo Zheng  Yu-Xin Zhu  Dian-Qing Li  Zi-Jun Cao  Qin-Xuan Deng  Kok-Kwang Phoon
Affiliation:State Key Laboratory of Water Resources and Hydropower Engineering Science, Institute of Engineering Risk and Disaster Prevention, Wuhan University, 299 Bayi Road,Wuhan 430072, China;Department of Civil and Environmental Engineering National University of Singapore, Blk E1A, #07-03, 1 Engineering Drive 2, Singapore 117576, Singapore
Abstract:Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances (i.e., outliers) that do not conform with the expected pattern of regular data instances. With sparse multivariate data obtained from geotechnical site investigation, it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity. This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation. The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5. It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts, rationally, for the statistical uncertainty by Bayesian machine learning. Moreover, the proposed approach also suggests an exclusive method to determine outlying components of each outlier. The proposed approach is illustrated and verified using simulated and real-life dataset. It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner. It can significantly reduce the masking effect (i.e., missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty). It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification. This emphasizes the necessity of data cleaning process (e.g., outlier detection) for uncertainty quantification based on geoscience data.
Keywords:Outlier detection  Site investigation  Sparse multivariate data  Mahalanobis distance  Resampling by half-means  Bayesian machine learning
本文献已被 维普 万方数据 ScienceDirect 等数据库收录!
点击此处可从《地学前缘(英文版)》浏览原始摘要信息
点击此处可从《地学前缘(英文版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号