结合L1和L2正则化约束的隐语义预测模型研究 Latent Factor Prediction Model Combining L_1 and L_2 Regularization Constraints期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

结合L1和L2正则化约束的隐语义预测模型研究

引用本文：	王德贤,何先波,贺春林,周坤,陈敏治.结合L1和L2正则化约束的隐语义预测模型研究[J].计算机工程与应用,2019,55(19):121-127.

作者姓名：	王德贤何先波贺春林周坤陈敏治

作者单位：	西华师范大学计算机学院，四川南充 637000

摘要：	在大数据领域中预测高维稀疏矩阵中的缺失数据，通常采用随机梯度下降算法构造隐语义模型来对缺失数据进行预测。在随机梯度下降算法来求解模型的过程中经常加入正则化项来提高模型的性能，由于L1]正则化项不可导，目前在隐语义模型中主要通过加入L2]正则化项来构建隐语义模型（SGD_LF）。但因为L1]正则化项能提高模型的稀疏性增强模型求解能力，因此提出一种基于L1]和L2]正则化约束的隐语义（SPGD_LF）模型。在通过构建目标函数时，同时引入L1]和L2]正则化项。由于目标函数满足利普希茨条件，并通过二阶的泰勒展开对目标函数进行逼近，构造出随机梯度下降的求解器，在随机梯度下降求解隐语义模型的过程中通过软阈值来处理L1]正则化项所对应的边界优化问题。通过此优化方案，可以更好地表达目标矩阵中的已知数据在隐语义空间中的特征和对应的所属社区关系，提高了模型的泛化能力。通过在大型工业数据集上的实验表明，SPGD_LF模型的预测精度、稀疏性和收敛速度等性能都有显著提高。
关键词：	大数据应用高维稀疏矩阵隐语义
Latent Factor Prediction Model Combining L_1 and L_2 Regularization Constraints

WANG Dexian,HE Xianbo,HE Chunlin,ZHOU Kun,CHEN Minzhi.Latent Factor Prediction Model Combining L_1 and L_2 Regularization Constraints[J].Computer Engineering and Applications,2019,55(19):121-127.

Authors:	WANG Dexian HE Xianbo HE Chunlin ZHOU Kun CHEN Minzhi

Affiliation:	School of Computer Science, China West Normal University, Nanchong, Sichuan 637000 China

Abstract:	LF model is usually built by SGD method and it’s used to predict the missing data of high-dimensional sparse matrix in big data field. LF model need to integrate regularization terms to improve its performance. Due to L1] regularization term is non-differentiable, normally integrates L2] regularization term into an LF model only. However, the L1] regularization normal can improve the sparsity and solving ability of LF model. To solve the issue, this paper proposes a SPGD_LF model that simultaneously integrates both L1] and L2] regularization terms in to an LF model. Since the objective function satisfies the Lipschitz condition and approximates the objective function by second-order Taylor expansion, a solver for stochastic gradient descent is constructed. In the process of stochastic gradient descent, the soft threshold process deals with the boundary optimization problem corresponding to the L1] regularization term and solves the implicit semantic model. Through this optimization scheme, the characteristics of the known data in the target matrix in the latent factor space and the corresponding community relationship can be better expressed, and the generalization ability of the model is improved. Empirical studies on two datasets from industrial applications and the results show that the prediction accuracy, sparsity and convergence rate of SPGD_LF model are improved significantly.

Keywords:	big data application high-dimensional and sparse matrix latent factor

	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏