首页 | 官方网站   微博 | 高级检索  
     

基于图划分抽样算法的图表示学习
引用本文:夏 鑫,高 品,陈 康,姜进磊.基于图划分抽样算法的图表示学习[J].计算机应用研究,2020,37(9):2586-2590,2599.
作者姓名:夏 鑫  高 品  陈 康  姜进磊
作者单位:清华大学 计算机科学与技术系,北京 100084;腾讯 微信事业群,广东 深圳518057
摘    要:在基于神经网络的图表示算法中,当节点属性维度过高、图的规模过大时,从内存到显存的数据传输会成为训练性能的瓶颈。针对这类问题,该方法将图划分算法应用于图表示学习中,降低了内存访问的I/O开销。该方法根据图节点的度数,将图划分成若干个块,使用显存缓存池存储若干个特征矩阵块。每一轮训练,使用缓存池中的特征矩阵块,以此来减少内存到显存的数据拷贝。针对这一思想,该方法使用基于图划分的抽样算法,设计显存的缓存池来降低内存的访问,运用多级负采样算法,降低训练中负样本采样的时间复杂度。在多个数据集上,与现有方法对比发现,该方法的下游机器学习准确率与原算法基本一致,训练效率可以提高2~ 7倍。实验结果表明,基于图划分的图表示学习能高效训练模型,同时保证节点表示向量的测试效果。今后的课题可以使用严谨的理论证明,阐明图划分模型与原模型的理论误差。

关 键 词:图划分  图表示学习  图抽样  图神经网络
收稿时间:2019/3/14 0:00:00
修稿时间:2020/7/28 0:00:00

Graph representation learning based on graph partition sampling algorithm
Xia Xin,Gao Pin,Chen Kang and Jiang Jinlei.Graph representation learning based on graph partition sampling algorithm[J].Application Research of Computers,2020,37(9):2586-2590,2599.
Authors:Xia Xin  Gao Pin  Chen Kang and Jiang Jinlei
Affiliation:Tsinghua University,Department of Computer Science and Technology,,,
Abstract:When training graph embedding via neural network, high dimension feature vector and large scale graph cause data transferring from memory to GPU to be a bottleneck. Aimed to solve this problem, this paper proposed graph partition based graph representation learning. This method splitted graph nodes into blocks according to their degree. It stored several node feature matrices in buffer pool on GPU. Every epoch, it trained representation during several blocks which fitted into buffer pool to reduce the data transferred from memory to GPU. Based on block split, this method used blocked based sampling algorithm, cached block feature matrix in GPU buffer pool to reduce memory read and built hierarchical negative sampling table, which could sample nodes in const time complex. Compared to related work on several real world datasets, this method achieves competitive accuracy at downstream machine learning task and 2~7 times speedup on training process. The experiments show that graph representation learning based on partition can train model efficiently and generate accurate embedding vectors. In future work, it is worth to prove the deviation between partition based method and original method in theory.
Keywords:graph partition  graph representation learning  graph sampling  graph neural network
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号