论文标题
Recten:一种递归层次低等级张量分解方法,以发现多模式数据中的层次模式
RecTen: A Recursive Hierarchical Low Rank Tensor Factorization Method to Discover Hierarchical Patterns in Multi-modal Data
论文作者
论文摘要
我们如何扩展张量分解以自适应方式揭示多模式数据的层次结构?当前的张量分解仅提供一层簇。我们认为,如今,凭借大量的多模式数据和随时间发展的网络,识别新兴层次结构的能力很重要。为此,我们提出了Recten,这是一种基于张量分解的递归层次软聚类方法。我们的方法使我们能够:(a)递归分解上一步中确定的簇,(b)确定终止此过程的正确条件。在没有适当地面真理的情况下,我们通过合成数据评估了我们的方法,并测试了其对不同参数的敏感性。我们还将Recten应用于五个真实数据集,其中涉及在线讨论平台(例如安全论坛)中用户的活动。该分析有助于我们揭示具有有趣行为的用户群,包括但不限于早期发现某些真实事件,例如勒索软件爆发,解密工具的黑市的出现以及浪漫史上的骗子。为了最大程度地提高我们方法的实用性,我们开发了一种工具,可以通过识别层次结构来帮助数据分析师和整体研究社区。 Recten是一种无监督的方法,可用于采用大型多模式数据的脉冲,并让数据本身发现其自己的隐藏结构。
How can we expand the tensor decomposition to reveal a hierarchical structure of the multi-modal data in a self-adaptive way? Current tensor decomposition provides only a single layer of clusters. We argue that with the abundance of multimodal data and time-evolving networks nowadays, the ability to identify emerging hierarchies is important. To this effect, we propose RecTen, a recursive hierarchical soft clustering approach based on tensor decomposition. Our approach enables us to: (a) recursively decompose clusters identified in the previous step, and (b) identify the right conditions for terminating this process. In the absence of proper ground truth, we evaluate our approach with synthetic data and test its sensitivity to different parameters. We also apply RecTen on five real datasets which involve the activities of users in online discussion platforms, such as security forums. This analysis helps us reveal clusters of users with interesting behaviors, including but not limited to early detection of some real events like ransomware outbreaks, the emergence of a blackmarket of decryption tools, and romance scamming. To maximize the usefulness of our approach, we develop a tool which can help the data analysts and overall research community by identifying hierarchical structures. RecTen is an unsupervised approach which can be used to take the pulse of the large multi-modal data and let the data discover its own hidden structures by itself.