一种新颖的多步有限状态自动机，用于任意确定性的TSETLIN机器学习

论文标题

一种新颖的多步有限状态自动机，用于任意确定性的TSETLIN机器学习

A Novel Multi-Step Finite-State Automaton for Arbitrarily Deterministic Tsetlin Machine Learning

论文作者

Abeyrathna, K. Darshana, Granmo, Ole-Christoffer, Shafik, Rishad, Yakovlev, Alex, Wheeldon, Adrian, Lei, Jie, Goodwin, Morten

论文摘要

由于深度学习的高能量消耗和可伸缩性挑战，因此将研究重点转移到应对能耗限制方面的迫切需要。 TSETLIN机器（TMS）是一种最近的机器学习方法，与神经网络相比，能量使用量显着降低，同时在几个基准测试方面进行了竞争精度。但是，TMS在很大程度上依赖能量成本的随机数生成来随机地指导Tsetlin Automata团队达到TM游戏的NASH平衡。在本文中，我们提出了一种新颖的有限国家学习自动机，可以替代TSTETLIN自动机在TM学习中，以增加确定性。新的自动机使用多步确定性状态跳跃来增强子图案。同时，翻转硬币以跳过每一个$ d $'的状态更新，确保通过随机分配多样化。因此，$ d $ - 参数可以很好地控制随机化的程度。例如，$ d = 1 $使每个更新随机，$ d = \ infty $使自动机完全确定性。我们的经验结果表明，总体而言，只有大量的确定性降低了准确性。能源，随机数生成构成了TM的切换能量消耗，为具有高$ d $值的较大数据集节省了高达11兆瓦的功率。因此，我们可以使用新的$ d $参数来对能量消耗进行权衡，以促进低能机器学习。

Due to the high energy consumption and scalability challenges of deep learning, there is a critical need to shift research focus towards dealing with energy consumption constraints. Tsetlin Machines (TMs) are a recent approach to machine learning that has demonstrated significantly reduced energy usage compared to neural networks alike, while performing competitively accuracy-wise on several benchmarks. However, TMs rely heavily on energy-costly random number generation to stochastically guide a team of Tsetlin Automata to a Nash Equilibrium of the TM game. In this paper, we propose a novel finite-state learning automaton that can replace the Tsetlin Automata in TM learning, for increased determinism. The new automaton uses multi-step deterministic state jumps to reinforce sub-patterns. Simultaneously, flipping a coin to skip every $d$'th state update ensures diversification by randomization. The $d$-parameter thus allows the degree of randomization to be finely controlled. E.g., $d=1$ makes every update random and $d=\infty$ makes the automaton completely deterministic. Our empirical results show that, overall, only substantial degrees of determinism reduces accuracy. Energy-wise, random number generation constitutes switching energy consumption of the TM, saving up to 11 mW power for larger datasets with high $d$ values. We can thus use the new $d$-parameter to trade off accuracy against energy consumption, to facilitate low-energy machine learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题