用MPI和GPU计数单词频率的MapReduce

论文标题

用MPI和GPU计数单词频率的MapReduce

MapReduce for Counting Word Frequencies with MPI and GPUs

论文作者

Kavi, Nithin

论文摘要

在这个项目中，目标是使用朱莉娅编程语言和并行化来编写快速地图降低算法，以计算大量文档中的单词频率。我们首先使用带有MPI的两个过程在CPU上实现了频率计数器算法。然后，我们创建了另一个实现，但是在使用Julia Cuda库的GPU上创建了一个实现，尽管不使用In hond Map降低foldscuda.jl中的算法。执行此操作后，我们将CPU和GPU算法应用于总统乔治·W·布什（George W Bush），巴拉克·H·奥巴马（Barack H Obama），唐纳德·J·特朗普（Donald J Trump）和约瑟夫·R·比登（Joseph r Biden）的演讲中的单词频率，目的是在选择中找到可以独特地识别的单词选择中的模式。我们发现，每位总统确实有某些词，它们比同伴的使用频率更明显，而且考虑到当时的政治气氛，这些话并不奇怪。该项目的目的是在CPU和GPU上在Julia中创建更快的MapReduce算法，而不是先前已经编写的算法。我们提供了一些简单的映射功能案例，其中我们的GPU算法优于朱莉娅的foldscuda实现。在计算文档中的单词频率以及这些特定的映射功能的情况下，我们还讨论了进一步优化的想法。

In this project, the goal was to use the Julia programming language and parallelization to write a fast map reduce algorithm to count word frequencies across large numbers of documents. We first implement the word frequency counter algorithm on a CPU using two processes with MPI. Then, we create another implementation, but on a GPU using the Julia CUDA library, though not using the in built map reduce algorithm within FoldsCUDA.jl. After doing this, we apply our CPU and GPU algorithms to count the frequencies of words in speeches given by Presidents George W Bush, Barack H Obama, Donald J Trump, and Joseph R Biden with the aim of finding patterns in word choice that could be used to uniquely identify each President. We find that each President does have certain words that they use distinctly more often than their fellow Presidents, and these words are not surprising given the political climate at the time. The goal of this project was to create faster MapReduce algorithms in Julia on the CPU and GPU than the ones that have already been written previously. We present some simple cases of mapping functions where our GPU algorithm outperforms Julia's FoldsCUDA implementation. We also discuss ideas for further optimizations in the case of counting word frequencies in documents and for these specific mapping functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题