基于令牌和汇总的起源用途信息，用于在Twitter数据上进行探索性数据分析的Python库

论文标题

基于令牌和汇总的起源用途信息，用于在Twitter数据上进行探索性数据分析的Python库

A Python Library for Exploratory Data Analysis on Twitter Data based on Tokens and Aggregated Origin-Destination Information

论文作者

Graff, Mario, Moctezuma, Daniela, Miranda-Jiménez, Sabino, Tellez, Eric S.

论文摘要

Twitter也许是社交媒体更适合研究。它仅需要几个步骤来获取信息，并且有很多图书馆可以在这方面提供帮助。但是，知道是否在Twitter上表达特定事件是一项具有挑战性的任务，需要大量的推文收集。该提案旨在通过开放自2015年12月以来从Twitter获取的一系列处理信息来促进感兴趣的研究人员在Twitter上开采活动的过程。这些事件可能与自然灾害，健康问题和人们的流动性有关，以及其他可以与图书馆提议的研究有关的研究。在此贡献中提出了不同的应用程序，以说明图书馆的能力：对推文中发现的主题的探索性分析，西班牙语方言之间相似性的研究以及对不同国家的出行报告。总而言之，提出的Python库应用于不同的域，并根据言语的频率和阿拉伯语，英语，西班牙语和俄罗斯语言的单词和单词的频率来检索大量信息。以及与200多个国家或地区的旅行数量有关的流动性信息。

Twitter is perhaps the social media more amenable for research. It requires only a few steps to obtain information, and there are plenty of libraries that can help in this regard. Nonetheless, knowing whether a particular event is expressed on Twitter is a challenging task that requires a considerable collection of tweets. This proposal aims to facilitate, to a researcher interested, the process of mining events on Twitter by opening a collection of processed information taken from Twitter since December 2015. The events could be related to natural disasters, health issues, and people's mobility, among other studies that can be pursued with the library proposed. Different applications are presented in this contribution to illustrate the library's capabilities: an exploratory analysis of the topics discovered in tweets, a study on similarity among dialects of the Spanish language, and a mobility report on different countries. In summary, the Python library presented is applied to different domains and retrieves a plethora of information in terms of frequencies by day of words and bi-grams of words for Arabic, English, Spanish, and Russian languages. As well as mobility information related to the number of travels among locations for more than 200 countries or territories.

下载PDF全文

下载文献需遵守相关版权规定

论文标题