论文标题
法国方言的数据收集和分析
Data Collection and Analysis of French Dialects
论文作者
论文摘要
本文讨论了创建和分析用于数据挖掘和文本分析研究的新数据集,这为利兹大学联合大学方言研究项目做出了贡献。该报告调查了机器学习分类器,以对各个法语国家的法语方言文本进行分类。遵循CRISP-DM方法的步骤,本报告探讨了数据收集过程,数据质量问题和数据转换以进行文本分析。最后,在应用了合适的数据挖掘技术后,讨论了评估方法,最佳总体特征,分类器和结论。
This paper discusses creating and analysing a new dataset for data mining and text analytics research, contributing to a joint Leeds University research project for the Corpus of National Dialects. This report investigates machine learning classifiers to classify samples of French dialect text across various French-speaking countries. Following the steps of the CRISP-DM methodology, this report explores the data collection process, data quality issues and data conversion for text analysis. Finally, after applying suitable data mining techniques, the evaluation methods, best overall features and classifiers and conclusions are discussed.