论文标题

机器学习和数据科学方法采用趋势和预测因素分析美国CDC死亡率数据

Machine Learning and Data Science approach towards trend and predictors analysis of CDC Mortality Data for the USA

论文作者

Nadeem, Yasir, Ahmed, Awais

论文摘要

对于任何结论从提供的数据和条件驱动的任何国家,死亡率的研究是一个积极的研究领域。域知识是必不可少的,但不是强制性技能(尽管仍然需要一些知识),以便使用机器学习和数据科学实践得出基于数据直觉的结论。进行该项目的目的是根据提供的数据集的统计数据得出结论,并使用受监督或无监督的学习算法预测数据集的标签。该研究得出(基于样本)预期寿命,而不论性别及其中心趋势。人民的婚姻状况还影响了他们每个人的死亡频率。该研究还有助于发现,由于更多的分类和数值数据,异常检测或采样不足可能是一个可行的解决方案,因为有可能比其他标签更多的类标签。该研究表明,机器学习预测对数据并不像显而易见。

The research on mortality is an active area of research for any country where the conclusions are driven from the provided data and conditions. The domain knowledge is an essential but not a mandatory skill (though some knowledge is still required) in order to derive conclusions based on data intuition using machine learning and data science practices. The purpose of conducting this project was to derive conclusions based on the statistics from the provided dataset and predict label(s) of the dataset using supervised or unsupervised learning algorithms. The study concluded (based on a sample) life expectancy regardless of gender, and their central tendencies; Marital status of the people also affected how frequent deaths were for each of them. The study also helped in finding out that due to more categorical and numerical data, anomaly detection or under-sampling could be a viable solution since there are possibilities of more class labels than the other(s). The study shows that machine learning predictions aren't as viable for the data as it might be apparent.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源