论文标题

建立自动存储库,用于墨西哥市政电子政务网站的索引,分析和表征

Towards an automated repository for indexing, analysis and characterization of municipal e-government websites in Mexico

论文作者

Coria, Sergio R., Marcos-Santiago, Leonardo, Cruz-Melendez, Christian A., Jimenez-Canseco, Juan M.

论文摘要

本文解决了在墨西哥特别感兴趣的电子政府纪律中的问题:需要集中和更新的有关市政电子政务网站的信息来源。原因之一是缺少包含网站市政府的电子地址(Web域名)的完整和更新的数据库。由于各种原因,并非所有墨西哥市都有一个原因,而且其中许多人没有提供与当前政府相对应的信息,而是与其他以前的政府相对应的信息。市政网站的稀缺官方列表没有以足够的频率更新,并且手动确定哪些市政当局在给定的时刻具有操作且有效的网站是耗时的过程。此外,网站内容并不总是符合法律要求,并且非常异质。反过来,市政网站的进化发展水平是有价值的信息,可以在公共行政领域的理论和实际目的来审核。获取所有这些信息需要网站内容分析。因此,本文调查了对实施和更新数字存储库的需求和可行性,以对这些网站进行多种分析。它的技术可行性是通过有关网络刮擦和提出初步手动方法的文献综述来解决的。这考虑了用于网络爬网和刮擦的已知,验证,技术和软件工具。由于现有的技术满足了当前的需求,因此没有提出用于爬行或刮擦的新技术。最后,指定软件要求以自动化存储库的创建,更新,索引和分析。

This article addresses a problem in the electronic government discipline with special interest in Mexico: the need for a concentrated and updated information source about municipal e-government websites. One reason for this is the lack of a complete and updated database containing the electronic addresses (web domain names) of the municipal governments having a website. Due to diverse causes, not all the Mexican municipalities have one, and a number of those having it do not present information corresponding to the current governments but, instead, to other previous ones. The scarce official lists of municipal websites are not updated with the sufficient frequency, and manually determining which municipalities have an operating and valid website in a given moment is a time-consuming process. Besides, website contents do not always comply with legal requirements and are considerably heterogeneous. In turn, the evolution development level of municipal websites is valuable information that can be harnessed for diverse theoretical and practical purposes in the public administration field. Obtaining all these pieces of information requires website content analysis. Therefore, this article investigates the need for and the feasibility to automate implementation and updating of a digital repository to perform diverse analyses of these websites. Its technological feasibility is addressed by means of a literature review about web scraping and by proposing a preliminary manual methodology. This takes into account known, proven, techniques and software tools for web crawling and scraping. No new techniques for crawling or scraping are proposed because the existing ones satisfy the current needs. Finally, software requirements are specified in order to automate the creation, updating, indexing, and analyses of the repository.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源