Paper by Alex Luscombe, Kevin Dick & Kevin Walby: “Web scraping, defined as the automated extraction of information online, is an increasingly important means of producing data in the social sciences. We contribute to emerging social science literature on computational methods by elaborating on web scraping as a means of automated access to information. We begin by situating the practice of web scraping in context, providing an overview of how it works and how it compares to other methods in the social sciences. Next, we assess the benefits and challenges of scraping as a technique of information production. In terms of benefits, we highlight how scraping can help researchers answer new questions, supersede limits in official data, overcome access hurdles, and reinvigorate the values of sharing, openness, and trust in the social sciences. In terms of challenges, we discuss three: technical, legal, and ethical. By adopting “algorithmic thinking in the public interest” as a way of navigating these hurdles, researchers can improve the state of access to information on the Internet while also contributing to scholarly discussions about the legality and ethics of web scraping. Example software accompanying this article are available within the supplementary materials..(More)”.
Algorithmic thinking in the public interest: navigating technical, legal, and ethical hurdles to web scraping in the social sciences
How to contribute:
Did you come across – or create – a compelling project/report/book/app at the leading edge of innovation in governance?
Share it with us at info@thelivinglib.org so that we can add it to the Collection!
About the author
Get the latest news right in you inbox
Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday
Related articles
INSTITUTIONAL INNOVATION
Why PeaceTech must be the next frontier of innovation and investment
Posted in June 18, 2025 by Stefaan Verhulst
artificial intelligence
Sharing trustworthy AI models with privacy-enhancing technologies
Posted in June 17, 2025 by Stefaan Verhulst
INSTITUTIONAL INNOVATION
2025 State of the Digital Decade
Posted in June 17, 2025 by Stefaan Verhulst