Collection
Share:

Mind the (Language) Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts

White Paper by the Stanford Institute for Human-Centered AI (HAI), the Asia Foundation and the University of Pretoria: “…maps the LLM development landscape for low-resource languages, highlighting challenges, trade-offs, and strategies to increase investment; prioritize cross-disciplinary, community-driven development; and ensure fair data ownership…

  • Large language model (LLM) development suffers from a digital divide: Most major LLMs underperform for non-English—and especially low-resource—languages; are not attuned to relevant cultural contexts; and are not accessible in parts of the Global South.
  • Low-resource languages (such as Swahili or Burmese) face two crucial limitations: a scarcity of labeled and unlabeled language data and poor quality data that is not sufficiently representative of the languages and their sociocultural contexts.
  • To bridge these gaps, researchers and developers are exploring different technical approaches to developing LLMs that better perform for and represent low-resource languages but come with different trade-offs:
    • Massively multilingual models, developed primarily by large U.S.-based firms, aim to improve performance for more languages by including a wider range of (100-plus) languages in their training datasets.
    • Regional multilingual models, developed by academics, governments, and nonprofits in the Global South, use smaller training datasets made up of 10-20 low-resource languages to better cater to and represent a smaller group of languages and cultures.
    • Monolingual or monocultural models, developed by a variety of public and private actors, are trained on or fine-tuned for a single low-resource language and thus tailored to perform well for that language…(More)”
Share
How to contribute:

Did you come across – or create – a compelling project/report/book/app at the leading edge of innovation in governance?

Share it with us at info@thelivinglib.org so that we can add it to the Collection!

About the author

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday

Related articles