Wikidata is a powerful tool to connect and open up the world of data to all communities. Also, Wikidata’s Lexemes allow for a unique opportunity to strengthen languages on the internet. In a recent project, the Wikidata Community Communication team conducted interviews and research to build awareness about Wikidata’s ability to generate translations with lexicographical data.
Language and technology on the African continent
Language is central to storing and sharing information, whether through wikis, social media, or in the use of voice assistants. However, not many African languages are represented on the internet to the same extent as they are spoken. Currently, none of Big Tech’s voice assistants support a single native African language. Most of the language data currently used to store information, develop algorithms and document the diversity(?) of cultures is held by a handful of major companies. This exacerbates the digital divide between English speakers and the rest of the world.
Wikidata helps to strengthen linguistic diversity and therefore knowledge sharing on the African continent. As Wikidata’s strategy is to increase participation from communities in the global south, the Community Communications team collaborated with Wikimedia affiliates on the African continent to strengthen their participation in the Lexeme namespace.
What are Lexemes?
Multilingualism is at the core of Wikidata. From the outset, any element relating to an object of knowledge and any property can have a name in one of the supported languages. Since 2018, Wikidata also stores a new type of data: words, described in many languages. This information is lexicographical data. Lexemes are the concrete data points in this lexicographical data. With all the language combinations that exist in Wikimedia projects, completely new possibilities open up: Translation from one language to another becomes possible, even though a printed dictionary for these languages does not exist. It can be generated with structured data about languages. You can learn more about the data model on the documentation page and read more about lexicographical data in this blog post.
Workshops with six African communities
Wikimedia Deutschland conducted workshops with six communities. We thank these communities for the participation and contribution: Wikimedians of Tamazight User Group, Hausa Wikimedians User Group, Dagbani Wikimedians User Group, Igbo Wikimedians User Group, Yoruba Wikimedians User Group, Jenga Wikipedia ya Kiswahili
We are happy that we were able to support these communities in their planning on how to make use of Lexemes for their specific needs and are particularly excited to already notice the first substantial contributions. We will stay in close contact and foster connections with resources and communities that can provide support.
African languages at Abstract Wikipedia
Abstract Wikipedia, a project by the Wikimedia Foundation will connect and be able to support small languages was emphasized as one of the byproducts of adding Lexemes in their language. The communities were encouraged to apply to become seed languages for the development of Abstract Wikipedia. We are happy that all three groups that did apply (Hausa, Dagbani and Igbo) to be seed languages for the development of Abstract Wikipedia were chosen.
Reimagining Wikidata from the margins at WikidataCon
Finally, Wikidata’s infrastructure, content and community can be used to support a better understanding and representation of the diversity of human knowledge. A project that aims to reflect and understand together how to shift Wikidata’s knowledge and decenter it is Reimagining Wikidata from the margins. The project will also be presented at this fall’s WikidataCon 2021, where the Wikidata community and partners will discuss the opportunities for sustainable development of Wikidata’s community and technology.