Many knowledge institutions use external identifiers to link their catalogues with Wikidata. Not many people are aware of the effort volunteers put into maintaining Wikidata’s data quality. In this interview, user Epìdosis shares some insights from his eleven years of editing and reconciling libraries’ catalogues on Wikidata.
Hi Camillo/Epìdosis! Thank you for agreeing to share your thoughts and experiences from your years of catalogue work with Wikidata. Can you briefly give an overview of your efforts and contributions to Wikidata, which I believe are all voluntary?
My activity on Wikidata, which has been all voluntary with just one exception in the last decade, has many different focuses: the daily patrolling of my watchlist (more than 100,000 items); matching existing items and creating new items through Mix’n’Match, mainly for library authority files and biographical dictionaries; resolving duplications and conflations of items about humans, starting from the constraint violations of the IDs of library authority files; reconciling Italian controlled vocabularies with Wikidata; periodically revising inconsistencies in reconciliation between the Italian national authority file (SBN) and Wikidata; supervising university projects and datathons on Wikidata; didactic activity about Wikidata, mainly for Italian librarians and university students, but also high school students in a few cases; and writing scientific articles about Wikidata and its relationship with data produced by libraries.
You have made 7.1 million edits so far on Wikidata. Wow! About what percentage of these edits are manual edits, and how much time per day do you usually spend editing Wikidata?
According to NavelGazer I made 1.9 million of my edits without using any external tool; all of these can be considered manual in a certain sense, although many of these were facilitated by using gadgets. I usually spend 1-3 hours each day on Wikidata.
About how many libraries or institutions have you reached out to over your past 10 years with Wikidata? What are their usual reactions when you first introduce yourself to them?
I’ve had direct contact with six VIAF members: ICCU, the institution managing the Italian national authority file; NLG, the National Library of Greece; PTBNP, the National Library of Portugal; SUDOC, the network of French academic libraries; SKMASNL, the Slovak National Library; J9U, the National Library of Israel. I’ve had indirect contact with seven others: BNE for Spain, BNC for Catalonia, DNB for Germany, NKC for the Czech Republic, BAV for the Vatican, PLWABN and NUKAT for Poland.
I’ve also had contact with a few dozen smaller libraries in Italy and a few in Greece and Cyprus. Librarians are usually happy to see that the data they produce is appreciated by Wikidata users, who reconcile them with Wikidata and use them as references; in most cases, they also appreciate mistake reports. The most frequent issue I encounter is that libraries are often severely understaffed. Only in a very few cases have I succeeded in persuading libraries to introduce editing Wikidata items into their cataloguing routine.
In your years of reconciling libraries’ catalogues with Wikidata, what would you say are the top three issues and challenges faced by editors like yourself and by libraries?
For libraries, I have already said the most relevant one: too few librarians involved in managing the authority file.
Another issue for libraries: once librarians have added good data on Wikidata, an integrated library system (ILS) usually doesn’t offer any display option which shows data taken from Wikidata to the readers of the catalogue. Only a few libraries have managed to create their own ways of displaying data from Wikidata.
On Wikidata users’ side, surely the most relevant problem is failures in data-roundtripping: contacting libraries to report and correct mistakes. Many libraries don’t show any contact option; others are very slow in solving issues because they are understaffed. Inefficient data round-tripping is severely damaging also for Wikidata data curation; constraint violation reports are flooded by unsolvable mistakes in external databases and thus are much less efficient in showing solvable mistakes in Wikidata itself.
What advice would you give to Wikidata editors and librarians to overcome some of these challenges?
Having too few librarians dedicated to authority files is often caused by conceiving of authority files as internal materials used only by cataloguers, instead of as public materials potentially useful for readers of the catalogues. Good-quality authority records are crucial to putting the authority files into the network of Linked Open Data. Spreading this new conception of authority files is probably the best way to increase the number of librarians dedicated to their improvement, which would then speed up the workflow concerning mistake reports coming from Wikidata users.
To convince librarians to work directly on Wikidata items, we have to offer them good ways to display in their catalogues the data they add to Wikidata. These ways are presently very rare and not standardized: we should convince ILS producers to offer to each library the option of displaying selected statements from Wikidata in the public interface of authority records, once an authority record is connected to a Wikidata item (see https://www.wikidata.org/wiki/User:Bargioni/AuthorityBox_SBN.js as an example of a Wikidata-generated AuthorityBox).
Let’s say I’m reconciling a library’s catalogue with Wikidata via Mix’n’Match, and I discover issues from the library’s catalogue. What is the best way to reach out to them? Do I tell them, “Dear National Library X, there is a problem with entity 123 in your catalogue. Please fix it!”? 🙂
In fact, I usually write messages like this! But of course, writing a message presupposes that the library has made contact information available. (Often this is not the case.) If an answer comes relatively soon, and the librarian seems helpful, I reply explaining that I noticed the mistake(s) during my Wikidata activity and asking if they are interested in establishing some sort of deeper collaboration to reconcile their authority file with Wikidata. This method was effective on many occasions.
Lastly, would you like to say anything to the community of Wikidata editors like yourself, as a motivation to everyone, when the going gets tough?
Continue insisting and someone will listen to you, sooner or later!
Thank you so much, Epìdosis, for your contributions and inspiring insights.