“Users are talking about a game changer”: Lydia Pintscher about the Wikidata Query Builder

Lydia Pintscher, Product Manager for Wikidata, on the new access to the free and open knowledge database.

The Query Builder is called the new “superpower” in the world of Open Data what are the forces behind it?

Huge amounts of data are available in Wikidata. The population of Berlin just like the name of the capital of Paraguay or the winner of the “Oscar” for the best sound editing. The only point is that this data is not very meaningful in itself. What is more relevant is the knowledge that can be gleaned from it. One question might be: How many people from Asia have won “Oscars” compared to people from Europe or the U.S.? To do that, you need to know: Who has won an “Oscar”, where was this person born, on which continent is the location? The point is to establish links. To do this, you have to start queries on the data in Wikidata. This is made possible by the query builder.

How were these queries possible before?

Before, you had to master a query language called SPARQL, which is specific to knowledge databases like Wikidata but also very complex to learn. All editors working with the data in Wikidata who wanted to build an app on top of it, for example, had to learn SPARQL. Or know a person who knows it. That’s costly and excludes people. That’s exactly what we don’t want. We want to give everyone access to knowledge, which includes the data in Wikidata. The query builder should help to lower this access threshold significantly.

How does it work in practice?

The Query Builder presents an interface on which a query can be clicked together and translates it into SPARQL in the background. For example, certain statements can be searched for on a data object in Wikidata. The data object for Berlin contains, among other things, the statement: Country: Germany. In the Query Builder, you could query all data objects that also have the statement Country: Germany. Or: all persons who have won an “Oscar” and were born in this country, or on that continent. These conditions can be entered and linked one after the other. The Query Builder cannot yet cover the full potential of SPARQL, but it delivers results that can be used for further work.

In which areas is the Query Builder used and who should use it?

In its first version, the Query Builder is mainly aimed at people who edit Wikidata. For example, a common editing task is to write a query to search specifically for missing data or incorrect records in Wikidata. For example, looking for people born in the future. Of course, this could be a fictional character. Or it could be an error. In any case, the query makes it possible to track down such cases without having to search through millions of data objects individually by hand.

Is it also possible to use it in scientific contexts?

Wikidata collects lots of data objects related to scientific papers and publications. Many of them are tagged with the topic that was written about and the name of the author. One possibility for scientists would be to look with the Query Builder: What publications exist in my field that I may not know about, or: Which colleagues should I take note of?

What is the history of the Query Builder?

The idea has been around since the Query Service for Wikidata existed, since you have been able to write SPARQL queries for Wikidata and when it became clear how complex that is. We started working with a university in 2015, designing initial concepts as part of Bachelor’s theses, and later a prototype, from which we were able to learn a lot: What must a user interface look like that works for the users? How can certain interaction concepts be made understandable? Two years ago, we started developing the Query Builder, which we were able to publish in 2021 and which is now live on Wikidata.

What have been the reactions so far?

The users call the Query Builder a game changer. Many of the editors had the problem that they had perhaps painstakingly acquired SPARQL themselves but could not encourage others to join in, for whom it sounded far too complex. With the Query Builder, we have lowered this inhibition threshold considerably.

Which extensions are still conceivable?

We will work on implementing even more parts of the SPARQL standard to enable more queries. More visualizations for query results are also conceivable. With the SPARQL interface that we have, you can display the results on a map, as a bar chart, or even as an image gallery when searching for images. The Query Builder does not fully provide these visualizations yet.

Which search queries could be possible in the future?

That definitely depends on what the editors are asking for. Our development work is largely based on the feedback we get from users. One example I like to use is the query “female heads of state” very complex because you have to look into many different data sets. But it could become possible!