Category: Interviews

Wikidata and the Next Generation of the Web – Goran Milovanović, Data Scientist for Wikidata @Wikimedia Deutschland

The world without Wikipedia would be a much sadder place. Everyone knows about it and everyone uses it – daily. The dreamlike promise of a great internet encyclopedia, accessible to anyone, anywhere, all the time – for free, has become a reality. Wikidata is a lesser-known part of the Wikimedia family, represents a data backend system of all Wikimedia projects, and fuels Apple’s Siri,  Google Assistant, and Amazon’s Alexa among many other popular and widely-used applications and systems.

Wikipedia is one of the most popular websites in the world. It represents everything glorious about the open web, where people share knowledge freely, generating exponential benefits for humanity. Its economic impact can’t be calculated; being used by hundreds of millions, if not billions of people worldwide, it fuels everything from the work of academics to business development.

Wikipedia is far more than just a free encyclopedia we all love. It’s part of the Wikimedia family, which is, in their own words: “a global movement whose mission is to bring free educational content to the world.” To summarize its vision: “Imagine a world in which every single human being can freely share in the sum of all knowledge.”

Not that many people know enough about Wikidata, which acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.

Goran Milovanović is one of the most knowledgeable people I know. I invited him to lecture about Data Science potential in the public administration at the policy conference that I organized five years ago. We remained friends and I enjoy talking with him about everything that has ever popped in the back of my head. Interested in early Internet development and the role of RAND in immediate postwar America? No worries, he’ll speak 15 minutes about it in one breath.

Goran earned a Ph.D. in Psychology (a 500+ pages long one, on the topic of Rationality in Cognitive Psychology) from the University of Belgrade in 2013, following two years as a graduate student in Cognition and Perception at NYU, United States. He spent a lot of years doing research in Online Behaviour, Information Society Development, and Internet Governance, co-authoring 5 books about the internet in Serbia. He provides consultancy services in Data Science for Wikidata to Wikimedia Deutschland since 2017, where he is responsible for the full-stack development and maintenance of several analytical systems and reports on this complex knowledge base, and runs his own Data Science boutique consultancy DataKolektiv from Belgrade.

He’s a perfect mixture of humanities, math, and engineering, constantly contemplating the world from a unique perspective that takes so many different angles into account. 

We focused our chat around Wikidata and the future of the web, but, as always, touched many different phenomena and trends.

Before we jump to Wikidata: What is Computational Cognitive Psychology and why were you so fascinated with it?

Computational Cognitive Psychology is a theoretical approach to the study of the mind. We assume that human cognitive processes – processes involved in the generation of knowledge: perception, reasoning, judgment, decision making, language, etc. – can essentially be viewed as computational processes, algorithms running not in silico but on a biologically evolved, physiological hardware of our brains. My journey into this field began when I entered the Department of Psychology in Belgrade, in 1993, following more than ten years of computer programming since the 80s and a short stay at the Faculty of Mathematics. In the beginning, I was fascinated by the power of the very idea, by the potential that I saw in the possible crossover of computer science and psychology. Nowadays, I do not think that all human cognitive processes are computational, and the research program of Computational Cognitive Psychology has a different meaning for me. I would like to see all of its potential fully explored, to know the limits of the approach, and then try to understand, do describe somehow, even intuitively only, what was left unexplained. The residuum of that explanatory process might represent the most interesting, significant aspect of being human at all. The part that remains irreducible to the most general scientific concept that we have ever discovered, the concept of computation, that part I believe to be very important. That would tell us something about the direction that the next scientific revolution, the next paradigm change, needs to take. For me, it is a question of philosophical anthropology: what is to be human? – only driven by an exact methodology. If we ever invent true, general AI in the process, we should treat it as a by-product, as much as the invention of the computer was a by-product of Turing’s readiness to challenge some of the most important questions in the philosophy of mathematics in his work on computable numbers. For me, Computational Cognitive Psychology, and Cognitive Science in general, do not pose a goal in themselves: they are tools to help us learn something of a higher value than how to produce technology and mimic human thinking.

What is Wikidata? How does it work? What’s the vision?

Wikidata is an open knowledge base, initially developed by parsing the already existing structured data from Wikipedia, then improved by community edits and massive imports of structured data from other databases. It is now the fastest-growing Wikimedia project, recently surpassing one billion edits. It represents knowledge as a graph in which nodes stand for items and values and links between them for properties. Such knowledge representations are RDF compliant, where RDF stands for Resource Description Framework, a W3C standard for structured data. All knowledge in systems like Wikidata takes a form of a collection of triples, or basic sentences that describe knowledge about things – anything, indeed – at the “atomic” level of granularity. For example, “Tim Berners-Lee is a human” in Wikidata translates to a sentence in which Q80 (the Wikidata identifier for “Tim Berners Lee”) is P31 (the Wikidata identifier for the “instance of” property of things) of Q5 (the Wikidata identifier for a class of items that are “humans”). So, Q80 – P31 – Q5 is one semantic triple that codifies some knowledge on Sir Timothy John Berners-Lee, who is the creator of the World Wide Web by the invention of the Hypertext Transfer Protocol (HTTP) and 2016. recipient of the Turing Award. All such additional facts about literally anything can be codified as semantic triples and composed to describe complex knowledge structures: in Wikidata, HTTP is Q8777, WWW is Q466, discoverer or inventor is P61, etc. All triples take the same, simple form: Subject-Predicate-Object. The RDF standard defines, in a rather abstract way, the syntax, the grammar, the set of rules that any such description of knowledge must follow in order to ensure that it will always be possible to exchange knowledge in an unambiguous way, irrespectively of whether the exchange takes place between people or computers.

Wikidata began as a project to support structured data for Wikipedia and other Wikimedia projects, and today represents the data backbone of the whole Wikimedia system. Thanks to Wikidata, many repetitions that might have occurred in Wikipedia and other places are now redundant and represent knowledge that can be served to our readers from a central repository. However, the significance of Wikidata goes way beyond what it means for Wikipedia and its sisters. The younger sister now represents knowledge on almost one hundred million things – called items in Wikidata – and grows. Many APIs on the internet rely on it. Contemporary, popular AI systems like virtual assistants (Google Assistant, Siri, Amazon Alexa) make use of it. Just take a look at the number of research papers published on Wikidata, or using its data to address fundamental questions in AI. By means of the so-called external identifiers – references from our items to their representations in other databases – it represents a powerful structured data hub. I believe Wikidata nowadays has the full potential to evolve into a central node in the network of knowledge repositories online.

Wikidata External Identifiers: a network of Wikidata external identifiers based on their overlap across tens of millions of items in Wikidata, produced by Goran S. Milovanovic, Data Scientist for Wikidata @WMDE, and presented at the WikidataCon 2019, Berlin

What’s your role in this mega system? 

I take care about the development and maintenance of analytical systems that serve us to understand how Wikidata is used in Wikimedia websites, what is the structure of Wikidata usage, how do human editors and bots approach editing Wikidata, how does the use of different languages in Wikidata develop, whether it exhibits any systematic biases that we might wish to correct for, what is the structure of the linkage of other online knowledge systems connected with Wikidata by means of external identifiers, how many pageviews we receive across the Wikidata entities, and many more. I am also developing a system that tracks the Wikidata edits in real-time and informs our community if there are any online news relevant for the items that are currently undergoing many revisions. It is a type of position which is known as a generalist in the Data Science lingo; in order to be able to do all these things for Wikidata I need to stretch myself quite a bit across different technologies, models and algorithms, and be able to keep them all working together and consistently in a non-trivial technological infrastructure. It is also a full-stack Data Science position where most of the time I implement the code in all development phases, from the back-end where data acquisition (the so-called ETL) takes place in Hadoop, Apache Spark, SPARQL, through machine learning where various, mostly unsupervised learning algorithms are used, towards the front-end development where we finally serve our results in interactive dashboards and reports, and finally production in virtualized environments. I am a passionate R developer and I tend to make use of the R programming language consistently across all the projects that I manage, however it ends up being pretty much a zoo in which R co-exists with Python, SQL, SPARQL, HiveQL, XML, JSON, and other interesting beings as well. It would be impossible for a single developer to take control of the whole process if there were no support from my colleagues in Wikimedia Deutschland and the Data Engineers from the Wikimedia Foundation’s Analytics Engineering team. My work on any new project feels like solving a puzzle; I face the “I don’t know how to do this” situation every now and then; I learn constantly and the challenge is so motivating that I truly suspect there can be many similarly interesting Data Science positions like this one. It is a very difficult position, but also one professionally very rewarding.       

If you were to explain Wikidata as a technological architecture, how would you do it in a few sentences? 

Strictly speaking, Wikidata is a dataset. Nothing more, nothing less: a collection of data represented so as to follow important standards that makes it interoperable, usable in any imaginable context where it makes sense to codify knowledge in an exact way. Then there is Wikibase, a powerful extension of the MediaWiki software that runs Wikipedia as well as many other websites. Wikibase is where Wikidata lives, and where it is served from wherever anything else – a Wikipedia page, for example – needs it. But Wikibase can run any other dataset that complies to the same standards as Wikidata, of course, and Wikidata can inhabit other systems as well. If by the technological architecture you mean the collection of data centers, software, and standards that make Wikidata join in wherever Wikipedia and other Wikimedia projects need it – well, I assure you that it is a huge and a rather complicated architecture underlying that infrastructure. If you imagine all possible uses of Wikidata, external to the Wikimedia universe, run in Wikibase or otherwise… then… it is the sum of all technology relying on one common architecture of knowledge representation, not of the technologies themselves.

How does Wikidata overcome different language constraints and barriers? It should be language-agnostic, right?

Wikidata is and is not language-agnostic at the same time. It would be best to say that it is aware of many different languages in parallel. At the very bottom of its toy box full of knowledge, we find abstract identifiers for things: Q identifiers for items, P identifiers for properties, L for lexemes, S for senses, F for forms… but those are just identifiers, and yes they are language agnostic. But things represented in Wikidata do not have only identifiers, but labels, aliases, and descriptions in many different languages too. Moreover, we have tons of such terms in Wikidata currently: take a look at my Wikidata Languages Landscape system for a study and an overview of the essential statistics.

What are the knowledge graphs and why they are important for the next generation of the web?

They are important for this generation of the web too. To put it in a nutshell: graphs allow us to represent knowledge in the most abstract and most general way. They are simply very suitable to describe things and relations between them in a way that is general, unambiguous, and in a form that can quickly evolve into new, different, alternative forms of representation that are necessary for computers to process it consistently. By following common standards in graph-based knowledge representation, like RDF, we can achieve at least two super important things. First, we can potentially relate all pieces of our knowledge, connect anything that we know at all so that we can develop automated reasoning across vast collections of information and potentially infer new, previously undiscovered knowledge from them. Second, interoperability: if we all follow the same standards of knowledge representation, and program our APIs that interact over the internet so to follow that standard, then anything online can easily enter any form of cooperation. All knowledge that can be codified in an exact way thus becomes exchangeable across the entirety of our information processing systems. It is a dream, a vision, and we find ourselves quite far away from it at the present moment, but a one rather worth of pursuit. Knowledge graphs just turn out to be the most suitable way of expressing knowledge in a way desirable to achieve these goals. I have mentioned semantic triples, sentences of the Subject-Predicate-Object form, that we use to represent the atomic pieces of knowledge in this paradigm. Well, knowledge graphs are just sets of connected constituents of such sentences. When you have one sentence, you have a miniature graph: a Subject points to the Object. Now imagine having millions, billions of sentences that can share some of the constituents, and a serious graph begins to emerge.

A part of the knowledge graph for English (Q1860 in Wikidata)

Where do you think the internet will go in the future? What’s Wikidata’s role in that transformation?

The future is, for many reasons, always a tricky question. Let’s give it a try: on the role of Wikidata, I think we have clarified that in my previous responses: it will begin to act as a central hub of Linked Open Data sooner or later. On the future of the Internet in general, talking from the perspective of the current discussion solely: I do not think that the semantic web standards like RDF will ever reach universal acceptance, and I do not think that is even necessary for that to happen to enter the stage of internet evolution where complex knowledge is almost seamlessly interacting all over the place. It is desirable, but not necessary in my opinion. Look at the de facto situation: instead of evolving towards one single, common standard of knowledge and data representation, we have a connected network of APIs and data providers exchanging information by following similar enough, easily learnable standards – enough to not make software engineers and data scientists cry. Access to knowledge and data will ease and governments and institutions will increasingly begin to share more open data, increasing the data quality along the way. It will become a thing of good manners and prestige to do so. Data openness and interoperability will become one of the most important development indicators, tightly coupled with questions of fundamental human rights and freedoms. To have your institution’s open data served via an API and offered in different serializations that comply with the formal standards will become as expected as publishing periodicals on your work is now. Finally, the market: more data, more ideas to play with.

A lot of your work is relying on technologies used in natural language processing (NLP), typically handling language at scale. What are your impressions of Open AI’s GPT-3 which is quite a buzz recently? 

It is fascinating, except for it works better in language production that can fool someone than in language production that exhibits anything like the traces of human-like thinking. Contemporary systems like GPT-3 make me think if the Turing test was ever a plausible test to detect intelligence in something – I always knew there was something I didn’t like about it. Take a look, for example, at what Gary Marcus and Ernest Davis did to GPT-3 recently: GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about. It is a clear example of a system that does everything to language except for it does not understand it. Its computational power and the level up to which it can mimic the superficial characteristics of the language spoken by a natural speaker are fascinating. But it suffers – and quite expectedly, I have to add – from a lack of understanding of the underlying narrative structure of the events, the processes it needs to describe, the complex interaction of language semantics and pragmatics that human speakers face no problems with. The contemporary models in NLP are all essentially based on an attempt to learn the structure of correlations between linguistic constituents of words, sentences, and documents, and that similarity-based approach has very well known limits. It was Noam Chomsky who in the late 50s – yes, 50s – tried to explain to the famous psychologist B. F. Skinner that just observing statistical data on the co-occurrences of various constituents of language will never provide for a representational structure powerful enough to represent and process natural language. Skinner didn’t seem to care back in time, and so didn’t the fans of the contemporary Deep Learning paradigm which is essentially doing exactly that, just in a way orders of magnitude more elaborated than anyone ever tried. I think we are beginning to face the limits of that approach with GPT-3 and similar systems. Personally, I am more worried about the possible misuse of such models to produce fake news and fool people into silly interpretations and decisions based on false, simulated information, than to question if GPT-3 will ever grow up to become a philosopher because it will certainly not. It can simulate language, but only the manifest characteristics of it; it is not a sense-generating machine. It does not think. For that, you need some strong symbolic, not connectionist representation, engaged in the control of associative processes. Associations and statistics alone will not do.

Do humanities have a future in the algorithmic world? How do you see the future of humanities in the fully data-driven world?

First, a question: is the world ever going to be fully data-driven, and what does that mean at all? Is a data-driven world a one in which all human activity is passivized and all our decisions transferred to algorithms? It is questionable if something like that is possible at all, and I think that we all already agree that it is certainly not desirable. While the contemporary developments in Data Science, Machine Learning, AI, and other related fields, are really fascinating, and while our society is becoming more and more dependent upon the products of such developments, we should not forget that we are light years away from anything comparable to true AI, sometimes termed AGI (Artificial General Intelligence). And I imagine only true AI would be powerful enough to run the place so that we can take a permanent vacation? But then comes the ethical question, one of immediate and essential importance, of would such systems, if they ever come to existence, be possible to judge human action and act upon the society in a way we as their makers would accept as moral? And only then comes the question of do we want something like that in the first place: wouldn’t it be a bit boring to have nothing to do and go meet your date because something smarter than us has invented a new congruence score and started matching people while following an experimental design for further evaluation and improvement?

Optimization is important, but it is not necessarily beautiful. Many interesting and nice things in our lives and societies are deeply related to the fact that there are real risks that we need to take into account because irreducible randomness is present in our environment. Would an AI system in full control prevent us from trying to conquer Mars because it is dangerous? Wait, discovering the Americas, the radioactive elements, and the flight to the Moon were dangerous too! I can imagine that humanity would begin to invent expertise in de-optimizing things and processes in our environments if some fully data-driven and AI-based world would ever come to existence. Any AI in a data-driven world that we imagine nowadays can be no more than our counselor, except for if the edge case that it develops true consciousness turns out to be a realistic scenario. If that happens, we would have to seriously approach the question of how to handle our relationship to AI and its power ethically.

I do not see how humanities could possibly be jeopardized in this world that is increasingly dependent on information technologies, data, and automation. To understand and discover new ways to interpret the sense generating narratives of our lives and societies was always a very deep, a very essential human need. Does industrialization, including our own Fourth age of data and automation, necessarily conflicts with the world aware of the Shakespearean tragedy, as it does in Huxley’s “Brave New World”? I don’t think it is necessarily so. I enjoy the dystopian discourse very much because I find that it offers so many opportunities to reflect upon our possible futures, but I do not see us living in a dystopian society anytime soon. It is just that somehow the poetics of the dystopian discourse are well-aligned, correlated with the current technological developments, but if there ever was a correlation from which we should not infer any fatalistic causation that is the one.

AI and the Government, UK Experience – Sébastien Krier (UK)

Automation is not just replacing (repetitive) jobs, it’s after the government as well, and computer algorithms are already widely deployed across many public sector operations, in many cases without awareness and understanding of the general population. What’s happening with AI in government and how does government affect and interact with the development of technologies like artificial intelligence? We wanted to dig deeper into this subject through a conversation with an AI Policy expert who had a notable first-row seat in all of this while working for Office for AI, a special UK government agency responsible for overseeing the implementation of the United Kingdom’s AI strategy.                        

The role of government is to serve its citizens. In doing so it uses the latest technologies, transforming into smarter and more accessible, even interactive, through its technological updates. Not so long ago, it was the government and government-funded organizations who were inventing new technologies. While it seems that R&D power is shifting to so-called “big tech” companies, the relationship between government (and politics) and technology is of both existential and practical importance for all. The government is there to – at least – initiate, oversee, and regulate. One of the most interesting and important technological developments is artificial intelligence. There are really high hopes around it, with many believing that the AI revolution will be bigger than the Agricultural, Industrial, and Digital revolution. What is the role of government in that process? 

The world is changing fast and so does the government. With the digital and computer revolution and the advent of the internet and the world wide web, the government is going through one of the most significant transformations in its history. “Software is eating the world” (Marc Andressen); it is eating the government too. The $400 billion #govtech market (Gartner) emerged with startups and other companies doing technologies for the government, improving many aspects of it. Some estimates say it will hit a trillion dollars by 2025. From a historical perspective, it’s probably just a start. Future, digital-first government, will probably look totally different from what it used to be and what it is now. 

New realities create new fields of study and practices that did not exist. One of those fields is AI Policies, the field primarily interested in the intersection of AI and government. The United Kingdom is leading the way in all of this in many respects. In the case of AI technologies, it’s a birthplace of some of the most important AI research companies in the world, Deep Mind being just one of them. Traditionally among the best in the world, higher education produces scientific leaders and researchers. If you want to seriously study long term effects of technology like artificial intelligence on society and humanity at large, you’ve already stumbled upon Oxford’s Institute for the Future of Humanity. How is the UK government approaching artificial intelligence?                                                               

Sébastien is an AI Policy expert. After graduating at UCL and spending quite some time in law, Sébastien joined the British government, Office for Artificial Intelligence as an Adviser in 2018, a government joint-unit responsible for designing and overseeing the implementation of the United Kingdom’s AI strategy. He now helps public and private organizations design strategies and policies that maximize the benefits of AI, while minimizing potential costs and risks. 

Sébastien’s former role involved designing national policies to address novel issues such as the oversight of automated decision-making systems and the responsible design of machine learning solutions. He led the first comprehensive review of AI in the public sector and has advised foreign delegations, companies, regulators, and third sector organizations. Sébastien has also represented the UK at various panels and multilateral organizations such as the D9 and EU Commission. 

He is the perfect person to talk to about all AI government stuff. We had a chat with him about the AI and government in the UK.                      

You spent quite some time working at the Office for AI, the UK government. Can you tell us more about the purpose and the work of that government agency? How does the UK government approach artificial intelligence? 

The Office for AI is a joint-unit between two ministries – BEIS and DCMS – and is responsible for overseeing the implementation of the £1bn AI Sector Deal. The AI Sector Deal was the Government’s response to an independent review carried out by Professor Dame Wendy Hall and Jérôme Pesenti. Our approach was therefore shaped by these commitments and consisted of the following workstreams:

Leadership: the aim here was to create a stronger dialogue between industry, academia, and the public sector through the Government establishing the AI Council.

Adoption: the aim of this work was to drive public and private sector adoption of AI and Data technologies that are good for society. This included a number of activities, such as the publication of A Guide to Using AI in the Public Sector, which includes a comprehensive chapter on safety and ethics drafted by Dr. David Leslie at the Alan Turing Institute. 

Skills: given the gap between demand and supply of AI talent, we worked on supporting the creation of 16 new AI Centres for Doctoral Training at universities across the country, delivering 1,000 new PhDs over the next five years. We also funded AI fellowships with the Alan Turing Institute to attract and retain the top AI talent, as well as an industry-funded program for AI Masters.

Data: we worked with the Open Data Institute to explore how the Government could facilitate legal, fair, ethical, and safe data sharing that is scalable and portable to stimulate innovation. You can read about the results of the first pilot programs here.

International: our work sought to identify global opportunities to collaborate across jurisdictional boundaries on questions of AI and data governance, and to formulate governance measures that have international traction and credibility. For example, I helped draft the UK-Canada Joint-Submission on AI and Inclusion for the G7 in 2018).

Note that the Government also launched the Center for Data Ethics and Innovation, who were tasked by the Government to research, review, and recommend the right governance regime for data-driven technologies. They’re great and I recommend checking out some of their recent outputs, but I do think they’d benefit from being truly independent of Government

How would you define “AI Policy“? It’s a relatively new field and many still don’t properly understand what the government has to do with AI. 

I like 80,000 Hours’ definitionAI policy is the analysis and practice of societal decision-making about AI. It’s a broad term but that’s what I like about it: it touches on different aspects of governance, and isn’t necessarily limited to the central government. There are many different areas a Government can look at in this space – for example:

  • How do you ensure regulators are adequately equipped and resourced to properly scrutinize the development and deployment of AI?
  • To what extent should the Government regulate the field and incentivize certain behaviors? See for example the EC’s recent White Paper on AI
  • What institutional mechanisms can ensure long-term safety risks are mitigated? 
  • How do you enable the adoption and use of AI? For example, what laws and regulations are needed to ensure self-driving cars are safe and can be rolled out? 
  • How do you deal with facial recognition technologies and algorithmic surveillance more generally? 
  • How do you ensure the Government’s own use of AI, for example, to process visa applications, is fair and equitable?

I recommend checking out this page by the Future of Life Institute, which touches on a lot more than I have time to do here!

The UK is home to some of the most advanced AI companies in the world. How does the government include them in the policy-making processes? How exactly does the government try to utilize their work and expertise?

The Pesenti-Hall Review mentioned earlier is an example of how the Government commissioned leaders in the field to provide recommendations to the Government. Dr. Demis Hassabis was also appointed as an adviser to the Office for AI.

CognitionX co-founder Tabitha Goldstaub was asked last year to chair the new AI Council and become an AI Business Champion. The AI Council is a great way to ensure the industry’s expertise and insights reach the Government. It’s an expert committee drawn from private, public, and academic sectors advising the Office for AI and government. You can find out more about them here

The government decides to implement a set of complex algorithms in the public sector in whatever field. How does it happen in most cases? Companies are pitching their solutions first, or the government explicitly wants solutions for previously very well defined problems? How does Ai in the public sector happen?

That’s a good question. It really depends on the team, the department, the expertise available, and the resources available. To be honest, people overestimate how mature Government departments are to actually develop and use AI. Frequently they’ll buy products off the shelf (which comes with a host of issues like IP and data rights). 

Back in 2019 I helped lead a review of hundreds of AI use cases in the UK Government and found that while there are some very high-impact use cases, there are also a lot of limitations and barriers. AI Watch recently published its first report on the use and impact of AI in public services in Europe. They found limited empirical evidence that the use of AI in government is achieving the intended results successfully

The procurement system is also quite dated and not particularly effective in bringing in solutions from SMEs and start-ups, which is why the Government Digital Services launched a more nimble technology innovation marketplace, Spark. The Office for AI also worked with the WEF to develop Guidelines for AI Procurement.

Part of your work was focused on educating others working in the government and the public sector about AI and its potential and challenges. How does the UK government approach the capacity building of its public officials and government employees? 

There are various initiatives that seek to upskill and ensure there’s the right amount of expertise in Government. As part of the 2018 Budget, the Data Science Campus at the ONS and the GDS were asked to conduct an audit of data science capability across the public sector, to “make sure the UK public sector can realize the maximum benefits from data”. There are also specific skills frameworks for data science-focused professions. Ultimately though I think a lot more should be done. I think a minimum level of data literacy will be increasingly necessary for policymakers to properly understand the implications new technologies will have on their policy areas.

The recently published National Data Strategy also finds that “The lack of a mature data culture across government and the wider public sector stems from the fragmentation of leadership and a lack of depth in data skills at all levels. The resulting overemphasis on the challenges and risks of misusing data has driven chronic underuse of data and a woeful lack of understanding of its value.

What is your favorite AI in the public sector use case in the UK (or anywhere) – and why?

One of my favorite use cases is how DFID used satellite images to estimate population levels in developing countries: this was the result of close collaboration between the Government, academia, and international organizations. And this is exactly how AI should be developed: through a multidisciplinary team.

Outside of the UK, I was briefly in touch with researchers at Stanford University who collaborated with the Swiss State Secretariat for Migration to use AI to better integrate asylum seekers. The algorithm assigns asylum seekers to cantons across the country that best fits their skills profile, rather than allocate them randomly, as under the current system. That’s an impactful example of AI being used by a Government and in fact, I think Estonia is trialing similar use cases. 

On the nerdier side, Anna Scaife (who was one of the first Turing AI Fellows) published a fascinating paper where a random forest classifier identified a new catalog of 49.7 million galaxies, 2.4 million quasars, and 59.2 million stars!

What are the hottest AI regulation dilemmas and issues in the UK at this moment?

Until recently, the key one was how to govern/oversee the use of facial recognition. Lord Clement-Jones, the chair of the House of Lords Artificial Intelligence Committee, recently proposed a bill that would place a moratorium on the use of facial recognition in public places. That won’t pass but it’s a strong signal that the Government should consider this issue in more detail – and indeed the CDEI is looking into this.

But with the A-levels scandal in the UK, I think there is a growing acknowledgment that there should be more oversight and accountability on how public authorities use algorithms.

You’ve spent quite some time collaborating with European institutions. Can you tell us more about AI policy approaches and strategies on the European level? What is happening there, what’s the agenda? 

The European approach is a lot more interventionist so far. There are some good proposals, and others I’m less keen on. For example, I think a dualistic approach to low-risk and high-risk AI is naïve. Defining risk (or AI) will be a challenge, and a technology-neutral approach is unlikely to be effective (as the European Parliament’s JURI committee also).

It’s better to focus on particular use cases and sectors, like affect recognition in hiring or facial recognition in public spaces. I also think that it’s dangerous to have inflexible rules for a technology that is very complex and changes rapidly.  

Still, I think it’s encouraging they’re at least exploring this area and soliciting views from industry, academia, and the wider public. 

As for their general approach, it’s worth having a look at the white papers on AI and data they published back in February.

What happens when something goes wrong, for example, there is major harm, or even death, when the AI system is used for government purposes? Who is responsible? How should governments approach the accountability challenge?

It’s very difficult to say without details on the use case, context, algorithm, human involvement, and so on. And I think that illustrates the problem with talking about AI in a vacuum: the details and context matter just as much as the algorithm itself. 

In principle, the Government remains liable of course. Just because you program use cases that learn over time, doesn’t require human involvement, or cannot be scrutinized because of black-box issues, doesn’t mean usual product liability and safety rules don’t apply. 

Outside the public sector context, the European Commission is seeking views on whether and to what extent it may be needed to mitigate the consequences of complexity by alleviating/reversing the burden of proof. Given the complexity of a supply chain and the algorithms used, it could be argued that additional requirements could help clarify faults and protect consumers. 

The European Parliament’s JURI committee’s report on liability is actually very interesting and has some interesting discussions on electronic personhood and why trying to define AI for regulatory purposes is doomed to fail. They also find that product liability legislation needs to be amended for five key reasons:

  1. The scope of application of the directive does not clearly cover damages caused by software. Or damage caused by services.
  2. The victim is required to prove the damage suffered, the defect, and the causal nexus between the two, without any duty of disclosure of relevant information on the producer. This is even harder for AI technologies.
  3. Reference to the standard of “reasonableness” in the notion of defect makes it difficult to assess the right threshold for new technologies with little precedent or societal agreement. What is to be deemed reasonable when technologies and use cases evolve faster than they are understood?
  4. Damages recoverable are limited and the €500 threshold means a lot of potential claims are not allowed
  5. Technologies pose different risks depending on use: e.g. FRT for mass surveillance or FRT for smartphone face unlocks. Therefore, there is a need for a “sector-specific approach that does not prioritize the technology, but focuses on its application within a given domain”.

How would you define “AI Ethics” and “AI Safety”? And how governments are shaping the development and deployment of AI systems in ways that are safe and ethical? Which policy instruments are used for that?

The definition we used in our safety & ethics guidance with the Alan Turing Institute defines ethics as “a set of values, principles, and techniques that employ widely accepted standards of right and wrong to guide moral conduct in the development and use of AI technologies”.

It’s tricky to define comprehensively since it could relate to so many aspects: for example, the use case itself – e.g. is it ethical for a bank to offer more favorable terms to people with hats if the data shows they’re less likely to default? There are also questions about mathematical definitions of fairness and which ones we value in a particular context: see for example this short explanation

Safety to me relates more to questions of accuracy, reliability, security, and robustness. For example, some adversarial attacks on machine learning models maliciously modify input data – how do you protect against that? 

Do you ever think about the role of AI in the long-term future of government, when technology improvement potentially accelerates exponentially? What do you contemplate when lost in thinking about decades to come? 

Definitely. In fact, the book that initially got me into AI was Nick Bostrom’s Superintelligence. To me, this is an important part of AI safety: preparing for low-probability but high-impact developments. Rapid acceleration can come with a number of dilemmas and problems, like a superintelligence explosion leading to the well-documented control problem, where we get treated by machines the same way we treat ants: not with any malign intent, but without really thinking about their interests if they don’t align with our objectives (like a clean floor). On this, I highly recommend Human Compatible by Stuart Russell. On superintelligence, I actually found Eric Drexler’s framing of the problem a lot more intuitive than Bostrom’s (see Reframing Superintelligence). 

Horizon scanning and forecasting are two useful tools for Governments to monitor the state of AI R&D and AI capabilities – but sadly this type of risk is rarely on Government’s radar. And yet it should be – precisely because there are fewer private-sector incentives to get this right. But there are things Governments are doing that are still helpful in tackling long-term problems, even though this isn’t necessarily the primary aim. 

There was a recent Twitter spat between AI giants at Facebook and Tesla on this actually. I don’t really buy Jerome Pesenti’s arguments: no one claims we’re near human-level intelligence, and allocating some resources to these types of risks doesn’t necessarily mean ignoring other societal concerns around fairness, bias, and so on. Musk on the other hand is too bullish.

What governments in Serbia and the Balkans region can and should learn from the UK AI Policy experience? Can you share three simple recommendations?

That’s a good but difficult question, particularly as I have limited knowledge of the state of affairs on the ground!

I think firstly there is a need for technology-specific governance structures – a point one of my favorite academics, Dr. Acemoglu, emphasized during a recent webinar. A team like the Office for AI can be a very helpful central resource but only if it is sufficiently funded, has the right skill set, and is empowered to consider these issues effectively

Second, there should be some analysis to identify key gaps in the AI ecosystem and how to fix them. This should be done in close partnership with academic institutions, the private sector, and civil society. In the UK, very early on the focus was essentially on skills and data sharing. But there are so many other facets to AI policy: funding long-term R&D, or implementing open data policies (see e.g. how TfL opening up its data led to lots of innovations here, like CityMapper).

Lastly, I really liked Estonia’s market-friendly AI strategy, and I think a lot of it can be replicated in Serbia and neighboring countries. One particular aspect I think is very important is supporting measures for the digitalization of companies in the private sector. It’s important for markets to not only be adequately equipped from a technological point of view but also to fully understand AI’s return on investment. 

Careers in AI Policy is relatively new. Can you recommend the most important readings for those interested in learning more?

The 80,000 Hours Guide on this is excellent, and so is this EA post on AI policy careers in the EU. I was recently appointed as an Expert to a fantastic new resource, AI Policy Exchange – I think it’s super promising and highly recommend it. Lastly, definitely check these amazing newsletters, in no particular order: