Category: Interviews

Interview: Wonders and Challenges of Synthetic Futures (We’re Already In) / Henry Ajder

Many of the dystopian visions of the future have “the reality problem”, where digital beings and realities are everywhere, indistinguishable from “the real world”.  Advanced technologies built today make many of these fantasies (and nightmares) – closer, if not already here. Are digital, synthetic human-like creatures going to conquer our social lives? What is synthetic media, why is it revolutionary, and are so-called deepfakes a threat to national security and our sanity and wellbeing? How – and who – is “protecting us”? I spoke with Henry Ajder who is one of the leading researchers on the topic of synthetic media, magical tech that is transforming both our understanding of ourselves and our social lives, to find out more.  

AI technologies are just kicking in. And we’re still far from the digital-first world

The AI field itself is just several decades old and has gone through different cycles over time, made of hypes, winters, high hopes, new winters, then all over again, and many new approaches and techniques emerging along the way. It is constantly evolving, and one technique, in particular, so-called deep learning, is the most responsible for recent breakthroughs. And one of the most thought-provoking, but also consequential applications of AI tech is generative media, where machines and algorithms create media and “things” that don’t exist but seem and act as real. A simple face swap today, the banal use case, can become true madness just around the corner. Things are getting pretty messy even today.

Many extremely smart people suspect that the invention of super-advanced AI will be the single most important event in human history, so much that some assume even further proclaiming it to be the number one existential risk for humanity. Something so strong and intelligent beyond human apprehension, with its own will, might not be controlled by us mortals, and we might lose and suffer, in silly ways. Utopians see AGI, artificial general intelligence, that’s when machine intelligence surpasses ours in all areas, as the most beneficial discovery ever;  that’s when flourishing will be unstoppable, and unimaginable possibilities will be popping up everywhere. Solving the biggest problems, the creation of wealth for all, so the world can exponentially leap forward. No one knows how and when, but we’re moving towards it, fast. 

And then, rather unexpectedly it feels, Facebook rebrands to Meta, trying to lead the deeper transformation in which we collectively start living our lives in a setting that is digital-first. They call it metaverse, although no one has the perfect definition yet, trying to brand what might become the next computational paradigm. Instead of using our fingers and staring at pocket devices, it will be so much more immersive. Black Mirror stuff. How exactly? Well, no one knows.

We are still very early in all of this. Software is running our contemporary lives, but most of the software is still dumb in many ways and instances, and so many of our life experiences are not digital-first. There are too many frictions. And the whole metaverse pitch is also confusing. There is this atypical understanding of it though, which puts the focus on time. Maybe we are there already? Same as cyborgs, more or less. Can you imagine living without a phone – and Google –  as your mind extension? It’s just that all of this will be thousands of times better in the next 50 years.

Wait: what is real?

“What is real?” is one of the most frequent questions philosophers have been kicking their heads with from the dawn of philosophy as the discipline of thinking about the world, us, and nature. Up until recently, our eyes, for example, were enough to do a basic tell. 

These people, for example, are not real. They look nice, don’t they? Their faces look cute, even familiar, definitely real, but are made by computers, algorithms. Now imagine having a Zoom call with some of them, if they could speak intelligently. Could you fall in love, or make a deal and wire 1000 bucks? Have you seen A, B, or C? We’re not there yet, but we are getting closer. If it’s possible to literally create people, many strange things can happen. 

Disinformation is the first to come to mind. Just the ability to create content that looks real, whether video, photo, or audio, and the power to impersonate someone saying or doing something that hasn’t happened, creates threats that can lead to deaths, chaos, or even wars. Imagine a fake video in which Trump says something really nasty about an Arab leader, or a video of someone pressing the nuclear red button. Wicked.  

The treat is real. The possibilities are miraculous!


What’s actually happening? Where is this generative media tech applied today, what are folks building? How is it regulated, if there’s a need? How can we be sure that what we see on Youtube isn’t false?

Henry Ajder is one of the leading researchers and thinkers in this exciting field, regularly contributing to global media outlets and consulting the most important stakeholders in the world, including governments, businesses, and international organizations. He’s currently part of the Metaphysic where he leads Strategy and Partnerships. Previously, he was Head of Research Analysis at the world’s first deepfakes detection company an Emerging Technologies Researcher at the London-based innovation think tank, Nesta. The topic is massive, so we’ve tried to focus just on some of the most interesting aspects, primarily to inspire further attention, thinking and research. 


The obvious question first: what is synthetic media? I mean, deepfakes sound so frightening. Can we start with a few definitions?

I’d say deepfakes are a form of synthetic media, but not all synthetic media necessarily is a deepfake. The phrase deepfake emerged in a very kind of natural way, in the community, online. And as such, it wasn’t coined in a scientific or academic context, and so its meaning is still quite fluid. It was used initially exclusively to refer to the swapping of celebrity faces in pornographic footage, but now people use it to refer to different kinds of face manipulation and video. They refer to non-pornographic videos. They refer to voice audio as deepfakes, as the term doesn’t really have a fixed meaning. I typically use it to describe malicious uses of synthetic media, which are intentionally designed to harm or deceive. But that is by no means a universally agreed definition. Some people don’t use it in that way. S the term that I try to use, where possible, is synthetic media, and synthetic media refers to a broad range of forms of media, which are generated entirely or partially by using artificial intelligence, specifically types of deep learning and neural networks. That could be things like voice audio, that could be music, that could be swapping entire faces, or things like lip movements. That could be photos of nonexistent people or even non-existent entities. That could be areas such as the kind of interactive media technologies, kind of things akin to metaverse technologies, like AR and VR. And then it could also technically be something like GPT-3 or large language models. A wide range of tech, obviously things like Jurassic Park AI, or like early forms of VFX still in some sense, all synthetic media. If you want to be really specific, you can talk about AI-generated synthetic media, but typically most people just say synthetic media. 

How did you end up in all of this, what’s your personal story and why do you think it’s so important that you’ve decided to dedicate your career to it?

So I started off in my academic background as a philosopher or philosophy student, where I was focused primarily on metaphysics and the philosophy of perception. That’s a really interesting area for me because you kind of see this collision of philosophical traditions in philosophy perception around things like phenomenology, the nature of conscious experience, and trying to fit that into kind of models of understanding and how certain experiences impact the way you see the world, yourself, and other people. I was also very interested in the issues around AI and things like Superintelligence. I saw a lot of emerging technologies, in a sense, bankrupting existing moral frameworks to an extent and requiring new ways of thinking about new problems to account sufficiently for harms and benefits. And so with those two kinds of focus areas, with deepfakes first emerging in late 2017, I came across the subreddit post on a group about futurism, about future technologies. And it immediately struck me, this technology. And seeing this as an entirely new category, an entirely new form of creation and also perception. How this will change the way that we interact with each other, both organically and synthetically was really quite profound to me, and looking both at those kinds of interesting creative uses around art and things like this, but then also the quite wide range of malicious uses, too. I came into this by researching the topic in think tanks and then working for the world’s first deepfake detection company as a researcher looking into the different kinds of deepfakes out there. That’s the rough sketch of my journey. 

The company you’re working with now, Metaphysic, is creating software and synthetic media solutions for others, as a service. Can you tell me more about it? What are some of the most interesting applications that are over already delivered, or in production? 

In Metaphysic it’s about developing the technologies to make hyper-real synthetic media better and to an extent, more accessible, but with the caveat of doing that in a way that is responsible and ethically driven. The company was co-founded by Chris Ume, the world-leading deepfake artist behind the deep Tom Cruise project, and many other really impressive kinds of hyper-real deepfakes. The idea is that the company is going to be trying to develop the tools. That means more people don’t necessarily need to have that very specific, highly rarefied expertise that Chris has and can use the technology in creative and exciting ways. We’re not going to have this in the hands of anyone, but it could be that, for example, someone in the VFX space who has no background in deepfakes can now use this to create certain effects for films or for adverts, or for pieces of art. The basic premise behind it is to provide the engine for hyper-real synthetic media of the future. Tom Cruise project really got people to pay attention. What’s really interesting that I’ve noticed is people recognize that Tom Cruise is a character, which is not the same as Tom Cruise. A lot of people love this kind of character that you build around him in the kind of funny situations you put them in. I think that’s really interesting and we’re seeing kind of deepfake satire and parody becoming a really kind of prominent area, but also film, entertainment, and more generally, the speculative area of the metaverse. The idea is that as the technology becomes more accessible, becomes more kind of embedded in day-to-day life, that we can all have hyper-real avatars of ourselves that we could then use. You can play with your synthetic version of yourself in the many different applications, which soon will be kind of a part of everyday life in that kind of world of the metaverse and sort of the Web3 era of communication. We recognize that it needs to be done right. There’s a lot of the classic Spider-Man. Great power, great responsibility. Got to make sure that the technology we’re developing doesn’t fall into the wrong hands. 

I believe you can share some data and insights? A number of deepfakes that are online, number of registered misuses, categories of misuses, adoption rates, as in people creating synthetic media content with their phones, numbers that can tell what’s happening and how is it progressing? 

The most important piece of research I did was the world’s first mapping of the deepfake landscape back in 2019, in a report called The State of Deepfakes. And at the time, we found that the number of deepfake videos online was just under 15000, and that represented an almost doubling in the space in just under a year. Fast forward to today, the landscape has changed dramatically. The biggest driver of that again is being synthetic media or deepfake applications becoming increasingly accessible to the everyday person. It’s no longer something that you need proficiency in programming or expensive computer hardware, nor the time to learn. Face swapping is super accessible through friendly apps on iPhone App Store or on the Android Store, and you typically need just one image that takes seconds to generate. One of the biggest apps in the space for that kind of usage could Reface. I think it was July they announced that they’d had over three billion Facebook videos generated on their app in another app working in a similar kind of space. Wombo, which is another novelty kind of app like lip-synching and kind of media, is a very friendly app. They recently announced that they’d passed 1 billion “wombos” being generated. You can see that the emergence of these apps has fundamentally changed the nature of the landscape. The malicious users are still very much prominent, and particularly in the gendered image abuse context, commonly referred to as deepfake pornography. It’s a phrase I try not to use, but that is a space which again, as these other apps have become more accessible, so have the malicious tools become more accessible. I’ve done research on these tools becoming increasingly user-friendly over the last couple of years, identifying new tools that are emerging that give people more powerful capabilities for things like synthetically stripping still images to face swapping, again one click into a library of adult material. These spaces are growing massively. Some of these forums the biggest form for this content have over a million members. You know some of the most popular tools for generating this content. Although the landscape may have shifted away from like that being a small number of videos and the vast majority of those being pornographic in terms of malicious deepfakes, the vast majority remain this form of image abuse targeting women. We are starting to see cyber security threats, in particular fraud and impersonation via audio increasing in frequency, and we’re also seeing the concept of deepfakes alone destabilizing democratic processes. People think  deepfakes are the thing causing the problem for democracies and people causing people not to trust what they see. Or is actually if the idea itself now allows people to dismiss real things as fake as well? And that has caused in Gabon, in Africa, in Myanmar, those two countries in particular serious, serious democratic issues where people believe that real videos were fake and that’s had serious consequences. That’s a kind of rough lay of the landscape right now, at least on the malicious side. There are plenty of creative and commercial uses, which are super interesting around film, new forms of communication, particularly for younger generations, around digital fashion,  and around avatars. The malicious side of things is certainly equally as important, if not more so. 

You worked for a company that does deepfake detection. How hard is it to detect a deepfake? How does the process work? Is it possible for something not to be recognized as a deepfake?  

It depends on what you mean by deepfake detection. If you’re talking about the automated process – yes. If you’re talking about training machine learning systems or algorithms to identify deepfakes with the aim for that to be more reliable than the human eye, the short answer is it’s incredibly, incredibly difficult because these models are being updated all the time for generating deepfakes. Because this is adversarial, the dynamic, between people who are looking to fool detection systems and building them. The people who form the systems are always going to be on the front foot because they get to fool the system, at which point the people then have to scramble to fix that. Facebook did a deepfake detection challenge, the results were published back in January last year. The best accuracy level was 65 percent. Can you imagine, let’s say there’s a law case or a trial where someone is being accused of murder, and the video is the defining piece of evidence. Would you trust the 65 percent accuracy-based model for that? You probably wouldn’t in many other critical contexts as well. The current accuracy and reliability of these tools are just not good enough to be deployed and based on that adversarial dynamic, I think it’s highly unlikely that we will ever see deepfake detection alone being used to decide whether something is real or not, or if we do that, to me is a sign of a pretty shady system for authentication. 

What do you do then? The other thirty-five percent

I mean, I think the thing is, it’s not the other 35 percent, it’s all of it. If it’s only 60 percent, that’s slightly better than the guess. Would you trust it? I don’t think I would. And this is a real challenge. This is a really significant issue. Detection may get really good, it is possible if there’s a lot of resources going into it and it’s constantly being invested in and there are some breakthrough techniques, maybe detection really could do the job, on maybe it could do a really good job on the vast majority of content. Maybe it’s like one layer of the process. Ultimately, it’s a real challenge if synthetic media gets to the point, which I think it will, which is just hyper-realistic content that is is very pervasive. Definitively proving whether something is real or not is like you can be very difficult to do after the fact. What I think is probably the most promising solution approach is going for authentication at the source. There are several initiatives and technology approaches going on here, which are kind of controlled capture or content authentication protocols, which are basically deploying where images were captured or a video is recorded on the chip in the device, you’re getting encrypted metadata, which is then attributed and you know that ledger is then filed on the blockchain, right? So that more bottom-up approach is much easier for a lifespan of a piece of media to deal with than trying to authenticate things from a more top-down perspective. Obviously, it comes with its own issues, the labeling. Maybe you can or can’t afford these new cutting-edge technologies that have this software. Maybe if you use it in human rights context, for example, they don’t want to give away their metadata or reveal their identity. And so there are issues around creating media hierarchies. I think, on the whole, that is a much more promising approach to authenticating media in a critical context than detection. Even if I think detection still has some role to play, it will be more limited than some people think. 

On the other side, let’s think of tech for good applications for a second. For example, loneliness or mental health issues, extremely big ones. I can fantasize about education as well, where every kid has a personal mentor of some sort, a virtual person. Are there any startups already building any of this stuff? 

I think that there are lots of really interesting and exciting uses of synthetic media, as you said, for things like bringing education to life, like having someone tell you about, you know, a historical figure, tell you about their life in an engaging way or like having that person as your mentor. A lot of work is being done looking at whether synthetic media could help people recover from addiction. Could maybe synthetically recreating someone’s voice or their likeness help someone process the loss of that person? On the flip side, it could also ruin process. This is why it needs to be carefully developed and studied. And I think one of the areas that is worth mentioning here, which I’m excited about, is accessibility. One kind of often cited project is called Project Revoice, which is about synthetically recreating people’s voices who have lost the ability to speak. So if you think of Stephen Hawking who was given the robotic voice, it was just there was no technology to clone the real one. Chances are he would get a really nice, personalized voice to speak with now, which I think is hugely valuable to people who rightly see the voice as an extension of their identity. And then also in a space like gaming, transgender gamers being able to perhaps speak with a voice that better reflects how they identify, help people, or using voice masking if they don’t want to share their own voice online. There are loads of that, this is just scratching the surface. We are Metaphysic are also working on some really interesting projects, which I can’t say much about, but we’re working on some really cool stuff that is really trying to showcase the kind of good that synthetic media can do. And I see a lot of avatars and education. And there are some like Réplica, for example, which are about creating chatbots and friends like avatars that could be your friends. But there are some questions about is that the right way to fix loneliness in society? Can a chatbot or a synthetic personality ever replace a real person? And at the same time, a new generation, a new relationship with technology? Maybe that is the future. There are lots of startups really interested in using it in a positive way. 

That’s absolutely mind-blowing

The tech really is astonishing. If you’re interested, look up Codec avatars by Facebook. That’s that like embodied VR chat avatars and they are unbelievable. They are unbelievably realistic. And so if you imagine that within a decade, that’s going to be in the hands of everyone, and we’ll be able to have a conversation like this, but embodied in VR with synthetic versions of ourselves. It’s going to be an incredibly disruptive technology, in both positive and negative way. 

To cite US Congress: “potential to be used to undermine national security, erode public trust in our democracy and other nefarious reasons”. If you could summarize what regulators and governments around the world are doing now, how can government approach any of this? 

The way that governments have been approaching it so far has been kind of quite polarized. On one side, you have a lot of people who are really parroting quite sensational lines around it. It’s going to cause World War III by someone playing a video of Trump pressing the button back when Trump was president, or this is going to mean we can’t trust anything we see anymore. There’s a lack of nuance, and understandably so to an extent. The media coverage is fixated on the malicious users typically, and those malicious uses are serious and need to be taken seriously. One thing that I find slightly difficult somewhat sometimes is people suggesting criminalizing the usage of tech based on its malicious or deceptive potential. How do you define deceptive, when is something intentionally deceptive? Does it matter or not? Is something as a Snapchat filter that you’re using on your Tinder profile, deceptive to the person you’re about to go on a date with? Then computational photography and the chips in all of our smartphones are going to be banned because that effectively processes all of the images you take on that phone. Governments are working on certain fronts in ways I think are really positive. For example, explicit laws around banning intimate image abuse, using deepfakes, I think are good. One way I don’t like is the more kind of sticky stuff. For example, say like banning deepfakes 30 days before an election, unless clearly fake or satire. How do you define what satire is? You know, 30 days seems quite arbitrary. Why not 14, or 45? Why 30 days? The governments around the world, from South Korea to Taiwan to India to the US and UK are actively looking to legislate, and I think that’s good on certain fronts. But without recognizing the nuances that are required with such a broad technology that might cause issues that I’m keen to avoid. 

You already mentioned Facebook. Big Tech and big social media platforms play an important role here because that’s the distribution channel and how content becomes viral. Besides the challenge you already mentioned organized by Facebook, what else is happening? Especially because “algorithmic moderation” isn’t perfect – it’s not working. Do these companies have dedicated teams that are looking into this? 

I guess there’s a difference I’d say between like the big social media platforms, in a sense, and Big Tech. It’s quite clear that Facebook’s whole statement of intent with becoming Meta. Microsoft is working on a lot of metaverse synthetic media technologies, as Amazon, as Google, as Tencent and Baidu. And all of these big technology companies have research labs that are focusing on synthetic media and also on its applications. Also Twitter and Facebook, all those platforms. You’re right, algorithmic moderation not very robust, but ultimately is the best they can do without literally bankrupting their business model. They would never be able to hire the moderators required to ever not use algorithmic moderation. Having said that, they have got policies coming in that to have varying efficacy around synthetic media. For example, Facebook is banning deepfakes, like A.I. generated or manipulated deceptive stuff, but they haven’t banned more crude forms of media manipulation, such as what we call shallow fakes. Twitter has policy explicitly again forbidding malicious or deceptive uses of synthetic media or manipulated media in general. It comes back to the question when the hypothetical causes this problem. They can’t stop it being uploaded to the platform, at least not right now. And it will take time for it to be like blocked or banned at that point. Chances are the damage has already been done. Big Tech is investing a lot of money into this. The social media platforms in particular, are very much buying into this kind of vision of the future of communication. But the problems that come with deepfakes are the same kinds of problems that we’ve seen with other forms of racial abuse and media manipulation. It’s not like an entirely new problem, it’s just a new way of expressing existing issues, and it’ll be very interesting to see how the platforms deal with it. 

I have two more questions. One is from the conference you’ve organized recently, and the second one is more sci-fi-ish. I really liked the idea of “media provenance”, where you as a reader or viewer can see the history of how certain content was made. How would that like look in practice? Would it be like tags, would it be something on the side of a viewer, for example, on YouTube or when you’re scrolling the News Feed? Are there any initiatives there that are already in place? 

People are dating “robots” even today. If we can hypothetically create virtual, digital beings that are indistinguishable from reality, do you see the future where are life companions might be virtual? Living “happily ever after”, in metaverse?

On the first one on provenance, the best place to look right now is Adobe’s Content Authenticity Initiative. And specifically, the C2PA  standard, which is one of those open standards for content provenance akin to, say, PDF. t’s an open standard that is used and is accessible to anyone. It has Twitter signed up to it, the New York Times signed up it, BBC, Intel, Arm. It’s got that whole journey right from the chip on the phone to the social media platforms, to the news platforms. Metaphysic is a member of the Content Authenticity Initiative as well. The process is that you try and get that metadata from the image where it was taken, when it was taken. The kind of rearrangement of the pixels fundamentally is right. And then you have a button maybe to press. Maybe like a watermark, or some kind of accessible metadata for certain applications where you can then access it. For example, right now, Twitter, if something’s misleading, they have the little icon saying “this is misleading”, maybe in the future they have something that says “to see image metadata, press here”. If it’s been edited, that tells you when it’s been edited and how and then on other applications, I imagine it would be a very similar story is that, you know, news organizations is in the interest to be very transparent with this stuff. Adobe, again, obviously evangelizes their own sort of technologies very heavily in terms of the kind of embedding this into their own products and their own tools. It’s still not 100 percent solidified, and it will vary depending on platform to platform, But I think that the ideal end goal is that we reflexively look for metadata in the same way that maybe we look for like a corresponding headline from the BBC or the New York Times about a news story that seems suspicious. So you really trying to build in that reflexive attitude towards it. 

On the second question, the kind of the Brave New World aspect of this. I mean, if you think about Alexa as a tool and when that was released,  a lot of people, including myself, felt uncomfortable with that in the house. It felt weird. It felt kind of alien, at first. And now it’s an accepted part of most people, or a lot of Western people in particular, and they would reflexively ask, “What’s the weather like today?”. And I think you’ll see a similar shift coming with synthetic media and more sophisticated and realistic forms around avatars and things like this. I mean, we’re already seeing virtual influencers being some of these popular influencers on Instagram, getting probably million-dollar deals to advertise.

Hypothetically, it’s possible to have, like immortal whatever, like synthetic version your parents them,  they can live forever. If there virtual beings that have their movements, or voice, and everything in between?

It’s almost like some imprint of them. It’s just getting very sci-fi, but like being able to import your identity entirely in some respect, like basically uploading your brain into a synthetic avatar. And then there are questions about is there such thing as static identity, identity over time? You know, you’re always learning. Are you in a state of flux? And is that like an extra capturing of your final moment, your final stage before you die? Does that mean that you’re already like an identity if you’re not constantly in flux and learning and growing? Or could you continue to evolve? I mean, again, these are super abstract, hypothetical questions. What I do think is going to become more frequent and more likely is that people form attractions of many different forms to synthetic versions of themselves, real people, or non-existent people. We already see that with people fancing animated characters. Or like people using Snapchat filters and wanting to get surgery to look like the filters. And we all are in this weird point of like like reality imitating fiction in a weird sense. And I think that a lot of kind of interesting philosophical and ethical questions that surround that, but no doubt what seems alien and weird today in a decade, I think will be much more part of day-to-day life. It will bring many questions that we need to think about now so that we’re not being reactive to problems that emerge, but proactive in trying to figure out how to best use and implement these technologies. 

Wikidata and the Next Generation of the Web – Goran Milovanović, Data Scientist for Wikidata @Wikimedia Deutschland

The world without Wikipedia would be a much sadder place. Everyone knows about it and everyone uses it – daily. The dreamlike promise of a great internet encyclopedia, accessible to anyone, anywhere, all the time – for free, has become a reality. Wikidata is a lesser-known part of the Wikimedia family, represents a data backend system of all Wikimedia projects, and fuels Apple’s Siri,  Google Assistant, and Amazon’s Alexa among many other popular and widely-used applications and systems.

Wikipedia is one of the most popular websites in the world. It represents everything glorious about the open web, where people share knowledge freely, generating exponential benefits for humanity. Its economic impact can’t be calculated; being used by hundreds of millions, if not billions of people worldwide, it fuels everything from the work of academics to business development.

Wikipedia is far more than just a free encyclopedia we all love. It’s part of the Wikimedia family, which is, in their own words: “a global movement whose mission is to bring free educational content to the world.” To summarize its vision: “Imagine a world in which every single human being can freely share in the sum of all knowledge.”

Not that many people know enough about Wikidata, which acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.

Goran Milovanović is one of the most knowledgeable people I know. I invited him to lecture about Data Science potential in the public administration at the policy conference that I organized five years ago. We remained friends and I enjoy talking with him about everything that has ever popped in the back of my head. Interested in early Internet development and the role of RAND in immediate postwar America? No worries, he’ll speak 15 minutes about it in one breath.

Goran earned a Ph.D. in Psychology (a 500+ pages long one, on the topic of Rationality in Cognitive Psychology) from the University of Belgrade in 2013, following two years as a graduate student in Cognition and Perception at NYU, United States. He spent a lot of years doing research in Online Behaviour, Information Society Development, and Internet Governance, co-authoring 5 books about the internet in Serbia. He provides consultancy services in Data Science for Wikidata to Wikimedia Deutschland since 2017, where he is responsible for the full-stack development and maintenance of several analytical systems and reports on this complex knowledge base, and runs his own Data Science boutique consultancy DataKolektiv from Belgrade.

He’s a perfect mixture of humanities, math, and engineering, constantly contemplating the world from a unique perspective that takes so many different angles into account. 

We focused our chat around Wikidata and the future of the web, but, as always, touched many different phenomena and trends.

Before we jump to Wikidata: What is Computational Cognitive Psychology and why were you so fascinated with it?

Computational Cognitive Psychology is a theoretical approach to the study of the mind. We assume that human cognitive processes – processes involved in the generation of knowledge: perception, reasoning, judgment, decision making, language, etc. – can essentially be viewed as computational processes, algorithms running not in silico but on a biologically evolved, physiological hardware of our brains. My journey into this field began when I entered the Department of Psychology in Belgrade, in 1993, following more than ten years of computer programming since the 80s and a short stay at the Faculty of Mathematics. In the beginning, I was fascinated by the power of the very idea, by the potential that I saw in the possible crossover of computer science and psychology. Nowadays, I do not think that all human cognitive processes are computational, and the research program of Computational Cognitive Psychology has a different meaning for me. I would like to see all of its potential fully explored, to know the limits of the approach, and then try to understand, do describe somehow, even intuitively only, what was left unexplained. The residuum of that explanatory process might represent the most interesting, significant aspect of being human at all. The part that remains irreducible to the most general scientific concept that we have ever discovered, the concept of computation, that part I believe to be very important. That would tell us something about the direction that the next scientific revolution, the next paradigm change, needs to take. For me, it is a question of philosophical anthropology: what is to be human? – only driven by an exact methodology. If we ever invent true, general AI in the process, we should treat it as a by-product, as much as the invention of the computer was a by-product of Turing’s readiness to challenge some of the most important questions in the philosophy of mathematics in his work on computable numbers. For me, Computational Cognitive Psychology, and Cognitive Science in general, do not pose a goal in themselves: they are tools to help us learn something of a higher value than how to produce technology and mimic human thinking.

What is Wikidata? How does it work? What’s the vision?

Wikidata is an open knowledge base, initially developed by parsing the already existing structured data from Wikipedia, then improved by community edits and massive imports of structured data from other databases. It is now the fastest-growing Wikimedia project, recently surpassing one billion edits. It represents knowledge as a graph in which nodes stand for items and values and links between them for properties. Such knowledge representations are RDF compliant, where RDF stands for Resource Description Framework, a W3C standard for structured data. All knowledge in systems like Wikidata takes a form of a collection of triples, or basic sentences that describe knowledge about things – anything, indeed – at the “atomic” level of granularity. For example, “Tim Berners-Lee is a human” in Wikidata translates to a sentence in which Q80 (the Wikidata identifier for “Tim Berners Lee”) is P31 (the Wikidata identifier for the “instance of” property of things) of Q5 (the Wikidata identifier for a class of items that are “humans”). So, Q80 – P31 – Q5 is one semantic triple that codifies some knowledge on Sir Timothy John Berners-Lee, who is the creator of the World Wide Web by the invention of the Hypertext Transfer Protocol (HTTP) and 2016. recipient of the Turing Award. All such additional facts about literally anything can be codified as semantic triples and composed to describe complex knowledge structures: in Wikidata, HTTP is Q8777, WWW is Q466, discoverer or inventor is P61, etc. All triples take the same, simple form: Subject-Predicate-Object. The RDF standard defines, in a rather abstract way, the syntax, the grammar, the set of rules that any such description of knowledge must follow in order to ensure that it will always be possible to exchange knowledge in an unambiguous way, irrespectively of whether the exchange takes place between people or computers.

Wikidata began as a project to support structured data for Wikipedia and other Wikimedia projects, and today represents the data backbone of the whole Wikimedia system. Thanks to Wikidata, many repetitions that might have occurred in Wikipedia and other places are now redundant and represent knowledge that can be served to our readers from a central repository. However, the significance of Wikidata goes way beyond what it means for Wikipedia and its sisters. The younger sister now represents knowledge on almost one hundred million things – called items in Wikidata – and grows. Many APIs on the internet rely on it. Contemporary, popular AI systems like virtual assistants (Google Assistant, Siri, Amazon Alexa) make use of it. Just take a look at the number of research papers published on Wikidata, or using its data to address fundamental questions in AI. By means of the so-called external identifiers – references from our items to their representations in other databases – it represents a powerful structured data hub. I believe Wikidata nowadays has the full potential to evolve into a central node in the network of knowledge repositories online.

Wikidata External Identifiers: a network of Wikidata external identifiers based on their overlap across tens of millions of items in Wikidata, produced by Goran S. Milovanovic, Data Scientist for Wikidata @WMDE, and presented at the WikidataCon 2019, Berlin

What’s your role in this mega system? 

I take care about the development and maintenance of analytical systems that serve us to understand how Wikidata is used in Wikimedia websites, what is the structure of Wikidata usage, how do human editors and bots approach editing Wikidata, how does the use of different languages in Wikidata develop, whether it exhibits any systematic biases that we might wish to correct for, what is the structure of the linkage of other online knowledge systems connected with Wikidata by means of external identifiers, how many pageviews we receive across the Wikidata entities, and many more. I am also developing a system that tracks the Wikidata edits in real-time and informs our community if there are any online news relevant for the items that are currently undergoing many revisions. It is a type of position which is known as a generalist in the Data Science lingo; in order to be able to do all these things for Wikidata I need to stretch myself quite a bit across different technologies, models and algorithms, and be able to keep them all working together and consistently in a non-trivial technological infrastructure. It is also a full-stack Data Science position where most of the time I implement the code in all development phases, from the back-end where data acquisition (the so-called ETL) takes place in Hadoop, Apache Spark, SPARQL, through machine learning where various, mostly unsupervised learning algorithms are used, towards the front-end development where we finally serve our results in interactive dashboards and reports, and finally production in virtualized environments. I am a passionate R developer and I tend to make use of the R programming language consistently across all the projects that I manage, however it ends up being pretty much a zoo in which R co-exists with Python, SQL, SPARQL, HiveQL, XML, JSON, and other interesting beings as well. It would be impossible for a single developer to take control of the whole process if there were no support from my colleagues in Wikimedia Deutschland and the Data Engineers from the Wikimedia Foundation’s Analytics Engineering team. My work on any new project feels like solving a puzzle; I face the “I don’t know how to do this” situation every now and then; I learn constantly and the challenge is so motivating that I truly suspect there can be many similarly interesting Data Science positions like this one. It is a very difficult position, but also one professionally very rewarding.       

If you were to explain Wikidata as a technological architecture, how would you do it in a few sentences? 

Strictly speaking, Wikidata is a dataset. Nothing more, nothing less: a collection of data represented so as to follow important standards that makes it interoperable, usable in any imaginable context where it makes sense to codify knowledge in an exact way. Then there is Wikibase, a powerful extension of the MediaWiki software that runs Wikipedia as well as many other websites. Wikibase is where Wikidata lives, and where it is served from wherever anything else – a Wikipedia page, for example – needs it. But Wikibase can run any other dataset that complies to the same standards as Wikidata, of course, and Wikidata can inhabit other systems as well. If by the technological architecture you mean the collection of data centers, software, and standards that make Wikidata join in wherever Wikipedia and other Wikimedia projects need it – well, I assure you that it is a huge and a rather complicated architecture underlying that infrastructure. If you imagine all possible uses of Wikidata, external to the Wikimedia universe, run in Wikibase or otherwise… then… it is the sum of all technology relying on one common architecture of knowledge representation, not of the technologies themselves.

How does Wikidata overcome different language constraints and barriers? It should be language-agnostic, right?

Wikidata is and is not language-agnostic at the same time. It would be best to say that it is aware of many different languages in parallel. At the very bottom of its toy box full of knowledge, we find abstract identifiers for things: Q identifiers for items, P identifiers for properties, L for lexemes, S for senses, F for forms… but those are just identifiers, and yes they are language agnostic. But things represented in Wikidata do not have only identifiers, but labels, aliases, and descriptions in many different languages too. Moreover, we have tons of such terms in Wikidata currently: take a look at my Wikidata Languages Landscape system for a study and an overview of the essential statistics.

What are the knowledge graphs and why they are important for the next generation of the web?

They are important for this generation of the web too. To put it in a nutshell: graphs allow us to represent knowledge in the most abstract and most general way. They are simply very suitable to describe things and relations between them in a way that is general, unambiguous, and in a form that can quickly evolve into new, different, alternative forms of representation that are necessary for computers to process it consistently. By following common standards in graph-based knowledge representation, like RDF, we can achieve at least two super important things. First, we can potentially relate all pieces of our knowledge, connect anything that we know at all so that we can develop automated reasoning across vast collections of information and potentially infer new, previously undiscovered knowledge from them. Second, interoperability: if we all follow the same standards of knowledge representation, and program our APIs that interact over the internet so to follow that standard, then anything online can easily enter any form of cooperation. All knowledge that can be codified in an exact way thus becomes exchangeable across the entirety of our information processing systems. It is a dream, a vision, and we find ourselves quite far away from it at the present moment, but a one rather worth of pursuit. Knowledge graphs just turn out to be the most suitable way of expressing knowledge in a way desirable to achieve these goals. I have mentioned semantic triples, sentences of the Subject-Predicate-Object form, that we use to represent the atomic pieces of knowledge in this paradigm. Well, knowledge graphs are just sets of connected constituents of such sentences. When you have one sentence, you have a miniature graph: a Subject points to the Object. Now imagine having millions, billions of sentences that can share some of the constituents, and a serious graph begins to emerge.

A part of the knowledge graph for English (Q1860 in Wikidata)

Where do you think the internet will go in the future? What’s Wikidata’s role in that transformation?

The future is, for many reasons, always a tricky question. Let’s give it a try: on the role of Wikidata, I think we have clarified that in my previous responses: it will begin to act as a central hub of Linked Open Data sooner or later. On the future of the Internet in general, talking from the perspective of the current discussion solely: I do not think that the semantic web standards like RDF will ever reach universal acceptance, and I do not think that is even necessary for that to happen to enter the stage of internet evolution where complex knowledge is almost seamlessly interacting all over the place. It is desirable, but not necessary in my opinion. Look at the de facto situation: instead of evolving towards one single, common standard of knowledge and data representation, we have a connected network of APIs and data providers exchanging information by following similar enough, easily learnable standards – enough to not make software engineers and data scientists cry. Access to knowledge and data will ease and governments and institutions will increasingly begin to share more open data, increasing the data quality along the way. It will become a thing of good manners and prestige to do so. Data openness and interoperability will become one of the most important development indicators, tightly coupled with questions of fundamental human rights and freedoms. To have your institution’s open data served via an API and offered in different serializations that comply with the formal standards will become as expected as publishing periodicals on your work is now. Finally, the market: more data, more ideas to play with.

A lot of your work is relying on technologies used in natural language processing (NLP), typically handling language at scale. What are your impressions of Open AI’s GPT-3 which is quite a buzz recently? 

It is fascinating, except for it works better in language production that can fool someone than in language production that exhibits anything like the traces of human-like thinking. Contemporary systems like GPT-3 make me think if the Turing test was ever a plausible test to detect intelligence in something – I always knew there was something I didn’t like about it. Take a look, for example, at what Gary Marcus and Ernest Davis did to GPT-3 recently: GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about. It is a clear example of a system that does everything to language except for it does not understand it. Its computational power and the level up to which it can mimic the superficial characteristics of the language spoken by a natural speaker are fascinating. But it suffers – and quite expectedly, I have to add – from a lack of understanding of the underlying narrative structure of the events, the processes it needs to describe, the complex interaction of language semantics and pragmatics that human speakers face no problems with. The contemporary models in NLP are all essentially based on an attempt to learn the structure of correlations between linguistic constituents of words, sentences, and documents, and that similarity-based approach has very well known limits. It was Noam Chomsky who in the late 50s – yes, 50s – tried to explain to the famous psychologist B. F. Skinner that just observing statistical data on the co-occurrences of various constituents of language will never provide for a representational structure powerful enough to represent and process natural language. Skinner didn’t seem to care back in time, and so didn’t the fans of the contemporary Deep Learning paradigm which is essentially doing exactly that, just in a way orders of magnitude more elaborated than anyone ever tried. I think we are beginning to face the limits of that approach with GPT-3 and similar systems. Personally, I am more worried about the possible misuse of such models to produce fake news and fool people into silly interpretations and decisions based on false, simulated information, than to question if GPT-3 will ever grow up to become a philosopher because it will certainly not. It can simulate language, but only the manifest characteristics of it; it is not a sense-generating machine. It does not think. For that, you need some strong symbolic, not connectionist representation, engaged in the control of associative processes. Associations and statistics alone will not do.

Do humanities have a future in the algorithmic world? How do you see the future of humanities in the fully data-driven world?

First, a question: is the world ever going to be fully data-driven, and what does that mean at all? Is a data-driven world a one in which all human activity is passivized and all our decisions transferred to algorithms? It is questionable if something like that is possible at all, and I think that we all already agree that it is certainly not desirable. While the contemporary developments in Data Science, Machine Learning, AI, and other related fields, are really fascinating, and while our society is becoming more and more dependent upon the products of such developments, we should not forget that we are light years away from anything comparable to true AI, sometimes termed AGI (Artificial General Intelligence). And I imagine only true AI would be powerful enough to run the place so that we can take a permanent vacation? But then comes the ethical question, one of immediate and essential importance, of would such systems, if they ever come to existence, be possible to judge human action and act upon the society in a way we as their makers would accept as moral? And only then comes the question of do we want something like that in the first place: wouldn’t it be a bit boring to have nothing to do and go meet your date because something smarter than us has invented a new congruence score and started matching people while following an experimental design for further evaluation and improvement?

Optimization is important, but it is not necessarily beautiful. Many interesting and nice things in our lives and societies are deeply related to the fact that there are real risks that we need to take into account because irreducible randomness is present in our environment. Would an AI system in full control prevent us from trying to conquer Mars because it is dangerous? Wait, discovering the Americas, the radioactive elements, and the flight to the Moon were dangerous too! I can imagine that humanity would begin to invent expertise in de-optimizing things and processes in our environments if some fully data-driven and AI-based world would ever come to existence. Any AI in a data-driven world that we imagine nowadays can be no more than our counselor, except for if the edge case that it develops true consciousness turns out to be a realistic scenario. If that happens, we would have to seriously approach the question of how to handle our relationship to AI and its power ethically.

I do not see how humanities could possibly be jeopardized in this world that is increasingly dependent on information technologies, data, and automation. To understand and discover new ways to interpret the sense generating narratives of our lives and societies was always a very deep, a very essential human need. Does industrialization, including our own Fourth age of data and automation, necessarily conflicts with the world aware of the Shakespearean tragedy, as it does in Huxley’s “Brave New World”? I don’t think it is necessarily so. I enjoy the dystopian discourse very much because I find that it offers so many opportunities to reflect upon our possible futures, but I do not see us living in a dystopian society anytime soon. It is just that somehow the poetics of the dystopian discourse are well-aligned, correlated with the current technological developments, but if there ever was a correlation from which we should not infer any fatalistic causation that is the one.

AI and the Government, UK Experience – Sébastien Krier (UK)

Automation is not just replacing (repetitive) jobs, it’s after the government as well, and computer algorithms are already widely deployed across many public sector operations, in many cases without awareness and understanding of the general population. What’s happening with AI in government and how does government affect and interact with the development of technologies like artificial intelligence? We wanted to dig deeper into this subject through a conversation with an AI Policy expert who had a notable first-row seat in all of this while working for Office for AI, a special UK government agency responsible for overseeing the implementation of the United Kingdom’s AI strategy.                        

The role of government is to serve its citizens. In doing so it uses the latest technologies, transforming into smarter and more accessible, even interactive, through its technological updates. Not so long ago, it was the government and government-funded organizations who were inventing new technologies. While it seems that R&D power is shifting to so-called “big tech” companies, the relationship between government (and politics) and technology is of both existential and practical importance for all. The government is there to – at least – initiate, oversee, and regulate. One of the most interesting and important technological developments is artificial intelligence. There are really high hopes around it, with many believing that the AI revolution will be bigger than the Agricultural, Industrial, and Digital revolution. What is the role of government in that process? 

The world is changing fast and so does the government. With the digital and computer revolution and the advent of the internet and the world wide web, the government is going through one of the most significant transformations in its history. “Software is eating the world” (Marc Andressen); it is eating the government too. The $400 billion #govtech market (Gartner) emerged with startups and other companies doing technologies for the government, improving many aspects of it. Some estimates say it will hit a trillion dollars by 2025. From a historical perspective, it’s probably just a start. Future, digital-first government, will probably look totally different from what it used to be and what it is now. 

New realities create new fields of study and practices that did not exist. One of those fields is AI Policies, the field primarily interested in the intersection of AI and government. The United Kingdom is leading the way in all of this in many respects. In the case of AI technologies, it’s a birthplace of some of the most important AI research companies in the world, Deep Mind being just one of them. Traditionally among the best in the world, higher education produces scientific leaders and researchers. If you want to seriously study long term effects of technology like artificial intelligence on society and humanity at large, you’ve already stumbled upon Oxford’s Institute for the Future of Humanity. How is the UK government approaching artificial intelligence?                                                               

Sébastien is an AI Policy expert. After graduating at UCL and spending quite some time in law, Sébastien joined the British government, Office for Artificial Intelligence as an Adviser in 2018, a government joint-unit responsible for designing and overseeing the implementation of the United Kingdom’s AI strategy. He now helps public and private organizations design strategies and policies that maximize the benefits of AI, while minimizing potential costs and risks. 

Sébastien’s former role involved designing national policies to address novel issues such as the oversight of automated decision-making systems and the responsible design of machine learning solutions. He led the first comprehensive review of AI in the public sector and has advised foreign delegations, companies, regulators, and third sector organizations. Sébastien has also represented the UK at various panels and multilateral organizations such as the D9 and EU Commission. 

He is the perfect person to talk to about all AI government stuff. We had a chat with him about the AI and government in the UK.                      

You spent quite some time working at the Office for AI, the UK government. Can you tell us more about the purpose and the work of that government agency? How does the UK government approach artificial intelligence? 

The Office for AI is a joint-unit between two ministries – BEIS and DCMS – and is responsible for overseeing the implementation of the £1bn AI Sector Deal. The AI Sector Deal was the Government’s response to an independent review carried out by Professor Dame Wendy Hall and Jérôme Pesenti. Our approach was therefore shaped by these commitments and consisted of the following workstreams:

Leadership: the aim here was to create a stronger dialogue between industry, academia, and the public sector through the Government establishing the AI Council.

Adoption: the aim of this work was to drive public and private sector adoption of AI and Data technologies that are good for society. This included a number of activities, such as the publication of A Guide to Using AI in the Public Sector, which includes a comprehensive chapter on safety and ethics drafted by Dr. David Leslie at the Alan Turing Institute. 

Skills: given the gap between demand and supply of AI talent, we worked on supporting the creation of 16 new AI Centres for Doctoral Training at universities across the country, delivering 1,000 new PhDs over the next five years. We also funded AI fellowships with the Alan Turing Institute to attract and retain the top AI talent, as well as an industry-funded program for AI Masters.

Data: we worked with the Open Data Institute to explore how the Government could facilitate legal, fair, ethical, and safe data sharing that is scalable and portable to stimulate innovation. You can read about the results of the first pilot programs here.

International: our work sought to identify global opportunities to collaborate across jurisdictional boundaries on questions of AI and data governance, and to formulate governance measures that have international traction and credibility. For example, I helped draft the UK-Canada Joint-Submission on AI and Inclusion for the G7 in 2018).

Note that the Government also launched the Center for Data Ethics and Innovation, who were tasked by the Government to research, review, and recommend the right governance regime for data-driven technologies. They’re great and I recommend checking out some of their recent outputs, but I do think they’d benefit from being truly independent of Government

How would you define “AI Policy“? It’s a relatively new field and many still don’t properly understand what the government has to do with AI. 

I like 80,000 Hours’ definitionAI policy is the analysis and practice of societal decision-making about AI. It’s a broad term but that’s what I like about it: it touches on different aspects of governance, and isn’t necessarily limited to the central government. There are many different areas a Government can look at in this space – for example:

  • How do you ensure regulators are adequately equipped and resourced to properly scrutinize the development and deployment of AI?
  • To what extent should the Government regulate the field and incentivize certain behaviors? See for example the EC’s recent White Paper on AI
  • What institutional mechanisms can ensure long-term safety risks are mitigated? 
  • How do you enable the adoption and use of AI? For example, what laws and regulations are needed to ensure self-driving cars are safe and can be rolled out? 
  • How do you deal with facial recognition technologies and algorithmic surveillance more generally? 
  • How do you ensure the Government’s own use of AI, for example, to process visa applications, is fair and equitable?

I recommend checking out this page by the Future of Life Institute, which touches on a lot more than I have time to do here!

The UK is home to some of the most advanced AI companies in the world. How does the government include them in the policy-making processes? How exactly does the government try to utilize their work and expertise?

The Pesenti-Hall Review mentioned earlier is an example of how the Government commissioned leaders in the field to provide recommendations to the Government. Dr. Demis Hassabis was also appointed as an adviser to the Office for AI.

CognitionX co-founder Tabitha Goldstaub was asked last year to chair the new AI Council and become an AI Business Champion. The AI Council is a great way to ensure the industry’s expertise and insights reach the Government. It’s an expert committee drawn from private, public, and academic sectors advising the Office for AI and government. You can find out more about them here

The government decides to implement a set of complex algorithms in the public sector in whatever field. How does it happen in most cases? Companies are pitching their solutions first, or the government explicitly wants solutions for previously very well defined problems? How does Ai in the public sector happen?

That’s a good question. It really depends on the team, the department, the expertise available, and the resources available. To be honest, people overestimate how mature Government departments are to actually develop and use AI. Frequently they’ll buy products off the shelf (which comes with a host of issues like IP and data rights). 

Back in 2019 I helped lead a review of hundreds of AI use cases in the UK Government and found that while there are some very high-impact use cases, there are also a lot of limitations and barriers. AI Watch recently published its first report on the use and impact of AI in public services in Europe. They found limited empirical evidence that the use of AI in government is achieving the intended results successfully

The procurement system is also quite dated and not particularly effective in bringing in solutions from SMEs and start-ups, which is why the Government Digital Services launched a more nimble technology innovation marketplace, Spark. The Office for AI also worked with the WEF to develop Guidelines for AI Procurement.

Part of your work was focused on educating others working in the government and the public sector about AI and its potential and challenges. How does the UK government approach the capacity building of its public officials and government employees? 

There are various initiatives that seek to upskill and ensure there’s the right amount of expertise in Government. As part of the 2018 Budget, the Data Science Campus at the ONS and the GDS were asked to conduct an audit of data science capability across the public sector, to “make sure the UK public sector can realize the maximum benefits from data”. There are also specific skills frameworks for data science-focused professions. Ultimately though I think a lot more should be done. I think a minimum level of data literacy will be increasingly necessary for policymakers to properly understand the implications new technologies will have on their policy areas.

The recently published National Data Strategy also finds that “The lack of a mature data culture across government and the wider public sector stems from the fragmentation of leadership and a lack of depth in data skills at all levels. The resulting overemphasis on the challenges and risks of misusing data has driven chronic underuse of data and a woeful lack of understanding of its value.

What is your favorite AI in the public sector use case in the UK (or anywhere) – and why?

One of my favorite use cases is how DFID used satellite images to estimate population levels in developing countries: this was the result of close collaboration between the Government, academia, and international organizations. And this is exactly how AI should be developed: through a multidisciplinary team.

Outside of the UK, I was briefly in touch with researchers at Stanford University who collaborated with the Swiss State Secretariat for Migration to use AI to better integrate asylum seekers. The algorithm assigns asylum seekers to cantons across the country that best fits their skills profile, rather than allocate them randomly, as under the current system. That’s an impactful example of AI being used by a Government and in fact, I think Estonia is trialing similar use cases. 

On the nerdier side, Anna Scaife (who was one of the first Turing AI Fellows) published a fascinating paper where a random forest classifier identified a new catalog of 49.7 million galaxies, 2.4 million quasars, and 59.2 million stars!

What are the hottest AI regulation dilemmas and issues in the UK at this moment?

Until recently, the key one was how to govern/oversee the use of facial recognition. Lord Clement-Jones, the chair of the House of Lords Artificial Intelligence Committee, recently proposed a bill that would place a moratorium on the use of facial recognition in public places. That won’t pass but it’s a strong signal that the Government should consider this issue in more detail – and indeed the CDEI is looking into this.

But with the A-levels scandal in the UK, I think there is a growing acknowledgment that there should be more oversight and accountability on how public authorities use algorithms.

You’ve spent quite some time collaborating with European institutions. Can you tell us more about AI policy approaches and strategies on the European level? What is happening there, what’s the agenda? 

The European approach is a lot more interventionist so far. There are some good proposals, and others I’m less keen on. For example, I think a dualistic approach to low-risk and high-risk AI is naïve. Defining risk (or AI) will be a challenge, and a technology-neutral approach is unlikely to be effective (as the European Parliament’s JURI committee also).

It’s better to focus on particular use cases and sectors, like affect recognition in hiring or facial recognition in public spaces. I also think that it’s dangerous to have inflexible rules for a technology that is very complex and changes rapidly.  

Still, I think it’s encouraging they’re at least exploring this area and soliciting views from industry, academia, and the wider public. 

As for their general approach, it’s worth having a look at the white papers on AI and data they published back in February.

What happens when something goes wrong, for example, there is major harm, or even death, when the AI system is used for government purposes? Who is responsible? How should governments approach the accountability challenge?

It’s very difficult to say without details on the use case, context, algorithm, human involvement, and so on. And I think that illustrates the problem with talking about AI in a vacuum: the details and context matter just as much as the algorithm itself. 

In principle, the Government remains liable of course. Just because you program use cases that learn over time, doesn’t require human involvement, or cannot be scrutinized because of black-box issues, doesn’t mean usual product liability and safety rules don’t apply. 

Outside the public sector context, the European Commission is seeking views on whether and to what extent it may be needed to mitigate the consequences of complexity by alleviating/reversing the burden of proof. Given the complexity of a supply chain and the algorithms used, it could be argued that additional requirements could help clarify faults and protect consumers. 

The European Parliament’s JURI committee’s report on liability is actually very interesting and has some interesting discussions on electronic personhood and why trying to define AI for regulatory purposes is doomed to fail. They also find that product liability legislation needs to be amended for five key reasons:

  1. The scope of application of the directive does not clearly cover damages caused by software. Or damage caused by services.
  2. The victim is required to prove the damage suffered, the defect, and the causal nexus between the two, without any duty of disclosure of relevant information on the producer. This is even harder for AI technologies.
  3. Reference to the standard of “reasonableness” in the notion of defect makes it difficult to assess the right threshold for new technologies with little precedent or societal agreement. What is to be deemed reasonable when technologies and use cases evolve faster than they are understood?
  4. Damages recoverable are limited and the €500 threshold means a lot of potential claims are not allowed
  5. Technologies pose different risks depending on use: e.g. FRT for mass surveillance or FRT for smartphone face unlocks. Therefore, there is a need for a “sector-specific approach that does not prioritize the technology, but focuses on its application within a given domain”.

How would you define “AI Ethics” and “AI Safety”? And how governments are shaping the development and deployment of AI systems in ways that are safe and ethical? Which policy instruments are used for that?

The definition we used in our safety & ethics guidance with the Alan Turing Institute defines ethics as “a set of values, principles, and techniques that employ widely accepted standards of right and wrong to guide moral conduct in the development and use of AI technologies”.

It’s tricky to define comprehensively since it could relate to so many aspects: for example, the use case itself – e.g. is it ethical for a bank to offer more favorable terms to people with hats if the data shows they’re less likely to default? There are also questions about mathematical definitions of fairness and which ones we value in a particular context: see for example this short explanation

Safety to me relates more to questions of accuracy, reliability, security, and robustness. For example, some adversarial attacks on machine learning models maliciously modify input data – how do you protect against that? 

Do you ever think about the role of AI in the long-term future of government, when technology improvement potentially accelerates exponentially? What do you contemplate when lost in thinking about decades to come? 

Definitely. In fact, the book that initially got me into AI was Nick Bostrom’s Superintelligence. To me, this is an important part of AI safety: preparing for low-probability but high-impact developments. Rapid acceleration can come with a number of dilemmas and problems, like a superintelligence explosion leading to the well-documented control problem, where we get treated by machines the same way we treat ants: not with any malign intent, but without really thinking about their interests if they don’t align with our objectives (like a clean floor). On this, I highly recommend Human Compatible by Stuart Russell. On superintelligence, I actually found Eric Drexler’s framing of the problem a lot more intuitive than Bostrom’s (see Reframing Superintelligence). 

Horizon scanning and forecasting are two useful tools for Governments to monitor the state of AI R&D and AI capabilities – but sadly this type of risk is rarely on Government’s radar. And yet it should be – precisely because there are fewer private-sector incentives to get this right. But there are things Governments are doing that are still helpful in tackling long-term problems, even though this isn’t necessarily the primary aim. 

There was a recent Twitter spat between AI giants at Facebook and Tesla on this actually. I don’t really buy Jerome Pesenti’s arguments: no one claims we’re near human-level intelligence, and allocating some resources to these types of risks doesn’t necessarily mean ignoring other societal concerns around fairness, bias, and so on. Musk on the other hand is too bullish.

What governments in Serbia and the Balkans region can and should learn from the UK AI Policy experience? Can you share three simple recommendations?

That’s a good but difficult question, particularly as I have limited knowledge of the state of affairs on the ground!

I think firstly there is a need for technology-specific governance structures – a point one of my favorite academics, Dr. Acemoglu, emphasized during a recent webinar. A team like the Office for AI can be a very helpful central resource but only if it is sufficiently funded, has the right skill set, and is empowered to consider these issues effectively

Second, there should be some analysis to identify key gaps in the AI ecosystem and how to fix them. This should be done in close partnership with academic institutions, the private sector, and civil society. In the UK, very early on the focus was essentially on skills and data sharing. But there are so many other facets to AI policy: funding long-term R&D, or implementing open data policies (see e.g. how TfL opening up its data led to lots of innovations here, like CityMapper).

Lastly, I really liked Estonia’s market-friendly AI strategy, and I think a lot of it can be replicated in Serbia and neighboring countries. One particular aspect I think is very important is supporting measures for the digitalization of companies in the private sector. It’s important for markets to not only be adequately equipped from a technological point of view but also to fully understand AI’s return on investment. 

Careers in AI Policy is relatively new. Can you recommend the most important readings for those interested in learning more?

The 80,000 Hours Guide on this is excellent, and so is this EA post on AI policy careers in the EU. I was recently appointed as an Expert to a fantastic new resource, AI Policy Exchange – I think it’s super promising and highly recommend it. Lastly, definitely check these amazing newsletters, in no particular order: