Many of the dystopian visions of the future have “the reality problem”, where digital beings and realities are everywhere, indistinguishable from “the real world”. Advanced technologies built today make many of these fantasies (and nightmares) – closer, if not already here. Are digital, synthetic human-like creatures going to conquer our social lives? What is synthetic media, why is it revolutionary, and are so-called deepfakes a threat to national security and our sanity and wellbeing? How – and who – is “protecting us”? I spoke with Henry Ajder who is one of the leading researchers on the topic of synthetic media, magical tech that is transforming both our understanding of ourselves and our social lives, to find out more.
AI technologies are just kicking in. And we’re still far from the digital-first world
The AI field itself is just several decades old and has gone through different cycles over time, made of hypes, winters, high hopes, new winters, then all over again, and many new approaches and techniques emerging along the way. It is constantly evolving, and one technique, in particular, so-called deep learning, is the most responsible for recent breakthroughs. And one of the most thought-provoking, but also consequential applications of AI tech is generative media, where machines and algorithms create media and “things” that don’t exist but seem and act as real. A simple face swap today, the banal use case, can become true madness just around the corner. Things are getting pretty messy even today.
Many extremely smart people suspect that the invention of super-advanced AI will be the single most important event in human history, so much that some assume even further proclaiming it to be the number one existential risk for humanity. Something so strong and intelligent beyond human apprehension, with its own will, might not be controlled by us mortals, and we might lose and suffer, in silly ways. Utopians see AGI, artificial general intelligence, that’s when machine intelligence surpasses ours in all areas, as the most beneficial discovery ever; that’s when flourishing will be unstoppable, and unimaginable possibilities will be popping up everywhere. Solving the biggest problems, the creation of wealth for all, so the world can exponentially leap forward. No one knows how and when, but we’re moving towards it, fast.
And then, rather unexpectedly it feels, Facebook rebrands to Meta, trying to lead the deeper transformation in which we collectively start living our lives in a setting that is digital-first. They call it metaverse, although no one has the perfect definition yet, trying to brand what might become the next computational paradigm. Instead of using our fingers and staring at pocket devices, it will be so much more immersive. Black Mirror stuff. How exactly? Well, no one knows.
We are still very early in all of this. Software is running our contemporary lives, but most of the software is still dumb in many ways and instances, and so many of our life experiences are not digital-first. There are too many frictions. And the whole metaverse pitch is also confusing. There is this atypical understanding of it though, which puts the focus on time. Maybe we are there already? Same as cyborgs, more or less. Can you imagine living without a phone – and Google – as your mind extension? It’s just that all of this will be thousands of times better in the next 50 years.
Wait: what is real?
“What is real?” is one of the most frequent questions philosophers have been kicking their heads with from the dawn of philosophy as the discipline of thinking about the world, us, and nature. Up until recently, our eyes, for example, were enough to do a basic tell.
These people, for example, are not real. They look nice, don’t they? Their faces look cute, even familiar, definitely real, but are made by computers, algorithms. Now imagine having a Zoom call with some of them, if they could speak intelligently. Could you fall in love, or make a deal and wire 1000 bucks? Have you seen A, B, or C? We’re not there yet, but we are getting closer. If it’s possible to literally create people, many strange things can happen.
Disinformation is the first to come to mind. Just the ability to create content that looks real, whether video, photo, or audio, and the power to impersonate someone saying or doing something that hasn’t happened, creates threats that can lead to deaths, chaos, or even wars. Imagine a fake video in which Trump says something really nasty about an Arab leader, or a video of someone pressing the nuclear red button. Wicked.
The treat is real. The possibilities are miraculous!
What’s actually happening? Where is this generative media tech applied today, what are folks building? How is it regulated, if there’s a need? How can we be sure that what we see on Youtube isn’t false?
Henry Ajder is one of the leading researchers and thinkers in this exciting field, regularly contributing to global media outlets and consulting the most important stakeholders in the world, including governments, businesses, and international organizations. He’s currently part of the Metaphysic where he leads Strategy and Partnerships. Previously, he was Head of Research Analysis at the world’s first deepfakes detection company an Emerging Technologies Researcher at the London-based innovation think tank, Nesta. The topic is massive, so we’ve tried to focus just on some of the most interesting aspects, primarily to inspire further attention, thinking and research.
The obvious question first: what is synthetic media? I mean, deepfakes sound so frightening. Can we start with a few definitions?
I’d say deepfakes are a form of synthetic media, but not all synthetic media necessarily is a deepfake. The phrase deepfake emerged in a very kind of natural way, in the community, online. And as such, it wasn’t coined in a scientific or academic context, and so its meaning is still quite fluid. It was used initially exclusively to refer to the swapping of celebrity faces in pornographic footage, but now people use it to refer to different kinds of face manipulation and video. They refer to non-pornographic videos. They refer to voice audio as deepfakes, as the term doesn’t really have a fixed meaning. I typically use it to describe malicious uses of synthetic media, which are intentionally designed to harm or deceive. But that is by no means a universally agreed definition. Some people don’t use it in that way. S the term that I try to use, where possible, is synthetic media, and synthetic media refers to a broad range of forms of media, which are generated entirely or partially by using artificial intelligence, specifically types of deep learning and neural networks. That could be things like voice audio, that could be music, that could be swapping entire faces, or things like lip movements. That could be photos of nonexistent people or even non-existent entities. That could be areas such as the kind of interactive media technologies, kind of things akin to metaverse technologies, like AR and VR. And then it could also technically be something like GPT-3 or large language models. A wide range of tech, obviously things like Jurassic Park AI, or like early forms of VFX still in some sense, all synthetic media. If you want to be really specific, you can talk about AI-generated synthetic media, but typically most people just say synthetic media.
How did you end up in all of this, what’s your personal story and why do you think it’s so important that you’ve decided to dedicate your career to it?
So I started off in my academic background as a philosopher or philosophy student, where I was focused primarily on metaphysics and the philosophy of perception. That’s a really interesting area for me because you kind of see this collision of philosophical traditions in philosophy perception around things like phenomenology, the nature of conscious experience, and trying to fit that into kind of models of understanding and how certain experiences impact the way you see the world, yourself, and other people. I was also very interested in the issues around AI and things like Superintelligence. I saw a lot of emerging technologies, in a sense, bankrupting existing moral frameworks to an extent and requiring new ways of thinking about new problems to account sufficiently for harms and benefits. And so with those two kinds of focus areas, with deepfakes first emerging in late 2017, I came across the subreddit post on a group about futurism, about future technologies. And it immediately struck me, this technology. And seeing this as an entirely new category, an entirely new form of creation and also perception. How this will change the way that we interact with each other, both organically and synthetically was really quite profound to me, and looking both at those kinds of interesting creative uses around art and things like this, but then also the quite wide range of malicious uses, too. I came into this by researching the topic in think tanks and then working for the world’s first deepfake detection company as a researcher looking into the different kinds of deepfakes out there. That’s the rough sketch of my journey.
The company you’re working with now, Metaphysic, is creating software and synthetic media solutions for others, as a service. Can you tell me more about it? What are some of the most interesting applications that are over already delivered, or in production?
In Metaphysic it’s about developing the technologies to make hyper-real synthetic media better and to an extent, more accessible, but with the caveat of doing that in a way that is responsible and ethically driven. The company was co-founded by Chris Ume, the world-leading deepfake artist behind the deep Tom Cruise project, and many other really impressive kinds of hyper-real deepfakes. The idea is that the company is going to be trying to develop the tools. That means more people don’t necessarily need to have that very specific, highly rarefied expertise that Chris has and can use the technology in creative and exciting ways. We’re not going to have this in the hands of anyone, but it could be that, for example, someone in the VFX space who has no background in deepfakes can now use this to create certain effects for films or for adverts, or for pieces of art. The basic premise behind it is to provide the engine for hyper-real synthetic media of the future. Tom Cruise project really got people to pay attention. What’s really interesting that I’ve noticed is people recognize that Tom Cruise is a character, which is not the same as Tom Cruise. A lot of people love this kind of character that you build around him in the kind of funny situations you put them in. I think that’s really interesting and we’re seeing kind of deepfake satire and parody becoming a really kind of prominent area, but also film, entertainment, and more generally, the speculative area of the metaverse. The idea is that as the technology becomes more accessible, becomes more kind of embedded in day-to-day life, that we can all have hyper-real avatars of ourselves that we could then use. You can play with your synthetic version of yourself in the many different applications, which soon will be kind of a part of everyday life in that kind of world of the metaverse and sort of the Web3 era of communication. We recognize that it needs to be done right. There’s a lot of the classic Spider-Man. Great power, great responsibility. Got to make sure that the technology we’re developing doesn’t fall into the wrong hands.
I believe you can share some data and insights? A number of deepfakes that are online, number of registered misuses, categories of misuses, adoption rates, as in people creating synthetic media content with their phones, numbers that can tell what’s happening and how is it progressing?
The most important piece of research I did was the world’s first mapping of the deepfake landscape back in 2019, in a report called The State of Deepfakes. And at the time, we found that the number of deepfake videos online was just under 15000, and that represented an almost doubling in the space in just under a year. Fast forward to today, the landscape has changed dramatically. The biggest driver of that again is being synthetic media or deepfake applications becoming increasingly accessible to the everyday person. It’s no longer something that you need proficiency in programming or expensive computer hardware, nor the time to learn. Face swapping is super accessible through friendly apps on iPhone App Store or on the Android Store, and you typically need just one image that takes seconds to generate. One of the biggest apps in the space for that kind of usage could Reface. I think it was July they announced that they’d had over three billion Facebook videos generated on their app in another app working in a similar kind of space. Wombo, which is another novelty kind of app like lip-synching and kind of media, is a very friendly app. They recently announced that they’d passed 1 billion “wombos” being generated. You can see that the emergence of these apps has fundamentally changed the nature of the landscape. The malicious users are still very much prominent, and particularly in the gendered image abuse context, commonly referred to as deepfake pornography. It’s a phrase I try not to use, but that is a space which again, as these other apps have become more accessible, so have the malicious tools become more accessible. I’ve done research on these tools becoming increasingly user-friendly over the last couple of years, identifying new tools that are emerging that give people more powerful capabilities for things like synthetically stripping still images to face swapping, again one click into a library of adult material. These spaces are growing massively. Some of these forums the biggest form for this content have over a million members. You know some of the most popular tools for generating this content. Although the landscape may have shifted away from like that being a small number of videos and the vast majority of those being pornographic in terms of malicious deepfakes, the vast majority remain this form of image abuse targeting women. We are starting to see cyber security threats, in particular fraud and impersonation via audio increasing in frequency, and we’re also seeing the concept of deepfakes alone destabilizing democratic processes. People think deepfakes are the thing causing the problem for democracies and people causing people not to trust what they see. Or is actually if the idea itself now allows people to dismiss real things as fake as well? And that has caused in Gabon, in Africa, in Myanmar, those two countries in particular serious, serious democratic issues where people believe that real videos were fake and that’s had serious consequences. That’s a kind of rough lay of the landscape right now, at least on the malicious side. There are plenty of creative and commercial uses, which are super interesting around film, new forms of communication, particularly for younger generations, around digital fashion, and around avatars. The malicious side of things is certainly equally as important, if not more so.
You worked for a company that does deepfake detection. How hard is it to detect a deepfake? How does the process work? Is it possible for something not to be recognized as a deepfake?
It depends on what you mean by deepfake detection. If you’re talking about the automated process – yes. If you’re talking about training machine learning systems or algorithms to identify deepfakes with the aim for that to be more reliable than the human eye, the short answer is it’s incredibly, incredibly difficult because these models are being updated all the time for generating deepfakes. Because this is adversarial, the dynamic, between people who are looking to fool detection systems and building them. The people who form the systems are always going to be on the front foot because they get to fool the system, at which point the people then have to scramble to fix that. Facebook did a deepfake detection challenge, the results were published back in January last year. The best accuracy level was 65 percent. Can you imagine, let’s say there’s a law case or a trial where someone is being accused of murder, and the video is the defining piece of evidence. Would you trust the 65 percent accuracy-based model for that? You probably wouldn’t in many other critical contexts as well. The current accuracy and reliability of these tools are just not good enough to be deployed and based on that adversarial dynamic, I think it’s highly unlikely that we will ever see deepfake detection alone being used to decide whether something is real or not, or if we do that, to me is a sign of a pretty shady system for authentication.
What do you do then? The other thirty-five percent?
I mean, I think the thing is, it’s not the other 35 percent, it’s all of it. If it’s only 60 percent, that’s slightly better than the guess. Would you trust it? I don’t think I would. And this is a real challenge. This is a really significant issue. Detection may get really good, it is possible if there’s a lot of resources going into it and it’s constantly being invested in and there are some breakthrough techniques, maybe detection really could do the job, on maybe it could do a really good job on the vast majority of content. Maybe it’s like one layer of the process. Ultimately, it’s a real challenge if synthetic media gets to the point, which I think it will, which is just hyper-realistic content that is is very pervasive. Definitively proving whether something is real or not is like you can be very difficult to do after the fact. What I think is probably the most promising solution approach is going for authentication at the source. There are several initiatives and technology approaches going on here, which are kind of controlled capture or content authentication protocols, which are basically deploying where images were captured or a video is recorded on the chip in the device, you’re getting encrypted metadata, which is then attributed and you know that ledger is then filed on the blockchain, right? So that more bottom-up approach is much easier for a lifespan of a piece of media to deal with than trying to authenticate things from a more top-down perspective. Obviously, it comes with its own issues, the labeling. Maybe you can or can’t afford these new cutting-edge technologies that have this software. Maybe if you use it in human rights context, for example, they don’t want to give away their metadata or reveal their identity. And so there are issues around creating media hierarchies. I think, on the whole, that is a much more promising approach to authenticating media in a critical context than detection. Even if I think detection still has some role to play, it will be more limited than some people think.
On the other side, let’s think of tech for good applications for a second. For example, loneliness or mental health issues, extremely big ones. I can fantasize about education as well, where every kid has a personal mentor of some sort, a virtual person. Are there any startups already building any of this stuff?
I think that there are lots of really interesting and exciting uses of synthetic media, as you said, for things like bringing education to life, like having someone tell you about, you know, a historical figure, tell you about their life in an engaging way or like having that person as your mentor. A lot of work is being done looking at whether synthetic media could help people recover from addiction. Could maybe synthetically recreating someone’s voice or their likeness help someone process the loss of that person? On the flip side, it could also ruin process. This is why it needs to be carefully developed and studied. And I think one of the areas that is worth mentioning here, which I’m excited about, is accessibility. One kind of often cited project is called Project Revoice, which is about synthetically recreating people’s voices who have lost the ability to speak. So if you think of Stephen Hawking who was given the robotic voice, it was just there was no technology to clone the real one. Chances are he would get a really nice, personalized voice to speak with now, which I think is hugely valuable to people who rightly see the voice as an extension of their identity. And then also in a space like gaming, transgender gamers being able to perhaps speak with a voice that better reflects how they identify, help people, or using voice masking if they don’t want to share their own voice online. There are loads of that, this is just scratching the surface. We are Metaphysic are also working on some really interesting projects, which I can’t say much about, but we’re working on some really cool stuff that is really trying to showcase the kind of good that synthetic media can do. And I see a lot of avatars and education. And there are some like Réplica, for example, which are about creating chatbots and friends like avatars that could be your friends. But there are some questions about is that the right way to fix loneliness in society? Can a chatbot or a synthetic personality ever replace a real person? And at the same time, a new generation, a new relationship with technology? Maybe that is the future. There are lots of startups really interested in using it in a positive way.
That’s absolutely mind-blowing.
The tech really is astonishing. If you’re interested, look up Codec avatars by Facebook. That’s that like embodied VR chat avatars and they are unbelievable. They are unbelievably realistic. And so if you imagine that within a decade, that’s going to be in the hands of everyone, and we’ll be able to have a conversation like this, but embodied in VR with synthetic versions of ourselves. It’s going to be an incredibly disruptive technology, in both positive and negative way.
To cite US Congress: “potential to be used to undermine national security, erode public trust in our democracy and other nefarious reasons”. If you could summarize what regulators and governments around the world are doing now, how can government approach any of this?
The way that governments have been approaching it so far has been kind of quite polarized. On one side, you have a lot of people who are really parroting quite sensational lines around it. It’s going to cause World War III by someone playing a video of Trump pressing the button back when Trump was president, or this is going to mean we can’t trust anything we see anymore. There’s a lack of nuance, and understandably so to an extent. The media coverage is fixated on the malicious users typically, and those malicious uses are serious and need to be taken seriously. One thing that I find slightly difficult somewhat sometimes is people suggesting criminalizing the usage of tech based on its malicious or deceptive potential. How do you define deceptive, when is something intentionally deceptive? Does it matter or not? Is something as a Snapchat filter that you’re using on your Tinder profile, deceptive to the person you’re about to go on a date with? Then computational photography and the chips in all of our smartphones are going to be banned because that effectively processes all of the images you take on that phone. Governments are working on certain fronts in ways I think are really positive. For example, explicit laws around banning intimate image abuse, using deepfakes, I think are good. One way I don’t like is the more kind of sticky stuff. For example, say like banning deepfakes 30 days before an election, unless clearly fake or satire. How do you define what satire is? You know, 30 days seems quite arbitrary. Why not 14, or 45? Why 30 days? The governments around the world, from South Korea to Taiwan to India to the US and UK are actively looking to legislate, and I think that’s good on certain fronts. But without recognizing the nuances that are required with such a broad technology that might cause issues that I’m keen to avoid.
You already mentioned Facebook. Big Tech and big social media platforms play an important role here because that’s the distribution channel and how content becomes viral. Besides the challenge you already mentioned organized by Facebook, what else is happening? Especially because “algorithmic moderation” isn’t perfect – it’s not working. Do these companies have dedicated teams that are looking into this?
I guess there’s a difference I’d say between like the big social media platforms, in a sense, and Big Tech. It’s quite clear that Facebook’s whole statement of intent with becoming Meta. Microsoft is working on a lot of metaverse synthetic media technologies, as Amazon, as Google, as Tencent and Baidu. And all of these big technology companies have research labs that are focusing on synthetic media and also on its applications. Also Twitter and Facebook, all those platforms. You’re right, algorithmic moderation not very robust, but ultimately is the best they can do without literally bankrupting their business model. They would never be able to hire the moderators required to ever not use algorithmic moderation. Having said that, they have got policies coming in that to have varying efficacy around synthetic media. For example, Facebook is banning deepfakes, like A.I. generated or manipulated deceptive stuff, but they haven’t banned more crude forms of media manipulation, such as what we call shallow fakes. Twitter has policy explicitly again forbidding malicious or deceptive uses of synthetic media or manipulated media in general. It comes back to the question when the hypothetical causes this problem. They can’t stop it being uploaded to the platform, at least not right now. And it will take time for it to be like blocked or banned at that point. Chances are the damage has already been done. Big Tech is investing a lot of money into this. The social media platforms in particular, are very much buying into this kind of vision of the future of communication. But the problems that come with deepfakes are the same kinds of problems that we’ve seen with other forms of racial abuse and media manipulation. It’s not like an entirely new problem, it’s just a new way of expressing existing issues, and it’ll be very interesting to see how the platforms deal with it.
I have two more questions. One is from the conference you’ve organized recently, and the second one is more sci-fi-ish. I really liked the idea of “media provenance”, where you as a reader or viewer can see the history of how certain content was made. How would that like look in practice? Would it be like tags, would it be something on the side of a viewer, for example, on YouTube or when you’re scrolling the News Feed? Are there any initiatives there that are already in place?
People are dating “robots” even today. If we can hypothetically create virtual, digital beings that are indistinguishable from reality, do you see the future where are life companions might be virtual? Living “happily ever after”, in metaverse?.
On the first one on provenance, the best place to look right now is Adobe’s Content Authenticity Initiative. And specifically, the C2PA standard, which is one of those open standards for content provenance akin to, say, PDF. t’s an open standard that is used and is accessible to anyone. It has Twitter signed up to it, the New York Times signed up it, BBC, Intel, Arm. It’s got that whole journey right from the chip on the phone to the social media platforms, to the news platforms. Metaphysic is a member of the Content Authenticity Initiative as well. The process is that you try and get that metadata from the image where it was taken, when it was taken. The kind of rearrangement of the pixels fundamentally is right. And then you have a button maybe to press. Maybe like a watermark, or some kind of accessible metadata for certain applications where you can then access it. For example, right now, Twitter, if something’s misleading, they have the little icon saying “this is misleading”, maybe in the future they have something that says “to see image metadata, press here”. If it’s been edited, that tells you when it’s been edited and how and then on other applications, I imagine it would be a very similar story is that, you know, news organizations is in the interest to be very transparent with this stuff. Adobe, again, obviously evangelizes their own sort of technologies very heavily in terms of the kind of embedding this into their own products and their own tools. It’s still not 100 percent solidified, and it will vary depending on platform to platform, But I think that the ideal end goal is that we reflexively look for metadata in the same way that maybe we look for like a corresponding headline from the BBC or the New York Times about a news story that seems suspicious. So you really trying to build in that reflexive attitude towards it.
On the second question, the kind of the Brave New World aspect of this. I mean, if you think about Alexa as a tool and when that was released, a lot of people, including myself, felt uncomfortable with that in the house. It felt weird. It felt kind of alien, at first. And now it’s an accepted part of most people, or a lot of Western people in particular, and they would reflexively ask, “What’s the weather like today?”. And I think you’ll see a similar shift coming with synthetic media and more sophisticated and realistic forms around avatars and things like this. I mean, we’re already seeing virtual influencers being some of these popular influencers on Instagram, getting probably million-dollar deals to advertise.
Hypothetically, it’s possible to have, like immortal whatever, like synthetic version your parents them, they can live forever. If there virtual beings that have their movements, or voice, and everything in between?
It’s almost like some imprint of them. It’s just getting very sci-fi, but like being able to import your identity entirely in some respect, like basically uploading your brain into a synthetic avatar. And then there are questions about is there such thing as static identity, identity over time? You know, you’re always learning. Are you in a state of flux? And is that like an extra capturing of your final moment, your final stage before you die? Does that mean that you’re already like an identity if you’re not constantly in flux and learning and growing? Or could you continue to evolve? I mean, again, these are super abstract, hypothetical questions. What I do think is going to become more frequent and more likely is that people form attractions of many different forms to synthetic versions of themselves, real people, or non-existent people. We already see that with people fancing animated characters. Or like people using Snapchat filters and wanting to get surgery to look like the filters. And we all are in this weird point of like like reality imitating fiction in a weird sense. And I think that a lot of kind of interesting philosophical and ethical questions that surround that, but no doubt what seems alien and weird today in a decade, I think will be much more part of day-to-day life. It will bring many questions that we need to think about now so that we’re not being reactive to problems that emerge, but proactive in trying to figure out how to best use and implement these technologies.