在本期节目中,高级研究员安雅-卡斯珀森(Anja Kaspersen)与世界见证组织(WITNESS)执行主任、人权和公民新闻领域的领军人物萨姆-格雷戈里(Sam Gregory)进行了对话。他们的讨论深入探讨了合成数据、人工智能生成的媒体和深度伪造所带来的挑战和机遇。格雷戈里讨论了他开创性的 "做好准备,不要惊慌 "运动,并分享了他在 TED 演讲 "当人工智能可以伪造现实时,你能相信谁?"中的见解。他强调了数据来源水印的重要性,并探讨了真实性在当今数字环境中的作用。
对话还涉及对人工智能治理全球标准的迫切需要以及数字独裁主义的兴起。格雷戈里对近期趋势的反思以及他对2024年的展望,为我们在日益数字化的世界中以负责任的方式参与人权事务发出了令人信服的行动号召。
ANJA KASPERSEN:我非常高兴地欢迎山姆-格雷戈里(Sam Gregory),他是全球人权和公民新闻网络 "见证"(WITNESS)的执行董事,"见证 "协助人们利用视频和技术捍卫人权。他是国际公认的屡获殊荣的人权倡导者、技术专家,也是智能手机见证、深度伪造、媒体真实性和生成人工智能(AI)方面的专家。
山姆,热烈欢迎你。今天能与您交谈是一种享受。
山姆-格雷戈里: 我很高兴能在这里与你交谈。
ANJA KASPERSEN:您毕生致力于研究新兴技术如何影响草根媒体、主流媒体和民主问责制,以及如何增强人们在数字现实中采取行动和保护人权的能力。为了帮助我们的听众更好地了解您,并理解多年来是什么推动了您的这种强烈参与,您能分享一下您的这一旅程是如何以及何时开始的吗?
萨姆-格雷戈里: 近 25 年来,我一直从事人权维护者和记者如何使用技术,以及人权和这些一线信息工作者的需求如何影响技术这一广泛领域的工作。这项工作有一些反复出现的常量,比如:你如何看待信任问题--我们现在对这个问题非常关注;你如何看待安全、保障和同意问题;你如何看待试图分享重要信息的记者和人权维护者个人以及信息生态系统的效率?
I have done most of that work in the context of the human rights network WITNESS, which is, as you said, a global human rights network that works to think both at a very grassroots level how it is that a human rights defender in Myanmar or a journalist in Brazil literally takes out their camera, films, and is trusted when they share that, but also about the infrastructure questions that shape how they can do that.
That is what brought me and the organization a little over five years ago to start thinking about the issues around deepfakes and what we now think of as generative AI, which was like, you have this new evolution of emerging technologies that are not dehistoricized from the past—they are not dehistoricized from AI in the past nor are they dehistoricized from media issues in the past or from these questions of trust, authenticity, and safety, but they exacerbate some of those dimensions.
We started an early effort five years ago to, as we described it, “Prepare, Don’t Panic” around what these emerging technologies would do and how do we ensure that they are centered on the needs of the most vulnerable, thought through globally and based in human rights standards?
ANJA KASPERSEN: This campaign, Prepare, Don’t Panic—which is a fantastic name, by the way—is easier said than done because there is a lot of panicking going on and awful little preparation happening in conjunction with that. This is a global effort basically to address the challenges posed by synthetic data, AI-generated images, audio, video—you mentioned some of them—and what is commonly known as “deepfakes.”
Previously—and I think we have both been working on the security angle on this for some time—creating deepfakes was technically cumbersome and quite compute-intensive. That prevented it from becoming a widespread phenomenon and a democratized phenomenon; few people had access to the technologies to actually be able to do this. How does this link up with the Prepare, Don’t Panic campaign, and what are you seeing in this field? We should panic just a little bit, right?
SAM GREGORY: I heard a phrase recently from I think Katie Harbath, who used to work at Meta and works in a number of other political contexts. She used the phrase, “Panic Responsibly,” recently, just sort of a funny variant. When we first talked about Prepare, Don’t Panic, we had a very particular sort of thinking in line with it, which is we were saying this in 2018, when we were one of the first organizations globally to say we need to look at this but without the hyperbole that was shaping it in that year.
In 2018 there were headlines, like there are headlines now in 2024, saying, “Deepfakes will shape the elections” or “Deepfakes will disrupt society,” and we wanted to push back on that because at the time, as you say, it did not seem that technically feasible. Also, we saw deepfake panic being used to undermine democratic processes and the individual accounts of journalists, so in 2018 we were saying Prepare, Don’t Panic as an explicit pushback on this overhyping of the threat, this way in which the threat was being weaponized against true voices, this idea that we had.
We see now—and I hope we talk about it—this idea of the “liar’s dividend,” of plausible deniability, where you can say something real was made with AI and get it dismissed as somehow illegitimate.
In 2018 we were doing that and we were also saying prepare because we wanted to avoid dehistoricizing this and decontextualizing a technical issue from a technosocial context or even a social context. What I mean by that is that in 2018 we were looking around and saying: “Look, there are journalists and human rights defenders who are used to having their stories dismissed as faked, are used to having to try to verify content, and have grappled with the social media platforms and their inadequately created and enforced policies, so we do not need to start from zero. We need to prepare but we need to build on the foundations of experience from people who have experienced this before. Of course we could extrapolate that into other fields like AI, where people had been thinking about responsible AI for a longer period than just the current wave of discussion.”
I think Prepare, Don’t Panic has somewhat evolved for me a little bit because people ask me, “Are you still preparing and not panicking?” I think that Prepare takes on an urgency now that is about how we get regulation, technology development, and platform policy right, and how do we resource and support civil society, journalism, and frontline defenders in a better way. Now we really need to prepare, and we need to do it in a way that is not anticipating a threat two or three years down the line but a threat that is now with us.
In 2018 we were saying, “We’ve got to invest in starting to build out the infrastructure for detection, designing these tools to make it easier to understand media prominence in a way that reflects human rights, and we have got to start educating people about the potential threat from deepfakes without overhyping it.” Now when I look at this I look at it as: “We know now what we need in fact potentially in terms of access to detection, measures around provenance and authenticity, and regulation that takes into account human rights. It is now about whether we implement those in a meaningful and consultative way in the coming years and coming year, so there is an urgency to prepare now.”
I still think that Panic is important, or the not panicking, and I say that because one of the projects we run globally is to provide a Rapid Response Force for claims of deepfakes or claims of use of generative AI in elections and other political and human rights contexts. What we have seen there—maybe the majority of the cases we have received on that helpline that is shared to media forensics experts we work with—are people using the fear of AI, the mystery around it, the idea that you can fake almost anything, and the knowledge that it is quite hard for the average journalist or maybe even the average government to quickly make discernments and explain it to basically dismiss real footage as faked, so the ability to create this idea that AI is everywhere and that it can shape everything I think is still a worry that we need to push back on. AI is not magic pixie dust; it is something that has defined capabilities that are evolving in certain ways.
As an example, if someone comes to me and says, “I’m worried that this piece of fake audio could be made with AI,” I believe them because I know how easy it is now to make synthetic audio that sounds like a real person. If someone comes to me and says, “Look, this video of this riot or protest that took place yesterday has substituted in the face of that politician,” as of now I am going to say that is unlikely because the tools cannot do that in that sort of way. We still need this nuanced approach to our understanding of these that evolves in time to say, “Yes, that’s possible and you should worry about that,” and “No, that’s not possible; don’t worry about it,” and “Your worry is going to be exploited by people who want to exploit fear and the gray zones.”
ANJA KASPERSEN: It is a great overview of the breadth and depth of the work both your organization is doing and what you have been involved in.
To rewind a little for our listeners, is there a difference between audio and image-generated deepfake technologies, and what is this thing about synthetic data? Can you explain a little bit to give some more meat to these concepts and these technological tools essentially and programs?
SAM GREGORY: Sure. “Deepfakes” was the term that first was used to describe the ways people were manipulating people’s faces or substituting faces onto others’ faces. Of course, the first set of widespread-use cases of that were targeting women primarily with nonconsensual sexual images. Around 2017 and 2018 the word “deepfake” started to be popularized. For many people it does summon up this idea of the “face swap” because they were built using this particular tool called generative adversarial networks, basically two neural networks, a type of system for generating based on artificial intelligence where you have two systems competing with each other, one trying to create these incredible fakes of someone’s face and the other trying to detect it, and it kept improving.
What we have seen over the last five years—and people still use the word “deepfake,” so I do—is a diversity of the ways you can make people appear to say something they never did, appear to do something they never did, create an event that never happened, and an image that never happened.
ANJA KASPERSEN: Like the image of the pope or the video of Biden.
SAM GREGORY: Exactly, so we have AI audio, like an audio of Trump, Biden, or Obama, we have the pope in a puffer jacket as an image example, and we do have those deepfake videos that exist. What has happened is that you have a diversification of the technologies, so you have a range of ways people are generating realistic representations of both real people or events that look realistic that never happened.
We have seen an explosion of technologies. Some work well for different things. Diffusion models are behind a lot of the image generation; you have variational auto-encoders that do something well; we have these generative adversarial networks.
I think for most people it is not important to be able to distinguish what is the technology because often they are intersecting, so some of those are now intersecting with the large language models that are used underneath ChatGPT. What is important to understand is what they are giving us in terms of the flexibility of what you can do with them and who has access to that. If we were look at the state of play now, we would say something like, “Well, within this category of deepfakes and AI-generated media”—“synthetic media” is another term people use that is rather neutral about it; “synthetic media” does not have these negative connotations of deepfakes, it is just that this is synthesized versus real. You can do AI-generated audio, and that has really improved in the last year to 18 months, where it requires a minute of someone’s audio or less to be able to start generating a fairly realistic simulation, and you can do that with a free tool or a very low cost tool, so synthetic audio has increased in accessibility, cost barriers, and commoditization.
We all know AI image generation. Everyone has played around with Midjourney or DALL-E or something built into a Bing or an Adobe product. That has gotten commoditized very rapidly and continues to get more realistic and lifelike.
On the video side, it is still quite hard to create a fully synthetic, realistic-looking video on command like with a text sentence. We have not gotten there, though the progress is fast. What has gotten a lot easier is doing things like, for example, matching someone’s lips to an audio track, and substituting someone’s face, even in live scenarios. So video is improving, but it is still not at the stage where, as with an image, you can just write a sentence and summon up a video that looks pretty realistic. We are not there yet, and it is not at the stage of accessibility and reduced cost of audio, but we have seen all of these improve because of this intersection of technologies, computing power, and experimentation and now commercialization.
ANJA KASPERSEN: That was very helpful.
I want to stay on this track for a little bit because you also delivered a TED Talk not too long ago, which I would highly recommend to anyone listening to this podcast, and its apt title—“When AI can fake reality, who can you trust?”—captured my attention. It builds on some of the things you just said. Could you talk to us about the significance of this statement? I know TED will do their creative interventions in terms of making the title something very attention-grabbing, but I have a feeling that you weighed in on this because this is something you feel very strongly about. Am I right?
SAM GREGORY: Yes. TED does an amazing job in thinking about how to reach an audience with a title. I need to know the folks who do that; they have got it down to a fine art.
At the heart—and this comes out of the work I have been describing from WITNESS—of effective ecosystems of trust of information and of democracies are the people who are on the frontlines of gathering and sharing both journalistic information but of course in many contexts human rights information, and they are also on the frontlines of the trust in them being challenged.
This is not an AI-specific problem. I am always very careful not to blame AI for things that are much broader. We exist in a polarized society, we exist in a world where there is distrust in authorities, and we exist in a world where there are authoritarian powers trying to take control through a variety of means, so the challenges facing journalists, human rights defenders, communities, and democracies globally are not because of AI. But when you start to see how AI undermines our confidence in those stories, those accounts in journalism and human rights content, then we need to think about what we do in response to that. So the title is almost a provocation to what I am talking about in the TED Talk, which is what we have established from the work of WITNESS over the last five years in Prepare, Don’t Panic.
I should say the way we have done our work is always very much talking, listening, informing, and hearing what is needed by communities of journalists and human rights defenders, social movement leaders, and election officials globally. So we ground it not in hypothetical thinking but coming to people and saying, for example: “This has what has changed in the synthetic media landscape. This is what is now possible. You have a deep understanding of the threat landscape you exist in and of the way your government operates, the way the media operates in your context. What do you need as solutions? What do you want?”
I think this is critical because there are times when the AI governance conversation gets very parochial to particular jurisdictions. There are times when the social media conversation becomes about the U.S. elections but not the Indonesian elections, the South African elections, etc., so it has always been important to us to start from what a broad diversity of global voices who are on the frontlines of this need.
That is what led us in the TED Talk to ask: “What are the key things we should be doing now to respond to the way in which our information ecosystems are being undermined by both AI actually being used but also the threat of AI, and what can we do about it as very concrete measures?” You might adopt the catchphrase, “prepare and ‘act’” rather than “panic” around what is happening now.
ANJA KASPERSEN: I think Prepare and Act is extremely important. Personally I am deeply worried that we are nowhere near where we need to be in terms of our awareness.
You mentioned elections. This year alone we have 40 elections coming up. What is your take? What is your concern? To the point you made in the TED Talk, who do we know to trust, what to trust, and when?
SAM GREGORY: I don’t think we are particularly well prepared. I wish that some of the steps that we have been recommending for years had been taken. At the same time, I think there are some promising possibilities that if we keep putting the accelerator pedal down we could move toward.
The reason I do not think we are prepared is just looking at the evidence of the world around us. In December I was in Singapore and met with a lot of fact checkers and journalists from across Asia who are dealing with some of the upcoming elections. We just had the Bangladesh elections last week, we have the Pakistan elections in February, we have Indonesian elections, just to name the first quarter of the year. I think I have those right, but we have a lot of elections coming up, as you say, in the start of the year.
What I heard journalists and fact checkers saying is, “We don’t feel like we have access to the tools for detection, we don’t have the skills in resourcing, and we are seeing the usage of this.” They were often talking about audio and images, not so much video, but we are seeing this. We saw it last week in Bangladesh. There was faked audio of a political candidate that was part of the discussion. We saw it in Europe last year, in Slovakia. In the United Kingdom we have seen faked audio happening.
Firstly the reason I am worried is because we are seeing it. It is not hypothetical. As I said, in 2018, 2020, and even 2022 I was the person who kept saying: “I am not worried about the elections. We shouldn’t put the emphasis there. Stop hyping this up.” When I look at it now, there is genuine reason to think that this is a contributory factor.
I still think AI is a factor alongside the other fake news, misinformation, disinformation, and the bad behavior of politicians. A lot of the people who say the worst and the most lying things are the people in power. They do not need a deepfake to do that; they just say it. I think there is real substantive reason from a technical point of view to worry about this year.
I think there are steps we could be taking, as I was saying. I had the opportunity to testify in Congress on the House and Senate sides in the United States last year, and I saw congresspeople trying to grapple with this. It is interesting. It is one of those issues where—particularly with synthetic media and deepfakes; this may not be true of the whole of the AI spectrum—it does feel very personal. Politicians feel that they are threatened by this because, quite rightly, they are the targets of this, so you certainly see attention to how this could move forward.
In the TED Talk I talked about three steps that I think are important and timely in this election year. One of them we talk about is around access to the detection tools. Detection is controversial around synthetic media because a lot of people say: “This is a losing game. You are never going to be able to detect everything.” It is an adversarial contest, meaning that you are constantly trying to work against someone who is developing better and better fakes, and you are trying to make sure you have right way to detect it.
Detection does not work at scale. I think a lot of the time people are asking, “Well, couldn’t we just have a tool so everyone can detect whether something is fake or real?” I don’t think that is viable. What I do think is a gap that we should be deeply investing in from a research perspective, a foundation perspective, and a journalistic perspective, is access to detection and the skills for the people who need it most, the journalists, the human rights defenders, the election officials, and the civil society leaders on the frontlines of this because at the moment they do not have it.
It is imperfect because in the societies we live in not everyone trusts journalists, fact checkers, and frontline human rights defenders, and it is imperfect—I think most people would like to say, “I would like to be able to detect it and use my own discretion to work out if something is faked,” but what we know is that when you make detection tools available to everyone they get broken so quickly.
But there is this gap, and we talk about it at WITNESS, this detection equity gap, and a lot of that is resourcing. Simply, who is putting up the money to support human rights defenders, journalists, and election officials to do this well? In some sense I see that as one of the easiest challenges because it is a money and practical challenge.
The second area we point to is around—and you mentioned it earlier—the synthetic media and synthetic data provenance question. I spent many years working in the human rights field helping people be able to prove that a piece of video was real, starting in the years of, say, the Syrian Civil War: How do you film a video and be able to show that it was shot in a certain time and place and was not tampered with? We wrote a report in 2019 that is still a very seminal report about the challenges and how you create what we described as “authenticity infrastructure,” that basically just explains where something comes from in a way that you as a consumer can say, “Oh, I understand how this was made,” how it was adapted, and how it was edited.
That whole area has exploded over the last few years, and it is a big part of the legislative efforts that came up in everything from the EU AI Act to the White House Executive Order to pending legislation on the Hill to the Hiroshima Process out of the G20, this idea of provenance: How do you know when something was made with AI? How do you show that to consumers? How might you understand a whole bunch of things about the recipe?
I do not think we are going to get that widely adopted by most of the elections this year. It is a systemic infrastructure question of how to do that well, but that is one of the systemic things we need to do. We need to have this much more enhanced transparency about the role of AI across the media, the content we consume in a complex world.
I am always pushing back, and I see it in some of the bills that I have seen on the Hill, where it often becomes very binary. It is like, “Is it made with AI or not?” and that feels so short-sighted because almost all of our content is infused with AI. Increasingly it is not going to be “Is it AI or not?” It is going to be, “Where was AI, to what extent, was it done maliciously, was it done in ways that are deceptive” rather than, “Is it AI or not?”
That is an area that I think we need to be investing in. Sadly I do not think it is going to help us in these elections. It is an infrastructure question that we need to build, and we need to build it right. We need to build it in a way that is privacy protecting and does not demand too much information from people.
The third area which I named in the talk is about platforms and responsibility across the AI chain. I do not think any of this works, like detection. These provenance approaches do not work unless you have engaged the AI models, the deployers, the platforms, and the media entities. It is a version of what we have talked about in my work around platform responsibility. There is such frustration in many of the communities I work in around the failures of the social media platforms to be built and run in a way that protect people and support a global majority world. I think we need to keep placing our finger on that responsibility for those platforms but also the broader global AI ecosystem to do all of these things that actually support our information ecosystem.
That is going to be challenging this year because they have cut a lot of the funding for some of the people who would also be on the frontlines of that, the trust and safety teams, the people who are trying to think about this. As you can hear, I think there are things we can be doing. I do not think any of them is going to be more than partial, and they require different things—resourcing, legislation, and policymaking.
ANJA KASPERSEN: Indeed. In many kinds of decision-making legislative bodies around the world watermarking is currently being spoken about I will say as a bit of a panacea perhaps to all the ails that we have in terms of proving the provenance of data, the authenticity of data, and many would say that the technologies currently that we have in place to do this are simply not mature enough and not good enough, so setting up a system or creating regulations where this is an assumption will be a flawed regulation and cannot really be implemented. Could you talk a bit more about this? Is it a second part? I thought it was interesting when you got to the issue of intent.
SAM GREGORY: You are right. “Watermarking” is this word that gets thrown around a lot, as you say, as a panacea. It is sort of a silver bullet. I have seen it in the legislation we have engaged with, and it has come up in the policymaking context where we have testified and provided educational materials to support that.
I think there are a couple of challenges with watermarking. The first is that I do not think much of the time legislators know what they mean by it, so it is a definitional question of what do we mean by watermarking?
There was actually a great project that we have been part of what run by the Partnership on AI—they are in the middle of it, and WITNESS has been a core member of this process—to come up with essentially a taxonomy and a glossary around some of these terms to help people have a better understanding. They call these “synthetic media transparency methods,” and I like that frame because I think when we come to thinking about this in the bigger picture of where AI will be everywhere the transparency then becomes one of our access points to thinking about intent and thinking about what legislation we might use if we can see something being used maliciously.
In this sort of discussion a lot of people are using watermarking to cover a whole range of ways in which you might show the provenance of data, so watermarking can mean in a stricter sense embedding an imperceptible mark in a piece of media that enables you to trace data related to it, which I think is useful when you are trying to find out if something was made with a particular model, did it have a particular set of training data, almost foundational data about how something was made with AI. I think that is important. I think we need to have a better understanding of what everything is built on.
Often when they are talking about watermarking they are also talking about ways in which you do things that are known as “content provenance” or use metadata, and that is the idea that in fact any piece of content evolves over time, and you are trying to show its evolution. For example, I create an image in an AI generator. Maybe it is watermarked, but then I edit it, I redact it, I incorporate it into another video, I add a voiceover, and I drop in a piece of ChatGPT text. To understand that piece of media you need to understand it as an evolving process, and often the way people have been trying to do that is with metadata, which is very different from watermarking because it is like writing a set of information that you attach to a piece of media or provide a way to do it.
Then, of course, other people are thinking about what we know about from other contexts of trying to deal with hateful media or terrorist content, which is like fingerprinting. You can take a hash of a piece of media and track down if a duplicate emerges or something like that.
Lawmakers I think are struggling—and I am starting to see that improving—to know what they are exactly saying. Why do they want this and what do they want it for? Do they want it so they can understand that an AI model was used in a singular piece of content? Are they trying to do it in a way that helps consumer transparency, in which case you need to explain the process? Otherwise, it is not all that helpful just to say that a little bit of media was made with AI but not help people understand how it was incorporated, edited, and mixed with real media.
I am encouraged that lawmakers are trying to engage with it. I think it is a critical part of it, but I also know that we have a terminology problem that is about why we are doing it and for whom that is not easily resolved once we step out of niche scenarios. You can have a niche scenario, like elections, and we have been seeing legislation on that in the United States, where it is like, “We require you to visibly label and put a watermark and add provenance metadata to anything that is an official campaign communication,” or something like that. That is kind of easy to do because it is a niche scenario, and you are just saying these are like three requirements, but if we are looking at this more broadly as we are going to have AI throughout our information ecosystem we need to get clear on the terms and then work out who this helps.
I think this goes to your question about intent. If you go on TikTok—I don’t know what percentage of content has an AI element already integrated, an AI filter, a soundtrack that was maybe made with AI. We are entering a world where AI is going to be part of our day-to-day communication, most of which is personal, playful, and non-malicious; 99 percent of AI is non-malicious is my assumption.
Tracking intent is going to be hard. That is the problem we have always had unless you are tracking actors. It is a lot easier, and this is what people in misinformation, disinformation, and influence campaign work do. They look at actors rather than content because of course if you see someone repeatedly sharing content or repeatedly using a particularly abusive model—this could be the case in the AI context—that has no guardrails on it and constantly produces a type of content that breaks the law, so we need to track actors and potentially the tools that are used there.
I think for the broader range of public usage I come back to the word “transparency.” I think this is a place where a lot of the public is moving. It’s like, “Show me the information about how this media was made; show me what this is.” In most cases I am going to be totally fine to know that someone used an AI voice, an AI filter, or made it with AI. In fact I may be happy they did that, but I want to know, and then that enables us to know where we may set legal bounds that say, “No, that type of content is not acceptable to have or share,” as we do obviously around AI-generated child sexual abuse material. That is illegal, and then obviously we have to set within democratic processes limits around particular types of content where maybe their scale, scope, and reach or their personalization is enhanced with AI but are probably existing problems, like how we address terrorist content or how we address certain forms of disinforming, misinforming, or deceptive content.
ANJA KASPERSEN: There seems to be general agreement about what the issues are, that something needs to be done, that there needs to be legislative action and needs to be regulation in the area of synthetic data, deepfakes, and watermarking issues, the data provenance side of it.
There are hardly any standards, certainly no global standards. Is there potential for a global standard?
SAM GREGORY: That is a great question. I consider myself somewhat new to the world of standards. I think I am probably not as deep in it as you are with the role you have been playing, but about four or five years ago we started to engage with one of the nascent technical standards around authenticity and provenance, which was this standard created by a group called the Coalition for Content Provenance and Authenticity (C2PA), which includes companies, civil society, and some academia that has been working on standards there. That was my first exposure to thinking through a standards lens about how you approach a problem. In some senses I am a convert because I think a lot of my realization as someone who has also done a lot of work that is at the front end of how the world plays out with individuals literally trying to film in horrendous contexts I know that so much of the infrastructure is set completely out of view from them, and so much of that is actually about the standards that underlie the technologies they can use.
I think there is a necessity for global standards on this. We are not going to do well without a shared approach to how to do this that has shared norms. I have been relatively excited to see how the C2PA has developed their approach. It was my first exposure to it.
That was not done within one of the broader standards organizations—that is that group that is under The Linux Foundation. The key thing that strikes me from that, and I think it is where I particularly want to see any standards work in this area focusing or paying attention to, comes out of what we heard and said. In 2019 we wrote this report “Ticks or it didn’t happen: Confronting key dilemmas in authenticity infrastructure for multimedia,” which was all about key dilemmas in building this type of provenance infrastructure, and a lot of that grew out of the work of talking to people on the frontlines of the risk and benefit of this.
As we look at standards in this area some of the things we have emphasized in the standards process we have been in—we led, for example, the threats and harms taskforce within this coalition, where we were constantly saying, “This is the risk out of this standard if it goes wrong,” like this is how a standard could actually create harms in and of itself if it is not done right. I think that is an important stage that I appreciated that we were trying to engage with, and some of the things we were pushing on there was trying to understand questions around privacy, access, nondiscrimination, and also potential weaponization because any system that is trying to create, for example, a way to better understand media provenance can also start to move into areas around identity.
It can become a proxy for identity, and certainly as someone who works with citizen journalists and human rights defenders in a range of repressive and authoritarian contexts as well as in countries that seem to have a trajectory going the wrong way in Europe, the United States, and other places, I am certainly worried about creating technical infrastructure and technical standards that work well for a big company or in an idealized democratic society, but our underlying technical infrastructure then could be misused or weaponized by legislation or by technology in other contexts.
One big frame that we came into as we looked at these technical standards is that we were looking at privacy, nondiscrimination, and access, and we were particularly thinking about weaponization given the wave of so-called “fake news laws” globally, which often ask people to prove they are a journalist or provide additional information about themselves if they share something in a journalistic or social media context. Again, that is the lens that I hope we will see. I hope we will see more globalized standards work, and maybe that will build on things like C2PA and similar things.
The lens that I hope will be brought to that is to incorporate this global civil society and human rights perspective, particularly in this area, where there are a lot of risks when we are talking about fundamentally how we are going to communicate in the future, that we do this right and build a technical standard that underlies infrastructure and underlies legislation and regulation that will actually not backfire on our most vulnerable.
ANJA KASPERSEN: Based on my personal experience with the Institute of Electrical and Electronic Engineers (IEEE)—keeping that in mind because that was a big recognition for us as a standards organization—we call it “ethically aligned design,” and the idea of that is that there are these standards around what we call the “7000 series,” which are freely available, the only standards of their kind that are actually free for people to access, around everything from privacy issues to how do you do the actual systems engineering, keeping certain ethical principles in mind, etc., because there is also this confusion that ethics means “non-harmful,” but it is not; it is ways of assessing the tradeoffs and tension points that you have to grapple with in any form of decision-making process. That is what the ethical framing should really be about, that you are not entering into blind zones or making decisions where you have not thought through all the possible fallouts. So whatever you do has to be by design.
To your point, you create something, you create a technical standard, and then it might produce or generate a completely different fallout somewhere else. Having that in mind—hence we call that “ethically aligned” design, so that you have all these thoughts in your head at the same time, even if that will manifest itself into different technical standards then to be complementary to each other. So there are many different approaches, but that is a path that we have attempted to follow. We would love to have WITNESS and others involved in continuing that work.
Let us stay on regulation for a little bit. You mentioned quite a few issues already. When I was reading WITNESS’s organization mandate, what I liked is that—Gillian Tett, editor of the Financial Times, in her latest book talks about anthropological perspectives of how certain narratives cement in our society. I think when it comes to issues of technology and AI you were speaking about intent and mal intent. There are certainly a lot of intentional uses of certain narratives to drive the discussions, and she talks a lot about what are called “social silences,” which is an old term from sociologist Pierre Bourdieu in the 1960s, where certain things are just not spoken about, and it is those social silences that one should be paying attention to to understand what is really going on.
To some extent your job, Sam, for many years now has been to listen to those social silences, to perceive and maybe uncover what the rest of us might either not have the technical knowledge to see or otherwise because certain narratives got to dominate that we have been blinded to. From your perspective what are we currently witnessing in this area? Are we making progress? What are we not seeing? To paraphrase Shakespeare, is it “much ado about nothing,” and how concerned should we be about the regulatory capture that we are witnessing and its impact on civic and human rights?
SAM GREGORY: I love that, “social silences.” I also think that there are a lot of people in power who sit in soundproof rooms that they deliberately soundproof so others cannot hear, so there is a silence outside because they soundproof the room.
ANJA KASPERSEN: Douglas Rushkoff has done really funny editorials about it, the sort of bunker mentality. Again it was the Financial Times talking about how we live in an era of cognitive dissonance.
SAM GREGORY: I think it is part of our role at WITNESS and part of my role in the organization to try to make sure that we are trying to push the dialogue around these emerging technologies and bring a greater range of people into the room and also make sure that they are equally empowered to be part of that conversation. I think often emerging-technology conversations imply that if you are not an expert you cannot be part of the conversation, including very early on.
What we have tried to do, for example, in the synthetic media/deepfakes work is to say: “You are an expert on almost everything contextual to this issue. You just do not happen to know in depth how a generative adversarial network works, and that is probably the least important part of what you need to know to be able to say that you have prior knowledge that should inform this or existing needs that should inform the way we develop ‘solutions’ around this.” I think that is an important part of our work around this.
As I look at the landscape now, as I said I am worried about the way the technology is now taking off. I think there is a genuine technology takeoff versus the hyperbole in the past—and I worry as I look at a lot of the major players in this that the AI companies and the platforms have simultaneously cut the people who cared about harms globally, cared about harms to human rights defenders, and it does seem—this is from just watching the world—they are hiring a lot of public policy people. That suggests to me that we do need to worry about what “regulatory capture” means here.
I think it is complicated, though, because when I look at this I actually think, for example, provenance is an area that a lot of the companies are pushing in, and I think that is a pretty good area to support. It is just how we do it, bearing in mind privacy, nondiscrimination, and these global issues.
Then there is the open-source versus closed-source one. Obviously ideologically, as someone who has worked in the human rights field I have been very closely aligned with open-source projects, purposes, and things like that. At the same time it is very visible that a lot of the safeguards and the ways we might have media transparency are getting much harder to do in open source.
We are in a moment where we need to be putting the pressure on and recognizing where people have not put their money where their mouth is—they have cut resources, they are doing things that are completely antithetical to their rhetoric—and simultaneously recognize where people are pushing things that are probably aligned, but they need to be done right so we need to avoid regulatory capture that is not done right, and also be a little bit careful that we do not just make default assumptions about the way we thought about it before is the way we think about it now.
I will give another example of that. For many people in my space, the human rights space, who have worked on social media, we have put a lot of emphasis, I think very correctly, on platform accountability over the last years, everything from Facebook’s accountability in the genocide in Myanmar to the ways in which other platforms have under-resourced different election processes or neglected their content moderation. I think as we look ahead we have to think much more broadly about the AI responsibility chain that is not just the visible social media part.
In some ways Facebook and Meta are probably not going to be on their own able to show us and detect what is synthetic and not if we do not have the participation and accountability that goes back to the AI models and the deployers, the other parts of the conglomerate that is Meta. Again, I want us to be clear about when we are taking for granted something that we thought before, like “open source good, closed source bad,” but actually weigh it up against what we are trying to achieve and what we are hearing in terms of potential harms and benefits.
ANJA KASPERSEN: To add “open” into your company name, it is a little like we have seen countries having “democratic” as part of their country name in no way indicates that it is open.
SAM GREGORY: Exactly, and “open” means very different things to different people. I think it is incumbent on us in the technology space to be critical and to know who we are representing and who we are trying to speak alongside.
It is interesting. Even this year, with so many competing priorities, we made sure we went back and convened our stakeholders—journalists and social movement leaders in Nairobi and Bogotá and we will do that again in São Paulo in the coming months—because there is a danger that it is not just regulatory capture but also that there is only a certain set of perspectives that are shaping how regulation is done, and those need to be global and they need to reflect those most vulnerable. As I say, particularly when it comes to information issues, we need to think about those frontline information actors like human rights defenders, journalists, and civil society.
ANJA KASPERSEN: That is a great segue to discussing the who and the whom in this space that carry forward some of these narratives. Which of the actors concerns you more, and what behaviors should we as consumers, who often end up becoming the very product that they are selling, be aware of? You had this very interesting metaphor at the beginning of this interview, where you were referring to “pixie dust”—“AI is not magical pixie dust.” Certainly from where I am sitting I am seeing an increasing number of Tinker Bells spreading their pixie dust all over and maybe inventing some of it, some synthetic pixie dust to go along with it. Given your role in having observed these patterns for a long time, what do you see?
SAM GREGORY: Again, I am always going to push back on hyperbole because I think hyperbole often gives also false urgency: “We can’t do anything about this because everything is moving too fast. We can’t do anything about it because it is competitive. We can’t do anything about it because country X is developing this.” The magical pixie dust cuts across all the sectors.
When I think about actors, civil society actors are not present enough in the room. I think that is just a given. People from civil society and the media, particularly globally, are not a sufficient part of the conversation to say: “Hey, wait a second, why are you doing that? Are you thinking about its impact on my society? Are you thinking about the way you are building that solution?” I think it relates to that speed and hyperbole.
For the big business actors I think there are shared problems. There is the underlying question of the incorporation of all this data that is completely unresolved for most of them. How we understand whose data that is and how it can be used is a completely commonality.
The second commonality is in many of the big platforms repeating the mistakes they made with social media—exclusion and launching things without testing them properly. The number of times I see a news article or I am engaged with someone who launched an AI effect and it does something like create hate speech, create offensive caricatures, or get misused—this is all the “move fast and break things” perspective that I thought we had moved on from because we saw that those things were broken, and they were usually human beings globally in situations where they were vulnerable. Those are commonalities.
I think there are particular bad actors in the space because they ignore even established norms. I find it frustrating when I hear companies like Stability AI, who are like: “Human rights is whatever. We don’t need to think about this. It is all going to sort itself out.” No.
That feels like it is casting haze in the air, not pixie dust, around someone trying to just get somewhere with their business plan, so when someone is deliberately allowing, for example, their tools to create hateful images is completely ignoring established guardrails that we know we can build and develop both technically and from a policy perspective that is what I find offensive, but I think it is a shared problem.
With governments it is like we can look at the range of regulation and see the ones that are doing it well and ones that are trying badly. China has been out front on deepfakes and generative AI regulation, but it is completely neglectful of the human rights implications of not permitting satire, not permitting speech that is offensive to the government, and not permitting anonymity rights. First out of the gate does not mean good; it just means that you have the ability to pass legislation.
I know India has been contemplating deepfake legislation. I am not sure if they have yet passed it, but it was a very fast move at the end of last year. We need to be very careful about assessing motivations of governments and how they are misusing it against human rights. That is my lens when I look at a governmental approach or a regulatory approach: Is it taking into account the range of human rights issues that are occurring here, and the ones that have been first out of the gate I do not think are great.
The media is suffering because of this. They do not know how to deal with falsified content. They are trying to work out how to use it in a way that can also affect their business practices in a time when they are struggling. I tend to look at the media through the lens of how they are reporting on this responsibly in a way that looks through the haze when it is a company that is basically not doing anything that is responsible or tamping down the hyperbole.
It can also be specific. I often critique the types of articles that come out with like “Five tips to spot something that was made with AI,” because those tips are the current algorithmic failings of something, and they go away very quickly. I think it sends the wrong way for the public to think about this, which is as something that is going to be pervasive and that needs infrastructure and bigger-picture responses than just placing all the pressure on your or I to look very closely at that image and see if I can spot the sixth finger. I think the media has a real responsibility in helping the public navigate the line and also holding power to account, which are surely their core responsibilities on any news issue, but I think they are really important here.
ANJA KASPERSEN: Just to build on what you were saying on the moving-fast analogy and moving fast even in the regulatory space. We know it does not work from an innovation perspective, but moving fast in a regulatory space does not set us up for necessarily good regulations or regulations that keep in mind civic, political, and human rights.
I am seeing more and more this notion of “digital authoritarianism” being referred to. I have been asked about it a few times, and I tend to add the word “accidental” to it, “accidental digital autocrats or authoritarianism,” because I have certainly observed that the use of highly invasive technologies without scrutiny rarely starts with bad intent. We spoke about intent before.
The road to surveillance societies often moves through one could argue well-intentioned yet deeply misinformed decisions about safety. I find worse, which is classical professional companies’ narrative on this, promises of efficiency and optimization. We talked about safety, we talked about democracy, and others, but it is also this notion that the sales pitch that you, as a government, as a structure, or as an organization, is going to become that much more efficient, you are going to optimize and everything, but there is very little thinking about the tradeoffs and possible impacts that have not been accounted for which then deeply impact on civic and human rights.
What are your observations in this field? Are we in danger for a type of accidental digital authoritarianism, or is it even more dystopian than that?
SAM GREGORY: I like the phrase, “accidental digital authoritarianism,” because I think when we look at a range of technologies perhaps adopted for potentially legitimate reasons—in my work I think about facial recognition, that has some potential usages but generally I think is a pretty terrible idea in many contexts. Digital identity is another one that is complicated in many contexts.
ANJA KASPERSEN: Digital wallets.
SAM GREGORY: Yes. I think in the case about ones that are about understanding identity and tracking how people speak in their society I am worried about intentional linkages like these fake news laws plus technology, though the fake news law may come first and then they may be like, “Oh, that technology really helps us implement the approach underlying the fake news law,” which is to suppress speech.
I think that drive toward efficiency and optimization is there and is a part of the dialogue that I am hearing in these contexts. I think a lot of that is hearing it from business lobbyists who are saying, “We need to do this as well in order to remain competitive.”
ANJA KASPERSEN: To unlock the value.
SAM GREGORY: For commercial advantage, and I am not well enough informed to say they are lying. That is not my specialty. I think the question is the balancing question: Are they listening to that and at the same time also saying how are we incorporating the equities that are critical about how this impacts marginalized populations? How is it going to support privacy and access? How are we going to make sure we learn from things we know already?”
As we know in the AI space there are lots of questions we have already talked about pre-generative AI around bias, discrimination, and the way in which data sets are created and all of that that we do not need to recreate. We know they are there, so as we have a conversation around generative AI how are we making sure we are not reinventing the wheel on the flip side of efficiency and optimization, and I think it is critical now how we do that.
I think it is also on civil society to try to make sure we support the champions who are trying to push on that. For example, I had a conversation with one of the senators in the U.S. Congress who is really trying to think about how we learn from the experience of how marginalized populations in the United States have experienced social media or have experienced telcos. I think that is a valuable way to say that that experience is going to be replicated if we are not careful, and how do you ground a legislative approach in something like that?
That is the balancing act. There are probably going to be a thousand arguments on efficiency and optimization. Instead, how do we make sure there is an equally strong voice that is around the human rights, the ethical standards, and the learning from what we know already about AI?
ANJA KASPERSEN: And clear on the purpose.
SAM GREGORY: And clear on the purpose, and there are no-go areas, which is a place I appreciated the EU AI Act, which was attempting to actually say some of these things. It is in a mess now and it is blurry where it is going to end up, but some of these things are not acceptable, some of these are spaces where we will not allow it, and I do think there is a role for legislators also to say, “Actually this is not a place to automate, this is not a place to do it,” and of course we have similar places in the United States where it is like, “If you are going to automate it, you have to have the right of human appeal, the ability to understand a process,” et cetera, those kinds of ways we have already thought about around AI more broadly and now we are applying it in this synthetic media or generative AI space.
ANJA KASPERSEN: Before we get to my final question, I thought this might also be a good opportunity to discuss your role on the Technology Advisory Board of the International Criminal Court (ICC), which is an important one. Without a doubt, technology and new applications of AI are changing criminal behavior, changing the dynamics of criminal syndicates and criminal organizations, but also including what constitutes war crimes. Could you elaborate on your work with the ICC? You probably will not be able to speak about the work itself, but maybe you can share some of your insights based on that work?
SAM GREGORY: The Technology Advisory Board of the ICC is somewhat inactive at this point, I should say. The International Criminal Court is obviously a key place where we need to think about trust in the information and the evidence that is presented there.
I will use as an example rather than that, because it has been somewhat inactive for the last year—I know the ICC itself is of course moving forward on this, but the Advisory Board has not been particularly active—we have another program at WITNESS that looks at this pipeline of accountability that is based on the images, the videos, the audio, and open-source information that emerges in conflict zones and how do we ensure that it turns into accountability.
I think there we have a real simultaneity of challenges. That actually is true of all of WITNESS’s work. On the one hand, there is this surfeit of information, of videos, of people trying to gather evidence, and trying to show what is wrong, and at the same time you have so many technical and societal ways to undermine that, to say, “That’s a lie, that’s false.” A lot of our work in that space is around how do you prove something is real, how do you gather potential evidence, and how do you show it is to balance that. How do you make sure—and I think we are going to do this more in the next year—that the frontline documenters have the ability to both document in a way that holds up against someone claiming it has been faked with AI? We describe that as “fortifying the truth.”
I wrote an article last year called “Fortify the Truth,” which was all about how do we actually make it harder for someone to claim something has been faked with AI because I see that as a huge problem as it is just going to get easier and easier for people to say, “Oh, that was made with AI, it could have been made with AI, it could have been faked,” so we have this idea that we work with called “fortifying the truth,” which is all about the proactive strategies rather than the defensive strategies: How do we have proactive strategies about everything from how we film, archive, preserve, and present these types of videos and content in a way that enables us to show what is real and rebut claims that something is falsified.
I think there is a proactivity side that is important in conflicts from Ukraine to Gaza to beyond, and then there is also a recognition that AI could also help with the workflows of this. AI can help you sort through a ton of different images to work out if one shows you something. There is an important part of it, but it is all against this backdrop that we know that authoritarian leaders and publics are going to keep saying, “No, no, no, this is faked, this is not real,” and they are going to start throwing the claims of AI in our face.
That is what we have been seeing, and I think the key—again, this goes back to the role of civil society—is proactivity: Are we being proactive in both advocating at the infrastructure level but also with changing our practices in the same way that the media has to change its practices of how it lives in a world where AI is more prominent? Myself and my colleagues as human rights defenders also need to ask: “How are we going to make sure that the fundamental truths that we gather and document can remain trusted as that despite the way in which AI is reshaping our information environment?”
ANJA KASPERSEN: Do you foresee that we will need the same detection tools that you referred to earlier, even to detect truth, to prove to people that these are indeed facts? There is a saying that when you generate the facts as you go along it becomes increasingly more difficult to demonstrate what is the truth.
SAM GREGORY: I started earlier with this example from this Rapid Response Force we run. It was an example I cited in the TED Talk, which is that we have these three cases of supposed AI-created audio that came to this Rapid Response Force of media forensics experts, and although we were not in one case able to conclusively prove whether it was real rather than AI, in one it was definitely real and not AI as was claimed and in another one it was at least partially almost certainly real and not AI. What you were seeing there is people just throwing out this claim that it was made with AI in order to dismiss real footage. I think that is going to be one of our largest challenges.
It was actually a fundamental challenge I put in front of this group of fact checkers and journalists last year when I talked to them and one I have been talking to platforms about because so much of the work of fact checkers, journalists, and indeed platforms is to say, “No, that’s false,” or, “No, that’s misinformation,” when in fact a lot of the time now it may be trying to use these tools to say: “No, actually this is real. Everyone is telling you this is made with AI, but actually we want to tell you this is real.”
That is a bit of a flip-around, and it is quite hard to do reliably because the tools are not 100 percent reliable, so I do not know quite how to do that, but I think it is a fundamental challenge that more and more you are going to have obviously fact checkers, journalists, and perhaps platforms having to say, “Hey, look, this was made with AI, and we will explain it,” and they need the tools and resources to do that, which they do not have at the moment. That is the detection gap I was describing, but they are also going to think about how sometimes they are saying, “Actually I need folks to know that this is real and not faked despite everyone just throwing sand in your eyes and claiming it is being falsified with AI.”
ANJA KASPERSEN: I like this notion of “cognitive resilience,” that we need to have greater investments in people’s cognitive resilience.
This leads me to my last question, which builds on what you just said: So far 2024 has been quite a dramatic year in some ways in the field of AI, technology, and certainly the emergence of more and more applications of synthetic media, synthetic data, and deepfakes. What trends do you foresee as being impactful looking ahead, and what should we be particularly mindful of as we navigate this future, and how do we develop—maybe that is the most important answer for those listening to this—the skillful means needed to be on this journey together and, as you say, to “panic responsibly” together? How do we panic responsibly in your view, and what should we be mindful of in the year ahead?
SAM GREGORY: There are the technical trends: It is going to get easier to falsify what looks real and what sounds real and eventually what looks real in a video or a 3D thing. That is going to happen. The trajectory is clear, and video is going to improve this year in the way that, for example, images and audio did before, so we should prepare ourselves for that shift.
I think we are going to see the usage in elections. I don’t think it is quite as hyped as people describe it. I think we are going to see examples, but the bigger problems are going to be about our societies and our elections, not about AI. I think when we look back we are going to know that it has been a problem.
I am hopeful that there are trends on the regulation side where we will at least start to land some clarity about how we have this media transparency at the very least about how the media is made. I don’t think it is going to impact the elections, but I think it is the first building block in this transparency and accountability across the pipeline.
Developing the skillful means to panic responsibly—it was Katie Harbath’s phrase there; I don’t know if she is riffing deliberately on Prepare, Don’t Panic, but I like it as well—I think a big part of this is about how public figures also speak to this. I think public figures have a real responsibility to bring people into the tent on this, and by that I mean legislators bringing in civil society, I mean politicians talking about this responsibly and not using it as an excuse themselves. Those examples I was giving of politicians dismissing real as faked were all high-level politicians going, “No, no, no, that’s not real,” and that responsibility sits with them, and it sits with the media.
I think the media has to navigate this line. I do not want this to turn into the year where AI is the thing that everyone blames for the world’s ills. I think that would be a diversion from a lot of other things. At the same time, there are very clear steps we could be taking as of now that would make it better for us in 2024 and 2025 which are, as I mentioned earlier, everything from just basic resourcing and skills to investment in this sort of infrastructure, to placing the responsibility on the platforms and the AI chain that all of us could be doing—civil society demanding it, legislators following through on it, and companies actually moving beyond paying lip service and doing it.
ANJA KASPERSEN: Great. Sam, our conversation has been incredibly insightful and fun. My deepest thanks to you for taking the time to share your insights and expertise with our listeners. Thank you.
SAM GREGORY: Thank you. I really liked the conversation.
ANJA KASPERSEN: To our listeners, thank you for joining us, and a special shout-out to the dedicated team at the Carnegie Council for making this podcast possible. For more on ethics and international affairs, connect with us on social media @CarnegieCouncil. I am Anja Kaspersen, hoping this conversation has been worth your time and has left you with something to ponder. It certainly has left me pondering. Thank you very much.