Deepfakes are Getting Better
You’ve probably seen the infamous video of Nicolas Cage face-swapped onto Amy Adams from the 2013’s Man of Steel. Or maybe you have seen any of the many videos floating around the internet that show various world leaders, both past and present, giving speeches they never made.
These image-processing wonders are the end result of a process that starts with Generative Adversarial Networks. That is, two machine-learning neural networks that have been trained against each other in an escalating arms race. One, the generator network, does nothing but produce images. The other, the discriminator network, is fed a mix of images produced by the generator along with real images. It then attempts to determine which are real and which are fake. Success and failure scores are then fed back into both networks. When they succeed, they continue doing what worked. When they fail, they try something else. The two networks train off of each other.As soon as one is able to beat their partner, that simply pushes them to become better themselves.
How Deepfakes Work
Will it be possible in the future to detect deepfakes? For a little while, yes. Eventually the answer will be no and it’s absolutely crucial that you understand this. There will come a time when any random person on the internet will be able to create completely undetectable media of any kind they want. Neither Google, nor Facebook nor any government in the entire world will be able to tell you what is real and what is fake. The sooner you come to terms with that, the better off you’ll be. “Oh, but manipulated images are never perfect” you think. “Maybe fakes can be good enough to fool humans, but better software will be made to detect them!”
Deepfakes Are Easy . . . Seriously
On the left we have a stock photo of Vladimir Putin, compliments of wikipedia. On the right, we have Putin with a mustache, compliments of me spending ten minutes with windows paintbrush and GIMP. For a ten minute job, it’s reasonably passable. Somebody might plausibly see the image on the right and think it was real. But now zoom in, and the tampering becomes evident from the inconsistent color blending, the harsh contrast under the nose and the way one side of the mustache hovers over his lips.
Notice that while the image on the left is real, you can nevertheless see pixelation in it. Pixelation alone does not indicate a fake. All digital images are composed of pixels, and those flat edges you see in the real photo are simply the result of real life curves being smaller than the pixel size.
So while it might be easy to detect a photoshop job, what about deepfakes? Yes, for now it’s still possible to detect them, but remember that every GAN by definition contains a “fake detector” because that’s what GANs are in the first place: a generator plus a detector network.
The detector is the means by which the generator is trained and the generator is the means by which the detector is trained. The better your detector, the better you can train up your generator. The better your generator, the better you can train up your detector.
Deepfakes vs Big Tech & Social Media
So let’s imagine that Google or Facebook or some watch organization releases a deepfake detector. You upload or point to an image online, and it tells you whether it’s real or fake. Congratulations, all you’ve done is create a better detector which can then be used to train a better generator, BECAUSE THAT’S HOW GANS WORK. The arms race simply continues.
At that point all they have to do is simply train up a better detector, right? Well, yes and no. In the long run, this is an arms race that generators inevitably must win.
A digital image is just a collection of numbers. There’s nothing fundamentally about numbers that identifies them as real or fake. Detector networks detect fakes essentially through pattern recognition. All a generator has to do to “win the war” is produce patterns that are functionally identical to patterns produced by cameras. That’s a task of finite difficulty. Remember the pixelation even in the real Vladimir Putin picture?
As a simple example, let’s say you have a physical six-sided die on your table right now. Imagine rolling it to produce a real number from 1 to 6. And now head over to random.org and produce a “fake” number from 1 to 6. Here you go:
Which of the numbers above is a real number produced by a die roll, and which is a fake number produced by a random number generator?
There’s no way to know, because the generator at random.org is able to produce a range of results that are equivalent to the results produced by a real life physical die roll. It doesn’t even have to be perfect. It only has to be good enough to produce equivalent results sometimes, and at that point those results become indistinguishable from reality. For example, if you go to random.org and generate a number from 1-12 instead of 1-6, it will still produce results that are good enough half of the time.
So now imagine that you ask a million different people to each take a picture of an orange. Every single one of those pictures would have different pixel values. There is no one true set of numbers that constitutes a picture of an orange. The set of possible pictures that a real life camera might actually produce is very, very large.
All a generator has to do to “win the war” is train enough that it can sometimes produce data that’s equivalent to something that might actually be produced by a camera. Once it can, there’s no way to look at the picture itself and distinguish it from reality any more than you can distinguish between the results of a real life die roll and a random number generator.
And this capability won’t be confined to billion dollar corporations or clandestine government organizations. Give it a couple years and your neighbor’s twelve year old will be making these things on her phone.