The (Mostly) Newbie’s Guide to Automatically Swapping Faces in Video

Last weekend I was inspired by a great blog post from Matthew Earl, where he showed how to do face swapping in Python. It immediately got me intrigued, and I ended up quickly using it to make this video:

Adopting his code to make it automatically output video was a trivial change, and hardly worthy of a blog post, but I think it’s worthwhile to take a step back, and go through the thought process for someone who’d like to do the same thing, but might not know where to start with something like this.

So here it is, the mostly newbie guide to automatically swapping faces in video.

If you look and read Matthew’s blog post, you’ll see his code takes in two images, a source face, and a secondary face to be merged with. It outputs a third image, called output.jpg, that contains the magically shifted and merged image.

Now, where do we begin?

A lot of people ask me about adopting code, or what processes look like, so I figured I’d walk through the mostly hidden creative process of adapting someone else’s code. In this case, the very first problem is getting the libraries installed, before you can get the code to run.

I work mostly in Mac OS X, so all instructions that follow will assume that you’re running the same.

Getting dlib and its Python Bindings Installed

First things first, we need to download and build the library that Matthew’s code runs on. In this case, it’s dlib, and I’m going to assume you already have python installed.

wget   # Download dlib from the site
bunzip2 dlib-18.16.tar.bz2                      # Bunzip into directory
cd dlib-18.16/examples
mkdir build                                     # Create cmake build directory
cd build
cmake ..
cmake --build . --config Release                # Make the release build
cd ../python_examples
make                                            # Make the Python library

At the end of this, you should now have a file called in your python_examples directory. Copy this into your PYTHONPATH.

If you don’t know what your PYTHONPATH is set to:


You will certainly have a different output from me. In my case, I’ve set both in my .bashrc file. This is just a text file in my home folder. If I open it up and look at it, this is what’s in it:

export PYTHONPATH=/Users/kirkkaiser/caffe/python:/Users/kirkkaiser/pythonlibs:$PYTHONPATH

This tells Python where to look for libraries, in addition to the system directories. In my case, I copied over to my pythonlibs directory. Once you’ve created (or modified) this file, be sure to run it using the following:

source ~/.bashrc

Getting the Code Running

Finally, we can check out the code from Github. In my case, I did the following:

git clone
cd faceswap

It’s always a good idea to view source code before you run it, to at least try and understand what’s going on before running something. I want to say I did this too, but I’m not sure that I can. In the very first comments of Matthew’s code, he lets it be known that we’re going to need to get a file that is a shape predictor in order to get his code to work:

bunzip2 shape_predictor_68_face_landmarks.dat.bz2

Finally, we need two images with faces in them. One thing you’ll learn quickly when dealing with facial recognition systems is that they never seem to work with what you first try. In my case, I needed to go through a few images before I found two that worked.

Understanding What’s Happening

Once you’ve gotten a piece of code running, it’s now a great time to take a step back, and see how it’s running.

In the case of our faceswap code, it mostly happens at the bottom our file, here:

# loads and reads the images, and looks for a single face, throws an
# error if there's more than one or none. 
# it returns the loaded image, and a set of landmarks of the one face it's found
im1, landmarks1 = read_im_and_landmarks(sys.argv[1])     # sys.argv[1] is first image filename
im2, landmarks2 = read_im_and_landmarks(sys.argv[2])     # sys.argv[2] is second image filename
# builds the transformation matrix to make sure both heads align once copied
M = transformation_from_points(landmarks1[ALIGN_POINTS],
# build the mask of the second image
mask = get_face_mask(im2, landmarks2)
# build the mask of the first image to be copied
warped_mask = warp_im(mask, M, im1.shape)
combined_mask = numpy.max([get_face_mask(im1, landmarks1), warped_mask],
# and make the mask of the second image to allow the first over top
warped_im2 = warp_im(im2, M, im1.shape)
warped_corrected_im2 = correct_colours(im1, warped_im2, landmarks1)
# blend the two images
output_im = im1 * (1.0 - combined_mask) + warped_corrected_im2 * combined_mask
# save it out
cv2.imwrite('output.jpg', output_im)

That’s a lot happening, but it doesn’t seem too confusing. Basically, we build two masks, and then combine the images with two masks.

Getting Started with Detecting And Swapping Two Faces in One Image

First off, can we successfully detect two faces in a single image? In my case, I found a photo with two faces in it, both of which seemed perfect for facial recognition (ie, straight on, both people looking directly at camera).

Running this image through the existing code will obviously run into an error. As the code exists at Github from the post, it expects only 1 face per image. So let’s take another look at the function that reads and returns landmarks:

# this is the function to get our landmarks
def get_landmarks(im):
    rects = detector(im, 1)
    if len(rects) > 1: # if there's more than one face detected
        raise TooManyFaces # freak out
    if len(rects) == 0:
        raise NoFaces
    return numpy.matrix([[p.x, p.y] for p in predictor(im, rects[0]).parts()]) # return the matrix of x y coordinates of landmarks 
def read_im_and_landmarks(fname):
    im = cv2.imread(fname, cv2.IMREAD_COLOR)  # load the image 
    im = cv2.resize(im, (im.shape[1] * SCALE_FACTOR, # resize, scale factor is set to 1 by default, so nothing happens
                         im.shape[0] * SCALE_FACTOR))
    s = get_landmarks(im) # return landmarks from above
    return im, s

Now, to begin, we don’t really need the read_im_and_landmarks function anymore. We’re just loading up one image, so we might as well get rid of it. The same goes for the get_landmarks function, because that only calls our detector.

Instead of the calls to these functions, let’s just load the image passed to the command line:

im = cv2.imread(sys.argv[1], cv2.IMREAD_COLOR)
im = cv2.resize(im, (im.shape[1] * SCALE_FACTOR,
                     im.shape[0] * SCALE_FACTOR))
rects = detector(im, 1)
if len(rects) < 2:
  print 'Error, less than two faces detected'
print len(rects)

If you run the above, and you get 2, then it’s successful. Now, let’s get our faces to swap with each other in the most generic way possible:

im1, landmarks1 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[0]).parts()])) # first detected face
im2, landmarks2 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[1]).parts()])) # second detected face
M = transformation_from_points(landmarks1[ALIGN_POINTS], # First transformation
M1 = transformation_from_points(landmarks2[ALIGN_POINTS], # Second transformation
mask = get_face_mask(im2, landmarks2) # First mask
mask1 = get_face_mask(im1, landmarks1) # Second mask
warped_mask = warp_im(mask, M, im1.shape) # First warp
warped_mask1 = warp_im(mask1, M1, im2.shape) # Second warp
combined_mask = numpy.max([get_face_mask(im1, landmarks1), warped_mask],
combined_mask1 = numpy.max([get_face_mask(im2, landmarks2), warped_mask1],
warped_corrected_im2 = correct_colours(im1, warped_im2, landmarks1)
warped_corrected_im3 = correct_colours(im2, warped_im3, landmarks2)
output_im = im1 * (1.0 - combined_mask) + warped_corrected_im2 * combined_mask # apply first mask
output_im = output_im * (1.0 - combined_mask1) + warped_corrected_im3 * combined_mask1 # apply second face mask
cv2.imwrite('output.jpg', output_im)

This is super inefficient, but it doesn’t really matter. It’s creating four layers to move our two faces, and combining all of them. But it works. And we’ve successfully got faces being swapped in one image.

Adapting Our Script to Video

There are two great command line tools for working with video. The first is youtube-dl, and the second is ffmpeg. To install either of them, install homebrew, and then on the command line:

brew install ffmpeg
brew install youtube-dl

FFmpeg lets us break videos down into images, rescale them, modify them, and then put them back together. Youtube-dl lets us use the entire internet’s worth of videos to download and remix. In my case, I already had a video I’d shot, but if you don’t, pick one from Youtube, and use youtube-dl to download an mp4 of it.


Now, I have a mostly standardized way I like to work with video. In general, I’ll extract all frames of a video using ffmpeg, create an output directory, and run a glob of every image in the directory, and process it to an output directory.

From the command line, let’s extract our video image frames and then create our output directory:

ffmpeg -i yourmovie.mp4 output%05d.jpg
mkdir output

Alright. Our working directory should now be filled with images, each frame of our video now converted to images. Let’s now process all of those images one by on in Python using glob.

import glob
for filename in glob.glob('*.jpg'):
    im = cv2.imread(filename, cv2.IMREAD_COLOR)                 # open the current frame
    im = cv2.resize(im, (im.shape[1] * SCALE_FACTOR,
                         im.shape[0] * SCALE_FACTOR))
    rects = detector(im, 1)
    if len(rects) < 2:
        print filename + " is missing two faces. skipping."    # copy and skip a frame if it's missing two faces
        shutil.copyfile(filename, 'output/' + filename)
    if rects[0].left() < rects[1].left():                     # here's a tricky bit. make sure and keep the faces in the same place
        im1, landmarks1 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[0]).parts()]))
        im2, landmarks2 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[1]).parts()]))
        im1, landmarks1 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[1]).parts()]))
        im2, landmarks2 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[0]).parts()]))
    M = transformation_from_points(landmarks1[ALIGN_POINTS],
    M1 = transformation_from_points(landmarks2[ALIGN_POINTS],
    mask = get_face_mask(im2, landmarks2)
    mask1 = get_face_mask(im1, landmarks1)
    warped_mask = warp_im(mask, M, im1.shape)
    warped_mask1 = warp_im(mask1, M1, im2.shape)
    combined_mask = numpy.max([get_face_mask(im1, landmarks1), warped_mask],
    combined_mask1 = numpy.max([get_face_mask(im2, landmarks2), warped_mask1],
    warped_im2 = warp_im(im2, M, im1.shape)
    warped_im3 = warp_im(im1, M1, im2.shape)
    warped_corrected_im2 = correct_colours(im1, warped_im2, landmarks1)
    warped_corrected_im3 = correct_colours(im2, warped_im3, landmarks2)
    output_im = im1 * (1.0 - combined_mask) + warped_corrected_im2 * combined_mask
    output_im = output_im * (1.0 - combined_mask1) + warped_corrected_im3 * combined_mask1
    cv2.imwrite('output/' + filename, output_im) # write same filename to output directory
    print filename + " finished, adding."

Finally, we can cd into our output directory, and get back out our finished video:

cd output
ffmpeg -i output%05d.jpg out.mp4
open out.mp4

Get The Code

As always, the code is at Github.

Defeating Facebook’s DeepFace with Deep Dreams

Glitched Faces

AI Is Already Here, And We’ve Barely Noticed

Facebook DeepFace Autotagging Me

AI is infiltrating our lives, in much the same way mobile did before it. It’s being fueled by the massive amounts of data we humans are generating from our phones, and it’s begun to radically change the way we interact with our machines.

For instance, when you upload a photo to Facebook, it runs through DeepFace, Facebook’s technology to be able to recognize faces. It looks into your photo for any faces it may recognize, using its knowledge of previously tagged uploads to tell people apart.

In my case, DeepFace knew the black and white photo I uploaded was me. I only have a little over a hundred tagged photos of myself on Facebook, but that’s enough for DeepFace to recognize me.

Microsoft Hello, and Apple’s TouchID

Hello Terry

Windows Hello is Microsoft’s latest “security” feature, which allows you to use an infrared, Kinect like web camera to log into your computer securely. It uses a 3d model and images of what you look like, to ensure that it’s not just a photo of you in front of your computer, but the real you.

Every iPhone now has the “security” of Apple’s TouchID, an infrared thumb reader and recognizer. Apple has said multiple times that the thumbprint is stored securely on your phone, with no remote access, but all iPhones were just remotely rebootable via a text message.

Why do the largest companies in the world think we want to put all of our biometric data onto their platforms?

What happens when we have our first big biometric data breach, and everyone’s thumbprint, retina scan, and face patterns get leaked? How do we replace all newly insecure biometrics overnight when that happens?

Seeing the Machine’s Mind

A month ago, Google released a piece of software called Deep Dream. It allowed people to see what the machine learning algorithms were looking for when they recognized things like dogs or faces in images.

If you haven’t heard or seen any of the images, I wrote a guide walking through how it works.

The machine learns based on lots of input data. It needed thousands of images, each labeled as things like dogs, squids, bicycles, etc., all to know and learn what these things all look like.

So platforms like Google, Facebook, and Microsoft are all in unique positions to exploit and collect as much data as possible, to look for possibly novel uses for their massive amounts of data later.

But more interestingly, the recent release of DeepDream gives us an opportunity to subvert the machine’s process of discovery, by feeding it images that are exactly what it’s looking for, and creating noise which gives us an opportunity untrain the machinery from knowing who we are.

Generating Machine Noise

Originally, I tried generating raw noise, and having the noise be Deep Dreamed by a face trained neural network. (Specifically, pool5 of the Age Net from the Caffe model zoo.) This didn’t work at all. I used OpenCV and a Haar Cascade trained on faces to see when I’d generated a face from the background noise, and got a few images where there were multiple faces, but Facebook simply didn’t see the same faces as the Haar Cascade.


So I changed tacks, and just did a simple copy and paste job. I used the Haar Cascade on a few photos of myself, and copied and pasted multiple versions of my face into the image.

Unfortunately, this didn’t give me the results I was looking for. Instead, most of my faces were being missed by Facebook’s face detection. So I’d dream an entire image, filled with maybe 30 or 40 copies of my face, and I’d only get 1 or two faces recognized by Facebook.

Automatically Generating Noisy Faces

Perplexed, I started tiling faces, and doing multiple levels of dreams. Eventually, I found the optimal response for tricking Facebook’s DeepFace to come from 2 Deep Dreams of pool5 of the Age Net, and to use 1 other non face background image square as a filler. I stumbled on to this when a Haar Cascade mistook one of the trees in the background of my photo as a face.

from PIL import Image
import random
import cv2
cascPath = './haarcascade_frontalface_default.xml' # our face classifier from OpenCV
classy = cv2.CascadeClassifier(cascPath)
image = cv2.imread('face.jpg', 1) # supply image with a face for opencv 
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # convert to grey for haar cascade
faces = classy.detectMultiScale(  # play with these numbers if your face isn't recognized
    minSize=(30, 30),
    flags =
print len(faces) # number of faces in the image
facesImages = []
for (x, y, w, h) in faces:
    facesImages.append( img.crop((x-10, y-10, x+w+10, y+h+10))) # make an array of all faces with a bit of room around them
x = 0
y = 0
i = 0
angles = [0, 45, 90, 180, 270] # not used now, but you could rotate, cycle through each angle
blankIMG ="RGB", (1280, 720), "white" ) # optimal resolution for image for me
while y < blankIMG.height:
    x = random.randint(0,img.width)
    y = random.randint(0,img.height)
    if x > blankIMG.width:
        y = y + facesImages[0].height
        x = 0
    if (i % 2) == 0:
        blankIMG.paste(facesImages[random.randint(0,len(facesImages)-1)], (x,y)) # for making face selection random
        blankIMG.paste(facesImages[random.randint(0,len(facesImages)-1)].transpose(Image.FLIP_LEFT_RIGHT), (x,y)) 
        # flip image horizontally
    x = x + facesImages[0].width'presuccess.jpg') # image filled with tessellated faces
imgnum = np.float32(blankIMG)
frame = deepdream(net, imgnum, end='pool5')
frame = deepdream(net, frame, end='pool5') 

Finally, I stumbled on the perfect amount of glitch for Facebook to still think a Deep Dreamed version of myself was still me, the photo you see at the top of this post. When I uploaded it to Facebook, this is what I got:

Deep Dreamed with Age Net

The idea here is that we can start to steer the AI in a direction of our choosing. Maybe we want the right to be forgotten by Facebook’s machines, or maybe we want to loosen what gets seen as us. Either way, this is the beginning of a tool to steer the conversation of what the machines know about us.

I could see this sort of noise generation being used to throw AI and Big Data off of our personal trails. We may in the future have AIs covering our tracks for us online, generating our own signal to noise to be able to regain a piece of our anonymity.

Make Your Own Deep Graffiti

I’ve posted the code for this article over at github, and I encourage any and all pull requests / ideas. I think using neural networks to trick one another is just beginning, and the AI arms race is about to get very interesting.

Can’t wait to see what you come up with!