The (Mostly) Newbie’s Guide to Automatically Swapping Faces in Video

Last weekend I was inspired by a great blog post from Matthew Earl, where he showed how to do face swapping in Python. It immediately got me intrigued, and I ended up quickly using it to make this video:

Adopting his code to make it automatically output video was a trivial change, and hardly worthy of a blog post, but I think it’s worthwhile to take a step back, and go through the thought process for someone who’d like to do the same thing, but might not know where to start with something like this.

So here it is, the mostly newbie guide to automatically swapping faces in video.

If you look and read Matthew’s blog post, you’ll see his code takes in two images, a source face, and a secondary face to be merged with. It outputs a third image, called output.jpg, that contains the magically shifted and merged image.

Now, where do we begin?

A lot of people ask me about adopting code, or what processes look like, so I figured I’d walk through the mostly hidden creative process of adapting someone else’s code. In this case, the very first problem is getting the libraries installed, before you can get the code to run.

I work mostly in Mac OS X, so all instructions that follow will assume that you’re running the same.

Getting dlib and its Python Bindings Installed

First things first, we need to download and build the library that Matthew’s code runs on. In this case, it’s dlib, and I’m going to assume you already have python installed.

wget http://dlib.net/files/dlib-18.16.tar.bz2   # Download dlib from the site
bunzip2 dlib-18.16.tar.bz2                      # Bunzip into directory
cd dlib-18.16/examples
mkdir build                                     # Create cmake build directory
cd build
cmake ..
cmake --build . --config Release                # Make the release build
cd ../python_examples
make                                            # Make the Python library

At the end of this, you should now have a file called dlib.so in your python_examples directory. Copy this into your PYTHONPATH.

If you don’t know what your PYTHONPATH is set to:

$ echo $PYTHONPATH
/Users/kirkkaiser/caffe/python:/Users/kirkkaiser/pythonlibs:

You will certainly have a different output from me. In my case, I’ve set both in my .bashrc file. This is just a text file in my home folder. If I open it up and look at it, this is what’s in it:

export PYTHONPATH=/Users/kirkkaiser/caffe/python:/Users/kirkkaiser/pythonlibs:$PYTHONPATH

This tells Python where to look for libraries, in addition to the system directories. In my case, I copied over dlib.so to my pythonlibs directory. Once you’ve created (or modified) this file, be sure to run it using the following:

source ~/.bashrc

Getting the Code Running

Finally, we can check out the code from Github. In my case, I did the following:

git clone https://github.com/matthewearl/faceswap
cd faceswap
emacs faceswap.py

It’s always a good idea to view source code before you run it, to at least try and understand what’s going on before running something. I want to say I did this too, but I’m not sure that I can. In the very first comments of Matthew’s code, he lets it be known that we’re going to need to get a file that is a shape predictor in order to get his code to work:

wget http://sourceforge.net/projects/dclib/files/dlib/v18.10/shape_predictor_68_face_landmarks.dat.bz2
bunzip2 shape_predictor_68_face_landmarks.dat.bz2

Finally, we need two images with faces in them. One thing you’ll learn quickly when dealing with facial recognition systems is that they never seem to work with what you first try. In my case, I needed to go through a few images before I found two that worked.

Understanding What’s Happening

Once you’ve gotten a piece of code running, it’s now a great time to take a step back, and see how it’s running.

In the case of our faceswap code, it mostly happens at the bottom our file, here:

# loads and reads the images, and looks for a single face, throws an
# error if there's more than one or none. 
# it returns the loaded image, and a set of landmarks of the one face it's found
 
im1, landmarks1 = read_im_and_landmarks(sys.argv[1])     # sys.argv[1] is first image filename
im2, landmarks2 = read_im_and_landmarks(sys.argv[2])     # sys.argv[2] is second image filename
 
 
# builds the transformation matrix to make sure both heads align once copied
M = transformation_from_points(landmarks1[ALIGN_POINTS],
			       landmarks2[ALIGN_POINTS])
 
# build the mask of the second image
mask = get_face_mask(im2, landmarks2)
# build the mask of the first image to be copied
warped_mask = warp_im(mask, M, im1.shape)
combined_mask = numpy.max([get_face_mask(im1, landmarks1), warped_mask],
                          axis=0)
# and make the mask of the second image to allow the first over top
warped_im2 = warp_im(im2, M, im1.shape)
warped_corrected_im2 = correct_colours(im1, warped_im2, landmarks1)
 
# blend the two images
output_im = im1 * (1.0 - combined_mask) + warped_corrected_im2 * combined_mask
 
# save it out
cv2.imwrite('output.jpg', output_im)

That’s a lot happening, but it doesn’t seem too confusing. Basically, we build two masks, and then combine the images with two masks.

Getting Started with Detecting And Swapping Two Faces in One Image

First off, can we successfully detect two faces in a single image? In my case, I found a photo with two faces in it, both of which seemed perfect for facial recognition (ie, straight on, both people looking directly at camera).

Running this image through the existing code will obviously run into an error. As the code exists at Github from the post, it expects only 1 face per image. So let’s take another look at the function that reads and returns landmarks:

 
# this is the function to get our landmarks
def get_landmarks(im):
    rects = detector(im, 1)
 
    if len(rects) > 1: # if there's more than one face detected
        raise TooManyFaces # freak out
    if len(rects) == 0:
        raise NoFaces
 
    return numpy.matrix([[p.x, p.y] for p in predictor(im, rects[0]).parts()]) # return the matrix of x y coordinates of landmarks 
 
def read_im_and_landmarks(fname):
    im = cv2.imread(fname, cv2.IMREAD_COLOR)  # load the image 
    im = cv2.resize(im, (im.shape[1] * SCALE_FACTOR, # resize, scale factor is set to 1 by default, so nothing happens
                         im.shape[0] * SCALE_FACTOR))
    s = get_landmarks(im) # return landmarks from above
 
    return im, s

Now, to begin, we don’t really need the read_im_and_landmarks function anymore. We’re just loading up one image, so we might as well get rid of it. The same goes for the get_landmarks function, because that only calls our detector.

Instead of the calls to these functions, let’s just load the image passed to the command line:

im = cv2.imread(sys.argv[1], cv2.IMREAD_COLOR)
im = cv2.resize(im, (im.shape[1] * SCALE_FACTOR,
                     im.shape[0] * SCALE_FACTOR))
rects = detector(im, 1)
if len(rects) < 2:
  print 'Error, less than two faces detected'
print len(rects)

If you run the above, and you get 2, then it’s successful. Now, let’s get our faces to swap with each other in the most generic way possible:

im1, landmarks1 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[0]).parts()])) # first detected face
im2, landmarks2 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[1]).parts()])) # second detected face
 
M = transformation_from_points(landmarks1[ALIGN_POINTS], # First transformation
                               landmarks2[ALIGN_POINTS])
 
M1 = transformation_from_points(landmarks2[ALIGN_POINTS], # Second transformation
                               landmarks1[ALIGN_POINTS])
 
mask = get_face_mask(im2, landmarks2) # First mask
mask1 = get_face_mask(im1, landmarks1) # Second mask
 
warped_mask = warp_im(mask, M, im1.shape) # First warp
warped_mask1 = warp_im(mask1, M1, im2.shape) # Second warp
 
combined_mask = numpy.max([get_face_mask(im1, landmarks1), warped_mask],
                          axis=0)
combined_mask1 = numpy.max([get_face_mask(im2, landmarks2), warped_mask1],
                          axis=0)
 
warped_corrected_im2 = correct_colours(im1, warped_im2, landmarks1)
warped_corrected_im3 = correct_colours(im2, warped_im3, landmarks2)
 
output_im = im1 * (1.0 - combined_mask) + warped_corrected_im2 * combined_mask # apply first mask
output_im = output_im * (1.0 - combined_mask1) + warped_corrected_im3 * combined_mask1 # apply second face mask
 
cv2.imwrite('output.jpg', output_im)

This is super inefficient, but it doesn’t really matter. It’s creating four layers to move our two faces, and combining all of them. But it works. And we’ve successfully got faces being swapped in one image.

Adapting Our Script to Video

There are two great command line tools for working with video. The first is youtube-dl, and the second is ffmpeg. To install either of them, install homebrew, and then on the command line:

brew install ffmpeg
brew install youtube-dl

FFmpeg lets us break videos down into images, rescale them, modify them, and then put them back together. Youtube-dl lets us use the entire internet’s worth of videos to download and remix. In my case, I already had a video I’d shot, but if you don’t, pick one from Youtube, and use youtube-dl to download an mp4 of it.

youtube-dl https://www.youtube.com/watch?v=dQw4w9WgXcQ

Now, I have a mostly standardized way I like to work with video. In general, I’ll extract all frames of a video using ffmpeg, create an output directory, and run a glob of every image in the directory, and process it to an output directory.

From the command line, let’s extract our video image frames and then create our output directory:

ffmpeg -i yourmovie.mp4 output%05d.jpg
mkdir output

Alright. Our working directory should now be filled with images, each frame of our video now converted to images. Let’s now process all of those images one by on in Python using glob.

import glob
 
for filename in glob.glob('*.jpg'):
    im = cv2.imread(filename, cv2.IMREAD_COLOR)                 # open the current frame
    im = cv2.resize(im, (im.shape[1] * SCALE_FACTOR,
                         im.shape[0] * SCALE_FACTOR))
    rects = detector(im, 1)
    if len(rects) < 2:
        print filename + " is missing two faces. skipping."    # copy and skip a frame if it's missing two faces
        shutil.copyfile(filename, 'output/' + filename)
        continue
    if rects[0].left() < rects[1].left():                     # here's a tricky bit. make sure and keep the faces in the same place
        im1, landmarks1 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[0]).parts()]))
        im2, landmarks2 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[1]).parts()]))
    else:
        im1, landmarks1 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[1]).parts()]))
        im2, landmarks2 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[0]).parts()]))
 
    M = transformation_from_points(landmarks1[ALIGN_POINTS],
                               landmarks2[ALIGN_POINTS])
 
    M1 = transformation_from_points(landmarks2[ALIGN_POINTS],
                               landmarks1[ALIGN_POINTS])
 
    mask = get_face_mask(im2, landmarks2)
    mask1 = get_face_mask(im1, landmarks1)
 
    warped_mask = warp_im(mask, M, im1.shape)
    warped_mask1 = warp_im(mask1, M1, im2.shape)
 
    combined_mask = numpy.max([get_face_mask(im1, landmarks1), warped_mask],
                          axis=0)
    combined_mask1 = numpy.max([get_face_mask(im2, landmarks2), warped_mask1],
                          axis=0)
 
    warped_im2 = warp_im(im2, M, im1.shape)
    warped_im3 = warp_im(im1, M1, im2.shape)
 
    warped_corrected_im2 = correct_colours(im1, warped_im2, landmarks1)
    warped_corrected_im3 = correct_colours(im2, warped_im3, landmarks2)
 
    output_im = im1 * (1.0 - combined_mask) + warped_corrected_im2 * combined_mask
    output_im = output_im * (1.0 - combined_mask1) + warped_corrected_im3 * combined_mask1
 
    cv2.imwrite('output/' + filename, output_im) # write same filename to output directory
    print filename + " finished, adding."

import glob for filename in glob.glob('*.jpg'): im = cv2.imread(filename, cv2.IMREAD_COLOR) # open the current frame im = cv2.resize(im, (im.shape[1] * SCALE_FACTOR, im.shape[0] * SCALE_FACTOR)) rects = detector(im, 1) if len(rects) < 2: print filename + " is missing two faces. skipping." # copy and skip a frame if it's missing two faces shutil.copyfile(filename, 'output/' + filename) continue if rects[0].left() < rects[1].left(): # here's a tricky bit. make sure and keep the faces in the same place im1, landmarks1 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[0]).parts()])) im2, landmarks2 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[1]).parts()])) else: im1, landmarks1 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[1]).parts()])) im2, landmarks2 = (im, numpy.matrix([[p.x, p.y] for p in predictor(im, rects[0]).parts()])) M = transformation_from_points(landmarks1[ALIGN_POINTS], landmarks2[ALIGN_POINTS]) M1 = transformation_from_points(landmarks2[ALIGN_POINTS], landmarks1[ALIGN_POINTS]) mask = get_face_mask(im2, landmarks2) mask1 = get_face_mask(im1, landmarks1) warped_mask = warp_im(mask, M, im1.shape) warped_mask1 = warp_im(mask1, M1, im2.shape) combined_mask = numpy.max([get_face_mask(im1, landmarks1), warped_mask], axis=0) combined_mask1 = numpy.max([get_face_mask(im2, landmarks2), warped_mask1], axis=0) warped_im2 = warp_im(im2, M, im1.shape) warped_im3 = warp_im(im1, M1, im2.shape) warped_corrected_im2 = correct_colours(im1, warped_im2, landmarks1) warped_corrected_im3 = correct_colours(im2, warped_im3, landmarks2) output_im = im1 * (1.0 - combined_mask) + warped_corrected_im2 * combined_mask output_im = output_im * (1.0 - combined_mask1) + warped_corrected_im3 * combined_mask1 cv2.imwrite('output/' + filename, output_im) # write same filename to output directory print filename + " finished, adding."

Finally, we can cd into our output directory, and get back out our finished video:

cd output
ffmpeg -i output%05d.jpg out.mp4
open out.mp4

Get The Code

As always, the code is at Github.

kpkaiser.com

Art and Code

The (Mostly) Newbie’s Guide to Automatically Swapping Faces in Video

Leave a Reply Cancel reply