An Iterative Approach to Building an API – Part 3: Writing Chef Scripts for Amazon EC2 Deployment

If you’re just joining us, over the past two days we’ve written a RESTful API in Python with Tornado and ElasticSearch as our data backend. We started by stubbing out the API data in YAML files, then switched over into a complete, scalable data store with ElasticSearch.

Today, we’ll write the Chef scripts that will build out the server our API will be hosted on. In this example, we’ll walk through creating a new Amazon Web Services account, adding our private keys to SSH, and writing and deploying a complete server environment with Chef.

Creating Your Amazon Web Services SSH Key

First, you’ll need to log into, or create an Amazon Web Services account if you don’t have one already. Then, click the My Account / Console button to open the drop down and pick AWS Management Console.

Select EC2 to get access to Amazon’s server deployment frontend. We’re going to build our own servers from the command line, but first we’ll need to get a private key from Amazon so we don’t have to use passwords on our server. Instead, we’ll have a cryptographically signed certificate that we’ll use to verify ourselves.

Once at the EC2 console, on the left column there will be Networks and Security. Underneath this is the Key Pairs url. Click this to bring up the key pair management console.

Click create new pair, and you’ll be prompted to enter a name for your key pair. In this example, we’ll use ElasticAPI. After doing so, you’ll automatically download a file named ElasticAPI.pem. We’ll now need to chmod this private key to stop other users from being able to access, then add it to ssh’s list of keys.

$ mv ~/Downloads/ElasticAPI.pem ~/.ssh/
$ cd ~/.ssh
$ chmod 600 ElasticAPI.pem
$ ssh-add ElasticAPI.pem
Identity added: ElasticAPI.pem (ElasticAPI.pem)

Great! Now when we create servers with Amazon, we’ll be able to ssh into them right away, without prompting for passwords. This means we can write scripts to automatically manage our servers without requiring us to type long passwords every time something happens.

Installing knife-solo and Adding Our Amazon Access Keys

So now we need the ability to create Amazon instances from the command line. Chef has a utility named knife which allows you to deploy servers to a few different cloud providers. However, we’re going to use an extension of knife, called knife-solo instead in this example.

Knife-solo adds some utilities to bootstrap an instance of Chef scripts, and also allow for installation of Chef to a brand new server. This means we can get by with just a bare minimum of overhead with Chef, which can honestly get pretty damn confusing.

Installing knife-solo begins with making sure you’ve got a proper installation of Ruby and Gem, and then running the following set of commands to get a new Chef repository working:

$ gem install knife-solo
$ knife-solo kitchen chefAPI
$ cd chefAPI
$ git init
$ git add *
$ git commit

Great! We’ve got an empty Chef repository now, and very soon we can begin adding packages. Now, we can open up ~/.chef/knife.rb and add our Amazon Web Services API information:

log_level          :info                                                                                                                                                                                    
log_location       STDOUT                                                                                                                                                                                   
ssl_verify_mode    :verify_none                                                                                                                                                                             
#chef_server_url    "http://y.t.b.d:4000"                                                                                                                                                                   
#file_cache_path    "/var/cache/chef"                                                                                                                                                                       
#pid_file           "/var/run/chef/client.pid"                                                                                                                                                              
cache_options({ :path => "/var/cache/chef/checksums", :skip_expires => true})                                                                                                                               
signing_ca_user "chef"                                                                                                                                                                                      
Mixlib::Log::Formatter.show_time = true                                                                                                                                                                     
validation_client_name "chef-validator"                                                                                                                                                                     
knife[:aws_ssh_key_id] = "ElasticAPI"                                                                                                                                                                
knife[:aws_access_key_id]     = 'YOURACCESSKEYIDHERE'                                                                                                                                                      
knife[:aws_secret_access_key] = 'YOURSECRETACCESSKEYHERE'

Your Amazon Access Key ID and Secret Access Keys were made when you set up your Amazon account, and can be found in Security Credentials underneath the My Account / Console tab.

Verify knife-solo Works By Creating an Ubuntu Instance

Now we can verify that knife-solo works by invoking the command to bring up an Amazon t1.micro instance.

$ knife ec2 server create -I ami-137bcf7a -x ubuntu -f t1.micro
Instance ID: i-00a8957d
Flavor: t1.micro
Image: ami-137bcf7a
Region: us-east-1
Availability Zone: us-east-1b
Security Groups: default
Tags: {"Name"=>"i-00a8957d"}
SSH Key: ElasticAPI
 
Waiting for server..................
Public DNS Name: ec2-XX-XXX-XXX-XX.compute-1.amazonaws.com
Public IP Address: XX.XXX.XXX.XXX
Private DNS Name: domU-XX-XXX-XXX-XX.compute-1.internal
Private IP Address: XX.XXX.XXX.XXX
 
Waiting for sshd.done
Bootstrapping Chef on ec2-XX-XXX-XXX.compute-1.amazonaws.com

You might get an error after the last line, and that’s perfectly alright. Verify that the instance was created, and your ssh key works by doing an ssh to the server IP address you got from above, using the ubuntu username:

$ ssh -i ~/.ssh/ElasticAPI.pem ubuntu@XX.XXX.XX.XX
Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-29-virtual x86_64)
 
 * Documentation:  https://help.ubuntu.com/
 
  System information as of Sun Oct 14 21:23:54 UTC 2012
 
  System load:  0.08              Processes:           67
  Usage of /:   21.4% of 7.97GB   Users logged in:     0
  Memory usage: 46%               IP address for eth0: 10.210.218.255
  Swap usage:   0%
 
  Graph this data and manage this system at https://landscape.canonical.com/
 
42 packages can be updated.
22 updates are security updates.
 
Get cloud support with Ubuntu Advantage Cloud Guest
  http://www.ubuntu.com/business/services/cloud
ubuntu@domU-XX-XX-XX-XX-XX-XX:~$ exit

Finally! Now we can begin coding the actual Chef code, which should go fairly quickly, now that we have a way to quickly test whether or not the scripts are written correctly.

Adding Our Dependencies to Chef

Chef scripts are all written in Ruby. You download recipes, which are simple Ruby scripts that describe how to build pieces of your server.

The great people at Opscode, who created Chef have already built the majority of scripts you’ll need. So all we end up doing is cloning their repos for what we need, then writing own own recipe which describes how to use the standard recipes.

Because we’re going the simplified route, we’re just going to clone the repos by hand. So let’s begin:

$ cd elasticChef/cookbooks
$ git clone git://github.com/opscode-cookbooks/apt.git
$ git clone git://github.com/opscode-cookbooks/build-essential.git
$ git clone git://github.com/opscode-cookbooks/java.git
$ git clone git://github.com/opscode-cookbooks/python.git
$ git clone git://github.com/opscode-cookbooks/ark.git
$ git clone git://github.com/opscode-cookbooks/sudo.git
$ git clone git://github.com/opscode-cookbooks/gunicorn.git

All those commands installed the most very basic set of utilities to get our server up and running. We can now create our own recipe to invoke the proper installation of software.

$ mkdir elasticServer
$ touch elasticServer/metadata.rb
$ mkdir elasticServer/recipes
$ touch elasticServer/recipes/default.rb

Now, open elasticServer/metadata.rb, and add our dependencies to it:

depends "apt"
depends "git"
depends "build-essential"
depends "python"
depends "gunicorn"

Pretty straightforward file, right? Now, let’s actually write the bit of code that builds our server. This should be in recipes/default.rb:

include_recipe "build-essential"
include_recipe "python::default"
include_recipe "gunicorn::default"                                                                                                                                                                          
include_recipe "elasticsearch::default"
 
%w{emacs git-core rlwrap openjdk-6-jdk tmux curl tree unzip nginx python-setuptools python-dev build-essential supervisor}.each do |pkg|
  package pkg do
    action :install
  end
end
 
service "nginx" do
  enabled true
  running true
  supports :status => true, :restart => true, :reload => true
  action [:start, :enable]
end
 
python_virtualenv "/home/ubuntu/elasticEnv" do
    interpreter "python2.7"
    owner "ubuntu"
    group "ubuntu"
    action :create
end

Save this file, and we’re completely done with all the Chef scripts we’ll need to write. We just now need to create a node file that describes our node.

Adding and Building Our Node

In our elasticChef directory, you may have noticed a nodes/ directory. This is where we describe how individual nodes get built. The final step before we can push a server configuration is creating a file in nodes with the IP address of your server and the extension of json.

So, if your public Amazon IP address from before was 11.22.33.44, your file would be in nodes/ and called 11.22.33.44.json. And that file would consist of the following:

{                                                                                                                                                                                                           
    "run_list": [ "recipe[elasticServer]" ]
}

Save it, and we can now deploy our API server!

$ knife prepare ubuntu@11.22.33.44
$ knife cook ubuntu@11.22.33.44

The first command uploads Chef to the server and installs it. The second command uploads our cookbook to the server, and then installs all the software we need.

If everything went well, we should be able ssh to your server, curl localhost (curl localhost), and see an nginx message that the web server was installed.

Opening the Firewall for the HTTP Server

Amazon manages the server’s firewall with Security Groups, found underneath the Network and Security tab on your EC2 Console. If you click the default group here, you will see details pop up in the window below.

Click inbound, and you’ll see the options to create a new rule. Under port range enter 80, which is the default HTTP port. Leave source as 0.0.0.0/0 to allow the entire internet to see your server. Then, click Add Rule below then Apply Rules. You should now be able to go to http://11.22.33.44 from your web browser and see the server you created.

Tomorrow, We Deploy!

Great! Now we’ve configured the web server, and tomorrow we’ll write the fabric scripts to automatically deploy our code to the server. If you want to review again, here are the links to what we’ve accomplished so far:

Part One – Building the API skeleton with YAML
Part Two – From YAML to ElasticSearch
Part Three – Writing Chef Scripts for Amazon Deployment
Part Four – Writing Fabric Scripts for Code Deployment

Feel free to leave any questions or comments below.

As before, the code for everything is already finished and available at github, and so are the Chef scripts.

An Iterative Approach to Building an API – Part 2: Adding ElasticSearch

So, yesterday we built the very first iteration of our Python and Tornado API. We fleshed out all the data with YAML files, and got proper HTTP response codes for calls.

Today, we’ll take our existing data store, and migrate over to ElasticSearch. We’ll do this in the same approach as yesterday, going from step to step, just as you would do in the real world.

Part One – Building the API skeleton with YAML
Part Two – From YAML to ElasticSearch
Part Three – Writing Chef Scripts for Amazon Deployment
Part Four – Writing Fabric Scripts for Code Deployment

Why ElasticSearch?

Elasticsearch is a painless, RESTful frontend to the mature, robust search engine Apache Lucene. It allows for scalability by being distributed, and its installation lets us get up and running very quickly.

Here’s an example of just how easy using ElasticSearch to build and store data is. Once up and running, adding data is as easy as the following:

curl -XPUT http://localhost:9200/twitter/user/kimchy -d '{
    "name" : "Shay Banon"
}'

This command does a PUT request, to add a twitter user kimchy.

ElasticSearch runs on port 9200 by default and can begin taking and saving data right away. There’s no need to build a schema. (But we will.) Let’s get started.

Installing and Testing elasticsearch

As of this writing, the latest elasticsearch release is 0.19.10. You can verify that you’ve got the latest version at the download page of elasticsearch.

wget https://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.19.10.tar.gz
tar zxvf elasticsearch-0.19.10.tar.gz
cd elasticsearch-0.19.10/bin
./elasticsearch

With that, we should have elasticsearch up and running on our machine. Just as a note, when you run elasticsearch, nothing will happen at the command line by default. It will just go into the background and start a daemon. You can verify it’s working with curl:

$ curl http://localhost:9200
{
  "ok" : true,
  "status" : 200,
  "name" : "Power, Jack",
  "version" : {
    "number" : "0.19.10",
    "snapshot_build" : false
  },
  "tagline" : "You Know, for Search"

Nice! One of the easiest installs I’ve ever done. Way to go elasticsearch team!

Finding A Library, Poking Around in iPython

It’s always a good idea to see what your options are in a library. Initially, when I was building this integration I saw pyes, a very well written library, but the code to use it seemed a bit ugly for my tastes.

Luckily, after a bit more searching, I found elasticutils, which is, in my opinion, a much cleaner interface to the very simple elasticsearch server. It always pays to take a few minutes to read the introduction, and example code before deciding on a library. Elasticutils actually uses pyes under the covers.

I find it helps to have an example program open while interactively coding in iPython.

So let’s install those dependencies, and fire up an iPython session:

$ pip install elasticutils
$ ipython
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
Type "copyright", "credits" or "license" for more information.
 
IPython 0.13 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
 
In [1]: from elasticutils import get_es, S  # get_es gets a connection to elasticsearch, S creates a searcher.
 
In [2]: es = get_es(hosts='localhost:9200', default_indexes=['testindex'])   # Default connection, random index name
 
In [3]: mapping = {'companies': {'properties': {'company_name': {'type': 'string'}, 'active': {'type': 'string'}, 'inactive': {'type': 'string'},}}} # Build a doctype named 'companies' with the properties we defined in our data
 
In [4]: es.create_index('testindex', settings={'mappings': mapping}) # Create an index with the data
Out[4]: {u'acknowledged': True, u'ok': True}  
 
In [5]: from blah import getData  # Import our YAML loader
 
In [6]: a = getData()  # Call it and assign it to a
 
In [7]: a  # Verify a has the right value
Out[7]: 
{'homedepot': {'active': [{'25% percent off': 'ZYZZ',
    'Buy One Get One': 'REDDIT'}],
  'company_name': 'homedepot',
  'inactive': [{'0% off': 'DIVIDEBYZERO',
    'Buy None Get A Ton': 'FREEONETONTRUCK'}]},
 'lowes': {'active': [{'1 Free Like': 'INSTAGRAM',
    '50% off fun': 'YCOMBINATOR'}],
  'company_name': 'lowes',
  'inactive': [{'100 Free Likes': 'GTL'}]}}
 
In [8]: for company in a: # Go through each company and index!
  ....:        es.index(a[company], 'testindex', 'companies') 
 
In [9]: es.refresh('testindex') # Commit to the index
Out[9]: {u'_shards': {u'failed': 0, u'successful': 5, u'total': 10}, u'ok': True}
 
In [10]: basic_s = S().indexes('testindex').doctypes('companies').values_dict() # Create our searcher on the index and doctype
 
In [11]: basic_s.filter(company_name='lowes') # Try a filter, and.... success!
Out[11]: [{u'active': [{u'1 Free Like': u'INSTAGRAM', u'50% off fun': u'YCOMBINATOR'}], u'inactive': [{u'100 Free Likes': u'GTL'}], u'company_name': u'lowes'}]

Perfect! Now we’ve seen how to add and query our data. We can copy the majority of this code into a new file, and build everything out with a few minor changes.

From iPython to Production

Let’s put everything we just learned into a new file we can import from our webapp.py file. We’ll make it so we can do an iPython session on this file, and build and load new data. We’ll also pull the getData() function out of the webapp.py file. So create a new file named schema.py:

import glob
import yaml
 
from elasticutils import get_es, S
 
mapping = {'companies': {'properties': {'company_name': {'type': 'string'}, 'active': {'type': 'string'}, 'inactive': {'type': 'string'},}}}
 
es = get_es(hosts='localhost:9200', default_indexes=['dealsindex'])
 
def getData():
    data = {}
    a = glob.iglob("data/*.yaml") # Loads all the yaml files in the                                                                                                                                         
# data directory                                                                                                                                                                                            
    for file in a:
        b = open(file)
        c = yaml.load(b)
        data.update({c['company_name']: c}) # Takes the company_name and uses it as the key for lookups in dictionary
        b.close()
    return data
 
def create_and_insert():                                                                                                                                                                                    
    es.delete_index_if_exists('dealsindex')
    es.create_index('dealsindex', settings={'mapping': mapping})
 
    companies = get_data_from_yaml()
 
    for company in companies:
        es.index(companies[company],'dealsindex','companies')
 
    es.refresh('dealsindex')
 
def get_company(companyname):
    basic_s = S().indexes('dealsindex').doctypes('companies').values_dict() # Build the searcher
 
    return basic_s.filter(company_name=companyname) # Filter by company name

Now we can load up another iPython session, import schema, and then verify everything works. Finally, it’s time to hook our new engine into our webapp.py file.

Incorporating ElasticSearch in our Webapp

If you run an iPython shell, and try searching for a non-existant company, you’ll see that elasticsearch returns an empty list. We can use this to verify that companies actually exist. Our final code, with the new schema.py backend looks like the following:

# Run with:                                                                                                                                                                                                                                                                                                                                                                                                            
#   $ gunicorn -k egg:gunicorn#tornado webapp:app 
import schema
from tornado.web import Application, RequestHandler, HTTPError
 
def getAPIDescription():
    a = open("APIDescription.yaml")
    return yaml.load(a)
 
apiDescription = getAPIDescription()
allowableOptions = apiDescription['merchantapi']['options']
 
class MainHandler(RequestHandler):
    def get(self):
        self.write("Hello, world")
 
class APIHandler(RequestHandler):                                                                                                                                                                           
    def get(self):                                                                                                                                                                                          
        self.write(apiDescription)                                                                                                                                                                          
 
class DealsHandler(RequestHandler):
    def get_key_or_error(self, arguments, key):
        if (key in arguments.keys()) and (arguments[key][0] in allowableOptions):
            return arguments['status'][0]
        raise HTTPError(400)
 
    def get(self, merchant_name):
        status = self.get_key_or_error(self.request.arguments, 'status')
        merchant = schema.get_company(merchant_name)
 
        if merchant:
            self.write(unicode(merchant[0][status]))
        else:
            raise HTTPError(404)
 
    def post(self, merchant_name):
        raise HTTPError(403)
 
    def delete(self, merchant_name):
        raise HTTPError(403)
 
    def put(self, merchant_name):
        raise HTTPError(403)
 
app = Application([
    (r"/", MainHandler),
    (r"/v1/", APIHandler),
    (r"/v1/(.*)/deals", DealsHandler)
])

Verify Changes with nosetests, and commit
Save this file, then verify our API works by doing nosetests. If everything goes well, add schema.py to git, and commit!

$ nosetests
$ git add schema.py
$ git commit

Alright! We’ve built a complete (basic) API from scratch, incorporating an elasticsearch backend! That’s quite the accomplishment! Next up, we’ll write the scripts to deploy our API to the cloud, using Chef and Amazon EC2. Again, if you want to skip ahead, code is available at github, and so are the Chef scripts.

If you need to go back, or didn’t understand something, let me know below, or read the previous post.

Part One – Building the API skeleton with YAML
Part Two – From YAML to ElasticSearch
Part Three – Writing Chef Scripts for Amazon Deployment
Part Four – Writing Fabric Scripts for Code Deployment