An Iterative Approach to Building an API – Part 2: Adding ElasticSearch

So, yesterday we built the very first iteration of our Python and Tornado API. We fleshed out all the data with YAML files, and got proper HTTP response codes for calls.

Today, we’ll take our existing data store, and migrate over to ElasticSearch. We’ll do this in the same approach as yesterday, going from step to step, just as you would do in the real world.

Part One – Building the API skeleton with YAML
Part Two – From YAML to ElasticSearch
Part Three – Writing Chef Scripts for Amazon Deployment
Part Four – Writing Fabric Scripts for Code Deployment

Why ElasticSearch?

Elasticsearch is a painless, RESTful frontend to the mature, robust search engine Apache Lucene. It allows for scalability by being distributed, and its installation lets us get up and running very quickly.

Here’s an example of just how easy using ElasticSearch to build and store data is. Once up and running, adding data is as easy as the following:

curl -XPUT http://localhost:9200/twitter/user/kimchy -d '{
    "name" : "Shay Banon"
}'

This command does a PUT request, to add a twitter user kimchy.

ElasticSearch runs on port 9200 by default and can begin taking and saving data right away. There’s no need to build a schema. (But we will.) Let’s get started.

Installing and Testing elasticsearch

As of this writing, the latest elasticsearch release is 0.19.10. You can verify that you’ve got the latest version at the download page of elasticsearch.

wget https://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.19.10.tar.gz
tar zxvf elasticsearch-0.19.10.tar.gz
cd elasticsearch-0.19.10/bin
./elasticsearch

With that, we should have elasticsearch up and running on our machine. Just as a note, when you run elasticsearch, nothing will happen at the command line by default. It will just go into the background and start a daemon. You can verify it’s working with curl:

$ curl http://localhost:9200
{
  "ok" : true,
  "status" : 200,
  "name" : "Power, Jack",
  "version" : {
    "number" : "0.19.10",
    "snapshot_build" : false
  },
  "tagline" : "You Know, for Search"

Nice! One of the easiest installs I’ve ever done. Way to go elasticsearch team!

Finding A Library, Poking Around in iPython

It’s always a good idea to see what your options are in a library. Initially, when I was building this integration I saw pyes, a very well written library, but the code to use it seemed a bit ugly for my tastes.

Luckily, after a bit more searching, I found elasticutils, which is, in my opinion, a much cleaner interface to the very simple elasticsearch server. It always pays to take a few minutes to read the introduction, and example code before deciding on a library. Elasticutils actually uses pyes under the covers.

I find it helps to have an example program open while interactively coding in iPython.

So let’s install those dependencies, and fire up an iPython session:

$ pip install elasticutils
$ ipython
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
Type "copyright", "credits" or "license" for more information.
 
IPython 0.13 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
 
In [1]: from elasticutils import get_es, S  # get_es gets a connection to elasticsearch, S creates a searcher.
 
In [2]: es = get_es(hosts='localhost:9200', default_indexes=['testindex'])   # Default connection, random index name
 
In [3]: mapping = {'companies': {'properties': {'company_name': {'type': 'string'}, 'active': {'type': 'string'}, 'inactive': {'type': 'string'},}}} # Build a doctype named 'companies' with the properties we defined in our data
 
In [4]: es.create_index('testindex', settings={'mappings': mapping}) # Create an index with the data
Out[4]: {u'acknowledged': True, u'ok': True}  
 
In [5]: from blah import getData  # Import our YAML loader
 
In [6]: a = getData()  # Call it and assign it to a
 
In [7]: a  # Verify a has the right value
Out[7]: 
{'homedepot': {'active': [{'25% percent off': 'ZYZZ',
    'Buy One Get One': 'REDDIT'}],
  'company_name': 'homedepot',
  'inactive': [{'0% off': 'DIVIDEBYZERO',
    'Buy None Get A Ton': 'FREEONETONTRUCK'}]},
 'lowes': {'active': [{'1 Free Like': 'INSTAGRAM',
    '50% off fun': 'YCOMBINATOR'}],
  'company_name': 'lowes',
  'inactive': [{'100 Free Likes': 'GTL'}]}}
 
In [8]: for company in a: # Go through each company and index!
  ....:        es.index(a[company], 'testindex', 'companies') 
 
In [9]: es.refresh('testindex') # Commit to the index
Out[9]: {u'_shards': {u'failed': 0, u'successful': 5, u'total': 10}, u'ok': True}
 
In [10]: basic_s = S().indexes('testindex').doctypes('companies').values_dict() # Create our searcher on the index and doctype
 
In [11]: basic_s.filter(company_name='lowes') # Try a filter, and.... success!
Out[11]: [{u'active': [{u'1 Free Like': u'INSTAGRAM', u'50% off fun': u'YCOMBINATOR'}], u'inactive': [{u'100 Free Likes': u'GTL'}], u'company_name': u'lowes'}]

$ pip install elasticutils $ ipython Python 2.7.3 (default, Apr 20 2012, 22:39:59) Type "copyright", "credits" or "license" for more information. IPython 0.13 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: from elasticutils import get_es, S # get_es gets a connection to elasticsearch, S creates a searcher. In [2]: es = get_es(hosts='localhost:9200', default_indexes=['testindex']) # Default connection, random index name In [3]: mapping = {'companies': {'properties': {'company_name': {'type': 'string'}, 'active': {'type': 'string'}, 'inactive': {'type': 'string'},}}} # Build a doctype named 'companies' with the properties we defined in our data In [4]: es.create_index('testindex', settings={'mappings': mapping}) # Create an index with the data Out[4]: {u'acknowledged': True, u'ok': True} In [5]: from blah import getData # Import our YAML loader In [6]: a = getData() # Call it and assign it to a In [7]: a # Verify a has the right value Out[7]: {'homedepot': {'active': [{'25% percent off': 'ZYZZ', 'Buy One Get One': 'REDDIT'}], 'company_name': 'homedepot', 'inactive': [{'0% off': 'DIVIDEBYZERO', 'Buy None Get A Ton': 'FREEONETONTRUCK'}]}, 'lowes': {'active': [{'1 Free Like': 'INSTAGRAM', '50% off fun': 'YCOMBINATOR'}], 'company_name': 'lowes', 'inactive': [{'100 Free Likes': 'GTL'}]}} In [8]: for company in a: # Go through each company and index! ....: es.index(a[company], 'testindex', 'companies') In [9]: es.refresh('testindex') # Commit to the index Out[9]: {u'_shards': {u'failed': 0, u'successful': 5, u'total': 10}, u'ok': True} In [10]: basic_s = S().indexes('testindex').doctypes('companies').values_dict() # Create our searcher on the index and doctype In [11]: basic_s.filter(company_name='lowes') # Try a filter, and.... success! Out[11]: [{u'active': [{u'1 Free Like': u'INSTAGRAM', u'50% off fun': u'YCOMBINATOR'}], u'inactive': [{u'100 Free Likes': u'GTL'}], u'company_name': u'lowes'}]

Perfect! Now we’ve seen how to add and query our data. We can copy the majority of this code into a new file, and build everything out with a few minor changes.

From iPython to Production

Let’s put everything we just learned into a new file we can import from our webapp.py file. We’ll make it so we can do an iPython session on this file, and build and load new data. We’ll also pull the getData() function out of the webapp.py file. So create a new file named schema.py:

import glob
import yaml
 
from elasticutils import get_es, S
 
mapping = {'companies': {'properties': {'company_name': {'type': 'string'}, 'active': {'type': 'string'}, 'inactive': {'type': 'string'},}}}
 
es = get_es(hosts='localhost:9200', default_indexes=['dealsindex'])
 
def getData():
    data = {}
    a = glob.iglob("data/*.yaml") # Loads all the yaml files in the                                                                                                                                         
# data directory                                                                                                                                                                                            
    for file in a:
        b = open(file)
        c = yaml.load(b)
        data.update({c['company_name']: c}) # Takes the company_name and uses it as the key for lookups in dictionary
        b.close()
    return data
 
def create_and_insert():                                                                                                                                                                                    
    es.delete_index_if_exists('dealsindex')
    es.create_index('dealsindex', settings={'mapping': mapping})
 
    companies = get_data_from_yaml()
 
    for company in companies:
        es.index(companies[company],'dealsindex','companies')
 
    es.refresh('dealsindex')
 
def get_company(companyname):
    basic_s = S().indexes('dealsindex').doctypes('companies').values_dict() # Build the searcher
 
    return basic_s.filter(company_name=companyname) # Filter by company name

Now we can load up another iPython session, import schema, and then verify everything works. Finally, it’s time to hook our new engine into our webapp.py file.

Incorporating ElasticSearch in our Webapp

If you run an iPython shell, and try searching for a non-existant company, you’ll see that elasticsearch returns an empty list. We can use this to verify that companies actually exist. Our final code, with the new schema.py backend looks like the following:

# Run with:                                                                                                                                                                                                                                                                                                                                                                                                            
#   $ gunicorn -k egg:gunicorn#tornado webapp:app 
import schema
from tornado.web import Application, RequestHandler, HTTPError
 
def getAPIDescription():
    a = open("APIDescription.yaml")
    return yaml.load(a)
 
apiDescription = getAPIDescription()
allowableOptions = apiDescription['merchantapi']['options']
 
class MainHandler(RequestHandler):
    def get(self):
        self.write("Hello, world")
 
class APIHandler(RequestHandler):                                                                                                                                                                           
    def get(self):                                                                                                                                                                                          
        self.write(apiDescription)                                                                                                                                                                          
 
class DealsHandler(RequestHandler):
    def get_key_or_error(self, arguments, key):
        if (key in arguments.keys()) and (arguments[key][0] in allowableOptions):
            return arguments['status'][0]
        raise HTTPError(400)
 
    def get(self, merchant_name):
        status = self.get_key_or_error(self.request.arguments, 'status')
        merchant = schema.get_company(merchant_name)
 
        if merchant:
            self.write(unicode(merchant[0][status]))
        else:
            raise HTTPError(404)
 
    def post(self, merchant_name):
        raise HTTPError(403)
 
    def delete(self, merchant_name):
        raise HTTPError(403)
 
    def put(self, merchant_name):
        raise HTTPError(403)
 
app = Application([
    (r"/", MainHandler),
    (r"/v1/", APIHandler),
    (r"/v1/(.*)/deals", DealsHandler)
])

# Run with: # $ gunicorn -k egg:gunicorn#tornado webapp:app import schema from tornado.web import Application, RequestHandler, HTTPError def getAPIDescription(): a = open("APIDescription.yaml") return yaml.load(a) apiDescription = getAPIDescription() allowableOptions = apiDescription['merchantapi']['options'] class MainHandler(RequestHandler): def get(self): self.write("Hello, world") class APIHandler(RequestHandler): def get(self): self.write(apiDescription) class DealsHandler(RequestHandler): def get_key_or_error(self, arguments, key): if (key in arguments.keys()) and (arguments[key][0] in allowableOptions): return arguments['status'][0] raise HTTPError(400) def get(self, merchant_name): status = self.get_key_or_error(self.request.arguments, 'status') merchant = schema.get_company(merchant_name) if merchant: self.write(unicode(merchant[0][status])) else: raise HTTPError(404) def post(self, merchant_name): raise HTTPError(403) def delete(self, merchant_name): raise HTTPError(403) def put(self, merchant_name): raise HTTPError(403) app = Application([ (r"/", MainHandler), (r"/v1/", APIHandler), (r"/v1/(.*)/deals", DealsHandler) ])

Verify Changes with nosetests, and commit
Save this file, then verify our API works by doing nosetests. If everything goes well, add schema.py to git, and commit!

$ nosetests
$ git add schema.py
$ git commit

Alright! We’ve built a complete (basic) API from scratch, incorporating an elasticsearch backend! That’s quite the accomplishment! Next up, we’ll write the scripts to deploy our API to the cloud, using Chef and Amazon EC2. Again, if you want to skip ahead, code is available at github, and so are the Chef scripts.

If you need to go back, or didn’t understand something, let me know below, or read the previous post.

4 thoughts on “An Iterative Approach to Building an API – Part 2: Adding ElasticSearch”

sebastian ortiz on October 20, 2012 at 2:43 am said:

where is “blah” coming from in the line?:
from blah import getData

during your ipython session? As far as I can tell blah was not defined anywhere earlier.

Reply ↓
sebastian ortiz on October 20, 2012 at 2:54 am said:

Is get_data_from_yaml() supposed to just be getData() in this line from schema.py?:
companies = get_data_from_yaml()

Reply ↓
sebastian ortiz on October 20, 2012 at 3:19 am said:

Might also be worth clarifying that since your ipython session referenced ‘testindex’ but the code in schema.py references ‘dealsindex’ you need to run create_and_insert() at some point for things to work.

Reply ↓
Dan Osipov on February 16, 2014 at 5:39 pm said:

Hosts field in elasticutils is deprectated, the correct connection line should be:
es = get_es(urls=’http://localhost:9200′)

Reply ↓

kpkaiser.com

Art and Code

An Iterative Approach to Building an API – Part 2: Adding ElasticSearch

4 thoughts on “An Iterative Approach to Building an API – Part 2: Adding ElasticSearch”

Leave a Reply to sebastian ortiz Cancel reply