So, yesterday we built the very first iteration of our Python and Tornado API. We fleshed out all the data with YAML files, and got proper HTTP response codes for calls.
Today, we’ll take our existing data store, and migrate over to ElasticSearch. We’ll do this in the same approach as yesterday, going from step to step, just as you would do in the real world.
Part One – Building the API skeleton with YAML
Part Two – From YAML to ElasticSearch
Part Three – Writing Chef Scripts for Amazon Deployment
Part Four – Writing Fabric Scripts for Code Deployment
Why ElasticSearch?
Elasticsearch is a painless, RESTful frontend to the mature, robust search engine Apache Lucene. It allows for scalability by being distributed, and its installation lets us get up and running very quickly.
Here’s an example of just how easy using ElasticSearch to build and store data is. Once up and running, adding data is as easy as the following:
curl -XPUT http://localhost:9200/twitter/user/kimchy -d '{ "name" : "Shay Banon" }' |
This command does a PUT request, to add a twitter user kimchy.
ElasticSearch runs on port 9200 by default and can begin taking and saving data right away. There’s no need to build a schema. (But we will.) Let’s get started.
Installing and Testing elasticsearch
As of this writing, the latest elasticsearch release is 0.19.10. You can verify that you’ve got the latest version at the download page of elasticsearch.
wget https://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.19.10.tar.gz tar zxvf elasticsearch-0.19.10.tar.gz cd elasticsearch-0.19.10/bin ./elasticsearch |
With that, we should have elasticsearch up and running on our machine. Just as a note, when you run elasticsearch, nothing will happen at the command line by default. It will just go into the background and start a daemon. You can verify it’s working with curl:
$ curl http://localhost:9200 { "ok" : true, "status" : 200, "name" : "Power, Jack", "version" : { "number" : "0.19.10", "snapshot_build" : false }, "tagline" : "You Know, for Search" |
Nice! One of the easiest installs I’ve ever done. Way to go elasticsearch team!
Finding A Library, Poking Around in iPython
It’s always a good idea to see what your options are in a library. Initially, when I was building this integration I saw pyes, a very well written library, but the code to use it seemed a bit ugly for my tastes.
Luckily, after a bit more searching, I found elasticutils, which is, in my opinion, a much cleaner interface to the very simple elasticsearch server. It always pays to take a few minutes to read the introduction, and example code before deciding on a library. Elasticutils actually uses pyes under the covers.
I find it helps to have an example program open while interactively coding in iPython.
So let’s install those dependencies, and fire up an iPython session:
$ pip install elasticutils $ ipython Python 2.7.3 (default, Apr 20 2012, 22:39:59) Type "copyright", "credits" or "license" for more information. IPython 0.13 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: from elasticutils import get_es, S # get_es gets a connection to elasticsearch, S creates a searcher. In [2]: es = get_es(hosts='localhost:9200', default_indexes=['testindex']) # Default connection, random index name In [3]: mapping = {'companies': {'properties': {'company_name': {'type': 'string'}, 'active': {'type': 'string'}, 'inactive': {'type': 'string'},}}} # Build a doctype named 'companies' with the properties we defined in our data In [4]: es.create_index('testindex', settings={'mappings': mapping}) # Create an index with the data Out[4]: {u'acknowledged': True, u'ok': True} In [5]: from blah import getData # Import our YAML loader In [6]: a = getData() # Call it and assign it to a In [7]: a # Verify a has the right value Out[7]: {'homedepot': {'active': [{'25% percent off': 'ZYZZ', 'Buy One Get One': 'REDDIT'}], 'company_name': 'homedepot', 'inactive': [{'0% off': 'DIVIDEBYZERO', 'Buy None Get A Ton': 'FREEONETONTRUCK'}]}, 'lowes': {'active': [{'1 Free Like': 'INSTAGRAM', '50% off fun': 'YCOMBINATOR'}], 'company_name': 'lowes', 'inactive': [{'100 Free Likes': 'GTL'}]}} In [8]: for company in a: # Go through each company and index! ....: es.index(a[company], 'testindex', 'companies') In [9]: es.refresh('testindex') # Commit to the index Out[9]: {u'_shards': {u'failed': 0, u'successful': 5, u'total': 10}, u'ok': True} In [10]: basic_s = S().indexes('testindex').doctypes('companies').values_dict() # Create our searcher on the index and doctype In [11]: basic_s.filter(company_name='lowes') # Try a filter, and.... success! Out[11]: [{u'active': [{u'1 Free Like': u'INSTAGRAM', u'50% off fun': u'YCOMBINATOR'}], u'inactive': [{u'100 Free Likes': u'GTL'}], u'company_name': u'lowes'}] |
Perfect! Now we’ve seen how to add and query our data. We can copy the majority of this code into a new file, and build everything out with a few minor changes.
From iPython to Production
Let’s put everything we just learned into a new file we can import from our webapp.py file. We’ll make it so we can do an iPython session on this file, and build and load new data. We’ll also pull the getData() function out of the webapp.py file. So create a new file named schema.py:
import glob import yaml from elasticutils import get_es, S mapping = {'companies': {'properties': {'company_name': {'type': 'string'}, 'active': {'type': 'string'}, 'inactive': {'type': 'string'},}}} es = get_es(hosts='localhost:9200', default_indexes=['dealsindex']) def getData(): data = {} a = glob.iglob("data/*.yaml") # Loads all the yaml files in the # data directory for file in a: b = open(file) c = yaml.load(b) data.update({c['company_name']: c}) # Takes the company_name and uses it as the key for lookups in dictionary b.close() return data def create_and_insert(): es.delete_index_if_exists('dealsindex') es.create_index('dealsindex', settings={'mapping': mapping}) companies = get_data_from_yaml() for company in companies: es.index(companies[company],'dealsindex','companies') es.refresh('dealsindex') def get_company(companyname): basic_s = S().indexes('dealsindex').doctypes('companies').values_dict() # Build the searcher return basic_s.filter(company_name=companyname) # Filter by company name |
Now we can load up another iPython session, import schema, and then verify everything works. Finally, it’s time to hook our new engine into our webapp.py file.
Incorporating ElasticSearch in our Webapp
If you run an iPython shell, and try searching for a non-existant company, you’ll see that elasticsearch returns an empty list. We can use this to verify that companies actually exist. Our final code, with the new schema.py backend looks like the following:
# Run with: # $ gunicorn -k egg:gunicorn#tornado webapp:app import schema from tornado.web import Application, RequestHandler, HTTPError def getAPIDescription(): a = open("APIDescription.yaml") return yaml.load(a) apiDescription = getAPIDescription() allowableOptions = apiDescription['merchantapi']['options'] class MainHandler(RequestHandler): def get(self): self.write("Hello, world") class APIHandler(RequestHandler): def get(self): self.write(apiDescription) class DealsHandler(RequestHandler): def get_key_or_error(self, arguments, key): if (key in arguments.keys()) and (arguments[key][0] in allowableOptions): return arguments['status'][0] raise HTTPError(400) def get(self, merchant_name): status = self.get_key_or_error(self.request.arguments, 'status') merchant = schema.get_company(merchant_name) if merchant: self.write(unicode(merchant[0][status])) else: raise HTTPError(404) def post(self, merchant_name): raise HTTPError(403) def delete(self, merchant_name): raise HTTPError(403) def put(self, merchant_name): raise HTTPError(403) app = Application([ (r"/", MainHandler), (r"/v1/", APIHandler), (r"/v1/(.*)/deals", DealsHandler) ]) |
Verify Changes with nosetests, and commit
Save this file, then verify our API works by doing nosetests. If everything goes well, add schema.py to git, and commit!
$ nosetests $ git add schema.py $ git commit |
Alright! We’ve built a complete (basic) API from scratch, incorporating an elasticsearch backend! That’s quite the accomplishment! Next up, we’ll write the scripts to deploy our API to the cloud, using Chef and Amazon EC2. Again, if you want to skip ahead, code is available at github, and so are the Chef scripts.
If you need to go back, or didn’t understand something, let me know below, or read the previous post.
Part One – Building the API skeleton with YAML
Part Two – From YAML to ElasticSearch
Part Three – Writing Chef Scripts for Amazon Deployment
Part Four – Writing Fabric Scripts for Code Deployment