Está en la página 1de 40


Crash Course


PyCon US 2013

Andy Dirnberger @dirnonline Engineering @ CBS Local

hi. Im

So what is

? MongoDB

MongoDB is...

Document-oriented JSON-like (BSON) Dynamic schema* Scalable Open Source (GNU AGPL v3.0)**
*not the same thing as schemaless **drivers use the Apache license

MongoDB can be used for...

Metrics Logging* Messaging Queues Blog Content Management Anything you want
*Capped collections behave as xed-sized FIFO queues *TTL collections have a special index that will automatically remove old data

To run MongoDB...

Download it: or install it: Run it:

$ mongod $ mongod --dbpath /var/lib/mongodb/ $ mongod --fork $ sudo apt-get install mongodb $ brew install mongodb

PyMongo MongoDB
using with


The driver...

Install it:
$ pip install pymongo

pymongo bson gridfs

BSON supports...

int float basestring list dict datetime.datetime

Object IDs are made of...


4-byte timestamp (50d4dce7) 3-byte machine identier (0ea5fa) 2-byte process ID (e6fb) 3-byte counter (84e44b)

Connect with MongoClient >>> from pymongo import MongoClient >>> >>> MongoClient(host='localhost', port=27017) MongoClient('localhost', 27017) >>> >>> MongoClient(host='mongodb://localhost:27017') MongoClient('localhost', 27017) >>> >>> MongoClient('mongodb://localhost:27017').pycon Database(MongoClient('localhost', 27017), u'pycon')


Documents can be retrieved with... >>> coll = db.talks >>> coll.find_one({ 'name': 'A Crash Course in MongoDB'}) { u'track': 2, u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'speaker': u'Andy Dirnberger', u'name': u'A Crash Course in MongoDB', u'language': u'python', u'time': datetime.datetime(2013, 3, 17, 14, 30) }

Documents can be retrieved with...

>>> coll.find({ 'track': 2, 'time': {'$gte': datetime(2013, 3, 17), '$lt': datetime(2013, 3, 18)}}, {'name': 1}) <pymongo.cursor.Cursor object at 0x10da4ed90>

Whats in the cursor?

>>> for doc in cursor: ... print doc ... {u'_id': ObjectId('5145e4f00ea5fa321fa97062'), u'name': u'Elasticsearch (Part 2)'} {u'_id': ObjectId('5145e5200ea5fa321fa97063'), u'name': u'Going beyond the Django ORM'} {u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'name': u'A Crash Course in MongoDB'}


Documents can be removed with...

>>> coll.remove({'language': 'ruby'}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Documents can be removed with...

>>> coll.remove({ 'language': {'$in': ['php', 'node.js']}}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Documents can be removed with...

>>> coll.remove({'language': {'$ne': 'python'}}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Documents can be inserted with...

>>> db.tracks.insert({ 'number': 2, 'room': 'Grand Ballroom CD'}) ObjectId('5145eb4e0ea5fa321fa97065')

Documents can be inserted with... >>> db.sessions.update( {'track': 2}, {'track': 2, 'date': datetime(2013, 3, 17), 'order': 1, 'chair': 'Megan Speir', 'runner': 'Erik Bray'}, upsert=True) { ... u'upserted': ObjectId('5145ecfd3f69a773554253e8'), u'n': 1, u'updatedExisting': False }

A couple of other methods...

Works like update(..., upsert=True) if _id is specied, insert() if its not

Modies the document in the database, returns the original by default, the updated with new=True

A note about update() >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'num_talks': 3}) {...} >>> >>> # The document has been replaced >>> db.sessions.find_one({ '_id': ObjectId('5145ecfd3f69a773554253e8')}) { u'_id': ObjectId('5145ecfd3f69a773554253e8'), u'num_talks': 3 }

Using update operators to target specic elds... >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'$set': {'num_talks': 3}}) { u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1 }

Write concern...

The number of servers that must acknowledge the write, including the primary

The timeout for the write, without it the write could block forever

Write concern...

is turned on by default in MongoClient


You can create an index with...

Unconditionally creates an index on one or more elds

Works like create_index() except the driver will remember that the index was already made


Are directional
>>> db.sessions.ensure_index([ ('date', pymongo.ASCENDING), ('order', pymongo.DESCENDING)]) u'date_1_order_-1'

Can be sparse
Only documents containing all elds in the index will be included in the index

Explain plans... { 'cursor' : '<Cursor Type and Index>', 'n' : <num (documents matching query)>, 'nscanned': <num (documents scanned)>, 'scanAndOrder': <boolean>, } You want n and nscanned to be as close together as possible If scanAndOrder is True, the index cant be used for sorting


Storing les with GridFS...

Files are stored in chunks 4MB of RAM Replication and Sharing

To use GridFS... >>> import gridfs >>> fs = gridfs.GridFS(db) >>> file_id = fs.put('PyCon 2013', city='Santa Clara', state='CA') >>> file = fs.get(file_id) >>> 'PyCon 2013' >>> file.upload_date datetime.datetime(2013, 3, 17, 21, 30, 0, 0) >>>, file.state (u'Santa Clara', u'CA')

GridFS is versioned...

Gets the most recent le matching the query

Works like get_last_version() except it can request specic versions of a le


Create an index...

>>> db.tracks.update( {'_id': ObjectId('5145eb4e0ea5fa321fa97065')}, {'loc': [37.3542, 121.9542]}) {...} >>> db.tracks.ensure_index([ ('loc', pymongo.GEO2D)]) u'loc_2d'

Query, query, query...

>>> db.tracks.find({'loc': [37.3542, 121.9542]}) <pymongo.cursor.Cursor object at 0x10e14eb90> >>> db.tracks.find({ 'loc': {'$near': [37.3542, 121.9542]}}) <pymongo.cursor.Cursor object at 0x10e14edd0>

You can query $within shapes...

{'$center': [center, radius]} {'$box': [[x1, y1], [x2, y2]]} {'$polygon': [[x1, y1], [x2, y2],

[x3, y3]]}

Anything else...

Aggregation Framework
Helps with simple map reduce queries, but is subject to the same 16MB as documents


Thank you!