Está en la página 1de 40

a

Crash Course
in

MongoDB

PyCon US 2013

Andy Dirnberger
github.com/dirn @dirnonline Engineering @ CBS Local

hi. Im

dirn@dirnonline.com

So what is

? MongoDB
http://mongodb.org

MongoDB is...

Document-oriented JSON-like (BSON) Dynamic schema* Scalable Open Source (GNU AGPL v3.0)**
*not the same thing as schemaless **drivers use the Apache license

MongoDB can be used for...

Metrics Logging* Messaging Queues Blog Content Management Anything you want
*Capped collections behave as xed-sized FIFO queues *TTL collections have a special index that will automatically remove old data

To run MongoDB...

Download it: or install it: Run it:


$ mongod $ mongod --dbpath /var/lib/mongodb/ $ mongod --fork http://mongodb.org/downloads $ sudo apt-get install mongodb $ brew install mongodb

http://docs.mongodb.org/manual/tutorial/manage-mongodb-processes/

PyMongo MongoDB
using with

Python

https://github.com/mongodb/mongo-python-driver

The driver...

Install it:
$ pip install pymongo

Packages:
pymongo bson gridfs

http://api.mongodb.org/python/current/

BSON supports...

int float basestring list dict datetime.datetime

http://bsonspec.org/

Object IDs are made of...

50d4dce70ea5fae6fb84e44b

4-byte timestamp (50d4dce7) 3-byte machine identier (0ea5fa) 2-byte process ID (e6fb) 3-byte counter (84e44b)

Connect with MongoClient >>> from pymongo import MongoClient >>> >>> MongoClient(host='localhost', port=27017) MongoClient('localhost', 27017) >>> >>> MongoClient(host='mongodb://localhost:27017') MongoClient('localhost', 27017) >>> >>> MongoClient('mongodb://localhost:27017').pycon Database(MongoClient('localhost', 27017), u'pycon')

Querying

Documents can be retrieved with... >>> coll = db.talks >>> coll.find_one({ 'name': 'A Crash Course in MongoDB'}) { u'track': 2, u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'speaker': u'Andy Dirnberger', u'name': u'A Crash Course in MongoDB', u'language': u'python', u'time': datetime.datetime(2013, 3, 17, 14, 30) }

Documents can be retrieved with...

>>> coll.find({ 'track': 2, 'time': {'$gte': datetime(2013, 3, 17), '$lt': datetime(2013, 3, 18)}}, {'name': 1}) <pymongo.cursor.Cursor object at 0x10da4ed90>

http://docs.mongodb.org/manual/reference/operators/#query-selectors

Whats in the cursor?

>>> for doc in cursor: ... print doc ... {u'_id': ObjectId('5145e4f00ea5fa321fa97062'), u'name': u'Elasticsearch (Part 2)'} {u'_id': ObjectId('5145e5200ea5fa321fa97063'), u'name': u'Going beyond the Django ORM'} {u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'name': u'A Crash Course in MongoDB'}

http://api.mongodb.org/python/current/api/pymongo/cursor.html

Updating

Documents can be removed with...

>>> coll.remove({'language': 'ruby'}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Documents can be removed with...

>>> coll.remove({ 'language': {'$in': ['php', 'node.js']}}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Documents can be removed with...

>>> coll.remove({'language': {'$ne': 'python'}}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }

Documents can be inserted with...

>>> db.tracks.insert({ 'number': 2, 'room': 'Grand Ballroom CD'}) ObjectId('5145eb4e0ea5fa321fa97065')

Documents can be inserted with... >>> db.sessions.update( {'track': 2}, {'track': 2, 'date': datetime(2013, 3, 17), 'order': 1, 'chair': 'Megan Speir', 'runner': 'Erik Bray'}, upsert=True) { ... u'upserted': ObjectId('5145ecfd3f69a773554253e8'), u'n': 1, u'updatedExisting': False }

A couple of other methods...

save()
Works like update(..., upsert=True) if _id is specied, insert() if its not

find_and_modify()
Modies the document in the database, returns the original by default, the updated with new=True

A note about update() >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'num_talks': 3}) {...} >>> >>> # The document has been replaced >>> db.sessions.find_one({ '_id': ObjectId('5145ecfd3f69a773554253e8')}) { u'_id': ObjectId('5145ecfd3f69a773554253e8'), u'num_talks': 3 }

Using update operators to target specic elds... >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'$set': {'num_talks': 3}}) { u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1 }

http://docs.mongodb.org/manual/reference/operators/#update

Write concern...

w
The number of servers that must acknowledge the write, including the primary

wtimeout
The timeout for the write, without it the write could block forever

http://docs.mongodb.org/manual/core/write-operations/#write-concern

Write concern...

is turned on by default in MongoClient

Indexes

You can create an index with...

create_index()
Unconditionally creates an index on one or more elds

ensure_index()
Works like create_index() except the driver will remember that the index was already made

Indexes...

Are directional
>>> db.sessions.ensure_index([ ('date', pymongo.ASCENDING), ('order', pymongo.DESCENDING)]) u'date_1_order_-1'

Can be sparse
Only documents containing all elds in the index will be included in the index

Explain plans... { 'cursor' : '<Cursor Type and Index>', 'n' : <num (documents matching query)>, 'nscanned': <num (documents scanned)>, 'scanAndOrder': <boolean>, } You want n and nscanned to be as close together as possible If scanAndOrder is True, the index cant be used for sorting
http://docs.mongodb.org/manual/reference/explain/

GridFS

Storing les with GridFS...

Files are stored in chunks 4MB of RAM Replication and Sharing

http://docs.mongodb.org/manual/applications/gridfs/

To use GridFS... >>> import gridfs >>> fs = gridfs.GridFS(db) >>> file_id = fs.put('PyCon 2013', city='Santa Clara', state='CA') >>> file = fs.get(file_id) >>> file.read() 'PyCon 2013' >>> file.upload_date datetime.datetime(2013, 3, 17, 21, 30, 0, 0) >>> file.city, file.state (u'Santa Clara', u'CA')

GridFS is versioned...

get_last_version()
Gets the most recent le matching the query

get_version()
Works like get_last_version() except it can request specic versions of a le

Geospatial

Create an index...

>>> db.tracks.update( {'_id': ObjectId('5145eb4e0ea5fa321fa97065')}, {'loc': [37.3542, 121.9542]}) {...} >>> db.tracks.ensure_index([ ('loc', pymongo.GEO2D)]) u'loc_2d'

http://docs.mongodb.org/manual/applications/geospatial-indexes/

Query, query, query...

>>> db.tracks.find({'loc': [37.3542, 121.9542]}) <pymongo.cursor.Cursor object at 0x10e14eb90> >>> db.tracks.find({ 'loc': {'$near': [37.3542, 121.9542]}}) <pymongo.cursor.Cursor object at 0x10e14edd0>

You can query $within shapes...

{'$center': [center, radius]} {'$box': [[x1, y1], [x2, y2]]} {'$polygon': [[x1, y1], [x2, y2],

[x3, y3]]}

Anything else...

Aggregation Framework
Helps with simple map reduce queries, but is subject to the same 16MB as documents

Libraries
http://api.mongodb.org/python/current/tools.html

Thank you!
dirn.it/PyCon2013

Questions?