Documentos de Académico
Documentos de Profesional
Documentos de Cultura
alvin@10gen.com
Topics
Overview
Document Design
Modeling the “real world”
Replication & Sharding
Developing with MongoDB
Deployment
Drinking from the fire hose
Part One
MongoDB Overview
MongoDB is the leading database
for cloud deployment
3 Reason
- Performance
- Large number of readers / writers
- Large data volume
- Agility (ease of development)
NoSQL Really
Means:
non-‐relational,
next-‐generation
operational
datastores
and
databases
RDBMS
(Oracle,
MySQL)
past : one-size-fits-all
RDBMS
(Oracle,
MySQL)
New Gen.
OLAP
(vertica,
aster,
greenplum)
future
we claim nosql segment will be:
* large
* not fragmented
* ‘platformitize-able’
Philosophy:
maximize
features
-‐
up
to
the
“knee”
in
the
curve,
then
stop
• memcached
• RDBMS
Horizontally Scalable
Architectures
no
joins
+ no
complex
transactions
http://www.flickr.com/photos/42304632@N00/493639870/
A brief history of normalization
• 1970 E.F.Codd introduces 1st Normal Form (1NF)
• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)
• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)
• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)
Goals:
• Avoid anomalies when inserting, updating or deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias towards a particular style of query
* source : wikipedia
The real benefit of relational
• Before relational
• Data and Logic combined
• After relational
• Separation of concerns
• Data modeled independent of logic
• Logic freed from concerns of data design
RDBMS MongoDB
Table Collection
Row(s) JSON
Document
Index Index
Join Embedding
&
Linking
Partition Shard
Partition
Key Shard
Key
Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON
Document
Index Index
Join Embedding
&
Linking
Partition Shard
Partition
Key Shard
Key
Create a document
Design documents that simply map to
your application
post
=
{author:
“Hergé”,
date:
new
Date(),
text:
“Destination
Moon”,
tags:
[“comic”,
“adventure”]}
>db.post.save(post)
Add and index, find via Index
Secondary index for “author”
>db.posts.ensureIndex({author: 1})
>db.posts.find({author: 'Hergé'})
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
... }
Explain a query plan
>
db.blogs.find({author:
'Hergé'}).explain()
{
"cursor"
:
"BtreeCursor
author_1",
"nscanned"
:
1,
"nscannedObjects"
:
1,
"n"
:
1,
"millis"
:
5,
"indexBounds"
:
{
"author"
:
[
[
"Hergé",
"Hergé"
]
]
}
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
Regular expressions:
//
posts
where
author
starts
with
h
>
db.posts.find({author:
/^h/i
})
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
Regular expressions:
//
posts
where
author
starts
with
h
>
db.posts.find({author:
/^h/i
})
Counting:
//
number
of
posts
written
by
Hergé
>
db.posts.find({author:
“Hergé”}).count()
Part Three
Modeling the “real world”
Inheritance
Single Table Inheritance - RDBMS
shapes table
id type area radius d length width
1 circle 3.14 1
2 square 4 2
3 rect 10 5 2
Single Table Inheritance
>db.shapes.find()
{ _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1}
{ _id: ObjectId("..."), type: "square", area: 4, d: 2}
{ _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2}
// create index
>db.shapes.ensureIndex({radius: 1})
One to Many
One to Many relationships can specify
• degree of association between objects
• containment
• life-cycle
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
- Normalized (2 collections)
- most flexible
- more queries
One to Many - patterns
What is scaling?
Well - hopefully for everyone here.
Read Scalability : Replication
read
ReplicaSet 1
Primary
Secondary
Secondary
write
Basics
• MongoDB replication is a bit like RDBMS replication
Asynchronous master/slave at its core
• Variations:
Master / slave
Replica Sets
Replica Sets
• A cluster of N servers
• Any (one) node can be primary
• Consensus election of primary
• Automatic failover
• Automatic recovery
• All writes to primary
• Reads can be to primary (default) or a secondary
Replica Sets – Design Concepts
Member 1
Member 3
Member 2
Replica Set: Electing primary
Member 1
Member 3
Member 2
PRIMARY
Replica Set: Failure of master
negotiate
Member 1 new
Member 3
master PRIMARY
Member 2
DOWN
Replica Set: Reconfiguring
Member 1
Member 3
PRIMARY
Member 2
DOWN
Replica Set: Member recovers
Member 1
Member 3
PRIMARY
Member 2
RECOVER-
ING
Replica Set: Active
Member 1
Member 3
PRIMARY
Member 2
Write Scalability: Sharding
read key
range
key
range
key
range
0
..
30 31
..
60 61
..
100
write
Sharding
mongod
mongod
client
Writes
ease of development a surprisingly big benefit : faster to code, faster to change, avoid upgrades and scheduled downtime
more predictable performance
fast single server performance -> developer spends less time manually coding around the database
bottom line: usually, developers like it much better after trying
MongoDB features
• Durability
• Replication
• Sharding
• Connection options
Durability
What failures do you need to recover from?
• Loss of a single database node?
• Loss of a group of nodes?
Durability - Master only
• Write acknowledged
when in memory on
master only
Durability - Master + Slaves
• Write acknowledged when
in memory on master +
slave
slaveOk()
- driver to send read requests to Secondaries
- driver will always send writes to Primary
Can be set on
-‐
DB.slaveOk()
-‐
Collection.slaveOk()
-‐
find(q).addOption(Bytes.QUERYOPTION_SLAVEOK);
Using sharding
Before sharding
coll.save(
new
BasicDBObjectBuilder(“author”,
“Hergé”).
append(“text”,
“Destination
Moon”).
append(“date”,
new
Date());
After sharding
• Performance tuning
• Sizing
• O/S Tuning / File System layout
• Backup
Backup
• Typically backups are driven from a slave
• Eliminates impact to client / application traffic to master
Slave delay
• RAM - lots of it
• Filesystem
• EXT4 / XFS
• Better file allocation & performance
• I/O
• More disk the better
• Consider RAID10 or other RAID configs
Monitoring
Primary function:
• Measure stats over time
• Tells you what is going on with
your system
• Alerts when threshold reached
Remember me?
Summary
We’re Hiring !
alvin@10gen.com