Scaling For Humongous Amounts of Data With MongoDB

Scaling for Humongous amounts of data with MongoDB
Alvin Richards
Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com
Getting from here to there...
http://bit.ly/OT71M4
...probably using one of these
http://bit.ly/QDUIUF
Why NoSQL at all?
Growth? Scaling? Cost? Flexibility?

Indexed Pages
30,000 25,000 20,000 15,000 10,000 5,000 0
Pages Index (Million) 1998 2000 2008 2012
http://bit.ly/VDkDN2 http://bit.ly/108jTHN http://bit.ly/Wt3fl7 http://bit.ly/Qmg8YD
Need a Database that...

Build a database for scaleout
Run on clusters of 100s of commodity machines
that enables agile development and is usable for a broad variety of applications
Is Scaleout Mission Impossible?

Partitioning of Data
Hashes (Dynamo) vs Ranges (Big Table) Physical vs Logical segments
Consistency
Eventually Multi Master updates, resolve conflicts later
Immediately
Single Master updates, always consistent
NoSQL and MongoDB
Tradeoff: Scale vs Functionality

scalability & performance
memcached key/value
RDBMS
depth of functionality
9
What MongoDB solves

Agility
Applications store complex data that is easier to
model as documents Schemaless DB enables faster development cycles
Flexibility
Relaxed transactional semantics enable easy

scale out Auto Sharding for scale down and scale up
Cost
Cost effective operationalize abundant data

(clickstreams, logs, tweets, ...)
10
How does MongoDB shape up?

11
Data Distribution across nodes Sharding

Purpose:
Aggregate system resources horizontally Scaling writes Scaling consistent reads
Goals: Data location transparent to your code

Data distribution is automatic No code changes required
12
Sharding - Range distribution

sh.shardCollection("test.tweets",3{_id:31}3,3false)3
shard01
shard02
shard03
13
Sharding - Range distribution
shard01
shard02
shard03
a-i
j-r
s-z
14
Sharding - Splits
shard01
shard02
shard03
a-i
ja-jz k-r
s-z
15
Sharding - Splits
shard01
shard02
shard03
a-i
ja-ji ji-js js-jw jz-r

16
s-z
Sharding - Auto Balancing
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js js-jw jz-r

17
s-z
jz-r
Sharding - Goal Equilibrium
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
18
Sharding - Find by Key

find({_id:3"alvin"})3
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
19
Sharding - Find by Key

find({_id:3"alvin"})3
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
20
Sharding - Find by Attribute

find({email:3"alvin@10gen.com"})3
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
21
Sharding - Find by Attribute

find({email:3"alvin@10gen.com"})3
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
22
Sharding - Caching
96 GB Mem 3:1 Data/Mem
shard01
300 GB Data
a-i j-r s-z

300 GB
23
Aggregate Horizontal Resources

96 GB Mem 1:1 Data/Mem 96 GB Mem 1:1 Data/Mem 96 GB Mem 1:1 Data/Mem
shard01
300 GB Data
shard02
shard03
a-i j-r s-z

100 GB
j-r
s-z
100 GB
24
100 GB
Sharding
Partitions data across many nodes
Scales Read & Writes
What happens if a node fails?

Data in that partition is lost
Must have copies of partition across

Nodes Data Centers Geographic regions
25
Replica Sets
App
Write Read
Primary
Asynchronous Replication
Secondary
Secondary
26
Replica Sets
App
Write Read
Primary
Secondary
Secondary
27
Replica Sets
App
Primary
Write Read Automatic Election of new Primary
Primary
Secondary
28
Replica Sets
App
Recovering
Write Read New primary serves data
Primary
Secondary
29
Replica Sets
App
Secondary
Write Read
Primary
Secondary
30
Scale Eventually Consistent Reads
App
Read Write Read Read
Secondary
Primary
Secondary
31
Eventual Consistency Using Replicas for Read Scaling

Read Preferences
PRIMARY, PRIMARY PREFERRED! SECONDARY, SECONDARY PREFERRED! NEAREST
Java example
ReadPreference pref = ReadPreference.primaryPreferred();! DBCursor cur = new DBCursor(collection, query, ! null, pref);!
32
Immediate Consistency
Thread #1 Insert Read Update Read Primary
v1
"
v2
"
33
Eventual Consistency
Thread #1 Insert Read Primary
v1
Secondary Thread #2
v1 does not exist
"
"
v1
reads v1
"
34
Eventual Consistency
Thread #1 Insert Read Update Read Primary
v1
Secondary Thread #2
v1 does not exist
"
v2
"
v1
reads v1
"
" "
v2
reads v1
"
35
reads v2
Tunable Data Durability

Memory
RDBMS fire & forget w=1 w=1 j=true w="majority" w=n w="myTag" Less
36
Journal
Secondary Other Data Center
async
sync
More
Other MongoDB features

Capped Collections
Limit data by size, acts as a circular buffer / FIFO Use cases: Audit, history, logs
Time To Live (TTL) collections

Expire data based on timestamp Use cases: Archiving, purging, sessions
Text Search
Search by word, phrase, stemming, stop words Use cases: Consistent text search
37

38
Data Model
Why JSON?
Simple, well understood encapsulation of data Maps simply to objects in your OO language Linking & Embedding to describe relationships
39
Why Mess with the Data Model?
40
Mapping Objects to RDBMS

select * ! from posts p, ! authors a, ! comments c! where p.author_id = a.id! and p.id = c.post_id! and p.id = 123! ! ! ! !
comments
start transaction! insert into comments (...)! update posts ! set comment_count = comment_count + 1! where post_id = 123! commit!
!
posts authors
41
Mapping Objects to Distributed RDBMS

select * ! from posts p, ! authors a, ! comments c! where p.author_id = a.id! and p.id = c.post_id! and p.id = 123! ! ! ! !
comments
start transaction! insert into comments (...)! update posts ! set comment_count = comment_count + 1! where post_id = 123! commit!
!
posts authors
server a
server b
server c
42
Same Schema in MongoDB

linking embedding
43
Mapping Object with MongoDB

db.posts.find({_id:3123})3 !
posts author comments comments comments
db.posts.update(3 33{_id:3123},3 33{3"$push":3{comments:3new_comment},3 3333"$inc":33{comments_count:31}3}3 )3
server a
server b
server c
44
Schemas in MongoDB
Design documents that simply map to your application
post3=3{author:3"Herg", 33333333date:3new3Date(), 33333333text:3"Destination3Moon", 33333333tags:3["comic",3"adventure"]} 3 >3db.posts.save(post)

45
Examples
// Find the object! > db.blogs.find( { text: "Destination Moon" } )! ! // Find posts with tags! > db.blogs.find( { tags: { $exists: true } } )! ! // Regular expressions: posts where author starts with h! > db.blogs.find( { author: /^h/i } )! ! // Counting: number of posts written by Herg! > db.blogs.find( { author: "Herg" } ).count() !
46
Data Manipulation
Conditional Query Operators
Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte, $ne Vector: $in, $nin, $all, $size
Atomic Update Operators

Scalar: $inc, $set, $unset Vector: $push, $pop, $pull, $pushAll, $pullAll, $addToSet
47
Extending the schema

>3db.blogs.update(3 33333333333{3text:3"Destination3Moon"3},3 33333333333{3"$push":3{3comments:3new_comment3},3 3333333333333"$inc":33{3comments_count:313}3}3)3 3 33{3_id:3ObjectId("4c4ba5c0672c685e5e8aabf3"),33 3333text:3"Destination3Moon",3 3333comments:3[3 333{3 3 3author:3"Kyle",3 3 3date:3ISODate("2011Z09Z19T09:56:06.298Z"),3 3 3text:3"great3book"3 333}3 3333],3 3333comment_count:313 33}3
48

49
Big Data = MongoDB = Solved

Big Data Content Mgmt & Delivery Mobile & Social
User Data Management
Data Hub
50

51
10gen is the organization behind MongoDB
190+ employees
500+ customers
Over $81 million in funding
Offices in New York, Palo Alto, Washington DC, London, Dublin, Barcelona and Sydney
52
10gen Products and Services

Subscriptions
Professional Support, Subscriber Edition and Commercial License
Consulting
Expert Resources for All Phases of MongoDB Implementations
Training
Online and In-Person for Developers and Administrators
MongoDB Monitoring Service (MMS)

Free, Cloud-Based Service for Monitoring and Alerts
53
MongoDB is the Leading NoSQL Database

Google Search LinkedIn Job Skills
MongoDB Competitor 1 Competitor 2 Competitor 3 Competitor 4 Competitor 5 MongoDB Competitor 1 Competitor 2 Competitor 3 Competitor 4 All Others
Jaspersoft Big Data Index

Direct Real-Time Downloads MongoDB Competitor 1 Competitor 2 Competitor 3
Indeed.com Trends
Top Job Trends 1. HTML 5 2. MongoDB 3. iOS 4. Android 5. Mobile Apps 6. Puppet 7. Hadoop 8. jQuery 9. PaaS 10. Social Media
54
The Evolution of MongoDB

1.8 March 11
Journaling Sharding and Replica set enhancements Spherical geo search
2.0 Sept 11
2.2 Aug 12
2.4 March 13
Kerberos/SASL Hash Shard Key V8 Intersecting polygons Aggregation enhancements Text Search
Index enhancements Aggregation to improve size and Framework performance Multi-Data Center Authentication with Deployments sharded clusters Improved Replica Set Enhancements Concurrency improvements
55
Performance and Concurrency
download at mongodb.org
Drop(by(on(the(5th(oor(and(meet(an(Engineer( and( Get(a(discount(code(for(MongoDB(London(April(9th(
http://bit.ly/mongoC((
Facebook((((((((((|(((((((((Twitter(((((((((|(((((((((LinkedIn(
@mongodb(
http://linkd.in/joinmongo(

Scaling For Humongous Amounts of Data With MongoDB

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Scaling For Humongous Amounts of Data With MongoDB

Cargado por

Copyright:

Formatos disponibles

Scaling for Humongous amounts of data with MongoDB

Getting from here to there...

...probably using one of these

Why NoSQL at all?

Growth? Scaling? Cost? Flexibility?

Pages Index (Million) 1998 2000 2008 2012

http://bit.ly/VDkDN2 http://bit.ly/108jTHN http://bit.ly/Wt3fl7 http://bit.ly/Qmg8YD

Need a Database that...

Is Scaleout Mission Impossible?

NoSQL and MongoDB

Tradeoff: Scale vs Functionality

What MongoDB solves

Relaxed transactional semantics enable easy

Cost effective operationalize abundant data

How does MongoDB shape up?

Data Distribution across nodes Sharding

Goals: Data location transparent to your code

Sharding - Range distribution

Sharding - Range distribution

ja-ji ji-js js-jw jz-r

Sharding - Auto Balancing

ja-ji ji-js js-jw jz-r

Sharding - Goal Equilibrium

Sharding - Find by Key

Sharding - Find by Key

Sharding - Find by Attribute

Sharding - Find by Attribute

a-i j-r s-z

Aggregate Horizontal Resources

a-i j-r s-z

What happens if a node fails?

Must have copies of partition across

Scale Eventually Consistent Reads

Eventual Consistency Using Replicas for Read Scaling

Tunable Data Durability

Secondary Other Data Center

Other MongoDB features

Time To Live (TTL) collections

How does MongoDB shape up?

Why Mess with the Data Model?

Mapping Objects to RDBMS

Mapping Objects to Distributed RDBMS

Same Schema in MongoDB

Mapping Object with MongoDB

db.posts.update(3 33{_id:3123},3 33{3"$push":3{comments:3new_comment},3 3333"$inc":33{comments_count:31}3}3 )3

Design documents that simply map to your application

post3=3{author:3"Herg", 33333333date:3new3Date(), 33333333text:3"Destination3Moon", 33333333tags:3["comic",3"adventure"]} 3 >3db.posts.save(post)

Atomic Update Operators

Extending the schema

How does MongoDB shape up?

Big Data = MongoDB = Solved

User Data Management

How does MongoDB shape up?

10gen is the organization behind MongoDB

Over $81 million in funding

10gen Products and Services

MongoDB Monitoring Service (MMS)

MongoDB is the Leading NoSQL Database

Jaspersoft Big Data Index

The Evolution of MongoDB

Performance and Concurrency

Drop(by(on(the(5th(oor(and(meet(an(Engineer( and( Get(a(discount(code(for(MongoDB(London(April(9th(

También podría gustarte