Storm Blueprints: Patterns for Distributed Realtime Computation

Ebook686 pages4 hours

Storm Blueprints: Patterns for Distributed Realtime Computation

Name: Storm Blueprints: Patterns for Distributed Realtime Computation
Brand: Packt Publishing
Rating: 4.0 (1 reviews)

By P. Taylor Goetz and Brian O'Neill

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

A blueprints book with 10 different projects built in 10 different chapters which demonstrate the various use cases of storm for both beginner and intermediate users, grounded in realworld example applications.

Although the book focuses primarily on Java development with Storm, the patterns are more broadly applicable and the tips, techniques, and approaches described in the book apply to architects, developers, and operations.

Additionally, the book should provoke and inspire applications of distributed computing to other industries and domains. Hadoop enthusiasts will also find this book a good introduction to Storm, providing a potential migration path from batch processing to the world of realtime analytics.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateMar 26, 2014

ISBN9781782168300

Author

P. Taylor Goetz

Related authors

Skip carousel

Related to Storm Blueprints

Related ebooks

Skip carousel

Mastering RabbitMQ
Ebook
Mastering RabbitMQ
byAyanoglu Emrah
Rating: 0 out of 5 stars
0 ratings
Advanced Splunk
Ebook
Advanced Splunk
byAshish Kumar Tulsiram Yadav
Rating: 5 out of 5 stars
5/5
Real-Time Big Data Analytics
Ebook
Real-Time Big Data Analytics
byShilpi
Rating: 5 out of 5 stars
5/5
Mastering matplotlib
Ebook
Mastering matplotlib
byDuncan M. McGreggor
Rating: 0 out of 5 stars
0 ratings
Implementing Cloud Storage with OpenStack Swift
Ebook
Implementing Cloud Storage with OpenStack Swift
byAmar Kapadia
Rating: 0 out of 5 stars
0 ratings
Dart By Example
Ebook
Dart By Example
byMitchell Davy
Rating: 0 out of 5 stars
0 ratings
Node Web Development, Second Edition
Ebook
Node Web Development, Second Edition
byDavid Herron
Rating: 0 out of 5 stars
0 ratings
Mastering IPython 4.0
Ebook
Mastering IPython 4.0
byThomas Bitterman
Rating: 0 out of 5 stars
0 ratings
OpenStack Trove Essentials
Ebook
OpenStack Trove Essentials
byShrivastwa Alok
Rating: 0 out of 5 stars
0 ratings
Extending Puppet
Ebook
Extending Puppet
byAlessandro Franceschi
Rating: 0 out of 5 stars
0 ratings
Mastering Spring Application Development
Ebook
Mastering Spring Application Development
byAnjana Mankale
Rating: 1 out of 5 stars
1/5
Mastering CryENGINE
Ebook
Mastering CryENGINE
bySascha Gundlach
Rating: 0 out of 5 stars
0 ratings
Mastering Node.js
Ebook
Mastering Node.js
bySandro Pasquali
Rating: 1 out of 5 stars
1/5
Frank Kane's Taming Big Data with Apache Spark and Python
Ebook
Frank Kane's Taming Big Data with Apache Spark and Python
byFrank Kane
Rating: 0 out of 5 stars
0 ratings
Real-time Analytics with Storm and Cassandra
Ebook
Real-time Analytics with Storm and Cassandra
byShilpi Saxena
Rating: 0 out of 5 stars
0 ratings
Mastering Spark for Data Science
Ebook
Mastering Spark for Data Science
byAndrew Morgan
Rating: 0 out of 5 stars
0 ratings
Elasticsearch for Hadoop
Ebook
Elasticsearch for Hadoop
byShukla Vishal
Rating: 0 out of 5 stars
0 ratings
Mastering Symfony
Ebook
Mastering Symfony
bySohail Salehi
Rating: 0 out of 5 stars
0 ratings
Docker Orchestration
Ebook
Docker Orchestration
byRandall Smith
Rating: 0 out of 5 stars
0 ratings
Mastering Web Application Development with Express
Ebook
Mastering Web Application Development with Express
byAlexandru Vlăduțu
Rating: 0 out of 5 stars
0 ratings
Building Python Real-Time Applications with Storm
Ebook
Building Python Real-Time Applications with Storm
byBhatnagar Kartik
Rating: 0 out of 5 stars
0 ratings
Mastering Clojure Data Analysis
Ebook
Mastering Clojure Data Analysis
byEric Rochester
Rating: 0 out of 5 stars
0 ratings
Matplotlib for Python Developers
Ebook
Matplotlib for Python Developers
bySandro Tosi
Rating: 3 out of 5 stars
3/5
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Scala for Data Science
Ebook
Scala for Data Science
byBugnion Pascal
Rating: 0 out of 5 stars
0 ratings
Hadoop Blueprints
Ebook
Hadoop Blueprints
byAnurag Shrivastava
Rating: 0 out of 5 stars
0 ratings
Getting Started with Oracle Event Processing 11g
Ebook
Getting Started with Oracle Event Processing 11g
byAlexandre Alves
Rating: 0 out of 5 stars
0 ratings
Large Scale Machine Learning with Python
Ebook
Large Scale Machine Learning with Python
byBastiaan Sjardin
Rating: 2 out of 5 stars
2/5
OpenStack Administration with Ansible 2 - Second Edition
Ebook
OpenStack Administration with Ansible 2 - Second Edition
byBentley Walter
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis - Second Edition
Ebook
Python Data Analysis - Second Edition
byArmando Fandango
Rating: 0 out of 5 stars
0 ratings

Databases For You

Skip carousel

Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
Ebook
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
byWilliam Sullivan
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Blockchain Basics: A Non-Technical Introduction in 25 Steps
Ebook
Blockchain Basics: A Non-Technical Introduction in 25 Steps
byDaniel Drescher
Rating: 5 out of 5 stars
5/5
Access 2019 For Dummies
Ebook
Access 2019 For Dummies
byLaurie A. Ulrich
Rating: 0 out of 5 stars
0 ratings
SQL: Practical Guide for Developers
Ebook
SQL: Practical Guide for Developers
byMichael J. Donahoo
Rating: 2 out of 5 stars
2/5
Access 2010 All-in-One For Dummies
Ebook
Access 2010 All-in-One For Dummies
byAlison Barrows
Rating: 4 out of 5 stars
4/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
LINUX: Beginner's Crash Course. Your Step-By-Step Guide To Learning The Linux Operating System And Command Line Easy & Fast!
Ebook
LINUX: Beginner's Crash Course. Your Step-By-Step Guide To Learning The Linux Operating System And Command Line Easy & Fast!
byJeremy Li
Rating: 3 out of 5 stars
3/5
Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight
Ebook
Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight
byPiyanka Jain
Rating: 5 out of 5 stars
5/5
Python Projects for Everyone
Ebook
Python Projects for Everyone
byMohamad Charara
Rating: 0 out of 5 stars
0 ratings
Learn Git in a Month of Lunches
Ebook
Learn Git in a Month of Lunches
byRick Umali
Rating: 0 out of 5 stars
0 ratings
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
Ebook
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
byJohn Ladley
Rating: 4 out of 5 stars
4/5
Learn SQL Server Administration in a Month of Lunches
Ebook
Learn SQL Server Administration in a Month of Lunches
byDon Jones
Rating: 3 out of 5 stars
3/5
COMPUTER SCIENCE FOR ROOKIES
Ebook
COMPUTER SCIENCE FOR ROOKIES
byAngel Bahabwa
Rating: 0 out of 5 stars
0 ratings
Beginning Microsoft SQL Server 2012 Programming
Ebook
Beginning Microsoft SQL Server 2012 Programming
byPaul Atkinson
Rating: 1 out of 5 stars
1/5
Excel 2021
Ebook
Excel 2021
byJIAYI SIMONDS
Rating: 4 out of 5 stars
4/5
100+ SQL Queries T-SQL for Microsoft SQL Server
Ebook
100+ SQL Queries T-SQL for Microsoft SQL Server
byIFS Harrison
Rating: 4 out of 5 stars
4/5
COBOL Basic Training Using VSAM, IMS and DB2
Ebook
COBOL Basic Training Using VSAM, IMS and DB2
byRobert Wingate
Rating: 5 out of 5 stars
5/5
Spring in Action, Sixth Edition
Ebook
Spring in Action, Sixth Edition
byCraig Walls
Rating: 5 out of 5 stars
5/5
Building a Scalable Data Warehouse with Data Vault 2.0
Ebook
Building a Scalable Data Warehouse with Data Vault 2.0
byDaniel Linstedt
Rating: 4 out of 5 stars
4/5
Relational Database Design and Implementation
Ebook
Relational Database Design and Implementation
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Data Science Strategy For Dummies
Ebook
Data Science Strategy For Dummies
byUlrika Jägare
Rating: 0 out of 5 stars
0 ratings
IMS-DB Basic Training For Application Developers
Ebook
IMS-DB Basic Training For Application Developers
byRobert Wingate
Rating: 0 out of 5 stars
0 ratings
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
Ebook
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
byAJIT DASH
Rating: 3 out of 5 stars
3/5
Learning Oracle 12c: A PL/SQL Approach
Ebook
Learning Oracle 12c: A PL/SQL Approach
bySham Tickoo
Rating: 0 out of 5 stars
0 ratings
The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation
Ebook
The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation
byJhon Dujardin
Rating: 4 out of 5 stars
4/5
Oracle DBA Mentor: Succeeding as an Oracle Database Administrator
Ebook
Oracle DBA Mentor: Succeeding as an Oracle Database Administrator
byBrian Peasland
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
Podcast episode
Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
DevOps and Incident Response Evolution
Podcast episode
DevOps and Incident Response Evolution
byThe Cloudcast
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Commanding the Council of the Lords of Thought with Anna Belak: A few years ago Corey caught wind of the open source project Sysdig, which at the time attracted his attention. Now it has turned into something “rather interesting” when it comes to observability and security. Anna Belak, Sysdig’s Director of Thought Lea
Podcast episode
Commanding the Council of the Lords of Thought with Anna Belak: A few years ago Corey caught wind of the open source project Sysdig, which at the time attracted his attention. Now it has turned into something “rather interesting” when it comes to observability and security. Anna Belak, Sysdig’s Director of Thought Lea
byScreaming in the Cloud
0 ratings
0% found this document useful
Acorns for AWS
Podcast episode
Acorns for AWS
byThe Cloudcast
0 ratings
0% found this document useful
New Trends in Serverless
Podcast episode
New Trends in Serverless
byThe Cloudcast
0 ratings
0% found this document useful
How to Crack the ‘Bestseller Code’ with Jodie Archer & Matt Jockers: Part Two: In the cliffhanger conclusion to my chat with author and publishing consultant, Jodie Archer, we are joined this week by Dr. Matthew Jockers, English Professor & Dean at the University of Nebraska, and co-author of the internationally acclaimed...
Podcast episode
How to Crack the ‘Bestseller Code’ with Jodie Archer & Matt Jockers: Part Two: In the cliffhanger conclusion to my chat with author and publishing consultant, Jodie Archer, we are joined this week by Dr. Matthew Jockers, English Professor & Dean at the University of Nebraska, and co-author of the internationally acclaimed...
byThe Writer Files: Writing, Productivity, Creativity, and Neuroscience
0 ratings
0% found this document useful
The Rapid Rise of Vector Databases with Ram Sriharsha: Ram Sriharsha, VP of Engineering and R&D at Pinecone, joins Corey on Screaming in the Cloud to discuss Pinecone’s creation of Vector Databases, the challenges they solve, and why their customer adoption has seen such a rapid rise. Ram reveals the the comm
Podcast episode
The Rapid Rise of Vector Databases with Ram Sriharsha: Ram Sriharsha, VP of Engineering and R&D at Pinecone, joins Corey on Screaming in the Cloud to discuss Pinecone’s creation of Vector Databases, the challenges they solve, and why their customer adoption has seen such a rapid rise. Ram reveals the the comm
byScreaming in the Cloud
0 ratings
0% found this document useful
Putting the “Fun” in Functional with Frank Chen: Almost everyone is using Slack, and a lot of that is because of the work of those like Frank Chen, Slack’s Senior Staff Software Engineer. Frank is here to tell us how Slack keeps us all angrily typing. But equally as important is his own trajectory which
Podcast episode
Putting the “Fun” in Functional with Frank Chen: Almost everyone is using Slack, and a lot of that is because of the work of those like Frank Chen, Slack’s Senior Staff Software Engineer. Frank is here to tell us how Slack keeps us all angrily typing. But equally as important is his own trajectory which
byScreaming in the Cloud
0 ratings
0% found this document useful
Python with Dustin Ingram: Mark and Brian Dorsey spend today talking Python with Dustin Ingram.
Podcast episode
Python with Dustin Ingram: Mark and Brian Dorsey spend today talking Python with Dustin Ingram.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
The Busy Creator 31, Project Mgmt & Collaboration Tools w/guest Bryan Orr: Bryan Orr (@BryanJOrr) is both a brick-and-mortal small business owner and software entrepreneur, based in Florida. In owning and operating an HVAC installation company, he has experimented with a great variety
Podcast episode
The Busy Creator 31, Project Mgmt & Collaboration Tools w/guest Bryan Orr: Bryan Orr (@BryanJOrr) is both a brick-and-mortal small business owner and software entrepreneur, based in Florida. In owning and operating an HVAC installation company, he has experimented with a great variety
byThe Busy Creator Podcast with Prescott Perez-Fox
0 ratings
0% found this document useful
gRPC at CoreOS with Brandon Philips: Brandon Philips, CTO of CoreOS, tells your cohosts Mark and Francesc why they chose gRPC for the newest version of etcd and how this improved its performance and development flow.
Podcast episode
gRPC at CoreOS with Brandon Philips: Brandon Philips, CTO of CoreOS, tells your cohosts Mark and Francesc why they chose gRPC for the newest version of etcd and how this improved its performance and development flow.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Hasty Treat - Effortless Custom GraphQL with GraphQL Codegen: In this Hasty Treat, Scott and Wes talk about GraphQL tooling, and specifically a couple tools we use that will change your experience with GraphQL. .TECH Domains - Sponsor .TECH is taking the tech industry by storm. A domain that shows the world...
Podcast episode
Hasty Treat - Effortless Custom GraphQL with GraphQL Codegen: In this Hasty Treat, Scott and Wes talk about GraphQL tooling, and specifically a couple tools we use that will change your experience with GraphQL. .TECH Domains - Sponsor .TECH is taking the tech industry by storm. A domain that shows the world...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Cloud Dataflow with Frances Perry: Cloud Dataflow and its OSS counterpart Apache Beam are amazing tools for Big Data so we asked Frances Perry, the Tech Lead and PMC for those projects, to join us and tell us more about it.
Podcast episode
Cloud Dataflow with Frances Perry: Cloud Dataflow and its OSS counterpart Apache Beam are amazing tools for Big Data so we asked Frances Perry, the Tech Lead and PMC for those projects, to join us and tell us more about it.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
The Cloudcast #355 - Exploring IoT Edge
Podcast episode
The Cloudcast #355 - Exploring IoT Edge
byThe Cloudcast
0 ratings
0% found this document useful
Intro to Vector Databases
Podcast episode
Intro to Vector Databases
byThe Cloudcast
0 ratings
0% found this document useful
Rainforest QA with Russell Smith: Russell Smith, cofounder and CTO of Rainforest QA, joins the podcast to explain how they power their analytics platform with BigQuery, streaming thousands of rows per second.
Podcast episode
Rainforest QA with Russell Smith: Russell Smith, cofounder and CTO of Rainforest QA, joins the podcast to explain how they power their analytics platform with BigQuery, streaming thousands of rows per second.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Managed Service for Prometheus with Lee Yanco and Ashish Kumar: Hosts and are in the studio this week! We’re talking about Prometheus with guests and and learning about the build process for Google Cloud’s Managed Service for Prometheus and how Home Depot uses this tool to power their business. To begin...
Podcast episode
Managed Service for Prometheus with Lee Yanco and Ashish Kumar: Hosts and are in the studio this week! We’re talking about Prometheus with guests and and learning about the build process for Google Cloud’s Managed Service for Prometheus and how Home Depot uses this tool to power their business. To begin...
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
Understanding Machine Learning Features and Platforms
Podcast episode
Understanding Machine Learning Features and Platforms
byThe Cloudcast
0 ratings
0% found this document useful
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
Podcast episode
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
byData Engineering Podcast
0 ratings
0% found this document useful
Drone CI with Brad Rydzewksi and Jessie Frazelle: Digging back into our archive of interviews from Google Cloud Next, Mark and Francesc talk to Brad Rydzewski, creator of Drone, about the oipen source continuous integration and delivery platform. We are also excited to have the amazing Jessie Frazelle joining us as well!
Podcast episode
Drone CI with Brad Rydzewksi and Jessie Frazelle: Digging back into our archive of interviews from Google Cloud Next, Mark and Francesc talk to Brad Rydzewski, creator of Drone, about the oipen source continuous integration and delivery platform. We are also excited to have the amazing Jessie Frazelle joining us as well!
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Redwood, startups, and the future with Tom Preston-Werner: In this episode, we talk to Tom Preston-Werner, creator of RedwoodJS, Jekyll, and cofounder of GitHub, about why he wanted to create RedwoodJS, how it benefits startups, and the future of investing in startups.
Podcast episode
Redwood, startups, and the future with Tom Preston-Werner: In this episode, we talk to Tom Preston-Werner, creator of RedwoodJS, Jekyll, and cofounder of GitHub, about why he wanted to create RedwoodJS, how it benefits startups, and the future of investing in startups.
byPodRocket - A web development podcast from LogRocket
0 ratings
0% found this document useful
Developing Multi-Cloud Skills
Podcast episode
Developing Multi-Cloud Skills
byThe Cloudcast
0 ratings
0% found this document useful
Building Apps with WebAssembly
Podcast episode
Building Apps with WebAssembly
byThe Cloudcast
0 ratings
0% found this document useful
Automattic acquires Pocket Casts podcast app: Automattic purchases Pocket Casts, a popular podcast app that was facing an uncertain future. A quote from the creators of the app: “As part of Automattic, Pocket Casts will continue to provide you with the features needed to enjoy your favorite podcasts (or find something new). We will explore building deep integrations with WordPress.com and Pocket Casts, making it easier to distribute and listen to podcasts” Recent installs of WooCommerce were hit with a vulnerability which has since been patched. If you haven’t updated your WooCommerce site, please check that as soon as possible. A quote from the WooCommerce blog: Our investigation into this vulnerability and whether data has been compromised is ongoing. We will be sharing more information with site owners on how to investigate this security vulnerability on their site, which we will publish on our blog when it is ready. Pantheon host, known for WordPress hosting an
Podcast episode
Automattic acquires Pocket Casts podcast app: Automattic purchases Pocket Casts, a popular podcast app that was facing an uncertain future. A quote from the creators of the app: “As part of Automattic, Pocket Casts will continue to provide you with the features needed to enjoy your favorite podcasts (or find something new). We will explore building deep integrations with WordPress.com and Pocket Casts, making it easier to distribute and listen to podcasts” Recent installs of WooCommerce were hit with a vulnerability which has since been patched. If you haven’t updated your WooCommerce site, please check that as soon as possible. A quote from the WooCommerce blog: Our investigation into this vulnerability and whether data has been compromised is ongoing. We will be sharing more information with site owners on how to investigate this security vulnerability on their site, which we will publish on our blog when it is ready. Pantheon host, known for WordPress hosting an
byThe WP Minute - WordPress news
0 ratings
0% found this document useful
Platform Engineering at a FAANG Company
Podcast episode
Platform Engineering at a FAANG Company
byThe Cloudcast
0 ratings
0% found this document useful
Cloud Firestore with Dan McGrath and Alex Dufetel: Dan McGrath and Alex Dufetel join Francesc and Mark in the studio this week to discuss Cloud Firestore, the brand new, fully-managed NoSQL document database for mobile and web app development.
Podcast episode
Cloud Firestore with Dan McGrath and Alex Dufetel: Dan McGrath and Alex Dufetel join Francesc and Mark in the studio this week to discuss Cloud Firestore, the brand new, fully-managed NoSQL document database for mobile and web app development.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
New Tools for Cloud Native Developers
Podcast episode
New Tools for Cloud Native Developers
byThe Cloudcast
0 ratings
0% found this document useful
Evolution of Public Cloud Integrators
Podcast episode
Evolution of Public Cloud Integrators
byThe Cloudcast
0 ratings
0% found this document useful

Skip carousel

Comparing Time Series Data Like A Pro
Linux Format
Article
Comparing Time Series Data Like A Pro
Jun 1, 2021
8 min read
Lag Is Killing Games
Linux Format
Article
Lag Is Killing Games
Jan 11, 2022
8 min read
Exchange Messages Between Tasks
Linux Format
Article
Exchange Messages Between Tasks
Jun 29, 2021
8 min read
Poisoning The Well
Linux Format
Article
Poisoning The Well
Jan 11, 2022
4 min read
Collect And Graph Metrics With Python
Linux Format
Article
Collect And Graph Metrics With Python
May 4, 2021
7 min read
Monitor And Graph Your System Metrics
Linux Format
Article
Monitor And Graph Your System Metrics
Dec 13, 2022
Credit: https://oss.oetiker.ch/rrdtool Matt Holder has worked in IT support for over a decade, and always tries to use Linux alongside other installed systems. The code used in this article can be downloaded from https:// github.com/ mattmole/ LXF297
8 min read
Code A Cataloguing Application In Python
Linux Format
Article
Code A Cataloguing Application In Python
Nov 15, 2022
Credit: www.djangoproject.com Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://github.com/mat
8 min read
Build A Static Analysis Development Pipeline
Linux Format
Article
Build A Static Analysis Development Pipeline
Jul 27, 2021
9 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
HotPicks
Linux Format
Article
HotPicks
Nov 19, 2019
12 min read
A.i. Coding
Linux Format
Article
A.i. Coding
Aug 22, 2023
16 min read
Accurate, Open Source IP-based Localisation
Linux Format
Article
Accurate, Open Source IP-based Localisation
Dec 14, 2021
8 min read
Natural Language Translation
Linux Format
Article
Natural Language Translation
Jun 27, 2023
4 min read
The Body Syntonic
Linux Format
Article
The Body Syntonic
Aug 23, 2022
4 min read
The Evolution Of Live-action Media
3D World
Article
The Evolution Of Live-action Media
Dec 29, 2021
5 min read
Hot Picks
Linux Format
Article
Hot Picks
Mar 9, 2021
13 min read
Create Smaller Sized Apps With React
Linux Format
Article
Create Smaller Sized Apps With React
Nov 19, 2019
You may not be surprised that some developers have criticised Electron (see tutorials LXF256), mostly regarding the memory usage of its final binaries. The initial binary is over 100MB, because a major chunk of code from Chrome is embedded. When you
6 min read
Create Your Own Chromecast Device
Linux Format
Article
Create Your Own Chromecast Device
Mar 8, 2022
Mats Tage Axelsson would love to broadcast about all things Linux 24/7. We’re sure that it would be must-watch TV. We all spend an inordinate amount of time consuming video and music. Many of these services are streamed via browsers, but that’s not a
9 min read
Create Your Own Chromecast Device
Linux Format
Article
Create Your Own Chromecast Device
Mar 8, 2022
Mats Tage Axelsson would love to broadcast about all things Linux 24/7. We’re sure that it would be must-watch TV. We all spend an inordinate amount of time consuming video and music. Many of these services are streamed via browsers, but that’s not a
9 min read
Turbocharge Your Motion Graphics
3D World
Article
Turbocharge Your Motion Graphics
Feb 26, 2020
9 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
Create Your Own Chromecast Device
APC
Article
Create Your Own Chromecast Device
May 16, 2022
8 min read
Code An Admin Back-end In Django
Linux Format
Article
Code An Admin Back-end In Django
Dec 13, 2022
Credit: www.djangoproject.com OUR EXPERT Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://
6 min read
Build A Dynamic App Security Pipeline
Linux Format
Article
Build A Dynamic App Security Pipeline
Sep 21, 2021
8 min read
Use Python To Get More From Dropbox
Linux Format
Article
Use Python To Get More From Dropbox
Feb 8, 2022
8 min read
Art Beyond The Canvas
Linux Format
Article
Art Beyond The Canvas
May 2, 2023
9 min read
Visualise Smart- Home Sensor Data
Linux Format
Article
Visualise Smart- Home Sensor Data
Oct 17, 2023
8 min read

Related categories

Skip carousel

Reviews for Storm Blueprints

Rating: 4 out of 5 stars

4/5

1 rating0 reviews

Book preview

Storm Blueprints - P. Taylor Goetz

Storm Blueprints: Patterns for Distributed Real-time Computation

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Distributed Word Count

Introducing elements of a Storm topology – streams, spouts, and bolts

Streams

Spouts

Bolts

Introducing the word count topology data flow

Sentence spout

Introducing the split sentence bolt

Introducing the word count bolt

Introducing the report bolt

Implementing the word count topology

Setting up a development environment

Implementing the sentence spout

Implementing the split sentence bolt

Implementing the word count bolt

Implementing the report bolt

Implementing the word count topology

Introducing parallelism in Storm

WordCountTopology parallelism

Adding workers to a topology

Configuring executors and tasks

Understanding stream groupings

Guaranteed processing

Reliability in spouts

Reliability in bolts

Reliable word count

Summary

2. Configuring Storm Clusters

Introducing the anatomy of a Storm cluster

Understanding the nimbus daemon

Working with the supervisor daemon

Introducing Apache ZooKeeper

Working with Storm's DRPC server

Introducing the Storm UI

Introducing the Storm technology stack

Java and Clojure

Python

Installing Storm on Linux

Installing the base operating system

Installing Java

ZooKeeper installation

Storm installation

Running the Storm daemons

Configuring Storm

Mandatory settings

Optional settings

The Storm executable

Setting up the Storm executable on a workstation

The daemon commands

Nimbus

Supervisor

DRPC

The management commands

Jar

Kill

Deactivate

Activate

Rebalance

Remoteconfvalue

Local debug/development commands

REPL

Classpath

Localconfvalue

Submitting topologies to a Storm cluster

Automating the cluster configuration

A rapid introduction to Puppet

Puppet manifests

Puppet classes and modules

Puppet templates

Managing environments with Puppet Hiera

Introducing Hiera

Summary

3. Trident Topologies and Sensor Data

Examining our use case

Introducing Trident topologies

Introducing Trident spouts

Introducing Trident operations – filters and functions

Introducing Trident filters

Introducing Trident functions

Introducing Trident aggregators – Combiners and Reducers

CombinerAggregator

ReducerAggregator

Aggregator

Introducing the Trident state

The Repeat Transactional state

The Opaque state

Executing the topology

Summary

4. Real-time Trend Analysis

Use case

Architecture

The source application

The logback Kafka appender

Apache Kafka

Kafka spout

The XMPP server

Installing the required software

Installing Kafka

Installing OpenFire

Introducing the sample application

Sending log messages to Kafka

Introducing the log analysis topology

Kafka spout

The JSON project function

Calculating a moving average

Adding a sliding window

Implementing the moving average function

Filtering on thresholds

Sending notifications with XMPP

The final topology

Running the log analysis topology

Summary

5. Real-time Graph Analysis

Use case

Architecture

The Twitter client

Kafka spout

A titan-distributed graph database

A brief introduction to graph databases

Accessing the graph – the TinkerPop stack

Manipulating the graph with the Blueprints API

Manipulating the graph with the Gremlin shell

Software installation

Titan installation

Setting up Titan to use the Cassandra storage backend

Installing Cassandra

Starting Titan with the Cassandra backend

Graph data model

Connecting to the Twitter stream

Setting up the Twitter4J client

The OAuth configuration

The TwitterStreamConsumer class

The TwitterStatusListener class

Twitter graph topology

The JSONProjectFunction class

Implementing GraphState

GraphFactory

GraphTupleProcessor

GraphStateFactory

GraphState

GraphUpdater

Implementing GraphFactory

Implementing GraphTupleProcessor

Putting it all together – the TwitterGraphTopology class

The TwitterGraphTopology class

Querying the graph with Gremlin

Summary

6. Artificial Intelligence

Designing for our use case

Establishing the architecture

Examining the design challenges

Implementing the recursion

Accessing the function's return values

Immutable tuple field values

Upfront field declaration

Tuple acknowledgement in recursion

Output to multiple streams

Read-before-write

Solving the challenges

Implementing the architecture

The data model

Examining the recursive topology

The queue interaction

Functions and filters

Examining the Scoring Topology

Addressing read-before-write

Distributed locking

Retry when stale

Executing the topology

Enumerating the game tree

Distributed Remote Procedure Call (DRPC)

Remote deployment

Summary

7. Integrating Druid for Financial Analytics

Use case

Integrating a non-transactional system

The topology

The spout

The filter

The state design

Implementing the architecture

DruidState

Implementing the StormFirehose object

Implementing the partition status in ZooKeeper

Executing the implementation

Examining the analytics

Summary

8. Natural Language Processing

Motivating a Lambda architecture

Examining our use case

Realizing a Lambda architecture

Designing the topology for our use case

Implementing the design

TwitterSpout/TweetEmitter

Functions

TweetSplitterFunction

WordFrequencyFunction

PersistenceFunction

Examining the analytics

Batch processing / historical analysis

Hadoop

An overview of MapReduce

The Druid setup

HadoopDruidIndexer

Summary

9. Deploying Storm on Hadoop for Advertising Analysis

Examining the use case

Establishing the architecture

Examining HDFS

Examining YARN

Configuring the infrastructure

The Hadoop infrastructure

Configuring HDFS

Configuring the NameNode

Configuring the DataNode

Configuring YARN

Configuring the ResourceManager

Configuring the NodeManager

Deploying the analytics

Performing a batch analysis with the Pig infrastructure

Performing a real-time analysis with the Storm-YARN infrastructure

Performing the analytics

Executing the batch analysis

Executing real-time analysis

Deploying the topology

Executing the topology

Summary

10. Storm in the Cloud

Introducing Amazon Elastic Compute Cloud (EC2)

Setting up an AWS account

The AWS Management Console

Creating an SSH key pair

Launching an EC2 instance manually

Logging in to the EC2 instance

Introducing Apache Whirr

Installing Whirr

Configuring a Storm cluster with Whirr

Launching the cluster

Introducing Whirr Storm

Setting up Whirr Storm

Cluster configuration

Customizing Storm's configuration

Customizing firewall rules

Introducing Vagrant

Installing Vagrant

Launching your first virtual machine

The Vagrantfile and shared filesystem

Vagrant provisioning

Configuring multimachine clusters with Vagrant

Creating Storm-provisioning scripts

ZooKeeper

Storm

Supervisord

The Storm Vagrantfile

Launching the Storm cluster

Summary

Index

Storm Blueprints: Patterns for Distributed Real-time Computation

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: March 2014

Production Reference: 1200314

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78216-829-4

www.packtpub.com

Cover Image by Prashant Timappa Shetty (<sparkling.spectrum.123@gmail.com>)

Credits

Authors

P. Taylor Goetz

Brian O'Neill

Reviewers

Vincent Gijsen

Sonal Raj

James Xu

Acquisition Editors

Usha Iyer

James Jones

Lead Technical Editor

Arun Nadar

Technical Editors

Kapil Hemnani

Monica John

Edwin Moses

Copy Editors

Roshni Banerjee

Sarang Chari

Brandt D'Mello

Mradula Hegde

Gladson Monteiro

Project Coordinator

Mary Alex

Proofreaders

Simran Bhogal

Maria Gould

Graphics

Ronak Dhruv

Valentina Dsilva

Disha Haria

Yuvraj Mannari

Abhinash Sahu

Indexer

Tejal Soni

Production Coordinator

Conidon Miranda

Cover Work

Conidon Miranda

About the Authors

P. Taylor Goetz is an Apache Storm committer and release manager and has been involved with the usage and development of Storm since it was first released as open source in October of 2011. As an active contributor to the Storm user community, Taylor leads a number of open source projects that enable enterprises to integrate Storm into heterogeneous infrastructure.

Presently, he works at Hortonworks where he leads the integration of Storm into Hortonworks Data Platform (HDP). Prior to joining Hortonworks, he worked at Health Market Science where he led the integration of Storm into HMS' next generation Master Data Management platform with technologies including Cassandra, Kafka, Elastic Search, and the Titan graph database.

I would like to thank my amazing wife, children, family, and friends whose love, support, and sacrifices made this book possible. I owe you all a debt of gratitude.

Brian O'Neill is a husband, hacker, hiker, and kayaker. He is a fisherman and father as well as big data believer, innovator, and distributed computing dreamer.

He has been a technology leader for over 15 years and is recognized as an authority on big data. He has experience as an architect in a wide variety of settings, from start-ups to Fortune 500 companies. He believes in open source and contributes to numerous projects. He leads projects that extend Cassandra and integrate the database with indexing engines, distributed processing frameworks, and analytics engines. He won InfoWorld's Technology Leadership award in 2013. He authored the Dzone reference card on Cassandra and was selected as a Datastax Cassandra MVP in 2012 and 2013.

In the past, he has contributed to expert groups within the Java Community Process (JCP) and has patents in artificial intelligence and context-based discovery. He is proud to hold a B.S. in Computer Science from Brown University.

Presently, Brian is Chief Technology Officer for Health Market Science (HMS), where he heads the development of their big data platform focused on data management and analysis for the healthcare space. The platform is powered by Storm and Cassandra and delivers real-time data management and analytics as a service.

For my family...To my wife Lisa, We put our faith in the wind. And our mast has carried us to the clouds. Rooted to the earth by our children, and fastened to the bedrock of those that have gone before us, our hands are ever entwined by the fabric of our family. Without all of you, this ink would never have met this page.

About the Reviewers

Vincent Gijsen is essentially a people's person, and he is passionate about any stuff related to technology. His background and area of interest broadly lies in Embedded Systems Engineering and Information Science. He started his career at a marketing -research company as an IT Manager. After that, he started his own company, and specialized in VOIP communications. Currently, he works at ScienceRockstars, a start-up, which is all about persuasive profiling and large data. In his spare time, he likes to get his hands dirty with lasers, quad-copters, eBay purchases, hacking stuff, and beers.

Sonal Raj is a geek, a Pythonista, and a technology enthusiast. He is the founder and Executive Head at Enfoss. He holds a bachelor's degree in Computer Science and Engineering from National Institute of Technology, Jamshedpur. He was a Research Fellow at SERC, IISc Bangalore, and he pursued projects on distributed computing and real-time operations. He also worked as an intern at HCL Infosystems, Delhi.

He has given talks at PyCon India on Storm and Neo4J and has published articles and research papers in leading magazines and international journals.

James Xu is a committer of Apache Storm and a Java/Clojure programmer working in e-commerce. He is passionate about new technologies such as Storm and Clojure. He works in Alibaba Group, which is the leading e-ecommerce platform in China.

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Copy and paste, print and bookmark content

On demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

The demand for timely, actionable information is pushing software systems to process an increasing amount of data in a decreasing amount of time. Additionally, as the number of connected devices increases and as these devices are applied to a broadening spectrum of industries, that demand is becoming increasingly pervasive. Traditional enterprise operational systems are being forced to operate on scales of data that were originally associated only with Internet-scale companies. This monumental shift is forcing the collapse of more traditional architectures and approaches that separated online transactional systems and offline analysis. Instead, people are reimagining what it means to extract information from data. Frameworks and infrastructure are likewise evolving to accommodate this new vision.

Specifically, data generation is now viewed as a series of discrete events. Those event streams are associated with data flows, some operational and some analytical, but processed by a common framework and infrastructure.

Storm is the most popular framework for real-time stream processing. It provides the fundamental primitives and guarantees required for fault-tolerant distributed computing in high-volume, mission-critical applications. It is both an integration technology as well as a data flow and control mechanism. Many large companies are using Storm as the backbone of their big data platforms.

Using design patterns from this book, you will learn to develop, deploy, and operate data processing flows capable of processing billions of transactions per hour/day.

Storm Blueprints: Patterns for Distributed Real-time Computation covers a broad range of distributed computing topics, including not only design and integration patterns but also domains and applications to which the technology is immediately useful and commonly applied. This book introduces the reader to Storm using real-world examples, beginning with simple Storm topologies. The examples increase in complexity, introducing advanced Storm concepts as well as more sophisticated approaches to deployment and operational concerns.

What this book covers

Chapter 1, Distributed Word Count, introduces the core concepts of distributed stream processing with Storm. The distributed word count example demonstrates many of the structures, techniques, and patterns required for more complex computations. In this chapter, we will gain a basic understanding of the structure of Storm computations. We will set up a development environment and understand the techniques used to debug and develop Storm applications.

Chapter 2, Configuring Storm Clusters, provides a deeper look into the Storm technology stack and the process of setting up and deploying to a Storm cluster. In this chapter, we will automate the installation and configuration of a multi-node cluster using the Puppet provisioning tool.

Chapter 3, Trident Topologies and Sensor Data, covers Trident topologies. Trident provides a higher-level abstraction on top of Storm that abstracts away the details of transactional processing and state management. In this chapter, we will apply the Trident framework to process, aggregate, and filter sensor data to detect a disease outbreak.

Chapter 4, Real-time Trend Analysis, introduces trend analysis techniques using Storm and Trident. Real-time trend analysis involves identifying patterns in data streams. In this chapter, you will integrate with Apache Kafka and will implement a sliding window to compute moving averages.

Chapter 5, Real-time Graph Analysis, covers graph analysis using Storm to persist data to a graph database and query that data to discover relationships. Graph databases are databases that store data as graph structures with vertices, edges, and properties and focus primarily on relationships between entities. In this chapter, you will integrate Storm with Titan, a popular graph database, using Twitter as a data source.

Chapter 6, Artificial Intelligence, applies Storm to an artificial intelligence algorithm typically implemented using recursion. We expose some of the limitations of Storm, and examine patterns to accommodate those limitations. In this chapter, using Distributed Remote Procedure Call (DRPC), you will implement a Storm topology capable of servicing synchronous queries to determine the next best move in tic-tac-toe.

Chapter 7, Integrating Druid for Financial Analytics, demonstrates the complexities of integrating Storm with non-transactional systems. To support such integrations, the chapter presents a pattern that leverages ZooKeeper to manage the distributed state. In this chapter, you will integrate Storm with Druid, which is an open source infrastructure for exploratory analytics, to deliver a configurable real-time system for analysis of financial events.

Chapter 8, Natural Language Processing, introduces the concept of Lambda architecture, pairing real time and batch processing to create a resilient system for analytics. Building on the Chapter 7, Integrating Druid for Financial Analytics you will incorporate the Hadoop infrastructure and examine a MapReduce job to backfill analytics in Druid in the event of a host failure.

Chapter 9, Deploying Storm on Hadoop for Advertising Analysis, demonstrates converting an existing batch process, written in Pig script running on Hadoop, into a real-time Storm topology. To do this, you will leverage Storm-YARN, which allows users to leverage YARN to deploy and run Storm clusters. Running Storm on Hadoop allows enterprises to consolidate operations and utilize the same infrastructure for both real time and batch processing.

Chapter 10, Storm in the Cloud, covers best practices for running and deploying Storm in a cloud-provider hosted environment. Specifically, you will leverage Apache Whirr, a set of libraries for cloud services, to deploy and configure Storm and its supporting technologies to infrastructure provisioned via Amazon Web Services (AWS) Elastic Compute Cloud (EC2). Additionally, you will leverage Vagrant to create clustered environments for development and testing.

What you need for this book

The following is a list of software used in this book:

Who this book is for

Storm Blueprints: Patterns for Distributed Real-time Computation benefits both beginner and advanced users, by describing broadly applicable distributed computing patterns grounded in real-world example applications. The book presents the core primitives in Storm and Trident alongside the crucial techniques required for successful deployment and operation.

Although the book focuses primarily on Java development with Storm, the patterns are applicable to other languages, and the tips, techniques, and approaches described in the book apply to architects, developers, systems, and business operations.

Hadoop enthusiasts will also find this book a good introduction to Storm. The book demonstrates how the two systems complement each other and provides potential migration paths from batch processing to the world of real-time analytics.

The book provides examples that apply Storm to a broad range of problems and industries, which should translate to other domains faced with problems associated with processing large datasets under tight time constraints. As such, solution architects and business analysts will benefit from the high-level system architectures and technologies introduced in these chapters.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: All the Hadoop configuration files are located in $HADOOP_CONF_DIR. The three key configuration files for this example are: core-site.xml, yarn-site.xml, and hdfs-site.xml.

A block of code is set as follows:

fs.default.name

hdfs://master:8020

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

13/10/09 21:40:10 INFO yarn.StormAMRMClient: Use NMClient to launch supervisors in container.

13/10/09 21:40:10 INFO impl.ContainerManagementProtocolProxy: Opening proxy : slave05:35847

13/10/09 21:40:12 INFO yarn.StormAMRMClient: Supervisor log:

http://slave05:8042/node/containerlogs/container_1381197763696_0004_01_000002/boneill/supervisor.log

13/10/09 21:40:14 INFO yarn.MasterServer: HB: Received allocated containers (1) 13/10/09 21:40:14 INFO yarn.MasterServer: HB: Supervisors are to run, so queueing (1) containers...

13/10/09 21:40:14 INFO yarn.MasterServer: LAUNCHER: Taking container with id (container_1381197763696_0004_01_000004) from the queue.

13/10/09 21:40:14 INFO yarn.MasterServer: LAUNCHER: Supervisors are to run, so launching container id (container_1381197763696_0004_01_000004)

13/10/09 21:40:16 INFO yarn.StormAMRMClient: Use NMClient to launch supervisors in container. 13/10/09 21:40:16 INFO impl.ContainerManagementProtocolProxy: Opening proxy : dlwolfpack02.hmsonline.com:35125

13/10/09 21:40:16 INFO yarn.StormAMRMClient: Supervisor log:

http://slave02:8042/node/containerlogs/container_1381197763696_0004_01_000004/boneill/supervisor.log

Any command-line input or output is written as follows:

hadoop fs -mkdir /user/bone/lib/ hadoop fs -copyFromLocal ./lib/storm-0.9.0-wip21.zip /user/bone/lib/

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: From the Filter drop-down menu at the top of the page select Public images.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <feedback@packtpub.com>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <questions@packtpub.com> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Distributed Word Count

In this chapter, we will introduce you to the core concepts involved in creating distributed stream processing applications with Storm. We do this by building a simple application that calculates a running word count from a continuous stream of sentences. The word count example involves many of the structures, techniques, and patterns required for more complex computation, yet it is simple and easy to

Enjoying the preview?

Page 1 of 1

Storm Blueprints: Patterns for Distributed Realtime Computation

About this ebook

P. Taylor Goetz

Related authors

Related to Storm Blueprints

Related ebooks

Databases For You

Related podcast episodes

Related articles

Related categories

Reviews for Storm Blueprints

What did you think?

Book preview

Storm Blueprints - P. Taylor Goetz

Table of Contents

Storm Blueprints: Patterns for Distributed Real-time Computation

Storm Blueprints: Patterns for Distributed Real-time Computation

Credits

About the Authors

About the Reviewers

Support files, eBooks, discount offers and more

Why Subscribe?

Preface

What this book covers

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Chapter 1. Distributed Word Count