Storm Tutorial
Apache Storm becomes Top-Level Project
Storm System
Process streams of data
- Distributed
- Reliable
- Fault-tolerant
Streams and Tuples
- core abstraction in storm
- unbounded sequence of tuples
- tuple is a named list of values
- filed in tuple can be object of any type
- stream transformation done using spouts and bolts
Storm Basics
- Spout : source of streams
- Bolt : consumes input streams and can process and emit new streams
Example Code
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
- Input stream sends data to spout (here id is spout)
- Bolt acquires the data from spout (here bolt id is split)
-
Do not worry about syntax which will be explained later
Topology
Graph of computation
- Need to run topologies
- Topologies process message forever
Two types of nodes
Master and Worker Node
- Master node runs daemon called Nimbus
- Worker node runs daemon called Supervisor
Master : Nimbus
- responsible for distributing code around cluster
- assigns tasks to machines
- does monitoring for failures
Worker : Supervisor
- listens for work assigned
- starts and stops worker processes
- executes subset of topology
Zookeeper
maintains State
- storm cluster incredibly stable
Storm UI
- provides detailed information about cluster and topology
- detailed spout and bolt component information
Components to run Storm
- Zookeeper
- Nimbus
- Supervisor
- UI (to see detailed information)
- Storm (apache project)
- Jar file
- Topology
To run storm on local machine download storm and zookeeper
- Run Zookeeper in Zookeeper directory (eg: zookeeper-3.4.6) using
.\bin\zkServer.cmd
- Run Storm Nimbus, Supervisor and UI in Storm Home directory
storm nimbus storm supervisor storm ui
- Need to run zookeepr first before running nimbus, supervisor and ui
- Running Apache Storm on Windows
- Works on Windows 8.1
Code from Storm Starter Word Count Topology
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
- creates a topologybuilder and sets the spouts and bolts
- we will analyze this code in the coming slides
Topology Builder
- exposes the Java API for specifying a topology for Storm to execute
- eases the process of creating topologies
-
TopologyBuilder builder = new TopologyBuilder();
- creates a topology builder object
Set Spout method
- defines a new spout in the topology
-
builder.setSpout("spout", new RandomSentenceSpout(), 5);
- first argument is the id of the component
- second argument is the actual spout
- thrid argument is the number of taks assigned to execute the spout parallelism hint
Set Bolt method
- defines a new bolt in the topology
-
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
- first argument is the id of the component
- second argument is the actual spout
- thrid argument is the number of taks assigned to execute the spout parallelism hint
- similar to spout
Stream Grouping
- tells storm how to send tuples between tasks
- task is an instance of spout or bolt
-
Shuffle Grouping : Tuples are randomly distributed across the bolt’s tasks in a way such that each bolt is guaranteed to get an equal number of tuples
-
Fields Grouping : Tuples with same 'id' field will go to same task