SEQ2com.cloudera.flume.handlers.hdfs.WriteableEventKey/com.cloudera.flume.handlers.hdfs.WriteableEventP~mp/&jd&jd;a blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypebeg AckChecksum&jdP~mp/&jdGetting Started with Flume&jd9 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumFP~mp/&jd==========================&jd;o blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum|TTP~mp/&jd!Jonathan Hsieh &jd;qK blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum QP~mp/&jdv0.2.0, 1/25/10&jd;rƄ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum_hP~mp/&jd9(c) Copyright (2009), Cloudera Inc. All rights reserved.&jd;t6 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumTUTP~mp/&jd&jd;u blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd*// Don't know how to get multiple authors.&jd;w blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumz/ P~mp/&jd// Don't know how to get a TOC&jd;y# blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum~@P~mp/&jd&jd;zlA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd[[Introduction]]&jd;{ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumsvP~mp/&jd== Introduction&jd;}X blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd&jd;~ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdCFlume is a distributed, reliable, available service for efficiently&jd;y blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum fP~mp/&jdDmoving large amounts of data as it is produced. We're concentrating&jd;Y3 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumVyC(P~mp/&jdBon reliable logging, but really it's a high-performance conduit to&jd;; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumo1P~mp/&jdship data around a cluster. &jd;O blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum?7P~mp/&jd&jd;% blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdEThe primary use case is as a logging system that tails a bunch of log&jd; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumm_P~mp/&jdWe have a simple, but flexible, data model. Events - which are&jd;ؗ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumMP~mp/&jd@unstructured blobs - can be annotated with attributes, which are&jd;3 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumVP~mp/&jdEkey-value pairs. We can serialize this in pretty much any format you&jd; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum6P~mp/&jd like when the data leaves Flume.&jd;1 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd&jd;| blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdEFlume's architecture is simple and robust. A typical deployment might&jd;[ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumIP~mp/&jdFhave three-tiers, the first being a Flume agent tier (leaf nodes) that&jd;B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd@forward to a second tier of Flume collector nodes, which in turn&jd;r blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdFaggregate the data and then forward to a storage tier such as Hadoop's&jd; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum["P~mp/&jd@hdfs. Flume agents are typically installed on the machines that&jd;} blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumzD3P~mp/&jd>generate the data and are Flume's initial point of contact for&jd; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumE+P~mp/&jdDdata. Once data gets acked by a Flume agent node, we guarantee that,&jd; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumǐP~mp/&jdEas long as the node remains live, the data will eventually get to its&jd; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum.P~mp/&jdEfinal destination. As Flume forwards the data to upstream nodes, some&jd;* blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum`rP~mp/&jdFintermediate processing can be forwarded before data is forward again.&jd; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum!P P~mp/&jd&jd; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd@One nice feature of Flume is that the configuration is centrally&jd;# blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksummP~mp/&jdAmanaged. Nodes can be reconfigured dynamically without having to&jd;K blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdFactually ssh'ing to a machine, tweaking some conf files and restarting&jd;t blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumtP~mp/&jdDa daemon. This gives the system flexibility to dynamically tolerate&jd;xn blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumAJoP~mp/&jdFnetwork failures, and flexibility to change deployment architecture to&jd; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum QP~mp/&jd best suit your network topology.&jd;Fc blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd&jd;9 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdCThis document describes using and setting up Flume in three stages.&jd;g blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumƯIP~mp/&jdAThe first configures a Flume node and gets it working on a single&jd;@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumipP~mp/&jdCmachine. The second explains how to configure a pseudo-distributed&jd;S blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumkQP~mp/&jd?multi-node topology on the localhost. Finally, the third stage&jd>; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum>*P~mp/&jd>addresses distributed deployment and introduces mechanisms for&jd>X blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumPP~mp/&jd0deploying Flume onto other nodes in the cluster.&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumRP~mp/&jd&jd>1 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd[[Quickstart]]&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum۝P~mp/&jd == Quickstart&jd>M blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumKP~mp/&jd&jd>5 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdDIn this section we get a single Flume node up and running and verify&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum"NP~mp/&jdEthat the basic prerequisites for Flume are installed and working. We&jd>C blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumvgP~mp/&jdGassume that you have installed Flume (by untarring it) to the `~/flume`&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum\RP~mp/&jd directory.&jd>!'D blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumhP~mp/&jd&jd>"^ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd.Prerequisites&jd># blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumXP~mp/&jd****&jd>%X blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumoVP~mp/&jd* Flume&jd>& blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumsNP~mp/&jdP* A Linux system (tested on Centos 5.3 and Ubuntu 9.04, should work on Mac OS X)&jd>(r blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd)* Java 1.6 (only tested with Sun JRE 1.6)&jd>*FP blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum0];P~mp/&jd****&jd>+U blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumoVP~mp/&jd&jd>, blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdAWe will start start by seeing if we can get a Flume node running.&jd>. blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumUeP~mp/&jdBAfter this we will introduce a handful of data *sources*, and then&jd>0M blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumz)P~mp/&jd-describe how to arbitrarily configure a node.&jd>2Z blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumjt͢P~mp/&jd&jd>3 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd"=== Sources and the `dump` command&jd>5Q blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd&jd>6 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdEFirst lets get a Flume node up that dumps data written to the console&jd>8 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum|P~mp/&jdCstdin back out to the console stdout. We can do this by running the&jd>:B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumI=P~mp/&jdfollowing command.&jd>=l blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd----&jd>> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd$ bin/flume dump console&jd>@@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum]THP~mp/&jd----&jd>Ad blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd&jd>Bu blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdFTIP: The Flume program has the general form `bin/flume [args&jd>DK blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksume]P~mp/&jd...]`. &jd>E\ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumLfsP~mp/&jd&jd>F blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd@TIP: The command above is using the `dump` command and `console`&jd>H& blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum) P~mp/&jd@is its argument. Its general form is `bin/flume dump `.&jd>JS blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumYP~mp/&jd&jd>K blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdFWe have started a Flume node where `console` is the source of incoming&jd>MO0 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum; -P~mp/&jdEdata. When you run it you should see some logging messages displayed&jd>O' blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum-GP~mp/&jd@to the console. For now, you can ignore messages about masters,&jd>PN blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumꙡ8P~mp/&jdFback-off and failed connections (we'll touch on this later). When you&jd>Rv blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum\P~mp/&jdFtype at the console and hit a new line, you should see a new log entry&jd>Tt blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumԺP~mp/&jdAline with the data that you typed appear. If I entered `This is a&jd>V: blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumcP~mp/&jd%test` it should look similar to this:&jd>W blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumA8P~mp/&jd&jd>X blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd----&jd>Z9 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd;hostname [INFO Thu Nov 19 08:37:13 PST 2009] This is a test&jd>[ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum1#zP~mp/&jd----&jd>]: blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd&jd>^c\ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd#To exit the program, simply hit ^C.&jd>_ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum_P~mp/&jd&jd>a blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd@// TODO there are actually too many irrelevant events right now.&jd>bύ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumQ|P~mp/&jd8// Need to turn of heartbeat when using one shot option.&jd>dy blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum ]P~mp/&jd&jd>e blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd ==== `text`&jd>f blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumO"LP~mp/&jd&jd>h" blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd@We can also specify other sources of events. For example, if we&jd>i' blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum։P~mp/&jdCwanted a text file where each line represents a new event, we could&jd>k blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum6P~mp/&jdrun the following command. &jd>m" blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumZwP~mp/&jd&jd>n` blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd----&jd>o/ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd&$ bin/flume dump 'text("README.html")'&jd>q) blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumN%hP~mp/&jd----&jd>r~ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd&jd>s blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdGThis command reads the file, and then outputs each line as a new event.&jd>u{ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum"EP~mp/&jd&jd>v blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdDNOTE: You can try this with other files such as `/var/log/messages`,&jd>xc blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum2P~mp/&jdB`/var/log/syslog`, or `/var/log/hadoop/hadoop.log` also. However,&jd>z* blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumȟP~mp/&jd>Flume must run with appropriate permissions to read the files!&jd>{ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum\P~mp/&jd&jd>} blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd ==== `tail`&jd>~k blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum_P~mp/&jd&jd>q blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd?If we wanted to tail a file instead of just reading it, we just&jd>H blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumO8P~mp/&jdDspecify another source -- this time we use `tail` instead of `text`.&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum]P~mp/&jd&jd>; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd5// TODO (jon) replace with echo >> temp file example.&jd>) blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumQ8P~mp/&jd&jd>q blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd----&jd>V blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd#$ bin/flume dump 'tail("testfile")'&jd>R blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum\+P~mp/&jd----&jd>9 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd&jd>_) blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdEThis pipes data from the file into Flume and then out to the console.&jd>+g blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumPP~mp/&jd&jd>V blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdiThere will be a message that says "File 'testfile' does not currently exist, waiting for file to appear".&jd>] blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum|P~mp/&jd&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdCIn another terminal, we can create and write data to the file. New&jd>U blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum.P~mp/&jddata will appear.&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum)P~mp/&jd&jd>S blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd----&jd>)W blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd$ echo Hello world! >> testfile&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumOOP~mp/&jd----&jd>6 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd &jd>+h blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumlEP~mp/&jdIf we delete the file,&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumw&P~mp/&jd&jd>, blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd----&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd $ rm testfile&jd>Y blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumv P~mp/&jd----&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdFthe `tail` sink will detect this. If we then recreate the file again,&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd9the `tail` source will detect the new file and follow it.&jd>j- blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum*P~mp/&jd&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd----&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd$ echo Hello again! >> testfile&jd>\& blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumPP~mp/&jd----&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd&jd>, blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdAYou should see your new message appear in the Flume node console.&jd>; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum\`P~mp/&jd&jd>T blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd==== Synthetic sources, `synth`&jd>2 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum? P~mp/&jd&jd>Y blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jdDHere's one more example where we use the `synth` sources to generate&jd>) blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum~P~mp/&jdevents.&jd>u blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum͛P~mp/&jd&jd>u blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jd----&jd> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jd $ bin/flume dump 'synth(100,10)'&jd>k blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumC P~mp/&jd----&jd>Y blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je5You should get 100 events, each with 10 random bytes.&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum"2P~mp/&je&je?U blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je{P~mp/&je&je?+ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je=== Section summary&je?J blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumRYLP~mp/&je&je?{ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeEWe talked through some basic sources, but Flume include several other&je?u blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum^ 54P~mp/&je7event sources. The table below summarizes some of them.&je?) blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je&je?P blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeB// These tables are kinda gross, had to use asciidoc v8.2.7 style.&je?b blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksuma%P~mp/&je.Flume Event Sources&je?{ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumj6ÛP~mp/&je [grid="all"]&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&jet`--------------------------------`----------------------------------------------------------------------------------&je? B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumKNI P~mp/&je,name description&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je`------------------------------------------------------------------------------------------------&je?6 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&je/`console` Stdin console &je?dX blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum$DP~mp/&jeS`text("filename")` One shot text file source. One line is one event &je?O blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum,paiP~mp/2&je`tail("filename")` Similar to Unix's `tail -F`. One line is one event. Stays open for more data and follows filename if file rotated. &je?л blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumWQP~mp/&jeY`seqfile("filename")` Serialized Flume events in Hadoop sequence file format. &je?׶ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumL pP~mp/&jeY`syslogUdp(port)` Syslog over udp port, port. This is syslog compatible. &je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum֠P~mp/&je[`syslogTcp(port)` Syslog over tcp port, port. This is syslog-ng compatible. &je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumlozP~mp/ &jer`synth(msg_count,msg_size)` A source that synthetically generates msg_count random messages of size msg_size &je?O blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumZP~mp/C&je`nonlsynth(msg_count,msg_size)` A source that synthetically generates msg_count random messages of size msg_size. This converts all `\'\n\'` chars into `\' \'` chars.&je?  blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je`------------------------------------------------------------------------------------------------&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&je&je?A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je== Pseudo-Distributed Mode&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumuMP~mp/&je&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeBFlume is intended to be run as a distributed system with processes&je?f blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum3P~mp/&jeAspread out across _many_ machines. It can also be run as several&je?e blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum}P~mp/&jeEprocesses on a _single_ machine. We call this ``pseudo-distributed''&je?5 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumZP~mp/&jeFmode. This is useful for debugging Flume configurations and getting a&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum*P~mp/&je-better idea of how Flume components interact.&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum^{P~mp/&je&je?\ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeFIn the previous section, we had a single Flume node and introduced the&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum4XP~mp/&je>concept of Flume sources. Now, we introduce some new concepts&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumk-P~mp/&je9required for a distributed setup. This include the Flume&je?M blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumDDP~mp/&je?*configuration master* server, the configuration of sources and&je?A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum.=P~mp/&je/*sinks*, and connecting multiple Flume *nodes*.&je?d blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum>zP~mp/&je&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je-=== Starting pseudo-distributed Flume daemons&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum$(P~mp/&je&je? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeEThere are two kinds of process in the system: the Flume configuration&je?z> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumlEP~mp/&jeF*master* and the Flume *node*. The Flume configuration master (master&je?A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum)uP~mp/&jeP~mp/&je &je @1 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je ----&je @3H blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je $ bin/flume master&je @4] blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum#\P~mp/&je ----&je @6b blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je &je @7F blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je ?Once it is started, one can access the master by pointing a web&je @9#v blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum]P~mp/&je #browser to http://localhost:35871/.&je @;k blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum+0P~mp/&je &je @E blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumTP~mp/&je &je @?2 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je =This webpage displays the status of the Flume nodes that have&je @AM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&^( P~mp/&je Bcontacted the master, and shows each node's current configuration.&je @DI blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumzP~mp/&je EWhen we start this up without and Flume nodes running, the status and&je @Ft blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&P~mp/&je configs tables will be empty.&je @V blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je &je @XI blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je DTo start a Flume node, we can just invoking the following command in&je @Z+ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum搬P~mp/&je another terminal.&je @r) blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&je &je @uS blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je ----&je @v7 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je $ bin/flume node &je @x/ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumǖ'P~mp/&je ----&je @y blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je&je@z¾ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeCWe can see if the Flume node us up by is by pointing the browser to&je@| blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&jeFthe node's status server, http://localhost:35862/. It should by bring&je@~ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeFup a simple web page with a configuration version (sometime in 1969 or&je@n blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumyP~mp/&jeD1970 depending on you time zone), a source config, and a sink config&je@\ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum3MP~mp/&je+field. The status field should be "HELLO".&je@  blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumu_UP~mp/&je&je@M blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je(// TODO It should go from HELLO to IDLE.&je@, blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum~P~mp/&je&je@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeAWe should also check to make sure that the node has contacted the&je@L blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumuNx P~mp/&jeDmaster by checking the config master page and seeing if the node has&je@/ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum-LP~mp/&jeDshown up in the tables. The node column should have the name of the&je@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum*)QP~mp/&jeFnode. This should be the same as running `hostname` from Unix prompt.&je@׻ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum7P~mp/&jeAIn the rest of this documentation I will assume the node is named&je@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumrP~mp/&je `_example_`.&je@\ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum?%P~mp/&je&je@7 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je%=== Configuring a node via the master&je@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je&je@Ds blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeDHaving nodes content the master to get their configuration allows us&je@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumWP~mp/&jeBto dynamically change the configuration of nodes without having to&je@z blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeBlogin to the remote machine to restart the daemon. We can quickly&je@ŏ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum"8P~mp/&je?change the node's previous configuration to a new one. In this&je@P blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum4VP~mp/&je@subsection we show how to configure nodes using the master's web&je@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumt0P~mp/&je interface.&je@T blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je&je@̞ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeFIf we go to the master's web page and click on the config link, we are&je@| blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeApresented with two forms. These are web interfaces for setting a&je@h blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum0:P~mp/&jeBconfiguration on the master. When Flume nodes contact the master,&je@I blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumaP~mp/&jeP~mp/&jeSink: `console` &je@A9 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumR9eP~mp/&je----------------------------&je@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumL9P~mp/&je&je@[ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeBIf you go back to the master page you will notice that the version&je@# blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumWD1P~mp/&jeDstamp has change to a current time, and that the src and sink fields&je@uK blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeBof the configs has been updated. Status will eventually change to&je@4q blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksuma]P~mp/&jeEbecome "ACTIVE". When it is, it is ready to receive console traffic.&je@; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum0{P~mp/&je&je@1& blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeFIf you go to the terminal where your Flume node is running, you should&je@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumCLP~mp/&jeEbe able to type a few lines and then get output back showing your new&je@Ƣ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum"A6P~mp/&je log message.&je@+5 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je&je@_ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je [grid="all"]&je@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&je`----------`----------------&je@' blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumYP~mp/&jeNode name: `_example_`&je@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumWP~mp/&je Source: `text("README.html")`&je@4Y blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum* P~mp/&jeSink: `console` &je@1 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumR9eP~mp/&je----------------------------&je@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumL9P~mp/&je&je@Ek blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeENOTE: You may need to hit enter in the Flume node console. (This is a&jeA L blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumb yP~mp/&jebug that will get fixed.)&jeA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum>0+P~mp/&je&jeA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je"Or these new values to tail a file&jeA0_ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum^0P~mp/&je&jeAd blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je [grid="all"]&jeAh blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&je`----------`----------------&jeA , blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumYP~mp/&jeNode name: `_example_`&jeA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumWP~mp/&je Source: `tail("README.html")`&jeA  blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeSink: `console` &jeA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumR9eP~mp/&je----------------------------&jeAK blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumL9P~mp/&je&jeA@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeDGreat! We can now change the configuration of different nodes in the&jeA" blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumNnDP~mp/&jeDsystem to gather data from a variety of sources by going through the&jeA[ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum44=P~mp/&jemaster!&jeAV blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je&jeA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je=== Introducing sinks&jeA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumU P~mp/&je&jeA= blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je?Thus far, we have seen that Flume has a variety of sources that&jeAے blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumMP~mp/&jeDgenerate or accept new events that are fed into the system. We have&jeAr blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum:"сP~mp/&jeElimited out where these messages go so far to the `console` sink. As&jeAk blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumhP~mp/&jeEyou would expect, Flume also provides a wide variety of event *sinks*&jeA % blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksummˠP~mp/&je#-- destinations for all the events.&jeA"Dx blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum0RkP~mp/&je&jeA#o5 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeFThere are many possible destination for events -- to disk, to hdfs, to&jeA%7[ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumncZP~mp/&je?the console, or forwarding across the network. We use the sink&jeA&< blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum!P~mp/&je?abstractions an interface for forwarding events to any of these&jeA( blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumŅP~mp/&je destinations.&jeA*h blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum B P~mp/&je&jeA+>$ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je?We now can just connect different sources to different sinks by&jeA- blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumxHP~mp/&jeFspecifying the new configuration and submitting it to the master. For&jeA. blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum4k>2P~mp/&jeCexample would essentially make a copy of `README.html` (modulo some&jeA0 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum;P~mp/&jeformatting changes).&jeA2 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum'iFyP~mp/&je&jeA3d blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je [grid="all"]&jeA4 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&je`----------`----------------&jeA6Gp blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumYP~mp/&jeNode name: `_example_`&jeA7z blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumWP~mp/&je Source: `text("README.html")`&jeA9\' blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum* P~mp/&je!Sink: `text("README.copy")` &jeA:a blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumnؒP~mp/&je----------------------------&jeA<~= blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumL9P~mp/&je&jeA= blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeBWARNING: The `text`, `seqfile` and `dfs` sinks overwrite if a file&jeA?Q blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumXeP~mp/&jepreviously exists.&jeAA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je&jeAB6 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeDWe actually support several sinks that you can choose from. Here is&jeAD_ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumzo;P~mp/&jeGan abridged listing of sinks to try. (There are more in the appendix!)&jeAFf blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumgΝ P~mp/&je&jeAG. blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeCWARNING: All `_host_` arguments currently use the ip or dns name of&jeAI blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum.^P~mp/&jeEthe node as opposed to the Flume node name (set by `-n name` option).&jeAKn blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum[)nP~mp/&jeEUnless overridden by command line, the default the Flume node name is&jeAMR blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumPP~mp/&jethe host name.&jeANǃ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&P~mp/&je&jeAO blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je.Flume Event Sinks&jeAQy2 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum~P~mp/&je [grid="all"]&jeARg blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&jex`------------------------------------`----------------------------------------------------------------------------------&jeAU% blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je0name description&jeAVI blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumvHP~mp/&je`------------------------------------------------------------------------------------------------&jeAX blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&jeC`null` Null sink. Events are dropped.&jeAZ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum-e\5P~mp/&jeP`console` Console sink. Display to console's stdout.&jeA\› blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&jeS`text("_txtfile_")` Textfile sink. Write to text file `_txtfile_`&jeA^ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum>~P~mp/.&je`seqfile("_seqfile_")` Seqfile sink. Write serialized Flume events to local file system file `_seqfile_` in Hadoop's seqfile format.&jeAa9p blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumPP~mp/L&je`dfs("_dfsfile_")` DFS seqfile sink. Write serialized Flume events to a dfs path such as `hdfs://namenode/file` or `file:///file` in Hadoop's seqfile format. &jeAc7 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksuml,P~mp/-&je`syslogTcp("_host_",_port_)` Syslog tcp sink. Forward to events to `host` on tcp port `port` in syslog wire format (syslog-ng compatible)&jeAf blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumE40P~mp/&je`------------------------------------------------------------------------------------------------&jeAh blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&je&jeAj blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeCWARNING: Using `dfs` has some restrictions. See the more details in&jeAk blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum#P~mp/&jethe Trouble shooting section.&jeAm} blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumoZuP~mp/&je&jeAnh blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je=== Aggregated Configurations&jeApk blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumnP~mp/&je&jeAq` blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeCUsing the the simple for configuring a handful of machines is quite&jeAs blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeEmanageable, but when we get to tens or hundreds of machines, we would&jeAu blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum\6P~mp/&jeElike maintain or auto generate the configuration for all the machines&jeAwqi blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumuP~mp/&je@in a single file. To do this we provide a mechanism for setting&jeAyQ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum7zNP~mp/&jeDconfigurations of many machines in a single aggregated configuration&jeA{@0 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum$ P~mp/&je submission.&jeA|} blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum*DP~mp/&je&jeA}S blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je>Let's start using configurations from the previous subsection.&jeA֪ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum?bP~mp/&jeAInstead of the method we used from the "Configuring nodes via the&jeA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumh&P~mp/&je?master", we could put the following configuration line into the&jeA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumAh}P~mp/&je'"Configure nodes" form and then submit:&jeAK~ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumͺFP~mp/&je&jeA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je----&jeA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je)example : text("README.html") | console ;&jeA; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum|ņ!P~mp/&je----&jeA  blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je &je A] blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je Or &je A{ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum(SP~mp/&je &je A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je ----&je A_ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je 3example: text("README.html") | text("README.copy");&je A0 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksume#P~mp/&je ----&je A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je &je Aҷ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je AThe general format only adds a little syntax and looks like this:&je AŲ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum<P~mp/&je &je A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je ----&je Afw blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je  : | ;&je A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%4P~mp/&je! : | ;&je!A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je! : | ;&je!A)$ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum-;P~mp/&je!...&je!Aq blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumCP~mp/&je!----&je!A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je!&je!A, blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je!=From here on out, we will use this format to configure nodes.&je!A / blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je!&je!AP blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je!.=== Tiering Flume nodes: Agents and Collectors&je!Aq blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum3 P~mp/&je!&je!Ae blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je!AA simple network connection is abstractly just another sink. One&je!A? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum5j}P~mp/&je">would hope that sending events over the network would be easy,&je"A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je" blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum*>P~mp/&je(C(`-r`). These two extra ports would cause contention problems with&je(B # blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumTHP~mp/&je(Ethe `_example_` node already running on the machine. (We'll describe&je(B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum3P~mp/&je(,these options more in the advanced section.)&je(B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum,P~mp/&je(&je(B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je(----&je(BA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je(#$ bin/flume node -n collector -r -s&je(B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum*UP~mp/&je(----&je(BL blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je)&je)B* blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je)@When we look at the master's again, we should eventually see two&je)BS blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum죣P~mp/&je)#nodes: `_example_` and `collector`.&je)B1 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&P~mp/&je)&je)Bi blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je)CLet's configure `collector` to take on the role of a collector, and&je)BKs blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je)Flets setup `_example_` to send data from the console to the collector.&je)B* blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumpXP~mp/&je)EWe'll use the aggregated multiple configuration form. The agent uses&je)B! blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je)@the `agentSink`, a high reliability network sink. The collector&je)B" blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumұcP~mp/&je)Fnode's source is configured to be a `collectorSource`, and its sink is&je)B$ޣ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum꜄P~mp/&je*'configured to be the `console` for now.&je*B&~ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum^$mP~mp/&je*&je*B' blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je*----&je*B) blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je*example : console | agentSink ;&je*B* blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum:P~mp/&je*.collector : collectorSource(35863) | console ;&je*B,X blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumoGuP~mp/&je*----&je*B-d blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je*&je*B/ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je*FNow when you type lines in `_example_`'s console, events are forwarded&je*B0L blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je*Cto the collector. Currently, there is a bit of latency (15s or so)&je*B2 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je+>before the forwarded message shows up on `collector`. This is&je+B7 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je+Dactually a configurable setting whose default is set to a value that&je+B9) blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum`83P~mp/&je+Eis amenable for high event throughputs. We'll talk about how to tune&je+B;lF blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumPܺP~mp/&je+Flume in subsequent sections.&je+B=.l blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je+&je+B> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je+BGreat! You have made a event flow from an agent to the collector.&je+B@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumĞ=P~mp/&je+&je+BBbQ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je,BA more interesting setup would have the agent tailing a local file&je,BDe blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumH/P~mp/&je,C(using the `tail` source) or listening for local syslog data (using&je,BF1 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumKv3P~mp/&je,?the `syslogTcp` or `syslogUdp` sources and modifying the syslog&je,BG blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumtVP~mp/&je,>daemon's configuration). Instead of writing to a console, the&je,BI blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumSP~mp/&je,?collector would write to a `collectorSink`, a smarter sink that&je,BK blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum0P~mp/&je,7periodically rotates files and writes to disk or hdfs. &je,BM= blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumsMP~mp/&je,&je,BNj' blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je,ABelow would be the configuration for a agent listening for syslog&je,BP3e blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum %P~mp/&je,Fmessages that forwards to a collector writing files to local directory&je,BR+ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum{P~mp/&je-`/tmp/flume/collected/`.&je-BS blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumaC(P~mp/&je-&je-BT blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je-----&je-BVU blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je-&example : syslogTcp(514) | agentSink ;&je-BW blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum;P~mp/&je-Kcollector : collectorSource | collectorSink("file:///tmp/flume/collected");&je-BY blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumXP~mp/&je-----&je-B[ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je-&je-B\P= blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je-BHere's a slightly modified one that would write to an hdfs cluster&je-B^< blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum 3:P~mp/&je-2(assuming the hdfs namenode is called `namenode`):&je-B` blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum؍P~mp/&je-&je-Ba? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je.----&je.Bb blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je.&example : syslogTcp(514) | agentSink ;&je.Bd. blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum;P~mp/&je.Tcollector : collectorSource | collectorSink("hdfs://namenode/user/flume/collected");&je.BfC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumaP~mp/&je.----&je.Bg blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je.&je.Bh blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je.=== Section summary&je.BjK blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumRYLP~mp/&je.&je.Bks blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je.=We have now walked through starting a master, some nodes, and&je.BmV blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumɼXP~mp/&je.Fconfiguring the nodes to forward message from one node to another. We&je.Bo' blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumgy.P~mp/&je.@also have used some roles for sources and sinks to connect nodes&je.BpZ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP|`4P~mp/&je/Ftogether. You now have an understanding of the basics of setting up a&je/Br> blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum!7:P~mp/&je/Fset of Flume nodes. Here's the new sources and sinks we introduced in&je/Bt blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumkqP~mp/&je/this subsection.&je/Bv9 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je/&je/Bw` blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je/B// These tables are kinda gross, had to use asciidoc v8.2.7 style.&je/ByP blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksuma%P~mp/&je/.Flume's Tiered Event Sources&je/Bz* blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum{P~mp/&je/ [grid="all"]&je/B{ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&je/x`------------------------------------`----------------------------------------------------------------------------------&je/B~ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je/0name description&je/BR blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumvHP~mp/&je0`------------------------------------------------------------------------------------------------&je0B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/V&je0`collectorSource[(_port_)]` Collector source. Listens for data from agentSinks forwarding to port `_port_`. If port is not specified, the node default collector tcp port, 35863.&je0B1 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumMzP~mp/&je0`------------------------------------------------------------------------------------------------&je0B! blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&je0&je0BVn blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je0.Flume's Tiered Event Sinks&je0B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumwP~mp/&je0 [grid="all"]&je0B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&je0x`------------------------------------`----------------------------------------------------------------------------------&je0B84 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je00name description&je0B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumvHP~mp/&je1`------------------------------------------------------------------------------------------------&je1B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/\&je1`agentSink(["_collector_"[, port]])` Reliable agent sink. Forward events to `collector` over tcp port `port`. If collector or port are not specified, they will default to localhost and 35863.&je1By blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumͿ6P~mp/!&je1`collectorSink("_dfsdir_")` Collector sink. Rotating file write to dfs path such as `hdfs://namenode/dir` or `file:///dir` &je1B4 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumxE.0P~mp/&je1`------------------------------------------------------------------------------------------------&je1B1 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&je1&je1B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je1== Fully-Distributed Mode&je1B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je1&je1B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je1=The main goal for Flume is to collect logs and data from many&je1BUz blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumZtP~mp/&je1?different hosts and to scale and intelligently handle different&je1B7 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum{ 6P~mp/&je2 cluster and network topologies. &je2Bf blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum9P~mp/&je2&je2B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je2CTo install Flume on your cluster, the following steps must be done.&je2B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumr,P~mp/&je2&je2B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je2).Steps to deploy Flume across the cluster&je2Bq blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumVuP~mp/&je23* the Flume files need to be copied to each machine&je2Bc blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumEhP~mp/&je2 * select a node to be the master&je2B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum zFP~mp/&je2D* modify a static configuration file to use site specific properties&je2BN blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumbP~mp/&je2/* start the Flume master node on *one* machine &je2B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumo {5P~mp/&je2)* start the Flume node on *each* machine.&je2Bf blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum;P~mp/&je3&je3B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je3DWe will start by explaining how to manually configure the properties&je3Bi blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum{ TP~mp/&je3Ffile. Then we describe how to deploy Flume using a helper script that&je3Bz blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum|dP~mp/&je3Fwe provide (`deploy.py`) to help you deploy Flume across nodes on your&je3B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je3network.&je3B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumcbP~mp/&je3&je3B9 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je3=== Static configuration files&je3B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je3&je3B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je3BIn the previous sections, we used Flume out-of-the-box on a single&je3B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumgaP~mp/&je3Fmachine. The defaults are setup so that nodes to automatically search&je3BgG blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumqP~mp/&je4Dfor their master on `localhost` on a known port. For Flume nodes to&je4Bn- blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je4Dfind the master in a fully distributed setup, we need a place to set&je4BK blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumQ`P~mp/&je4>site specific static configuration settings. (This is why the&je4BM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum2P~mp/&je4.quickstart had some at the time odd messages).&je4B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumpƫP~mp/&je4&je4B7z blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je4FSite specific options for Flume nodes and masters are configured by by&je4B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je4Eproperties set in the `conf/flume-site.xml` file. If this file is not&je4B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumr&je5Ba blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum'[%&je5B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumwgMP~mp/&je5&je5B0A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je6&je6Bܒ_ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumK%P~mp/&je6 &je6B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum=P~mp/&je6) flume.config.master.addr&je6Bf( blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum)P~mp/&je6 master&je6B0 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumLhP~mp/&je6P This is the address or dns name for the config servers / status&je6B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum 3P~mp/&je6 server (http)&je6B| blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum-SP~mp/&je6 &je6B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum$P~mp/&je6 &je6Bd blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumNP~mp/&je7&je7Bt blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumNP~mp/&je7----&je7BY blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je7&je7B3 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je7EWhen we are using agent/collector roles, we can use the configuration&je7B blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum]RP~mp/&je7~to setup the default hosts used as collector. We can do this by adding the following properties to our flume-site.xml file. &je7B&% blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je8&je8BS blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je8----&je8B\ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je8...&je8B` blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumCP~mp/&je8 &je8B7 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum=P~mp/&je8* flume.colletor.event.host&je8CB blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum/P~mp/&je8 collector&je8C blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum[P~mp/&je9> This is the host name of the default "remote"&je9C u blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumLP~mp/&je9 collector. &je9C blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum'P~mp/&je9 &je9C ,6 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum$P~mp/&je9 &je9Ck blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumNP~mp/&je9&je9C blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je9 &je9C blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum=P~mp/&je9% flume.collector.port&je9C blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum)P~mp/&je9 35863&je9C/ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumhJoP~mp/&je9D This default tcp port that the collector listens to&je9C blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumѸ18P~mp/&je90 in order to receive events it is collecting.&je9C blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum4|GP~mp/&je: &je:C&" blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum$P~mp/&je: &je:C blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumNP~mp/&je:&je:C blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je:...&je:CGM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumCP~mp/&je:----&je:CT blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je:&je:C blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je:AThis will make the `agentSink` with no arguments default to using&je:C!N blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum7P~mp/&je:@`flume.collector.event.host` and `flume.colletor.port` for their&je:C#L- blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum,˾SP~mp/&je:default target and port.&je:C$ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumTQP~mp/&je:&je:C% blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je:=== Automated Deployment&je:C'e) blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumdPP~mp/&je;&je;C( blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je;&je;C)F blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je;:There are two ways you can currently install Flume on your&je;C+x blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumv0+P~mp/&je;Ccluster. The first is to download the flume archive to each machine&je;C-;a blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumLP~mp/&je;Cindividually, manually configure various directories and then start&je;C.A blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumZbP~mp/&je;Flume from the command line.&je;C0 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumAzP~mp/&je;&je;C1 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je;>Alternately, you can set use our deployment script to install,&je;C3jD blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum:iP~mp/&je;=configure and start Flume nodes in your cluster. This script&je;C5 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumnP~mp/&je;!introduces some new dependencies.&je;C6D blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksump2P~mp/&je<&je<C7 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je<#.Automated Deployment Prerequisites&je<C9U9 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumOP~mp/&je<****&je<C:= blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumoVP~mp/&je<"* Java 1.6.x (tested on JRE 1.6.x)&je<C< blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je<* Python 2.4.x &je<C=~ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je<R* ssh access to target machines (preferably "password-less" ssh-agent based login)&je<C?S blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumKP~mp/&je<****&je<C@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumoVP~mp/&je<&je<CA blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je<@`deploy.py` is a Python driver for the `scp` / configure / start&je<CC}r blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumK͚P~mp/&je<Cprocess required to get Flume onto the machines in your cluster. In&je<CEP blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumjP~mp/&je=Dorder to use it successfully you must have ssh credentials for every&je=CGG blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum>3̟P~mp/&je=Cmachine you wish to deploy on. Installation will be much easier if&je=CI` blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum-YP~mp/&je=7this is configured to use password-less authentication.&je=CJ@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumFP~mp/&je=&je=CL blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je=ETo use `deploy.py`, you can either specify a single deployment target&je=CMS blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumSnP~mp/&je=Afrom the command line, or you can create a comma-separated-values&je=COJ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum7P~mp/&je=F(CSV) file which contains details of multiple targets. The script then&je=CQd blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumW'VP~mp/&je=Ccopies a Flume tarball to all the deployment targets, followed by a&je=CS blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum:XP~mp/&je=Ccopy of `flume-conf.xml` which contains site-specific configuration&je=CT blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum^ʵ?P~mp/&je>Binformation. Finally, it copies `flume-site.xml` which tells Flume&je>CV blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumEXP~mp/&je>where to find the master node.&je>CX& blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum oP~mp/&je>&je>CYL blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je>>`deploy.py` requires several pieces of information per target:&je>C[X blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum3P~mp/&je>&je>C\[ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je>:* *Host* is the name of the node, used as the node for ssh&je>C^ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumnڱP~mp/&je>j* *Directory* is the parent directory into which Flume is copied and uncompressed (defaults to /tmp/flume)&je>C`9 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumRP~mp/&je>S* *User* is the username used to connect to the node (defaults to the current user)&je>Ca־ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum'P~mp/7&je>* *Nodetype* is a comma-separated list of the kind of Flume nodes to run - these can be 'master' or 'node' (this defaults to 'node' - more about this later)&je>Cd.e blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumUP~mp/&je?&je?CeVg blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je?&je?Cf blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je?/////&je?Cgʳ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum>IP~mp/&je?&je?Ci blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je?#// hide this complication for now.&je?Cj blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum[:rpP~mp/&je?] [-w ] [-m ] [-u user] [-j flume-archive] [-c config-file]&je?Co blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumώIcP~mp/&je?&je?Cp blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je?VCommand line options are defaults that can be overridden by settings in the targetfile&je?Cr blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum۲P~mp/&je?&je?Ct blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je@Options:&je@Cu[ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumtwP~mp/&je@7 -h, --help show this help message and exit&je@Cws blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumPhP~mp/&je@ -t TARGETS, --targets=TARGETS&je@Cx blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum`P~mp/&je@@ CSV file to read deployment targets from&je@CzbF blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumQP~mp/&je@2 -w HOST, --host=HOST A single host to deploy to&je@C| blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumJpP~mp/&je@ -m MASTER, --master=MASTER&je@C}F blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je@3 The name of the master node&je@C@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumm"=P~mp/&je@9 -x, --nostart If set, Flume will not be started&je@C1 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumoղP~mp/&je@6 -n NAME, --name=NAME A text identifier for the node&je@CF blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum2LP~mp/&jeA" -p NODETYPE, --nodetype=NODETYPE&jeAC|O blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumȫdP~mp/&jeAH Comma-separated list of nodetypes to run on this&jeACuL blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeA9 target. Options are node, master.&jeAC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum,P~mp/&jeA% -d DIRECTORY, --directory=DIRECTORY&jeAC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum^{aP~mp/&jeA7 Directory to install Flume into&jeACU blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum6P~mp/&jeA: -u USER, --user=USER Which user to connect to as&jeAC| blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumZ P~mp/&jeA -j ARCHIVE, --archive=ARCHIVE&jeAC@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumIGP~mp/&jeAA The location of the Flume tarball to copy&jeAC5 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumFXP~mp/&jeAE -c CONF, --conf=CONF The location of your Flume configuration file&jeAC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum][P~mp/&jeB= -y, --nocopy If set, don't copy the archive across&jeBC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumJP~mp/&jeB&jeBCV blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeB&jeBC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeB----&jeBCU blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jeB////&jeBCa blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumSdP~mp/&jeB&jeBC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeB&jeBC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeB>To run `deploy.py`, make sure that it is executable (`chmod +x&jeBC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum'P~mp/&jeBEdeploy.py`), or run it with your system's Python interpreter (`python&jeBCb blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumhiLP~mp/&jeB deploy.py`).&jeBC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumO#I>P~mp/&jeB&jeBC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeC!==== Deploying to a single target&jeCC! blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum(NP~mp/&jeC&jeCCj blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeC?You can deploy Flume to a single machine using only the command&jeCCi blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumcP~mp/&jeCCline. Here's an example that start a Flume node and Flume master on&jeCC* blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum[P~mp/&jeC/target, installed in dir `/tmp/flume-install` :&jeCC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum74P~mp/&jeC&jeCC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeC----&jeCC: blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jeCZpython deploy.py -w target -p node,master -d /tmp/flume-install/ -j /path/to/flume.tar.gz &jeCCL blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum:5P~mp/&jeC----&jeCC^P blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jeC&jeCC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeC////&jeCCo blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumSdP~mp/&jeD0// just have the options assuming defaults work.&jeDCjz blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumizP~mp/&jeD----&jeDC9 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jeDhpython deploy.py -w hostname -u username -p node,master -d /tmp/flume-install/ -j /path/to/flume.tar.gz &jeDC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum;P~mp/&jeD----&jeDC; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jeD/////&jeDCN blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum>IP~mp/&jeD"==== Deploying to multiple targets&jeDC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumcP~mp/&jeD&jeDCz blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeDETo deploy Flume to several machines at once, it's easiest to create a&jeDC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum`5P~mp/&jeDHtargetfile (specified by `-t __` to `deploy.py`) containing an&jeDC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumA P~mp/&jeD(entry for each machine Flume can run on.&jeDC.u blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumRʱP~mp/&jeE&jeECX blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeEWEach deployment target is represented by a single line in the target file. For example:&jeECB blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum?eP~mp/&jeE&jeEC| blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeE----&jeEC4 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jeE)host=localhost,user=henry,nodetype=master&jeECT? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum,P~mp/&jeE----&jeECș blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jeE&jeEC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeEDspecifies a master deployment target of henry@localhost. Unspecified&jeECˊ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumS#>P~mp/&jeE&jeECi blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeE?These defaults can all be overridden from the command line. For&jeEC+ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumnQ#P~mp/&jeFBexample, specifying `-d /home/henry/flume` will change the default&jeFCڗ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum;FHP~mp/&jeFFdirectory for every target node to /home/henry/flume. The defaults can&jeFCӓ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumeP~mp/&jeF*also be overwritten on a per-target basis.&jeFC+S blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum.`P~mp/&jeF&jeFCW blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeF ==== Verifying your installation&jeFCͻ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumaP~mp/&jeF&jeFC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeF3All being well, each Flume node should have started&jeFCگn blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumssP~mp/&jeFCsuccessfully. `deploy.py` will report any problems that it detects,&jeFCg blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum'*RP~mp/&jeF4but it doesn't watch the Flume processes themselves.&jeFC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum\P~mp/&jeF&jeFCAE blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeG4In order to check up on your deployment, navigate to&jeGC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumVP~mp/&jeGG`http://_mastername_:35871/flumemaster.jsp` where `_mastername_` is the&jeGC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum0P~mp/&jeG9name of the node that you designated as the master during&jeGC\W blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&P~mp/&jeGBdeployment. You should see a simple web page with a table of nodes&jeGC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumVXP~mp/&jeGthat the master has heard from.&jeGCm blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum7P~mp/&jeG&jeGCň blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeG=== Section summary&jeGC' blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumRYLP~mp/&jeG&jeGCP3 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeG=Congrats! We have stepped through installing, deploying, and&jeGCp blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumv\"P~mp/&jeHDconfiguring a set of Flume nodes in a fully distributed setting. You&jeHCO blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum}kP~mp/&jeH5should be able to collect streams of logs with Flume!&jeHCp. blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeH&jeHCG blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeHBThe subsequent sections talk about some troubleshooting issues and&jeHCV blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumIOP~mp/&jeHBalso goes into more detail about the Flume configuration language.&jeHC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeH=The language exposes many expressive lower-level features and&jeHC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum!P~mp/&jeH*experimental features of the Flume system.&jeHC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumx]P~mp/&jeH&jeHC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeH== Troubleshooting&jeHC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumg+P~mp/&jeH&jeHC8s blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeH=== Writing to Hadoop's HDFS&jeHC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&jeI&jeIC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeIBCurrently, there are constraints when we want to write to HDFS. A&jeID blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum̜P~mp/&jeIFFlume node can only write to one version of Hadoop. Although Hadoop's&jeID\ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumۊXP~mp/&jeIDHDFS API has been fairly stable, HDFS clients are only guaranteed to&jeID blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumHP~mp/&jeI?be wire compatible with the same major version of HDFS. In our&jeIDC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum} TP~mp/&jeI@testing used a Hadoop hdfs 0.20.x and hdfs 0.18.x. They are API&jeID blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum0g0P~mp/&jeIEcompatible so all that is necessary to switch versions is to swap out&jeID Ow blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumxP~mp/&jeI@the Hadoop jar and restart the node that will write to the other&jeID  blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum@(P~mp/&jeIHadoop version.&jeID b blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumONP~mp/&jeJ&jeJD blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeJDWe include a 0.20.x Hadoop jar in the Flume distribution. One still&jeJDC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumaN'P~mp/&jeJBneeds to configure this instance of Hadoop so that it talks to the&jeJD5 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum-P~mp/&jeJAcorrect hdfs namenode. One configures the Hadoop client settings&jeJD blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeJ?(such as pointers to the name node) the same way as with Hadoop&jeJDk blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeJAdatanodes or worker nodes -- modify and use `conf/core-site.xml`.&jeJDG blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumT81P~mp/&jeJ&jeJD blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeJ+=== Node failure due to out of file handles&jeJD$ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumhP~mp/&je&jeHFދ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeAThere are two limits in Linux -- the max number of allowable open&jeHI blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumBP~mp/&jeAfiles (328838), and max number of allowable open files for a user&jeHJ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumTgMP~mp/&jeF(default 1024). Sockets are file handles so this limits the number of&jeHL blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeopen tcp connections available.&jeHN  blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum/P~mp/&je&jeHO2 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je hard nofile 10000&jeHT blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum'P~mp/&je----&jeHV blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je&jeHW( blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeFThe user should also have the following line to a `~/.bash_profile` to&jeHX? blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumFxP~mp/&je"raise the limit to the hard limit.&jeHZ^ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksuma&P~mp/&je&jeH[7 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je----&jeH\i blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jeulimit -n 10000&jeH^NX blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum"9P~mp/&je----&jeH_ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je&jeH` blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je<=== Failures due when using Disk Failover or Write Ahead Log&jeHbZ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum!cCP~mp/&je&jeHc blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeEFlume currently relies on the file system for our logging mechanisms.&jeHe2 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum\P~mp/&jeAYou must make sure that the user running flume has permissions to&jeHf blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumBP~mp/&je)write to the specified logging directory.&jeHhqb blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumd `nP~mp/&je&jeHj5 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je?=== Sending configurations to the master via CLI (experimental)&jeHk blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumM6KP~mp/&je&jeHm blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je .Prequisites&jeHnZ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumPP~mp/&je****&jeHo[ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumoVP~mp/&je* `curl`&jeHp blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum!3P~mp/&je****&jeHrb blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumoVP~mp/&je&jeHs@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeDTo send configuration information to the master, one can use the web&jeHtZ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumSP~mp/&jeDinterface, or send them to the interface via the command line. This&jeHv: blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumoP~mp/&je*example requires curl to to work properly.&jeHxJt blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumg=P~mp/&je&jeHyn blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je----&jeHz blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je?$ bin/master-client spec 'node : src | agentSink("collector");'&jeH|^ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumr&>P~mp/&je----&jeH} blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je &jeH~_ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumlEP~mp/&je== Advanced usage&jeH/6 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je&jeHU blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeDUsing the Flume node roles (collector, agent) is the simplest method&jeHF blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumxFP~mp/&jeFto get up and running with Flume. Under the covers, these sources and&jeH< blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumIP~mp/&jeDsinks (collectorSink, collectorSource, and agentSource) are actually&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumqiP~mp/&je?composed of primitive sinks that have been augmented with *role&jeH* blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumU*"P~mp/&jeCdefaults*, *special sinks* and *sink decorators*. These components&jeHe blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumtP~mp/&jeBmake configuration more flexible, but also make configuration more&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum*P~mp/&jeFcomplicated. The combination of special sinks and decorators expose a&jeHU blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum^P~mp/&jeBlot of details of the underlying mechanisms but are a powerful and&jeHI blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumLeP~mp/&je'expressive way to encode rich behavior.&jeH; blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumܩP~mp/&je&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je=== Role Defaults&jeH- blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je&jeH= blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je8Agents and collectors roles have defaults baked into the&jeHj blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumE=P~mp/&jeC`conf/flume-site.xml` file. Look at the `conf/flume-conf.xml` file&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeFfor properties prefixed with `flume.agent.\*` and `flume.collector.\*`&jeH] blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumxGP~mp/&je.for descriptions of the configuration options.&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum6fP~mp/&je&jeH c blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je(=== Special sinks: Fan out and Fail over&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumDtP~mp/&je&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeBTwo special sinks are FanOutSinks and FailoverSinks. Fanout sinks&jeHi| blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum=T P~mp/&je@send any incoming events to all of the sinks specified to be its&jeH/ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeCchildren. These can be used for data replication or for processing&jeHl blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumUaQP~mp/&je-data off of the main reliable data flow path.&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum,P~mp/&je&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je*Currently the syntax for a FanoutSink is :&jeH. blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumߵP~mp/&je&jeH`m blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je----&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je[ console, collectorSink ]&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum(AP~mp/&je----&jeHG blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je&jeH8 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je&jeHl blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumIGP~mp/&je----&jeH+ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je&jeH, blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeESo, we could configure node "agent1" to have a failover to collector2&jeH: blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumKjP~mp/&jeCif collector1 fails (for example, if the connection to `collector1`&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumWP~mp/&je2goes down or if `collector1`'s hdfs becomes full):&jeH"S blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum|)zP~mp/&je&jeHF= blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je----&jeHƌB blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jeIagent1 : source | < agentSink("collector1") ? agentSink("collector2") > ;&jeHD blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum_UP~mp/&je----&jeHɐ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je&jeHО blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeFThese can be composed to have even richer behavior. For example, this&jeH̆} blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum_sP~mp/&je>sink outputs to the console and has a failover collector node.&jeH:t blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumxP~mp/&je&jeH` blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je----&jeHЛ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&jeJ[ console, < collectorSink("collector1") ? collectorSink("collector2") > ]&jeHc blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum?2P~mp/&je----&jeHӡ] blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je&jeHļ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je=== Introducing Sink decorators&jeH= blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je&jeH` blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeFFan out and failover affect where messages go in the system but do not&jeHG blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum-CVP~mp/&jeDmodify the messages themselves. To augment or filter events as they&jeH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum-A~P~mp/&jestreams that pass through them. For example, they are used to&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum9KP~mp/&je>increase reliability via write ahead logging, increase network&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumEP~mp/&jeEthroughput via batching/compression, sampling, benchmarking, and even&jeMb blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumCPP~mp/&jelightweight analytics.&jeM# blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum#/?P~mp/&je&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je:Let's start with a simple sampling example. Here we use a&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum[lP~mp/&jeDintervalSampler and configure to send every 10th element from source&jeMc blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum/jfQP~mp/&je"source" to the sink "sink":&jeM\ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum^cP~mp/&je&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je----&jeM( blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je4flumenode: source | { intervalSampler(10) -> sink };&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumOdP~mp/&je----&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je&jeMC blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je9Here's an example that batches every 100 events together.&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum7E*P~mp/&je&jeM! blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je----&jeM#% blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je+flumenode: source | { batch(100) -> sink };&jeM$ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumBP~mp/&je----&jeM% blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je&jeM' blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jeDLike fanout and failover, decorators are also composable. Here is an&jeM(ع blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum:oAP~mp/&jeCexample that creates batches of 100 events and then compresses them&jeM* blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum|kP~mp/&je%before shipping them off to the sink:&jeM, blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumcP~mp/&je&jeM-8 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je----&jeM.hI blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je8flumenode: source | { batch(100) -> { gzip -> sink } };&jeM0 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumQP~mp/&je----&jeM15 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum&|{P~mp/&je&jeM2T blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je=== Flume Source Catalog&jeM3 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumR:P~mp/&je&jeM41 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je&jeM5 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je.Flume's Tiered Event Sources&jeM7I blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum{P~mp/&je [grid="all"]&jeM8< blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&jex`------------------------------------`----------------------------------------------------------------------------------&jeM: blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je0name description&jeM<& blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumvHP~mp/&je`------------------------------------------------------------------------------------------------&jeM= blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/V&je`collectorSource[(_port_)]` Collector source. Listens for data from agentSinks forwarding to port `_port_`. If port is not specified, the node default collector tcp port, 35863.&jeM@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumMzP~mp/&je`------------------------------------------------------------------------------------------------&jeMBm blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&je&jeMC+ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je.Flume's Basic Sources&jeMD blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumB:P~mp/&je [grid="all"]&jeMF>e blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&jex`------------------------------------`----------------------------------------------------------------------------------&jeMH?J blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je0name description&jeMI blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumvHP~mp/&je`------------------------------------------------------------------------------------------------&jeMK blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&je_`console` Stdin console (doesn't work because of watchdog right now)&jeMMJ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumHވP~mp/&jeM`tsource(port)` Thrift RPC base source on tcp port port.&jeMO~s blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumisP~mp/&jeV`text("filename")` One shot text file source. One line is one event&jeMQF blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum\RP~mp/O&je`tail("filename")` Like text but like unix's tail utility, One line is one event. Stays open for more data, and follows filename (as opposed to file if rotated &jeMS blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum`AP~mp/&jeI`seqfile("filename")` Hadoop sequence file formatted file.&jeMU blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumoǷP~mp/&je\`syslogUdp(port)` Syslog over udp port, port. This is syslog compatible.&jeMWi6 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumEOP~mp/&je^`syslogTcp(port)` Syslog over tcp port, port. This is syslog-ng compatible.&jeMYH blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksuml+yP~mp/&jeu`synth(msg_count,msg_size)` A source that synthetically generates msg_count random messages of size msg_size&jeM[E blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum|P~mp/G&je`nonlsynth(msg_count,msg_size)` A source that synthetically generates msg_count random messages of size msg_size. This converts all `\'\n\'` chars into `\' \'` chars.&jeM]` blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum1P~mp/&je&jeM^ʾ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je`------------------------------------------------------------------------------------------------&jeM`[ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&je&jeMa- blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je////&jeMb blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumSdP~mp/&je// Hide these for now. &jeMdk blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumo:!P~mp/&je4.Flume's Experimental/internal sources (unsupported)&jeMe blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum30P~mp/&je [grid="all"]&jeMgBG blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&jex`------------------------------------`----------------------------------------------------------------------------------&jeMi@ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je0name description&jeMjÓ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumvHP~mp/&je`------------------------------------------------------------------------------------------------&jeMl blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/=&je`tpriosource(port)` Thrift RPC base source on tcp port, port, prioritized by priority (higher priority first) and then age (older messages first)&jeMn blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumq blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumRJP~mp/&je&jeMG blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je.Flume's Sink Decorators&jeM✡ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je [grid="all"]&jeMՍ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&jex`------------------------------------`----------------------------------------------------------------------------------&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je0name description&jeMcf blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumvHP~mp/&je`------------------------------------------------------------------------------------------------&jeM: blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/|&je`writeAhead(...)` Write ahead decorator. Provides durability by writing events to disk before sending them. This can be used as a buffering mechanism -- receive and send are decoupled in different threads&jeMm blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum=LP~mp/&jeU`intervalSampler(N)` Interval sampler. Every Nth event gets forwarded&jeMg blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum"P~mp/&jes`probSampler(p)` Probability sampler. Every event has a probability p chance of being forwarded&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum9P~mp/M&je`reservoirSampler(K)` Reservoir sampler. When flushed, exactly K elements are forwarded. All events that pass through have the same probability of being selected&jeMe blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumBaP~mp/b&je`insistentOpen` Insistent Open. If open fails, this exponentially backs of and then retries the open. Note that this synchronously blocks the open call until the open succeeds.&jeMt blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum6P~mp/$&je`stubbornAppend` Stubborn Append. If append fails, this closes the target, opens it, and then tries to append again.&jeM] blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum4 P~mp/&jeW`batch(n)` buffers n events and then send one aggregate event&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum[P~mp/<&je`unbatch` takes a aggregate event, splits and then forwards its original events. If an event is not an aggregate it is just forwarded&jeM9Y blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum7P~mp/&je@`gzip` gzips a serialized event. &jeMO blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumVϞ%P~mp/&jev`gunzip` gunzip's a gzip'ed event. If the event is not a gzip event, it is just forwarded&jeM blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum}P~mp/&je`------------------------------------------------------------------------------------------------&jeN. blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&je&jeNأ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je=.Flume's Experimental/Debugging Sink Decorators (unsupported)&jeNo" blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum_ڼP~mp/&je [grid="all"]&jeNl blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX P~mp/&jex`------------------------------------`----------------------------------------------------------------------------------&jeN blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum P~mp/&je0name description&jeND blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumvHP~mp/&je`------------------------------------------------------------------------------------------------&jeN ,) blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&jej`flakeyAppend(p)` each append has a probability p of failing and throwing an exception.&jeN  blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum0LP~mp/&jeW`intervalFlakeyAppend(n)` every nth element fails by throwing an exception &jeN blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumxP~mp/&jeu`inmem` buffers events in memory and then flushes all on close. Good for benchmarking. &jeNx blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumTQ4P~mp/&jev`benchinject` injects an extra benchmark begin event on open and a benchmark end event on close&jeNu blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumsP~mp/&je`benchreport("name")` starts a benchmark when it receives a benchmark begin event and stops a benchmark when it receives a corresponding benchmark end event. Benchmark events are not forwarded, but all other events are&jeN blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum8P~mp/&jeL`mult(N)` takes each message and sends it N times&jeNd blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumƽp5P~mp/&jeG`delay(ms)` adds a delay to the event pipeline&jeN8 blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumX- BP~mp/&jeM`latch(...)` adds a latch that blocks until triggered&jeNP blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum GqP~mp/&je`------------------------------------------------------------------------------------------------&jeNĦ blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksum%P~mp/&je&jeN blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&je&jeN blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypemsg AckChecksumP~mp/&jf.&jf.Q blitzwingAckTag/writeahead.00000000.20100204-015814430-0800.seqAckTypeend AckChecksum&;P~mp/