////////////////////
Licensed to Cloudera, Inc. under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  Cloudera, Inc. licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////////////////////

== Trying out Flume sources and sinks

First this section will describe some basic tips for testing sources,
sinks and logical nodes.

=== Testing sources

Testing sources is a straightforward process.  The +flume+ script has
a +dump+ command that allows you test data source configuration by
displaying data at the console.

----
flume dump source
----

NOTE: +source+ needs to be a single command line argument, so you may
need to add 'quotes' or "quotes" to the argument if it has quotes or
parenthesis in them.  Using single quotes allow you to use unescaped
double quotes in the configuration argument. (ex: `'text("/tmp/foo")'`
or `"text(\"/tmp/foo\")"`).

Here are some simple examples to try:

----
$ flume dump console
$ flume dump 'text("/path/to/file")'
$ flume dump 'file("/path/to/file")'
$ flume dump syslogTcp
$ flume dump 'syslogUdp(5140)'
$ flume dump 'tail("/path/to/file")'
$ flume dump 'tailDir("path/to/dir", "fileregex")'
$ flume dump 'rpcSource(12346)'
----

Under the covers, this dump command is actually running the +flume
node_nowatch+ command with some extra command line parameters.

----
flume node_nowatch -1 -s -n dump -c "dump: $1 | console;"
----

Here's a summary of what the options mean.

[horizontal]
+-1+ :: one shot execution.  This makes the node instance not use the
heartbeating mechanism to get a config.

+-s+ :: starts the Flume node without starting the node's http status
web server.  If the status web server is started, a Flume node's
status server will keep the process alive even if in one-shot mode.
If the -s flag is specified along with one-shot mode (-1), the Flume
node will exit after all logical nodes complete.

+-c "node:src|snk;"+ :: Starts the node with the given configuration
 definition.  NOTE: If not using -1, this will be invalidated
 upon the first heartbeat to the master.
+-n node+ :: gives the node the physical name node.

You can get info on all of the Flume node commands by using this command:

----
$ flume node -h
----

=== Testing sinks and decorators

Now that you can test sources, there is only one more step necessary
to test arbitrary sinks and decorators from the command line.  A sink
requires data to consume so some common sources used to generate test
data include synthetic datasets (+asciisynth+), the console
(+console+), or files (+text+).

For example, you can use a synthetic source to generate 1000 events,
each of 100 "random" bytes data to the console. You could use a text
source to read a file like /etc/services, or you could use the console
as a source and interactively enter lines of text as events:

----
$ flume node_nowatch -1 -s -n dump -c 'dump: asciisynth(1000,100) | console;'
$ flume node_nowatch -1 -s -n dump -c 'dump: text("/etc/services") | console;'
$ flume node_nowatch -1 -s -n dump -c 'dump: console | console;'
----

You can also use decorators on the sinks you specify.  For example,
you could rate limit the amount of data that pushed from the source to
sink by inserting a delay decorator.  In this case, the 'delay'
decorator waits 100ms before sending each synthesized event to the
console.

----
$ flume node_nowatch -1 -s -n dump -c 'dump: asciisynth(1000,100) | { delay(100) => console};'
----

Using the command line, you can send events via the direct best-effort
(BE) or disk-failover (DFO) agents.  The example below uses the
+console+ source so you can interactively generate data to send in BE
and DFO mode to collectors.

NOTE: Flume node_nowatch must be used when piping data in to a Flume
node's console source.  The watchdog program does not forward stdin.

----
$ flume node_nowatch -1 -s -n dump -c 'dump: console | agentBESink("collectorHost");'
$ flume node_nowatch -1 -s -n dump -c 'dump: console | agentDFOSink("collectorHost");'
----

WARNING: Since these nodes are executed with configurations entered at
the command line and never contact the master, they cannot use the
automatic chains or logical node translations.  Currently, the
acknowledgements used in E2E mode go through the master piggy-backed
on node-to-master heartbeats.  Since this mode does not heartbeat, E2E
mode should not be used.

Console sources are useful because we can pipe data into Flume
directly.  The next example pipes data from a program into Flume which
then delivers it.

----
$ <external process> | flume node_nowatch -1 -s -n foo -c 'foo:console|agentBESink("collector");'
----

Ideally, you could write data to a named pipe and just have Flume read
data from a named pipe using +text+ or +tail+.  Unfortunately, this
version of Flume's +text+ and +tail+ are not currently compatible with
named pipes in a Linux environment.  However, you could pipe data to a
Flume node listening on the stdin console:

----
$ tail -f namedpipe | flume node_nowatch -1 -s -n foo -c 'foo:console|agentBESink;'
----

Or you can use the exec source to get its output data:

----
$ flume node_nowatch -1 -s -n bar -c 'bar:exec("cat pipe")|agentBESink;' 
----

=== Monitoring nodes

While outputting data to a console or to a logfile is an effective way
to verify data transmission, Flume nodes provide a way to monitor the
state of sources and sinks.  This can be done by looking at the node's
report page.  By default, you can navigate your web browser to the
nodes TCP 35862 port.  (http://node:35862/)

This page shows counter information about all of the logical nodes on
the physical node as well as some basic machine metric information.

TIP: If you have multiple physical nodes on a single machine, there
may be a port conflict.  If you have the auto-find port option on
(+flume.node.http.autofindport+), the physical node will increment the
port number until it finds a free port it can bind to.