Falcon - Operationalizing Falcon

Operationalizing Falcon

Overview

Apache Falcon provides various tools to operationalize Falcon consisting of Alerts for unrecoverable errors, Audits of user actions, Metrics, and Notifications. They are detailed below.

Monitoring

Falcon provides monitoring of various events by capturing metrics of those events. The metric numbers can then be used to monitor performance and health of the Falcon system and the entire processing pipelines.

Users can view the logs of these events in the metric.log file, by default this file is created under ${user.dir}/logs/ directory. Users may also extend the Falcon monitoring framework to send events to systems like Mondemand/lwes by implementingorg.apache.falcon.plugin.MonitoringPlugin interface.

The following events are captured by Falcon for logging the metrics:

New cluster definitions posted to Falcon (success & failures)
New feed definition posted to Falcon (success & failures)
New process definition posted to Falcon (success & failures)
Process update events (success & failures)
Feed update events (success & failures)
Cluster update events (success & failures)
Process suspend events (success & failures)
Feed suspend events (success & failures)
Process resume events (success & failures)
Feed resume events (success & failures)
Process remove events (success & failures)
Feed remove events (success & failures)
Cluster remove events (success & failures)
Process instance kill events (success & failures)
Process instance re-run events (success & failures)
Process instance generation events
Process instance failure events
Process instance auto-retry events
Process instance retry exhaust events
Feed instance deletion event
Feed instance deletion failure event (no retries)
Feed instance replication event
Feed instance replication failure event
Feed instance replication auto-retry event
Feed instance replication retry exhaust event
Feed instance late arrival event
Feed instance post cut-off arrival event
Process re-run due to late feed event
Transaction rollback failed event

The metric logged for an event has the following properties:

Action - Name of the event.
Dimensions - A list of name/value pairs of various attributes for a given action.
Status- Status of an action FAILED/SUCCEEDED.
Time-taken - Time taken in nanoseconds for a given action.

An example for an event logged for a submit of a new process definition:

2012-05-04 12:23:34,026 {Action:submit, Dimensions:{entityType=process}, Status: SUCCEEDED, Time-taken:97087000 ns}

Users may parse the metric.log or capture these events from custom monitoring frameworks and can plot various graphs or send alerts according to their requirements.

Notifications

Falcon creates a JMS topic for every process/feed that is scheduled in Falcon. The implementation class and the broker url of the JMS engine are read from the dependent cluster's definition. Users may register consumers on the required topic to check the availability or status of feed instances.

For a given process that is scheduled, the name of the topic is same as the process name. Falcon sends a Map message for every feed produced by the instance of a process to the JMS topic. The JMS MapMessage sent to a topic has the following properties: entityName, feedNames, feedInstancePath, workflowId, runId, nominalTime, timeStamp, brokerUrl, brokerImplClass, entityType, operation, logFile, topicName, status, brokerTTL;

For a given feed that is scheduled, the name of the topic is same as the feed name. Falcon sends a map message for every feed instance that is deleted/archived/replicated depending upon the retention policy set in the feed definition. The JMS MapMessage sent to a topic has the following properties: entityName, feedNames, feedInstancePath, workflowId, runId, nominalTime, timeStamp, brokerUrl, brokerImplClass, entityType, operation, logFile, topicName, status, brokerTTL;

The JMS messages are automatically purged after a certain period (default 3 days) by the Falcon JMS house-keeping service.TTL (Time-to-live) for JMS message can be configured in the Falcon's startup.properties file.

Alerts

Falcon generates alerts for unrecoverable errors into a log file by default. Users can view these alerts in the alerts.log file, by default this file is created under ${user.dir}/logs/ directory.

Users may also extend the Falcon Alerting plugin to send events to systems like Nagios, etc. by extending org.apache.falcon.plugin.AlertingPlugin interface.

Audits

Falcon audits all user activity and captures them into a log file by default. Users can view these audits in the audit.log file, by default this file is created under ${user.dir}/logs/ directory.

Users may also extend the Falcon Audit plugin to send audits to systems like Apache Argus, etc. by extending org.apache.falcon.plugin.AuditingPlugin interface.