FalconCLI is a interface between user and Falcon. It is a command line utility provided by Falcon. FalconCLI supports Entity Management, Instance Management and Admin operations.There is a set of web services that are used by FalconCLI to interact with Falcon.
Optional -url option indicating the URL of the Falcon system to run the command against can be provided. If not mentioned it will be picked from the system environment variable FALCON_URL. If FALCON_URL is not set then it will be picked from client.properties file. If the option is not provided and also not set in client.properties, Falcon CLI will fail.
The -doAs option allows the current user to impersonate other users when interacting with the Falcon system. The current user must be configured as a proxyuser in the Falcon system. The proxyuser configuration may restrict from which hosts a user may impersonate users, as well as users of which groups can be impersonated.
If you export FALCON_DEBUG=true then the Falcon CLI will output the Web Services API details used by any commands you execute. This is useful for debugging purposes to or see how the Falcon CLI works with the WS API. Alternately, you can specify '-debug' through the CLI arguments to get the debug statements. Example: $FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml -debug
Submit option is used to set up entity definition.
Usage: $FALCON_HOME/bin/falcon entity -submit -type [cluster|datasource|feed|process] -file <entity-definition.xml>
Example: $FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml
Note: The url option in the above and all subsequent commands is optional. If not mentioned it will be picked from client.properties file. If the option is not provided and also not set in client.properties, Falcon CLI will fail.
Once submitted, an entity can be scheduled using schedule option. Process and feed can only be scheduled.
Usage: $FALCON_HOME/bin/falcon entity -type [process|feed] -name <<name>> -schedule
Optional Arg : -skipDryRun. When this argument is specified, Falcon skips oozie dryrun.
Example: $FALCON_HOME/bin/falcon entity -type process -name sampleProcess -schedule
Suspend on an entity results in suspension of the oozie bundle that was scheduled earlier through the schedule function. No further instances are executed on a suspended entity. Only schedule-able entities(process/feed) can be suspended.
Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -name <<name>> -suspend
Puts a suspended process/feed back to active, which in turn resumes applicable oozie bundle.
Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -name <<name>> -resume
Delete removes the submitted entity definition for the specified entity and put it into the archive.
Usage: $FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name <<name>> -delete
Entities of a particular type can be listed with list sub-command.
Usage: $FALCON_HOME/bin/falcon entity -list
Optional Args : -fields <<field1,field2>> -type <<[cluster|datasource|feed|process],[cluster|datasource|feed|process]>> -nameseq <<namesubsequence>> -tagkeys <<tagkeyword1,tagkeyword2>> -filterBy <<field1:value1,field2:value2>> -tags <<tagkey=tagvalue,tagkey=tagvalue>> -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10
Summary of entities of a particular type and a cluster will be listed. Entity summary has N most recent instances of entity.
Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -summary
Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -fields <<field1,field2>> -filterBy <<field1:value1,field2:value2>> -tags <<tagkey=tagvalue,tagkey=tagvalue>> -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10 -numInstances 7
Update operation allows an already submitted/scheduled entity to be updated. Cluster and datasource updates are currently not allowed.
Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -name <<name>> -update -file <<path_to_file>>
Optional Arg : -skipDryRun. When this argument is specified, Falcon skips oozie dryrun.
Example: $FALCON_HOME/bin/falcon entity -type process -name HourlyReportsGenerator -update -file /process/definition.xml
Force Update operation allows an already submitted/scheduled entity to be updated.
Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -name <<name>> -touch
Optional Arg : -skipDryRun. When this argument is specified, Falcon skips oozie dryrun.
Status returns the current status of the entity.
Usage: $FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name <<name>> -status
With the use of dependency option, we can list all the entities on which the specified entity is dependent. For example for a feed, dependency return the cluster name and for process it returns all the input feeds, output feeds and cluster names.
Usage: $FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name <<name>> -dependency
Definition option returns the entity definition submitted earlier during submit step.
Usage: $FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name <<name>> -definition
Lookup option tells you which feed does a given path belong to. This can be useful in several scenarios e.g. generally you would want to have a single definition for common feeds like metadata with same location otherwise it can result in a problem (different retention durations can result in surprises for one team) If you want to check if there are multiple definitions of same metadata then you can pick an instance of that and run through the lookup command like below.
Usage: $FALCON_HOME/bin/falcon entity -type feed -lookup -path /data/projects/my-hourly/2014/10/10/23/
If you have multiple feeds with location as /data/projects/my-hourly/${YEAR}/${MONTH}/${DAY}/${HOUR} then this command will return all of them.
Since: 0.8
This command lists all the feed instances which have missed sla and are still not available. If a feed instance missed sla but is now available, then it will not be reported in results. The purpose of this API is alerting and hence it doesn't return feed instances which missed SLA but are available as they don't require any action.
* Currently sla monitoring is supported only for feeds.
* Option end is optional and will default to current time if missing.
* Option name is optional, if provided only instances of that feed will be considered.
Usage:
Example 1
$FALCON_HOME/bin/falcon entity -type feed -start 2014-09-05T00:00Z -slaAlert -end 2016-05-03T00:00Z -colo local
name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T11:59Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:00Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:01Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:02Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:03Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:04Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:05Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:06Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:07Z, tags: Missed SLA High name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:08Z, tags: Missed SLA Low
Response: default/Success!
Request Id: default/216978070@qtp-830047511-4 - f5a6c129-ab42-4feb-a2bf-c3baed356248
Example 2
$FALCON_HOME/bin/falcon entity -type feed -start 2014-09-05T00:00Z -slaAlert -end 2016-05-03T00:00Z -colo local -name in
name: in, type: FEED, cluster: local, instanceTime: 2015-09-26T06:00Z, tags: Missed SLA High
Response: default/Success!
Request Id: default/1580107885@qtp-830047511-7 - f16cbc51-5070-4551-ad25-28f75e5e4cf2
Kill sub-command is used to kill all the instances of the specified process whose nominal time is between the given start time and end time.
Note: 1. The start time and end time needs to be specified in TZ format. Example: 01 Jan 2012 01:00 => 2012-01-01T01:00Z
3. Process name is compulsory parameter for each instance management command.
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -kill -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
Suspend is used to suspend a instance or instances for the given process. This option pauses the parent workflow at the state, which it was in at the time of execution of this command.
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -suspend -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
Continue option is used to continue the failed workflow instance. This option is valid only for process instances in terminal state, i.e. KILLED or FAILED.
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -continue -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
Rerun option is used to rerun instances of a given process. On issuing a rerun, by default the execution resumes from the last failed node in the workflow. This option is valid only for process instances in terminal state, i.e. SUCCEEDED, KILLED or FAILED. If one wants to forcefully rerun the entire workflow, -force should be passed along with -rerun Additionally, you can also specify properties to override via a properties file and this will be prioritized over force option in case of contradiction.
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -rerun -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" [-force] [-file <<properties file>>]
Resume option is used to resume any instance that is in suspended state.
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -resume -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
Status option via CLI can be used to get the status of a single or multiple instances. If the instance is not yet materialized but is within the process validity range, WAITING is returned as the state. Along with the status of the instance time is also returned. Log location gives the oozie workflow url If the instance is in WAITING state, missing dependencies are listed. The job urls are populated for all actions of user workflow and non-succeeded actions of the main-workflow. The user then need not go to the underlying scheduler to get the job urls when needed to debug an issue in the job.
Example : Suppose a process has 3 instance, one has succeeded,one is in running state and other one is waiting, the expected output is:
{"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"}, {"instance":"2010-01-02T11:05Z","status":"WAITING"}]
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -status
Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -colo <<colo>> -filterBy <<field1:value1,field2:value2>> -lifecycle <<lifecycles>> -orderBy field -sortOrder <<sortOrder>> -offset 0 -numResults 10
List option via CLI can be used to get single or multiple instances. If the instance is not yet materialized but is within the process validity range, WAITING is returned as the state. Instance time is also returned. Log location gives the oozie workflow url If the instance is in WAITING state, missing dependencies are listed
Example : Suppose a process has 3 instance, one has succeeded,one is in running state and other one is waiting, the expected output is:
{"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"}, {"instance":"2010-01-02T11:05Z","status":"WAITING"}]}
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -list
Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -colo <<colo>> -lifecycle <<lifecycles>> -filterBy <<field1:value1,field2:value2>> -orderBy field -sortOrder <<sortOrder>> -offset 0 -numResults 10
Summary option via CLI can be used to get the consolidated status of the instances between the specified time period. Each status along with the corresponding instance count are listed for each of the applicable colos. The unscheduled instances between the specified time period are included as UNSCHEDULED in the output to provide more clarity.
Example : Suppose a process has 3 instance, one has succeeded,one is in running state and other one is waiting, the expected output is:
{"status":"SUCCEEDED","message":"getSummary is successful", instancesSummary:[{"cluster": <<name>> "map":[{"SUCCEEDED":"1"}, {"WAITING":"1"}, {"RUNNING":"1"}]}]}
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -summary
Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -colo <<colo>> -filterBy <<field1:value1,field2:value2>> -lifecycle <<lifecycles>> -orderBy field -sortOrder <<sortOrder>>
Running option provides all the running instances of the mentioned process.
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -running
Optional Args : -colo <<colo>> -lifecycle <<lifecycles>> -filterBy <<field1:value1,field2:value2>> -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10
Get falcon feed instance availability.
Usage: $FALCON_HOME/bin/falcon instance -type feed -name <<name>> -listing
Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -colo <<colo>>
Get logs for instance actions
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -logs
Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -runid <<runid>> -colo <<colo>> -lifecycle <<lifecycles>> -filterBy <<field1:value1,field2:value2>> -orderBy field -sortOrder <<sortOrder>> -offset 0 -numResults 10
Describes list of life cycles of a entity , for feed it can be replication/retention and for process it can be execution. This can be used with instance management options. Default values are replication for feed and execution for process.
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -status -lifecycle <<lifecycletype>> -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
Given a feed/process instance this command traces it's ancestors to find what all ancestors have failed. It's useful if lot of instances are failing in a pipeline as it then finds out the root cause of the pipeline being stuck.
Usage: $FALCON_HOME/bin/falcon instance -triage -type <<feed/process>> -name <<name>> -start "yyyy-MM-dd'T'HH:mm'Z'"
Displays the workflow params of a given instance. Where start time is considered as nominal time of that instance and end time won't be considered.
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -params -start "yyyy-MM-dd'T'HH:mm'Z'"
Display the dependent instances which are dependent on the given instance. For example for a given process instance it will list all the input feed instances(if any) and the output feed instances(if any).
An example use case of this command is as follows: Suppose you find out that the data in a feed instance was incorrect and you need to figure out which all process instances consumed this feed instance so that you can reprocess them after correcting the feed instance. You can give the feed instance and it will tell you which process instance produced this feed and which all process instances consumed this feed.
NOTE: 1. instanceTime must be a valid instanceTime e.g. instanceTime of a feed should be in it's validity range on applicable clusters, and it should be in the range of instances produced by the producer process(if any)
2. For processes with inputs like latest() which vary with time the results are not guaranteed to be correct.
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -params -instanceTime "yyyy-MM-dd'T'HH:mm'Z'"
For example: $FALCON_HOME/bin/falcon instance -dependency -type feed -name out -instanceTime 2014-12-15T00:00Z name: producer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:00Z, tags: Output name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:03Z, tags: Input name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:04Z, tags: Input name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:02Z, tags: Input name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:05Z, tags: Input
Response: default/Success!
Request Id: default/1125035965@qtp-503156953-7 - 447be0ad-1d38-4dce-b438-20f3de69b172
dot format. You can use the output and view a graphical representation of DAG using an online graphviz viewer like this.
Usage:
$FALCON_HOME/bin/falcon metadata -lineage -pipeline my-pipeline
pipeline is a mandatory option.
Get the vertex with the specified id.
Usage: $FALCON_HOME/bin/falcon metadata -vertex -id <<id>>
Example: $FALCON_HOME/bin/falcon metadata -vertex -id 4
Get all vertices for a key index given the specified value.
Usage: $FALCON_HOME/bin/falcon metadata -vertices -key <<key>> -value <<value>>
Example: $FALCON_HOME/bin/falcon metadata -vertices -key type -value feed-instance
Get the adjacent vertices or edges of the vertex with the specified direction.
Usage: $FALCON_HOME/bin/falcon metadata -edges -id <<vertex-id>> -direction <<direction>>
Example: $FALCON_HOME/bin/falcon metadata -edges -id 4 -direction both $FALCON_HOME/bin/falcon metadata -edges -id 4 -direction inE
Get the edge with the specified id.
Usage: $FALCON_HOME/bin/falcon metadata -edge -id <<id>>
Example: $FALCON_HOME/bin/falcon metadata -edge -id Q9n-Q-5g
Lists of all dimensions of given type. If the user provides optional param cluster, only the dimensions related to the cluster are listed. Usage: $FALCON_HOME/bin/falcon metadata -list -type [cluster_entity|datasource_entity|feed_entity|process_entity|user|colo|tags|groups|pipelines|replication_metrics]
Optional Args : -cluster <<cluster name>>
Example: $FALCON_HOME/bin/falcon metadata -list -type process_entity -cluster primary-cluster $FALCON_HOME/bin/falcon metadata -list -type tags
To display replication metrics from recipe based replication process and from feed replication. Usage: $FALCON_HOME/bin/falcon metadata -list -type replication_metrics -process/-feed <entity name> Optional Args : -numResults <<value>>
Example: $FALCON_HOME/bin/falcon metadata -list -type replication_metrics -process hdfs-replication $FALCON_HOME/bin/falcon metadata -list -type replication_metrics -feed fs-replication
List all dimensions related to specified Dimension identified by dimension-type and dimension-name. Usage: $FALCON_HOME/bin/falcon metadata -relations -type [cluster_entity|feed_entity|process_entity|user|colo|tags|groups|pipelines] -name <<Dimension Name>>
Example: $FALCON_HOME/bin/falcon metadata -relations -type process_entity -name sample-process
Version returns the current version of Falcon installed. Usage: $FALCON_HOME/bin/falcon admin -version
Status returns the current state of Falcon (running or stopped). Usage: $FALCON_HOME/bin/falcon admin -status
Submit the specified recipe.
Usage: $FALCON_HOME/bin/falcon recipe -name <name> Name of the recipe. User should have defined <name>-template.xml and <name>.properties in the path specified by falcon.recipe.path in client.properties file. falcon.home path is used if its not specified in client.properties file. If its not specified in client.properties file and also if files cannot be found at falcon.home, Falcon CLI will fail.
Optional Args : -tool <recipeToolClassName> Falcon provides a base tool that recipes can override. If this option is not specified the default Recipe Tool RecipeTool defined is used. This option is required if user defines his own recipe tool class.
Example: $FALCON_HOME/bin/falcon recipe -name hdfs-replication