tags.
Since the inputs that the process takes can be distributed over a wide range we use the limits by giving "start" and "end" instance for input. Output is only one location so only instance is given.
The timeout specifies, the how long a given instance should wait for input data before being terminated by the workflow engine.
Coming back to instance start time, since a instance will start every 30 mins starting 2010-01-02T01:00Z, the time it is scheduled to start is called its instance time. For example first few instance time for above example are:
Instance Number instance start Time
1 2010-01-02T01:00Z
2 2010-01-02T01:30Z
3 2010-01-02T02:00Z
4 2010-01-02T02:30Z
. .
. .
. .
. .
Now lets go to how to use expression language. Only thing to keep in mind is all EL evaluation are done based on the start time of that instance, and very instance will have different inputs / outputs based on the feed instance given in process definition.
All the parameters in various El can be both positive, zero or negative values. Positive values indicate so many units in future, zero means the base time EL has been resolved to, and negative values indicate corresponding units in past.
__Note: if no instance is created at the resolved time, then the instance immediately before it is considered.__
Falcon currently support following ELs:
* 1. *now(hours,minutes)*: now refer to the instance start time. Hours and minutes given are in reference with the start time of instance. For example now(-2,40) corresponds to feed instance at -2 hr and +40 minutes i.e. feed instance 80 mins before the instance start time. Id user would have given now(0,-80) it would have correspond to the same.
* 2. *today(hours,minutes)*: hours and minutes given in this EL corresponds to instance from the start day of instance start time. Ie. If instance start is at 2010-01-02T01:30Z then today(-3,-20) will mean instance created at 2010-01-01T20:40 and today(3,20) will correspond to 2010-01-02T3:20Z.
* 3. *yesterday(hours,minutes)*: As the name suggest EL yesterday picks up feed instances with respect to start of day yesterday. Hours and minutes are added to the 00 hours starting yesterday, Example: yesterday(24,30) will actually correspond to 00:30 am of today, for 2010-01-02T01:30Z this would mean 2010-01-02:00:30 feed.
* 7. *lastYear(month,day,hour,minute)*: This is exactly similarly to currentYear in usage> only difference being start reference is taken to start of previous year. For example: lastYear(4,2,2,20) will correspond to feed instance created at 2009-05-03T02:20Z and lastYear(12,2,2,20) will correspond to feed at 2010-01-03T02:20Z.
* 4. *currentMonth(day,hour,minute)*: Current month takes the reference to start of the month with respect to instance start time. One thing to keep in mind is that day is added to the first day of the month. So the value of day is the number of days you want to add to the first day of the month. For example: for instance start time 2010-01-12T01:30Z and El as currentMonth(3,2,40) will correspond to feed created at 2010-01-04T02:40Z and currentMonth(0,0,0) will mean 2010-01-01T00:00Z.
* 5. *lastMonth(day,hour,minute)*: Parameters for lastMonth is same as currentMonth, only difference being the reference is shifted to one month back. For instance start 2010-01-12T01:30Z lastMonth(2,3,30) will correspond to feed instance at 2009-12-03:T03:30Z
* 6. *currentYear(month,day,hour,minute)*: The month,day,hour, minutes in the parameter are added with reference to the start of year of instance start time. For our example start time 2010-01-02:00:30 reference will go back to 2010-01-01:T00:00Z. Also similar to days, months are added to the 1st month that Jan. So currentYear(0,2,2,20) will mean 2010-01-03T02:20Z while currentYear(11,2,2,20) will mean 2010-12-03T02:20Z
* 7. *lastYear(month,day,hour,minute)*: This is exactly similarly to currentYear in usage> only difference being start reference is taken to start of previous year. For example: lastYear(4,2,2,20) will corrospond to feed insatnce created at 2009-05-03T02:20Z and lastYear(12,2,2,20) will corrospond to feed at 2010-01-03T02:20Z.
* 8. *latest(number of latest instance)*: This will simply make you input consider the number of latest available instance of the feed given as parameter. For example: latest(0) will consider the last available instance of feed, where as latest latest(-1) will consider second last available feed and latest(-3) will consider 4th last available feed.
* 9. *currentWeek(weekDayName,hour,minute)*: This is similar to currentMonth in the sense that it returns a relative time with respect to the instance start time, considering the day name provided as input as the start of the week. The day names can be one of SUN, MON, TUE, WED, THU, FRI, SAT.
* 10. *lastWeek(weekDayName,hour,minute)*: This is typically 7 days less than what the currentWeek returns for similar parameters.
---++ Lineage
Falcon adds the ability to capture lineage for both entities and its associated instances. It
also captures the metadata tags associated with each of the entities as relationships. The
following relationships are captured:
* owner of entities - User
* data classification tags
* groups defined in feeds
* Relationships between entities
* Clusters associated with Feed and Process entity
* Input and Output feeds for a Process
* Instances refer to corresponding entities
Lineage is exposed in 3 ways:
* REST API
* CLI
* Dashboard - Interactive lineage for Process instances
This feature is enabled by default but could be disabled by removing the following from:
config name: *.application.services
config value: org.apache.falcon.metadata.MetadataMappingService
Lineage is only captured for Process executions. A future release will capture lineage for
lifecycle policies such as replication and retention.
---++Security
Security is detailed in [[Security][Security]].
---++ Recipes
Recipes is detailed in [[Recipes][Recipes]].
---++ Monitoring
Monitoring and Operationalizing Falcon is detailed in [[Operability][Operability]].
---++ Email Notification
Notification for instance completion in Falcon is defined in [[FalconEmailNotification][Falcon Email Notification]].
---++ Backwards Compatibility
Backwards compatibility instructions are [[Compatibility][detailed here.]]
---++ Proxyuser support
Falcon supports impersonation or proxyuser functionality (identical to Hadoop proxyuser capabilities and conceptually
similar to Unix 'sudo').
Proxyuser enables Falcon clients to submit entities on behalf of other users. Falcon will utilize Hadoop core's hadoop-auth
module to implement this functionality.
Because proxyuser is a powerful capability, Falcon provides the following restriction capabilities (similar to Hadoop):
* Proxyuser is an explicit configuration on per proxyuser user basis.
* A proxyuser user can be restricted to impersonate other users from a set of hosts.
* A proxyuser user can be restricted to impersonate users belonging to a set of groups.
There are 2 configuration properties needed in runtime properties to set up a proxyuser:
* falcon.service.ProxyUserService.proxyuser.#USER#.hosts: hosts from where the user #USER# can impersonate other users.
* falcon.service.ProxyUserService.proxyuser.#USER#.groups: groups the users being impersonated by user #USER# must belong to.
If these configurations are not present, impersonation will not be allowed and connection will fail. If more lax security is preferred,
the wildcard value * may be used to allow impersonation from any host or of any user, although this is recommended only for testing/development.
-doAs option via CLI or doAs query parameter can be appended if using API to enable impersonation.