Falcon - Feed management and data processing platform

Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters.

Why?

  • Establishes relationship between various data and processing elements on a Hadoop environment

  • Feed management services such as feed retention, replications across clusters, archival etc.

  • Easy to onboard new workflows/pipelines, with support for late data handling, retry policies

  • Integration with metastore/catalog such as Hive/HCatalog

  • Provide notification to end customer based on availability of feed groups
(logical group of related feeds, which are likely to be used together)

  • Enables use cases for local processing in colo and global aggregations

  • Captures Lineage information for feeds and processes

Getting Started

Start with these simple steps to install an falcon instance Simple setup. Also refer to Falcon architecture and documentation in Documentation. On boarding describes steps to on-board a pipeline to Falcon. It also gives a sample pipeline for reference. Entity Specification gives complete details of all Falcon entities.

Falcon CLI implements Falcon's RESTful API and describes various options for the command line utility provided by Falcon.

Falcon provides OOTB lifecycle management for Tables in Hive (HCatalog) such as table replication for BCP and table eviction. Falcon also enforces Security on protected resources and enables SSL.

Licensing Information

Falcon is distributed under Apache License 2.0.