PutHiveQL

Description:

Executes a HiveQL DDL/DML command (UPDATE, INSERT, e.g.). The content of an incoming FlowFile is expected to be the HiveQL command to execute. The HiveQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention hiveql.args.N.type and hiveql.args.N.value, where N is a positive integer. The hiveql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.

Tags:

sql, hive, put, database, update, insert

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

NameDefault ValueAllowable ValuesDescription
Hive Database Connection Pooling ServiceController Service API:
HiveDBCPService
Implementation: HiveConnectionPool
The Hive Controller Service that is used to obtain connection(s) to the Hive database
Batch Size100The preferred number of FlowFiles to put to the database in a single transaction
Character SetUTF-8Specifies the character set of the record data.
Statement Delimiter;Statement Delimiter used to separate SQL statements in a multiple statement script
Rollback On Failurefalse
  • true
  • false
Specify how to handle error. By default (false), if an error occurs while processing a FlowFile, the FlowFile will be routed to 'failure' or 'retry' relationship based on error type, and processor can continue with next FlowFile. Instead, you may want to rollback currently processed FlowFiles and stop further processing immediately. In that case, you can do so by enabling this 'Rollback On Failure' property. If enabled, failed FlowFiles will stay in the input relationship without penalizing it and being processed repeatedly until it gets processed successfully or removed by other means. It is important to set adequate 'Yield Duration' to avoid retrying too frequently.

Relationships:

NameDescription
retryA FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed
successA FlowFile is routed to this relationship after the database is successfully updated
failureA FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, such as an invalid query or an integrity constraint violation

Reads Attributes:

NameDescription
hiveql.args.N.typeIncoming FlowFiles are expected to be parametrized HiveQL statements. The type of each Parameter is specified as an integer that represents the JDBC Type of the parameter.
hiveql.args.N.valueIncoming FlowFiles are expected to be parametrized HiveQL statements. The value of the Parameters are specified as hiveql.args.1.value, hiveql.args.2.value, hiveql.args.3.value, and so on. The type of the hiveql.args.1.value Parameter is specified by the hiveql.args.1.type attribute.

Writes Attributes:

NameDescription
query.input.tablesThis attribute is written on the flow files routed to the 'success' relationships, and contains input table names (if any) in comma delimited 'databaseName.tableName' format.
query.output.tablesThis attribute is written on the flow files routed to the 'success' relationships, and contains the target table names in 'databaseName.tableName' format.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.

See Also:

SelectHiveQL