UpdateHiveTable

Description:

This processor uses a Hive JDBC connection and incoming records to generate any Hive 1.2 table changes needed to support the incoming records.

Tags:

hive, metadata, jdbc, database, table

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Name	Default Value	Allowable Values	Description
Record Reader		Controller Service API: RecordReaderFactory Implementations: JsonPathReader AvroReader XMLReader WindowsEventLogReader ReaderLookup Syslog5424Reader GrokReader ScriptedReader CSVReader SyslogReader ParquetReader JsonTreeReader CEFReader	The service for reading incoming flow files. The reader is only used to determine the schema of the records, the actual records will not be processed.
Hive Database Connection Pooling Service		Controller Service API: HiveDBCPService Implementation: HiveConnectionPool	The Hive Controller Service that is used to obtain connection(s) to the Hive database
Table Name			The name of the database table to update. If the table does not exist, then it will either be created or an error thrown, depending on the value of the Create Table property. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Partition Clause			Specifies a comma-separated list of attribute names and optional data types corresponding to the partition columns of the target table. Simply put, if the table is partitioned or is to be created with partitions, each partition name should be an attribute on the FlowFile and listed in this property. This assumes all incoming records belong to the same partition and the partition columns are not fields in the record. An example of specifying this field is if PartitionRecord is upstream and two partition columns 'name' (of type string) and 'age' (of type integer) are used, then this property can be set to 'name string, age int'. The data types are optional and if partition(s) are to be created they will default to string type if not specified. For non-string primitive types, specifying the data type for existing partition columns is helpful for interpreting the partition value(s). If the table exists, the data types need not be specified (and are ignored in that case). This property must be set if the table is partitioned, and there must be an attribute for each partition column in the table. The values of the attributes will be used as the partition values, and the resulting output.path attribute value will reflect the location of the partition in the filesystem (for use downstream in processors such as PutHDFS). Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Create Table Strategy	Fail If Not Exists	Create If Not Exists Fail If Not Exists	Specifies how to process the target table when it does not exist (create it, fail, e.g.).
Create Table Management Strategy	Managed	Managed External Use 'hive.table.management.strategy' Attribute	Specifies (when a table is to be created) whether the table is a managed table or an external table. Note that when External is specified, the 'External Table Location' property must be specified. If the 'hive.table.management.strategy' value is selected, 'External Table Location' must still be specified, but can contain Expression Language or be set to the empty string, and is ignored when the attribute evaluates to 'Managed'. This Property is only considered if the <Create Table Strategy> Property has a value of "Create If Not Exists".
External Table Location			Specifies (when an external table is to be created) the file path (in HDFS, e.g.) to store table data. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) This Property is only considered if the <Create Table Management Strategy> Property is set to one of the following values: "Use 'hive.table.management.strategy' Attribute", "External"
Create Table Storage Format	TEXTFILE	TEXTFILE SEQUENCEFILE ORC PARQUET AVRO RCFILE	If a table is to be created, the specified storage format will be used. This Property is only considered if the <Create Table Strategy> Property has a value of "Create If Not Exists".
Update Field Names	false	true false	This property indicates whether to update the output schema such that the field names are set to the exact column names from the specified table. This should be used if the incoming record field names may not match the table's column names in terms of upper- and lower-case. For example, this property should be set to true if the output FlowFile (and target table storage) is Avro format, as Hive/Impala expects the field names to match the column names exactly.
Record Writer		Controller Service API: RecordSetWriterFactory Implementations: AvroRecordSetWriter ScriptedRecordSetWriter JsonRecordSetWriter ParquetRecordSetWriter RecordSetWriterLookup FreeFormTextRecordSetWriter XMLRecordSetWriter CSVRecordSetWriter	Specifies the Controller Service to use for writing results to a FlowFile. The Record Writer should use Inherit Schema to emulate the inferred schema behavior, i.e. an explicit schema need not be defined in the writer, and will be supplied by the same logic used to infer the schema from the column types. If Create Table Strategy is set 'Create If Not Exists', the Record Writer's output format must match the Record Reader's format in order for the data to be placed in the created table location. Note that this property is only used if 'Update Field Names' is set to true and the field names do not all match the column names exactly. If no update is needed for any field names (or 'Update Field Names' is false), the Record Writer is not used and instead the input FlowFile is routed to success or failure without modification. This Property is only considered if the <Update Field Names> Property has a value of "true".
Query Timeout	0		Sets the number of seconds the driver will wait for a query to execute. A value of 0 means no timeout. NOTE: Non-zero values may not be supported by the driver. Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

Relationships:

Name	Description
success	A FlowFile containing records routed to this relationship after the record has been successfully transmitted to Hive.
failure	A FlowFile containing records routed to this relationship if the record could not be transmitted to Hive.

Reads Attributes:

Name	Description
hive.table.management.strategy	This attribute is read if the 'Table Management Strategy' property is configured to use the value of this attribute. The value of this attribute should correspond (ignoring case) to a valid option of the 'Table Management Strategy' property.

Writes Attributes:

Name	Description
output.table	This attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the target table name.
output.path	This attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the path on the file system to the table (or partition location if the table is partitioned).
mime.type	Sets the mime.type attribute to the MIME Type specified by the Record Writer, only if a Record Writer is specified and Update Field Names is 'true'.
record.count	Sets the number of records in the FlowFile, only if a Record Writer is specified and Update Field Names is 'true'.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.