UpdateHiveTable

Description:

This processor uses a Hive JDBC connection and incoming records to generate any Hive 1.2 table changes needed to support the incoming records.

Tags:

hive, metadata, jdbc, database, table

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

NameDefault ValueAllowable ValuesDescription
Record ReaderController Service API:
RecordReaderFactory
Implementations: JsonPathReader
AvroReader
XMLReader
WindowsEventLogReader
ReaderLookup
Syslog5424Reader
GrokReader
ScriptedReader
CSVReader
SyslogReader
ParquetReader
JsonTreeReader
CEFReader
The service for reading incoming flow files. The reader is only used to determine the schema of the records, the actual records will not be processed.
Hive Database Connection Pooling ServiceController Service API:
HiveDBCPService
Implementation: HiveConnectionPool
The Hive Controller Service that is used to obtain connection(s) to the Hive database
Table NameThe name of the database table to update. If the table does not exist, then it will either be created or an error thrown, depending on the value of the Create Table property.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Partition ClauseSpecifies a comma-separated list of attribute names and optional data types corresponding to the partition columns of the target table. Simply put, if the table is partitioned or is to be created with partitions, each partition name should be an attribute on the FlowFile and listed in this property. This assumes all incoming records belong to the same partition and the partition columns are not fields in the record. An example of specifying this field is if PartitionRecord is upstream and two partition columns 'name' (of type string) and 'age' (of type integer) are used, then this property can be set to 'name string, age int'. The data types are optional and if partition(s) are to be created they will default to string type if not specified. For non-string primitive types, specifying the data type for existing partition columns is helpful for interpreting the partition value(s). If the table exists, the data types need not be specified (and are ignored in that case). This property must be set if the table is partitioned, and there must be an attribute for each partition column in the table. The values of the attributes will be used as the partition values, and the resulting output.path attribute value will reflect the location of the partition in the filesystem (for use downstream in processors such as PutHDFS).
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Create Table StrategyFail If Not Exists
  • Create If Not Exists Create a table with the given schema if it does not already exist
  • Fail If Not Exists If the target does not already exist, log an error and route the flowfile to failure
Specifies how to process the target table when it does not exist (create it, fail, e.g.).
Create Table Management StrategyManaged
  • Managed Any tables created by this processor will be managed tables (see Hive documentation for details).
  • External Any tables created by this processor will be external tables located at the `External Table Location` property value.
  • Use 'hive.table.management.strategy' Attribute Inspects the 'hive.table.management.strategy' FlowFile attribute to determine the table management strategy. The value of this attribute must be a case-insensitive match to one of the other allowable values (Managed, External, e.g.).
Specifies (when a table is to be created) whether the table is a managed table or an external table. Note that when External is specified, the 'External Table Location' property must be specified. If the 'hive.table.management.strategy' value is selected, 'External Table Location' must still be specified, but can contain Expression Language or be set to the empty string, and is ignored when the attribute evaluates to 'Managed'.

This Property is only considered if the <Create Table Strategy> Property has a value of "Create If Not Exists".
External Table LocationSpecifies (when an external table is to be created) the file path (in HDFS, e.g.) to store table data.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

This Property is only considered if the <Create Table Management Strategy> Property is set to one of the following values: "Use 'hive.table.management.strategy' Attribute", "External"
Create Table Storage FormatTEXTFILE
  • TEXTFILE Stored as plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.
  • SEQUENCEFILE Stored as compressed Sequence Files.
  • ORC Stored as ORC file format. Supports ACID Transactions & Cost-based Optimizer (CBO). Stores column-level metadata.
  • PARQUET Stored as Parquet format for the Parquet columnar storage format.
  • AVRO Stored as Avro format.
  • RCFILE Stored as Record Columnar File format.
If a table is to be created, the specified storage format will be used.

This Property is only considered if the <Create Table Strategy> Property has a value of "Create If Not Exists".
Update Field Namesfalse
  • true
  • false
This property indicates whether to update the output schema such that the field names are set to the exact column names from the specified table. This should be used if the incoming record field names may not match the table's column names in terms of upper- and lower-case. For example, this property should be set to true if the output FlowFile (and target table storage) is Avro format, as Hive/Impala expects the field names to match the column names exactly.
Record WriterController Service API:
RecordSetWriterFactory
Implementations: AvroRecordSetWriter
ScriptedRecordSetWriter
JsonRecordSetWriter
ParquetRecordSetWriter
RecordSetWriterLookup
FreeFormTextRecordSetWriter
XMLRecordSetWriter
CSVRecordSetWriter
Specifies the Controller Service to use for writing results to a FlowFile. The Record Writer should use Inherit Schema to emulate the inferred schema behavior, i.e. an explicit schema need not be defined in the writer, and will be supplied by the same logic used to infer the schema from the column types. If Create Table Strategy is set 'Create If Not Exists', the Record Writer's output format must match the Record Reader's format in order for the data to be placed in the created table location. Note that this property is only used if 'Update Field Names' is set to true and the field names do not all match the column names exactly. If no update is needed for any field names (or 'Update Field Names' is false), the Record Writer is not used and instead the input FlowFile is routed to success or failure without modification.

This Property is only considered if the <Update Field Names> Property has a value of "true".
Query Timeout0Sets the number of seconds the driver will wait for a query to execute. A value of 0 means no timeout. NOTE: Non-zero values may not be supported by the driver.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

Relationships:

NameDescription
successA FlowFile containing records routed to this relationship after the record has been successfully transmitted to Hive.
failureA FlowFile containing records routed to this relationship if the record could not be transmitted to Hive.

Reads Attributes:

NameDescription
hive.table.management.strategyThis attribute is read if the 'Table Management Strategy' property is configured to use the value of this attribute. The value of this attribute should correspond (ignoring case) to a valid option of the 'Table Management Strategy' property.

Writes Attributes:

NameDescription
output.tableThis attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the target table name.
output.pathThis attribute is written on the flow files routed to the 'success' and 'failure' relationships, and contains the path on the file system to the table (or partition location if the table is partitioned).
mime.typeSets the mime.type attribute to the MIME Type specified by the Record Writer, only if a Record Writer is specified and Update Field Names is 'true'.
record.countSets the number of records in the FlowFile, only if a Record Writer is specified and Update Field Names is 'true'.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.