AbstractRecordWriter (Hive 3.1.3 API)

java.lang.Object
- org.apache.hive.streaming.AbstractRecordWriter

All Implemented Interfaces:

RecordWriter

Direct Known Subclasses:

StrictDelimitedInputWriter, StrictJsonWriter, StrictRegexWriter
```
public abstract class AbstractRecordWriter
extends Object
implements RecordWriter
```

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

protected static class AbstractRecordWriter.OrcMemoryPressureMonitor

Nested Classes
Modifier and Type	Class and Description
`protected static class`	`AbstractRecordWriter.OrcMemoryPressureMonitor`

Field Summary

Fields
Modifier and Type	Field and Description
`protected AcidOutputFormat<?,?>`	`acidOutputFormat`
`protected Set<String>`	`addedPartitions`
`protected boolean`	`autoFlush`
`protected Object[]`	`bucketFieldData`
`protected List<Integer>`	`bucketIds`
`protected ObjectInspector[]`	`bucketObjInspectors`
`protected StructField[]`	`bucketStructFields`
`protected HiveConf`	`conf`
`protected StreamingConnection`	`conn`
`protected Long`	`curBatchMaxWriteId`
`protected Long`	`curBatchMinWriteId`
`protected String`	`defaultPartitionName`
`protected org.apache.hadoop.fs.FileSystem`	`fs`
`protected String`	`fullyQualifiedTableName`
`protected HeapMemoryMonitor`	`heapMemoryMonitor`
`protected long`	`ingestSizeBytes`
`protected long`	`ingestSizeThreshold`
`protected List<String>`	`inputColumns`
`protected StructObjectInspector`	`inputRowObjectInspector`
`protected List<String>`	`inputTypes`
`protected boolean`	`isBucketed`
`protected String`	`lineDelimiter`
`protected AtomicBoolean`	`lowMemoryCanary`
`protected float`	`memoryUsageThreshold`
`protected ObjectInspector`	`outputRowObjectInspector`
`protected List<String>`	`partitionColumns`
`protected Object[]`	`partitionFieldData`
`protected ObjectInspector[]`	`partitionObjInspectors`
`protected Map<String,org.apache.hadoop.fs.Path>`	`partitionPaths`
`protected StructField[]`	`partitionStructFields`
`protected Table`	`table`
`protected int`	`totalBuckets`
`protected Map<String,List<RecordUpdater>>`	`updaters`

Constructor Summary

Constructors
Constructor and Description

AbstractRecordWriter(String lineDelimiter)

Constructors
Constructor and Description
`AbstractRecordWriter(String lineDelimiter)`

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`protected void`	`checkAutoFlush()`
`void`	`close()` Close the RecordUpdater.
`protected RecordUpdater`	`createRecordUpdater(org.apache.hadoop.fs.Path partitionPath, int bucketId, Long minWriteId, Long maxWriteID)`
`abstract AbstractSerDe`	`createSerde()` Create SerDe for the record writer.
`abstract Object`	`encode(byte[] record)` Encode a record as an Object that Hive can read with the ObjectInspector associated with the serde returned by `createSerde()`.
`void`	`flush()` Flush records from buffer.
`protected int`	`getBucket(Object row)`
`protected List<Integer>`	`getBucketColIDs(List<String> bucketCols, List<FieldSchema> cols)`
`protected Object[]`	`getBucketFields(Object row)`
`protected static ObjectInspector[]`	`getObjectInspectorsForBucketedCols(List<Integer> bucketIds, StructObjectInspector recordObjInspector)`
`protected Object[]`	`getPartitionFields(Object row)`
`Set<String>`	`getPartitions()` Get the set of partitions that were added by the record writer.
`protected List<String>`	`getPartitionValues(Object row)`
`protected RecordUpdater`	`getRecordUpdater(List<String> partitionValues, int bucketId)`
`protected String`	`getWatermark(String partition)` used to tag error msgs to provided some breadcrumbs
`void`	`init(StreamingConnection conn, long minWriteId, long maxWriteId)` Initialize record writer.
`protected List<RecordUpdater>`	`initializeBuckets()`
`protected void`	`logStats(String prefix)`
`protected void`	`prepareBucketingFields()`
`protected void`	`preparePartitioningFields()`
`protected void`	`setupMemoryMonitoring()`
`void`	`write(long writeId, byte[] record)` Writes using a hive RecordUpdater.
`void`	`write(long writeId, InputStream inputStream)` Writes using a hive RecordUpdater.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

conf
```
protected HiveConf conf
```

conn
```
protected StreamingConnection conn
```

table
```
protected Table table
```

inputColumns
```
protected List<String> inputColumns
```

inputTypes
```
protected List<String> inputTypes
```

fullyQualifiedTableName

protected String fullyQualifiedTableName

updaters

protected Map<String,List<RecordUpdater>> updaters

partitionPaths

protected Map<String,org.apache.hadoop.fs.Path> partitionPaths

addedPartitions
```
protected Set<String> addedPartitions
```

inputRowObjectInspector

protected StructObjectInspector inputRowObjectInspector

outputRowObjectInspector

protected ObjectInspector outputRowObjectInspector

partitionColumns

protected List<String> partitionColumns

partitionObjInspectors

protected ObjectInspector[] partitionObjInspectors

partitionStructFields

protected StructField[] partitionStructFields

partitionFieldData
```
protected Object[] partitionFieldData
```

bucketObjInspectors

protected ObjectInspector[] bucketObjInspectors

bucketStructFields

protected StructField[] bucketStructFields

bucketFieldData
```
protected Object[] bucketFieldData
```

bucketIds
```
protected List<Integer> bucketIds
```

totalBuckets
```
protected int totalBuckets
```

defaultPartitionName
```
protected String defaultPartitionName
```

isBucketed
```
protected boolean isBucketed
```

acidOutputFormat

protected AcidOutputFormat<?,?> acidOutputFormat

curBatchMinWriteId
```
protected Long curBatchMinWriteId
```

curBatchMaxWriteId
```
protected Long curBatchMaxWriteId
```

lineDelimiter
```
protected final String lineDelimiter
```

heapMemoryMonitor

protected HeapMemoryMonitor heapMemoryMonitor

lowMemoryCanary

protected AtomicBoolean lowMemoryCanary

ingestSizeBytes
```
protected long ingestSizeBytes
```

autoFlush
```
protected boolean autoFlush
```

memoryUsageThreshold
```
protected float memoryUsageThreshold
```

ingestSizeThreshold
```
protected long ingestSizeThreshold
```

fs

protected org.apache.hadoop.fs.FileSystem fs

Constructor Detail

AbstractRecordWriter

public AbstractRecordWriter(String lineDelimiter)

Method Detail
- init
```
public void init(StreamingConnection conn,
                 long minWriteId,
                 long maxWriteId)
          throws StreamingException
```
  Description copied from interface: RecordWriter
  
  Initialize record writer.
  
  Specified by:
  
  init in interface RecordWriter
  
  Parameters:
  
  conn - - streaming connection
  
  minWriteId - - min write id
  
  maxWriteId - - max write id
  
  Throws:
  
  StreamingException - - thrown when initialization failed
- setupMemoryMonitoring
```
protected void setupMemoryMonitoring()
```
- prepareBucketingFields
```
protected void prepareBucketingFields()
```
- preparePartitioningFields
```
protected void preparePartitioningFields()
```
- getWatermark
```
protected String getWatermark(String partition)
```
  used to tag error msgs to provided some breadcrumbs
- getBucketColIDs
```
protected List<Integer> getBucketColIDs(List<String> bucketCols,
                                        List<FieldSchema> cols)
```
- createSerde
```
public abstract AbstractSerDe createSerde()
                                   throws SerializationError
```
  Create SerDe for the record writer.
  
  Returns:
  
  - serde
  
  Throws:
  
  SerializationError - - if serde cannot be created.
- encode
```
public abstract Object encode(byte[] record)
                       throws SerializationError
```
  Encode a record as an Object that Hive can read with the ObjectInspector associated with the serde returned by createSerde(). This is public so that test frameworks can use it.
  
  Parameters:
  
  record - record to be deserialized
  
  Returns:
  
  deserialized record as an Object
  
  Throws:
  
  SerializationError - - any error during serialization or deserialization of record
- getBucket
```
protected int getBucket(Object row)
```
- getPartitionValues
```
protected List<String> getPartitionValues(Object row)
```
- flush
```
public void flush()
           throws StreamingIOFailure
```
  Description copied from interface: RecordWriter
  
  Flush records from buffer. Invoked by TransactionBatch.commitTransaction()
  
  Specified by:
  
  flush in interface RecordWriter
  
  Throws:
  
  StreamingIOFailure
- close
```
public void close()
           throws StreamingIOFailure
```
  Description copied from interface: RecordWriter
  
  Close the RecordUpdater. Invoked by TransactionBatch.close()
  
  Specified by:
  
  close in interface RecordWriter
  
  Throws:
  
  StreamingIOFailure
- getObjectInspectorsForBucketedCols
```
protected static ObjectInspector[] getObjectInspectorsForBucketedCols(List<Integer> bucketIds,
                                                                      StructObjectInspector recordObjInspector)
```
- getBucketFields
```
protected Object[] getBucketFields(Object row)
```
- getPartitionFields
```
protected Object[] getPartitionFields(Object row)
```
- write
```
public void write(long writeId,
                  InputStream inputStream)
           throws StreamingException
```
  Description copied from interface: RecordWriter
  
  Writes using a hive RecordUpdater. The specified input stream will be automatically closed by the API after reading all the records out of it.
  
  Specified by:
  
  write in interface RecordWriter
  
  Parameters:
  
  writeId - - the write ID of the table mapping to Txn in which the write occurs
  
  inputStream - - the record to be written
  
  Throws:
  
  StreamingException - - thrown when write fails
- write
```
public void write(long writeId,
                  byte[] record)
           throws StreamingException
```
  Description copied from interface: RecordWriter
  
  Writes using a hive RecordUpdater.
  
  Specified by:
  
  write in interface RecordWriter
  
  Parameters:
  
  writeId - - the write ID of the table mapping to Txn in which the write occurs
  
  record - - the record to be written
  
  Throws:
  
  StreamingException - - thrown when write fails
- checkAutoFlush
```
protected void checkAutoFlush()
                       throws StreamingIOFailure
```
  Throws:
  
  StreamingIOFailure
- getPartitions
```
public Set<String> getPartitions()
```
  Description copied from interface: RecordWriter
  
  Get the set of partitions that were added by the record writer.
  
  Specified by:
  
  getPartitions in interface RecordWriter
  
  Returns:
  
  - set of partitions
- createRecordUpdater
```
protected RecordUpdater createRecordUpdater(org.apache.hadoop.fs.Path partitionPath,
                                            int bucketId,
                                            Long minWriteId,
                                            Long maxWriteID)
                                     throws IOException
```
  Throws:
  
  IOException
- getRecordUpdater
```
protected RecordUpdater getRecordUpdater(List<String> partitionValues,
                                         int bucketId)
                                  throws StreamingIOFailure
```
  Throws:
  
  StreamingIOFailure
- initializeBuckets
```
protected List<RecordUpdater> initializeBuckets()
```
- logStats
```
protected void logStats(String prefix)
```

Class AbstractRecordWriter

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

conf

conn

table

inputColumns

inputTypes

fullyQualifiedTableName

updaters

partitionPaths

addedPartitions

inputRowObjectInspector

outputRowObjectInspector

partitionColumns

partitionObjInspectors

partitionStructFields

partitionFieldData

bucketObjInspectors

bucketStructFields

bucketFieldData

bucketIds

totalBuckets

defaultPartitionName

isBucketed

acidOutputFormat

curBatchMinWriteId

curBatchMaxWriteId

lineDelimiter

heapMemoryMonitor

lowMemoryCanary

ingestSizeBytes

autoFlush

memoryUsageThreshold

ingestSizeThreshold

fs

Constructor Detail

AbstractRecordWriter

Method Detail

init

setupMemoryMonitoring

prepareBucketingFields

preparePartitioningFields

getWatermark

getBucketColIDs

createSerde

encode

getBucket

getPartitionValues

flush

close

getObjectInspectorsForBucketedCols

getBucketFields

getPartitionFields

write

write

checkAutoFlush

getPartitions

createRecordUpdater

getRecordUpdater

initializeBuckets

logStats