VectorizedParquetRecordReader (Hive 4.0.0-beta-1 API)

java.lang.Object
- org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase
- - org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader

All Implemented Interfaces:

Closeable, AutoCloseable, RowPositionAwareVectorizedRecordReader, org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,VectorizedRowBatch>
```
public class VectorizedParquetRecordReader
extends ParquetRecordReaderBase
implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,VectorizedRowBatch>, RowPositionAwareVectorizedRecordReader
```
This reader is used to read a batch of record from inputsplit, part of the code is referred from Apache Spark and Apache Parquet.

Field Summary

Fields
Modifier and Type	Field and Description
`protected org.apache.parquet.schema.MessageType`	`fileSchema`
`static org.slf4j.Logger`	`LOG`
`protected org.apache.parquet.schema.MessageType`	`requestedSchema`
`protected long`	`totalRowCount` The total number of rows this RecordReader will eventually read.

Fields inherited from class org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase
filePath, fileSplit, filteredBlocks, jobConf, legacyConversionEnabled, parquetInputSplit, parquetMetadata, projectionPusher, reader, schemaSize, serDeStats, skipProlepticConversion, skipTimestampConversion

Constructor Summary

Constructors
Constructor and Description
`VectorizedParquetRecordReader(org.apache.hadoop.mapred.InputSplit oldInputSplit, org.apache.hadoop.mapred.JobConf conf)`
`VectorizedParquetRecordReader(org.apache.hadoop.mapred.InputSplit oldInputSplit, org.apache.hadoop.mapred.JobConf conf, FileMetadataCache metadataCache, DataCache dataCache, org.apache.hadoop.conf.Configuration cacheConf)`
`VectorizedParquetRecordReader(org.apache.hadoop.mapred.InputSplit oldInputSplit, org.apache.hadoop.mapred.JobConf conf, FileMetadataCache metadataCache, DataCache dataCache, org.apache.hadoop.conf.Configuration cacheConf, org.apache.parquet.hadoop.metadata.ParquetMetadata parquetMetadata)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static CacheTag`	`cacheTagOfParquetFile(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration cacheConf, org.apache.hadoop.mapred.JobConf jobConf)`
`void`	`close()`
`org.apache.hadoop.io.NullWritable`	`createKey()`
`VectorizedRowBatch`	`createValue()`
`protected org.apache.parquet.hadoop.metadata.ParquetMetadata`	`getParquetMetadata(org.apache.hadoop.fs.Path path, org.apache.hadoop.mapred.JobConf conf)`
`long`	`getPos()`
`float`	`getProgress()`
`long`	`getRowNumber()` Returns the row position (in the file) of the first row in the last returned batch.
`void`	`initialize(org.apache.parquet.hadoop.ParquetInputSplit split, org.apache.hadoop.mapred.JobConf configuration)`
`boolean`	`next(org.apache.hadoop.io.NullWritable nullWritable, VectorizedRowBatch vectorizedRowBatch)`

Methods inherited from class org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase
getFilteredBlocks, getSplit, getStats, setFilter, setupMetadataAndParquetSplit

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

LOG

public static final org.slf4j.Logger LOG

fileSchema

protected org.apache.parquet.schema.MessageType fileSchema

requestedSchema

protected org.apache.parquet.schema.MessageType requestedSchema

totalRowCount
```
protected long totalRowCount
```
The total number of rows this RecordReader will eventually read. The sum of the rows of all the row groups.

Constructor Detail

VectorizedParquetRecordReader

public VectorizedParquetRecordReader(org.apache.hadoop.mapred.InputSplit oldInputSplit,
                                     org.apache.hadoop.mapred.JobConf conf)
                              throws IOException

Throws:: IOException

VectorizedParquetRecordReader

public VectorizedParquetRecordReader(org.apache.hadoop.mapred.InputSplit oldInputSplit,
                                     org.apache.hadoop.mapred.JobConf conf,
                                     FileMetadataCache metadataCache,
                                     DataCache dataCache,
                                     org.apache.hadoop.conf.Configuration cacheConf,
                                     org.apache.parquet.hadoop.metadata.ParquetMetadata parquetMetadata)
                              throws IOException

Throws:: IOException

VectorizedParquetRecordReader

public VectorizedParquetRecordReader(org.apache.hadoop.mapred.InputSplit oldInputSplit,
                                     org.apache.hadoop.mapred.JobConf conf,
                                     FileMetadataCache metadataCache,
                                     DataCache dataCache,
                                     org.apache.hadoop.conf.Configuration cacheConf)
                              throws IOException

Throws:: IOException

Method Detail

getParquetMetadata

protected org.apache.parquet.hadoop.metadata.ParquetMetadata getParquetMetadata(org.apache.hadoop.fs.Path path,
                                                                                org.apache.hadoop.mapred.JobConf conf)
                                                                         throws IOException

Overrides:: getParquetMetadata in class ParquetRecordReaderBase
Throws:: IOException

initialize

public void initialize(org.apache.parquet.hadoop.ParquetInputSplit split,
                       org.apache.hadoop.mapred.JobConf configuration)
                throws IOException,
                       InterruptedException,
                       HiveException

Throws:: IOException; InterruptedException; HiveException

cacheTagOfParquetFile

public static CacheTag cacheTagOfParquetFile(org.apache.hadoop.fs.Path path,
                                             org.apache.hadoop.conf.Configuration cacheConf,
                                             org.apache.hadoop.mapred.JobConf jobConf)

public boolean next(org.apache.hadoop.io.NullWritable nullWritable,
                    VectorizedRowBatch vectorizedRowBatch)
             throws IOException

Specified by:: next in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,VectorizedRowBatch>
Throws:: IOException

createKey
```
public org.apache.hadoop.io.NullWritable createKey()
```
Specified by:

createKey in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,VectorizedRowBatch>

createValue
```
public VectorizedRowBatch createValue()
```
Specified by:

createValue in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,VectorizedRowBatch>

getPos
```
public long getPos()
            throws IOException
```
Specified by:

getPos in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,VectorizedRowBatch>

Throws:

IOException

close
```
public void close()
           throws IOException
```
Specified by:

close in interface Closeable

Specified by:

close in interface AutoCloseable

Specified by:

close in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,VectorizedRowBatch>

Throws:

IOException

getProgress
```
public float getProgress()
                  throws IOException
```
Specified by:

getProgress in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,VectorizedRowBatch>

Throws:

IOException

getRowNumber
```
public long getRowNumber()
                  throws IOException
```
Description copied from interface: RowPositionAwareVectorizedRecordReader

Returns the row position (in the file) of the first row in the last returned batch.

Specified by:

getRowNumber in interface RowPositionAwareVectorizedRecordReader

Returns:

row position

Throws:

IOException

Class VectorizedParquetRecordReader

Field Summary

Fields inherited from class org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase

Methods inherited from class java.lang.Object

Field Detail

LOG

fileSchema

requestedSchema

totalRowCount

Constructor Detail

VectorizedParquetRecordReader

VectorizedParquetRecordReader

VectorizedParquetRecordReader

Method Detail

getParquetMetadata

initialize

cacheTagOfParquetFile

next

createKey

createValue

getPos

close

getProgress

getRowNumber