DataWritableReadSupport (Hive 2.3.9 API)

java.lang.Object
- org.apache.parquet.hadoop.api.ReadSupport<org.apache.hadoop.io.ArrayWritable>
- - org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport

public class DataWritableReadSupport
extends org.apache.parquet.hadoop.api.ReadSupport<org.apache.hadoop.io.ArrayWritable>

A MapWritableReadSupport Manages the translation between Hive and Parquet

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.parquet.hadoop.api.ReadSupport
  org.apache.parquet.hadoop.api.ReadSupport.ReadContext

Field Summary

Fields
Modifier and Type Field and Description

static String HIVE_TABLE_AS_PARQUET_SCHEMA

static String PARQUET_COLUMN_INDEX_ACCESS
- Fields inherited from class org.apache.parquet.hadoop.api.ReadSupport
  PARQUET_READ_SCHEMA

Fields
Modifier and Type	Field and Description
`static String`	`HIVE_TABLE_AS_PARQUET_SCHEMA`
`static String`	`PARQUET_COLUMN_INDEX_ACCESS`

Constructor Summary

Constructors
Constructor and Description

DataWritableReadSupport()

Constructors
Constructor and Description
`DataWritableReadSupport()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static List<String>`	`getColumnNames(String columns)` From a string which columns names (including hive column), return a list of string columns
`static List<TypeInfo>`	`getColumnTypes(String types)` Returns a list of TypeInfo objects from a string which contains column types strings.
`static org.apache.parquet.schema.MessageType`	`getProjectedSchema(org.apache.parquet.schema.MessageType schema, List<String> colNames, List<Integer> colIndexes, Set<String> nestedColumnPaths)` Generate the projected schema from colIndexes and nested column paths.
`static org.apache.parquet.schema.MessageType`	`getSchemaByIndex(org.apache.parquet.schema.MessageType schema, List<String> colNames, List<Integer> colIndexes)` Searches column names by indexes on a given Parquet file schema, and returns its corresponded Parquet schema types.
`static org.apache.parquet.schema.MessageType`	`getSchemaByName(org.apache.parquet.schema.MessageType schema, List<String> colNames, List<TypeInfo> colTypes)` Searches column names by name on a given Parquet message schema, and returns its projected Parquet schema types.
`org.apache.parquet.hadoop.api.ReadSupport.ReadContext`	`init(org.apache.parquet.hadoop.api.InitContext context)` It creates the readContext for Parquet side with the requested schema during the init phase.
`org.apache.parquet.io.api.RecordMaterializer<org.apache.hadoop.io.ArrayWritable>`	`prepareForRead(org.apache.hadoop.conf.Configuration configuration, Map<String,String> keyValueMetaData, org.apache.parquet.schema.MessageType fileSchema, org.apache.parquet.hadoop.api.ReadSupport.ReadContext readContext)` It creates the hive read support to interpret data from parquet to hive

Methods inherited from class org.apache.parquet.hadoop.api.ReadSupport
getSchemaForRead, getSchemaForRead, init

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- HIVE_TABLE_AS_PARQUET_SCHEMA
```
public static final String HIVE_TABLE_AS_PARQUET_SCHEMA
```
  See Also:
  
  Constant Field Values
- PARQUET_COLUMN_INDEX_ACCESS
```
public static final String PARQUET_COLUMN_INDEX_ACCESS
```
  See Also:
  
  Constant Field Values

Constructor Detail
- DataWritableReadSupport
```
public DataWritableReadSupport()
```

Method Detail

getColumnNames
```
public static List<String> getColumnNames(String columns)
```
From a string which columns names (including hive column), return a list of string columns

Parameters:

columns - comma separated list of columns

Returns:

list with virtual columns removed

getColumnTypes
```
public static List<TypeInfo> getColumnTypes(String types)
```
Returns a list of TypeInfo objects from a string which contains column types strings.

Parameters:

types - Comma separated list of types

Returns:

A list of TypeInfo objects.

getSchemaByName

public static org.apache.parquet.schema.MessageType getSchemaByName(org.apache.parquet.schema.MessageType schema,
                                                                    List<String> colNames,
                                                                    List<TypeInfo> colTypes)

Searches column names by name on a given Parquet message schema, and returns its projected Parquet schema types.

Parameters:: schema - Message type schema where to search for column names.; colNames - List of column names.; colTypes - List of column types.
Returns:: A MessageType object of projected columns.

getSchemaByIndex

public static org.apache.parquet.schema.MessageType getSchemaByIndex(org.apache.parquet.schema.MessageType schema,
                                                                     List<String> colNames,
                                                                     List<Integer> colIndexes)

Searches column names by indexes on a given Parquet file schema, and returns its corresponded Parquet schema types.

Parameters:: schema - Message schema where to search for column names.; colNames - List of column names.; colIndexes - List of column indexes.
Returns:: A MessageType object of the column names found.

getProjectedSchema

public static org.apache.parquet.schema.MessageType getProjectedSchema(org.apache.parquet.schema.MessageType schema,
                                                                       List<String> colNames,
                                                                       List<Integer> colIndexes,
                                                                       Set<String> nestedColumnPaths)

Generate the projected schema from colIndexes and nested column paths. If the column is contained by colIndex, it will be added directly, otherwise it will build a group type which contains all required sub types using nestedColumnPaths.

Parameters:: schema - original schema; colNames -; colIndexes - the index of needed columns; nestedColumnPaths - the paths for nested columns
Returns:

init
```
public org.apache.parquet.hadoop.api.ReadSupport.ReadContext init(org.apache.parquet.hadoop.api.InitContext context)
```
It creates the readContext for Parquet side with the requested schema during the init phase.

Overrides:

init in class org.apache.parquet.hadoop.api.ReadSupport<org.apache.hadoop.io.ArrayWritable>

Parameters:

context -

Returns:

the parquet ReadContext

prepareForRead

public org.apache.parquet.io.api.RecordMaterializer<org.apache.hadoop.io.ArrayWritable> prepareForRead(org.apache.hadoop.conf.Configuration configuration,
                                                                                                       Map<String,String> keyValueMetaData,
                                                                                                       org.apache.parquet.schema.MessageType fileSchema,
                                                                                                       org.apache.parquet.hadoop.api.ReadSupport.ReadContext readContext)

It creates the hive read support to interpret data from parquet to hive

Specified by:: prepareForRead in class org.apache.parquet.hadoop.api.ReadSupport<org.apache.hadoop.io.ArrayWritable>
Parameters:: configuration - // unused; keyValueMetaData -; fileSchema - // unused; readContext - containing the requested schema and the schema of the hive table
Returns:: Record Materialize for Hive

Class DataWritableReadSupport

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.parquet.hadoop.api.ReadSupport

Field Summary

Fields inherited from class org.apache.parquet.hadoop.api.ReadSupport

Constructor Summary

Method Summary

Methods inherited from class org.apache.parquet.hadoop.api.ReadSupport

Methods inherited from class java.lang.Object

Field Detail

HIVE_TABLE_AS_PARQUET_SCHEMA

PARQUET_COLUMN_INDEX_ACCESS

Constructor Detail

DataWritableReadSupport

Method Detail

getColumnNames

getColumnTypes

getSchemaByName

getSchemaByIndex

getProjectedSchema

init

prepareForRead