public class DataWritableReadSupport
extends org.apache.parquet.hadoop.api.ReadSupport<org.apache.hadoop.io.ArrayWritable>
Modifier and Type | Field and Description |
---|---|
static String |
HIVE_TABLE_AS_PARQUET_SCHEMA |
static String |
PARQUET_COLUMN_INDEX_ACCESS |
Constructor and Description |
---|
DataWritableReadSupport() |
Modifier and Type | Method and Description |
---|---|
static List<String> |
getColumnNames(String columns)
From a string which columns names (including hive column), return a list
of string columns
|
static List<TypeInfo> |
getColumnTypes(String types)
Returns a list of TypeInfo objects from a string which contains column
types strings.
|
static org.apache.parquet.schema.MessageType |
getProjectedSchema(org.apache.parquet.schema.MessageType schema,
List<String> colNames,
List<Integer> colIndexes,
Set<String> nestedColumnPaths)
Generate the projected schema from colIndexes and nested column paths.
|
static org.apache.parquet.schema.MessageType |
getSchemaByIndex(org.apache.parquet.schema.MessageType schema,
List<String> colNames,
List<Integer> colIndexes)
Searches column names by indexes on a given Parquet file schema, and returns its corresponded
Parquet schema types.
|
static org.apache.parquet.schema.MessageType |
getSchemaByName(org.apache.parquet.schema.MessageType schema,
List<String> colNames,
List<TypeInfo> colTypes)
Searches column names by name on a given Parquet message schema, and returns its projected
Parquet schema types.
|
org.apache.parquet.hadoop.api.ReadSupport.ReadContext |
init(org.apache.parquet.hadoop.api.InitContext context)
It creates the readContext for Parquet side with the requested schema during the init phase.
|
org.apache.parquet.io.api.RecordMaterializer<org.apache.hadoop.io.ArrayWritable> |
prepareForRead(org.apache.hadoop.conf.Configuration configuration,
Map<String,String> keyValueMetaData,
org.apache.parquet.schema.MessageType fileSchema,
org.apache.parquet.hadoop.api.ReadSupport.ReadContext readContext)
It creates the hive read support to interpret data from parquet to hive
|
public static final String HIVE_TABLE_AS_PARQUET_SCHEMA
public static final String PARQUET_COLUMN_INDEX_ACCESS
public static List<String> getColumnNames(String columns)
columns
- comma separated list of columnspublic static List<TypeInfo> getColumnTypes(String types)
types
- Comma separated list of typespublic static org.apache.parquet.schema.MessageType getSchemaByName(org.apache.parquet.schema.MessageType schema, List<String> colNames, List<TypeInfo> colTypes)
schema
- Message type schema where to search for column names.colNames
- List of column names.colTypes
- List of column types.public static org.apache.parquet.schema.MessageType getSchemaByIndex(org.apache.parquet.schema.MessageType schema, List<String> colNames, List<Integer> colIndexes)
schema
- Message schema where to search for column names.colNames
- List of column names.colIndexes
- List of column indexes.public static org.apache.parquet.schema.MessageType getProjectedSchema(org.apache.parquet.schema.MessageType schema, List<String> colNames, List<Integer> colIndexes, Set<String> nestedColumnPaths)
schema
- original schemacolNames
- colIndexes
- the index of needed columnsnestedColumnPaths
- the paths for nested columnspublic org.apache.parquet.hadoop.api.ReadSupport.ReadContext init(org.apache.parquet.hadoop.api.InitContext context)
init
in class org.apache.parquet.hadoop.api.ReadSupport<org.apache.hadoop.io.ArrayWritable>
context
- public org.apache.parquet.io.api.RecordMaterializer<org.apache.hadoop.io.ArrayWritable> prepareForRead(org.apache.hadoop.conf.Configuration configuration, Map<String,String> keyValueMetaData, org.apache.parquet.schema.MessageType fileSchema, org.apache.parquet.hadoop.api.ReadSupport.ReadContext readContext)
prepareForRead
in class org.apache.parquet.hadoop.api.ReadSupport<org.apache.hadoop.io.ArrayWritable>
configuration
- // unusedkeyValueMetaData
- fileSchema
- // unusedreadContext
- containing the requested schema and the schema of the hive tableCopyright © 2021 The Apache Software Foundation. All rights reserved.