BucketCodec (Hive 3.1.3 API)

java.lang.Object
- java.lang.Enum<BucketCodec>
- - org.apache.hadoop.hive.ql.io.BucketCodec

All Implemented Interfaces:

Serializable, Comparable<BucketCodec>
```
public enum BucketCodec
extends Enum<BucketCodec>
```
This class makes sense of RecordIdentifier.getBucketProperty(). Up until ASF Hive 3.0 this field was simply the bucket ID. Since 3.0 it does bit packing to store several things: top 3 bits - version describing the format (we can only have 8). The rest is version specific - see below.

Enum Constant Summary

Enum Constants
Enum Constant and Description

V0
This is the "legacy" version.

V1
Represents format of "bucket" property in Hive 3.0.

Enum Constants
Enum Constant and Description
`V0` This is the "legacy" version.
`V1` Represents format of "bucket" property in Hive 3.0.

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`abstract int`	`decodeStatementId(int bucketProperty)`
`abstract int`	`decodeWriterId(int bucketProperty)` For bucketed tables this the bucketId, otherwise writerId
`static BucketCodec`	`determineVersion(int bucket)`
`abstract int`	`encode(AcidOutputFormat.Options options)`
`static BucketCodec`	`getCodec(int version)`
`int`	`getVersion()`
`static BucketCodec`	`valueOf(String name)` Returns the enum constant of this type with the specified name.
`static BucketCodec[]`	`values()` Returns an array containing the constants of this enum type, in the order they are declared.

Methods inherited from class java.lang.Enum
clone, compareTo, equals, finalize, getDeclaringClass, hashCode, name, ordinal, toString, valueOf

Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait

- Enum Constant Detail
  - V0
```
public static final BucketCodec V0
```
    This is the "legacy" version. The whole bucket value just has the bucket ID in it. The numeric code for this version is 0. (Assumes bucket ID takes less than 29 bits... which implies top 3 bits are 000 so data written before Hive 3.0 is readable with this scheme).
  - V1
```
public static final BucketCodec V1
```
    Represents format of "bucket" property in Hive 3.0. top 3 bits - version code. next 1 bit - reserved for future next 12 bits - the bucket ID next 4 bits reserved for future remaining 12 bits - the statement ID - 0-based numbering of all statements within a transaction. Each leg of a multi-insert statement gets a separate statement ID. The reserved bits align it so that it easier to interpret it in Hex. Constructs like Merge and Multi-Insert may have multiple tasks writing data that belongs to the same physical bucket file. For example, a Merge stmt with update and insert clauses, (and split update enabled - should be the default in 3.0). A task on behalf of insert may be writing a row into bucket 0 and another task in the update branch may be writing an insert event into bucket 0. Each of these task are writing to different delta directory - distinguished by statement ID. By including both bucket ID and statement ID in RecordIdentifier we ensure that RecordIdentifier is unique. The intent is that sorting rows by RecordIdentifier groups rows in the same physical bucket next to each other. For any row created by a given version of Hive, top 3 bits are constant. The next most significant bits are the bucket ID, then the statement ID. This ensures that SortedDynPartitionOptimizer works which is designed so that each task only needs to keep 1 writer opened at a time. It could be configured such that a single writer sees data for multiple buckets so it must "group" data by bucket ID (and then sort within each bucket as required) which is achieved via sorting by RecordIdentifier which includes the RecordIdentifier.getBucketProperty() which has the actual bucket ID in the high order bits. This scheme also ensures that FileSinkOperator.process(Object, int) works in case there numBuckets > numReducers. (The later could be fixed by changing how writers are initialized in "if (fpaths.acidLastBucket != bucketNum) {")
- Method Detail
  - values
```
public static BucketCodec[] values()
```
    Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:
```
for (BucketCodec c : BucketCodec.values())
    System.out.println(c);
```
    Returns:
    
    an array containing the constants of this enum type, in the order they are declared
  - valueOf
```
public static BucketCodec valueOf(String name)
```
    Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)
    
    Parameters:
    
    name - the name of the enum constant to be returned.
    
    Returns:
    
    the enum constant with the specified name
    
    Throws:
    
    IllegalArgumentException - if this enum type has no constant with the specified name
    
    NullPointerException - if the argument is null
  - determineVersion
```
public static BucketCodec determineVersion(int bucket)
```
  - getCodec
```
public static BucketCodec getCodec(int version)
```
  - decodeWriterId
```
public abstract int decodeWriterId(int bucketProperty)
```
    For bucketed tables this the bucketId, otherwise writerId
  - decodeStatementId
```
public abstract int decodeStatementId(int bucketProperty)
```
  - encode
```
public abstract int encode(AcidOutputFormat.Options options)
```
  - getVersion
```
public int getVersion()
```

Enum BucketCodec

Enum Constant Summary

Method Summary

Methods inherited from class java.lang.Enum

Methods inherited from class java.lang.Object

Enum Constant Detail

V0

V1

Method Detail

values

valueOf

determineVersion

getCodec

decodeWriterId

decodeStatementId

encode

getVersion