HybridHashTableContainer (Hive 2.1.1 API)

java.lang.Object
- org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer

All Implemented Interfaces:

MapJoinTableContainer, MapJoinTableContainerDirectAccess
```
public class HybridHashTableContainer
extends Object
implements MapJoinTableContainer, MapJoinTableContainerDirectAccess
```
Hash table container that can have many partitions -- each partition has its own hashmap, as well as row container for small table and big table. The purpose is to distribute rows into multiple partitions so that when the entire small table cannot fit into memory, we are still able to perform hash join, by processing them recursively. Partitions that can fit in memory will be processed first, and then every spilled partition will be restored and processed one by one.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`HybridHashTableContainer.HashPartition` This class encapsulates the triplet together since they are closely related to each other The triplet: hashmap (either in memory or on disk), small table container, big table container

Nested classes/interfaces inherited from interface org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainer
MapJoinTableContainer.ReusableGetAdaptor

Constructor Summary

Constructors
Constructor and Description
`HybridHashTableContainer(org.apache.hadoop.conf.Configuration hconf, long keyCount, long memoryAvailable, long estimatedTableSize, HybridHashTableConf nwayConf)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static int`	`calcNumPartitions(long memoryThreshold, long dataSize, int minNumParts, int minWbSize)` Calculate how many partitions are needed.
`void`	`clear()` Clears the contents of the table.
`MapJoinTableContainer.ReusableGetAdaptor`	`createGetter(MapJoinKey keyTypeFromLoader)` Creates reusable get adaptor that can be used to retrieve rows from the table based on either vectorized or non-vectorized input rows to MapJoinOperator.
`void`	`dumpMetrics()`
`void`	`dumpStats()`
`MapJoinKey`	`getAnyKey()`
`HybridHashTableContainer.HashPartition[]`	`getHashPartitions()`
`LazyBinaryStructObjectInspector`	`getInternalValueOi()`
`long`	`getMemoryThreshold()`
`byte[]`	`getNotNullMarkers()`
`byte[]`	`getNullMarkers()`
`int`	`getNumPartitions()`
`boolean[]`	`getSortableSortOrders()`
`long`	`getTableRowSize()`
`int`	`getToSpillPartitionId()` Gets the partition Id into which to spill the big table row
`int`	`getTotalInMemRowCount()`
`MapJoinBytesTableContainer.KeyValueHelper`	`getWriteHelper()`
`boolean`	`hasSpill()` Checks if the container has spilled any data onto disk.
`boolean`	`isHashMapSpilledOnCreation(int partitionId)` Check if the hash table of a specified partition has been "spilled" to disk when it was created.
`boolean`	`isOnDisk(int partitionId)` Check if the hash table of a specified partition is on disk (or "spilled" on creation)
`void`	`put(org.apache.hadoop.io.Writable currentKey, org.apache.hadoop.io.Writable currentValue)`
`MapJoinKey`	`putRow(org.apache.hadoop.io.Writable currentKey, org.apache.hadoop.io.Writable currentValue)` Adds row from input to the table.
`void`	`seal()` Indicates to the container that the puts have ended; table is now r/o.
`void`	`setSerde(MapJoinObjectSerDeContext keyCtx, MapJoinObjectSerDeContext valCtx)`
`void`	`setSpill(boolean isSpilled)`
`void`	`setTotalInMemRowCount(int totalInMemRowCount)`
`int`	`size()` Return the size of the hash table
`long`	`spillPartition(int partitionId)` Move the hashtable of a specified partition from memory into local file system

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - HybridHashTableContainer
```
public HybridHashTableContainer(org.apache.hadoop.conf.Configuration hconf,
                                long keyCount,
                                long memoryAvailable,
                                long estimatedTableSize,
                                HybridHashTableConf nwayConf)
                         throws SerDeException,
                                IOException
```
    Throws:
    
    SerDeException
    
    IOException
- Method Detail
  - getWriteHelper
```
public MapJoinBytesTableContainer.KeyValueHelper getWriteHelper()
```
  - getHashPartitions
```
public HybridHashTableContainer.HashPartition[] getHashPartitions()
```
  - getMemoryThreshold
```
public long getMemoryThreshold()
```
  - getInternalValueOi
```
public LazyBinaryStructObjectInspector getInternalValueOi()
```
  - getSortableSortOrders
```
public boolean[] getSortableSortOrders()
```
  - getNullMarkers
```
public byte[] getNullMarkers()
```
  - getNotNullMarkers
```
public byte[] getNotNullMarkers()
```
  - putRow
```
public MapJoinKey putRow(org.apache.hadoop.io.Writable currentKey,
                         org.apache.hadoop.io.Writable currentValue)
                  throws SerDeException,
                         HiveException,
                         IOException
```
    Description copied from interface: MapJoinTableContainer
    
    Adds row from input to the table.
    
    Specified by:
    
    putRow in interface MapJoinTableContainer
    
    Throws:
    
    SerDeException
    
    HiveException
    
    IOException
  - isOnDisk
```
public boolean isOnDisk(int partitionId)
```
    Check if the hash table of a specified partition is on disk (or "spilled" on creation)
    
    Parameters:
    
    partitionId - partition number
    
    Returns:
    
    true if on disk, false if in memory
  - isHashMapSpilledOnCreation
```
public boolean isHashMapSpilledOnCreation(int partitionId)
```
    Check if the hash table of a specified partition has been "spilled" to disk when it was created. In fact, in other words, check if a hashmap does exist or not.
    
    Parameters:
    
    partitionId - hashMap ID
    
    Returns:
    
    true if it was not created at all, false if there is a hash table existing there
  - spillPartition
```
public long spillPartition(int partitionId)
                    throws IOException
```
    Move the hashtable of a specified partition from memory into local file system
    
    Parameters:
    
    partitionId - the hashtable to be moved
    
    Returns:
    
    amount of memory freed
    
    Throws:
    
    IOException
  - calcNumPartitions
```
public static int calcNumPartitions(long memoryThreshold,
                                    long dataSize,
                                    int minNumParts,
                                    int minWbSize)
                             throws IOException
```
    Calculate how many partitions are needed. For n-way join, we only do this calculation once in the HashTableLoader, for the biggest small table. Other small tables will use the same number. They may need to adjust (usually reduce) their individual write buffer size in order not to exceed memory threshold.
    
    Parameters:
    
    memoryThreshold - memory threshold for the given table
    
    dataSize - total data size for the table
    
    minNumParts - minimum required number of partitions
    
    minWbSize - minimum required write buffer size
    
    Returns:
    
    number of partitions needed
    
    Throws:
    
    IOException
  - getNumPartitions
```
public int getNumPartitions()
```
  - getTotalInMemRowCount
```
public int getTotalInMemRowCount()
```
  - setTotalInMemRowCount
```
public void setTotalInMemRowCount(int totalInMemRowCount)
```
  - getTableRowSize
```
public long getTableRowSize()
```
  - hasSpill
```
public boolean hasSpill()
```
    Description copied from interface: MapJoinTableContainer
    
    Checks if the container has spilled any data onto disk. This is only applicable for HybridHashTableContainer.
    
    Specified by:
    
    hasSpill in interface MapJoinTableContainer
  - setSpill
```
public void setSpill(boolean isSpilled)
```
  - getToSpillPartitionId
```
public int getToSpillPartitionId()
```
    Gets the partition Id into which to spill the big table row
    
    Returns:
    
    partition Id
  - clear
```
public void clear()
```
    Description copied from interface: MapJoinTableContainer
    
    Clears the contents of the table.
    
    Specified by:
    
    clear in interface MapJoinTableContainer
  - getAnyKey
```
public MapJoinKey getAnyKey()
```
    Specified by:
    
    getAnyKey in interface MapJoinTableContainer
  - createGetter
```
public MapJoinTableContainer.ReusableGetAdaptor createGetter(MapJoinKey keyTypeFromLoader)
```
    Description copied from interface: MapJoinTableContainer
    
    Creates reusable get adaptor that can be used to retrieve rows from the table based on either vectorized or non-vectorized input rows to MapJoinOperator.
    
    Specified by:
    
    createGetter in interface MapJoinTableContainer
    
    Parameters:
    
    keyTypeFromLoader - Last key from hash table loader, to determine key type used when loading hashtable (if it can vary).
  - seal
```
public void seal()
```
    Description copied from interface: MapJoinTableContainer
    
    Indicates to the container that the puts have ended; table is now r/o.
    
    Specified by:
    
    seal in interface MapJoinTableContainer
  - put
```
public void put(org.apache.hadoop.io.Writable currentKey,
                org.apache.hadoop.io.Writable currentValue)
         throws SerDeException,
                IOException
```
    Specified by:
    
    put in interface MapJoinTableContainerDirectAccess
    
    Throws:
    
    SerDeException
    
    IOException
  - dumpMetrics
```
public void dumpMetrics()
```
    Specified by:
    
    dumpMetrics in interface MapJoinTableContainer
  - dumpStats
```
public void dumpStats()
```
  - size
```
public int size()
```
    Description copied from interface: MapJoinTableContainer
    
    Return the size of the hash table
    
    Specified by:
    
    size in interface MapJoinTableContainer
  - setSerde
```
public void setSerde(MapJoinObjectSerDeContext keyCtx,
                     MapJoinObjectSerDeContext valCtx)
              throws SerDeException
```
    Specified by:
    
    setSerde in interface MapJoinTableContainer
    
    Throws:
    
    SerDeException

Class HybridHashTableContainer

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainer

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

HybridHashTableContainer

Method Detail

getWriteHelper

getHashPartitions

getMemoryThreshold

getInternalValueOi

getSortableSortOrders

getNullMarkers

getNotNullMarkers

putRow

isOnDisk

isHashMapSpilledOnCreation

spillPartition

calcNumPartitions

getNumPartitions

getTotalInMemRowCount

setTotalInMemRowCount

getTableRowSize

hasSpill

setSpill

getToSpillPartitionId

clear

getAnyKey

createGetter

seal

put

dumpMetrics

dumpStats

size

setSerde