public class HybridHashTableContainer extends Object implements MapJoinTableContainer, MapJoinTableContainerDirectAccess
Modifier and Type | Class and Description |
---|---|
static class |
HybridHashTableContainer.HashPartition
This class encapsulates the triplet together since they are closely related to each other
The triplet: hashmap (either in memory or on disk), small table container, big table container
|
MapJoinTableContainer.ReusableGetAdaptor
Constructor and Description |
---|
HybridHashTableContainer(org.apache.hadoop.conf.Configuration hconf,
long keyCount,
long memoryAvailable,
long estimatedTableSize,
HybridHashTableConf nwayConf) |
Modifier and Type | Method and Description |
---|---|
static int |
calcNumPartitions(long memoryThreshold,
long dataSize,
int minNumParts,
int minWbSize)
Calculate how many partitions are needed.
|
void |
clear()
Clears the contents of the table.
|
MapJoinTableContainer.ReusableGetAdaptor |
createGetter(MapJoinKey keyTypeFromLoader)
Creates reusable get adaptor that can be used to retrieve rows from the table
based on either vectorized or non-vectorized input rows to MapJoinOperator.
|
void |
dumpMetrics() |
void |
dumpStats() |
MapJoinKey |
getAnyKey() |
HybridHashTableContainer.HashPartition[] |
getHashPartitions() |
LazyBinaryStructObjectInspector |
getInternalValueOi() |
long |
getMemoryThreshold() |
byte[] |
getNotNullMarkers() |
byte[] |
getNullMarkers() |
int |
getNumPartitions() |
boolean[] |
getSortableSortOrders() |
long |
getTableRowSize() |
int |
getToSpillPartitionId()
Gets the partition Id into which to spill the big table row
|
int |
getTotalInMemRowCount() |
MapJoinBytesTableContainer.KeyValueHelper |
getWriteHelper() |
boolean |
hasSpill()
Checks if the container has spilled any data onto disk.
|
boolean |
isHashMapSpilledOnCreation(int partitionId)
Check if the hash table of a specified partition has been "spilled" to disk when it was created.
|
boolean |
isOnDisk(int partitionId)
Check if the hash table of a specified partition is on disk (or "spilled" on creation)
|
void |
put(org.apache.hadoop.io.Writable currentKey,
org.apache.hadoop.io.Writable currentValue) |
MapJoinKey |
putRow(org.apache.hadoop.io.Writable currentKey,
org.apache.hadoop.io.Writable currentValue)
Adds row from input to the table.
|
void |
seal()
Indicates to the container that the puts have ended; table is now r/o.
|
void |
setSerde(MapJoinObjectSerDeContext keyCtx,
MapJoinObjectSerDeContext valCtx) |
void |
setSpill(boolean isSpilled) |
void |
setTotalInMemRowCount(int totalInMemRowCount) |
int |
size()
Return the size of the hash table
|
long |
spillPartition(int partitionId)
Move the hashtable of a specified partition from memory into local file system
|
public HybridHashTableContainer(org.apache.hadoop.conf.Configuration hconf, long keyCount, long memoryAvailable, long estimatedTableSize, HybridHashTableConf nwayConf) throws SerDeException, IOException
SerDeException
IOException
public MapJoinBytesTableContainer.KeyValueHelper getWriteHelper()
public HybridHashTableContainer.HashPartition[] getHashPartitions()
public long getMemoryThreshold()
public LazyBinaryStructObjectInspector getInternalValueOi()
public boolean[] getSortableSortOrders()
public byte[] getNullMarkers()
public byte[] getNotNullMarkers()
public MapJoinKey putRow(org.apache.hadoop.io.Writable currentKey, org.apache.hadoop.io.Writable currentValue) throws SerDeException, HiveException, IOException
MapJoinTableContainer
putRow
in interface MapJoinTableContainer
SerDeException
HiveException
IOException
public boolean isOnDisk(int partitionId)
partitionId
- partition numberpublic boolean isHashMapSpilledOnCreation(int partitionId)
partitionId
- hashMap IDpublic long spillPartition(int partitionId) throws IOException
partitionId
- the hashtable to be movedIOException
public static int calcNumPartitions(long memoryThreshold, long dataSize, int minNumParts, int minWbSize) throws IOException
memoryThreshold
- memory threshold for the given tabledataSize
- total data size for the tableminNumParts
- minimum required number of partitionsminWbSize
- minimum required write buffer sizeIOException
public int getNumPartitions()
public int getTotalInMemRowCount()
public void setTotalInMemRowCount(int totalInMemRowCount)
public long getTableRowSize()
public boolean hasSpill()
MapJoinTableContainer
hasSpill
in interface MapJoinTableContainer
public void setSpill(boolean isSpilled)
public int getToSpillPartitionId()
public void clear()
MapJoinTableContainer
clear
in interface MapJoinTableContainer
public MapJoinKey getAnyKey()
getAnyKey
in interface MapJoinTableContainer
public MapJoinTableContainer.ReusableGetAdaptor createGetter(MapJoinKey keyTypeFromLoader)
MapJoinTableContainer
createGetter
in interface MapJoinTableContainer
keyTypeFromLoader
- Last key from hash table loader, to determine key type used
when loading hashtable (if it can vary).public void seal()
MapJoinTableContainer
seal
in interface MapJoinTableContainer
public void put(org.apache.hadoop.io.Writable currentKey, org.apache.hadoop.io.Writable currentValue) throws SerDeException, IOException
put
in interface MapJoinTableContainerDirectAccess
SerDeException
IOException
public void dumpMetrics()
dumpMetrics
in interface MapJoinTableContainer
public void dumpStats()
public int size()
MapJoinTableContainer
size
in interface MapJoinTableContainer
public void setSerde(MapJoinObjectSerDeContext keyCtx, MapJoinObjectSerDeContext valCtx) throws SerDeException
setSerde
in interface MapJoinTableContainer
SerDeException
Copyright © 2016 The Apache Software Foundation. All rights reserved.