public class HybridHashTableContainer extends Object implements MapJoinTableContainer, MapJoinTableContainerDirectAccess
Modifier and Type | Class and Description |
---|---|
static class |
HybridHashTableContainer.HashPartition
This class encapsulates the triplet together since they are closely related to each other
The triplet: hashmap (either in memory or on disk), small table container, big table container
|
MapJoinTableContainer.ReusableGetAdaptor
Constructor and Description |
---|
HybridHashTableContainer(org.apache.hadoop.conf.Configuration hconf,
long keyCount,
long memoryAvailable,
long estimatedTableSize,
HybridHashTableConf nwayConf) |
Modifier and Type | Method and Description |
---|---|
static int |
calcNumPartitions(long memoryThreshold,
long dataSize,
int minNumParts,
int minWbSize,
HybridHashTableConf nwayConf)
Calculate how many partitions are needed.
|
void |
clear()
Clears the contents of the table.
|
MapJoinTableContainer.ReusableGetAdaptor |
createGetter(MapJoinKey keyTypeFromLoader)
Creates reusable get adaptor that can be used to retrieve rows from the table
based on either vectorized or non-vectorized input rows to MapJoinOperator.
|
void |
dumpMetrics() |
void |
dumpStats() |
MapJoinKey |
getAnyKey() |
HybridHashTableContainer.HashPartition[] |
getHashPartitions() |
LazyBinaryStructObjectInspector |
getInternalValueOi() |
long |
getMemoryThreshold() |
int |
getNumPartitions() |
boolean[] |
getSortableSortOrders() |
long |
getTableRowSize() |
int |
getToSpillPartitionId()
Gets the partition Id into which to spill the big table row
|
int |
getTotalInMemRowCount() |
MapJoinBytesTableContainer.KeyValueHelper |
getWriteHelper() |
boolean |
hasSpill()
Checks if the container has spilled any data onto disk.
|
boolean |
isHashMapSpilledOnCreation(int partitionId)
Check if the hash table of a specified partition has been "spilled" to disk when it was created.
|
boolean |
isOnDisk(int partitionId)
Check if the hash table of a specified partition is on disk (or "spilled" on creation)
|
void |
put(org.apache.hadoop.io.Writable currentKey,
org.apache.hadoop.io.Writable currentValue) |
MapJoinKey |
putRow(MapJoinObjectSerDeContext keyContext,
org.apache.hadoop.io.Writable currentKey,
MapJoinObjectSerDeContext valueContext,
org.apache.hadoop.io.Writable currentValue)
Adds row from input to the table.
|
long |
refreshMemoryUsed()
Get the current memory usage by recalculating it.
|
void |
seal()
Indicates to the container that the puts have ended; table is now r/o.
|
void |
setSpill(boolean isSpilled) |
void |
setTotalInMemRowCount(int totalInMemRowCount) |
long |
spillPartition(int partitionId)
Move the hashtable of a specified partition from memory into local file system
|
public HybridHashTableContainer(org.apache.hadoop.conf.Configuration hconf, long keyCount, long memoryAvailable, long estimatedTableSize, HybridHashTableConf nwayConf) throws SerDeException, IOException
SerDeException
IOException
public MapJoinBytesTableContainer.KeyValueHelper getWriteHelper()
public HybridHashTableContainer.HashPartition[] getHashPartitions()
public long getMemoryThreshold()
public long refreshMemoryUsed()
public LazyBinaryStructObjectInspector getInternalValueOi()
public boolean[] getSortableSortOrders()
public MapJoinKey putRow(MapJoinObjectSerDeContext keyContext, org.apache.hadoop.io.Writable currentKey, MapJoinObjectSerDeContext valueContext, org.apache.hadoop.io.Writable currentValue) throws SerDeException, HiveException, IOException
MapJoinTableContainer
putRow
in interface MapJoinTableContainer
SerDeException
HiveException
IOException
public boolean isOnDisk(int partitionId)
partitionId
- partition numberpublic boolean isHashMapSpilledOnCreation(int partitionId)
partitionId
- hashMap IDpublic long spillPartition(int partitionId) throws IOException
partitionId
- the hashtable to be movedIOException
public static int calcNumPartitions(long memoryThreshold, long dataSize, int minNumParts, int minWbSize, HybridHashTableConf nwayConf) throws IOException
memoryThreshold
- memory threshold for the given tabledataSize
- total data size for the tableminNumParts
- minimum required number of partitionsminWbSize
- minimum required write buffer sizenwayConf
- the n-way join configurationIOException
public int getNumPartitions()
public int getTotalInMemRowCount()
public void setTotalInMemRowCount(int totalInMemRowCount)
public long getTableRowSize()
public boolean hasSpill()
MapJoinTableContainer
hasSpill
in interface MapJoinTableContainer
public void setSpill(boolean isSpilled)
public int getToSpillPartitionId()
public void clear()
MapJoinTableContainer
clear
in interface MapJoinTableContainer
public MapJoinKey getAnyKey()
getAnyKey
in interface MapJoinTableContainer
public MapJoinTableContainer.ReusableGetAdaptor createGetter(MapJoinKey keyTypeFromLoader)
MapJoinTableContainer
createGetter
in interface MapJoinTableContainer
keyTypeFromLoader
- Last key from hash table loader, to determine key type used
when loading hashtable (if it can vary).public void seal()
MapJoinTableContainer
seal
in interface MapJoinTableContainer
public void put(org.apache.hadoop.io.Writable currentKey, org.apache.hadoop.io.Writable currentValue) throws SerDeException, IOException
put
in interface MapJoinTableContainerDirectAccess
SerDeException
IOException
public void dumpMetrics()
dumpMetrics
in interface MapJoinTableContainer
public void dumpStats()
Copyright © 2017 The Apache Software Foundation. All rights reserved.