- All Implemented Interfaces:
- org.apache.hadoop.mapred.split.SplitLocationProvider
public class HostAffinitySplitLocationProvider
extends Object
implements org.apache.hadoop.mapred.split.SplitLocationProvider
This maps a split (path + offset) to an index based on the number of locations provided.
If locations do not change across jobs, the intention is to map the same split to the same node.
A big problem is when nodes change (added, removed, temporarily removed and re-added) etc. That changes
the number of locations / position of locations - and will cause the cache to be almost completely invalidated.
TODO: Support for consistent hashing when combining the split location generator and the ServiceRegistry.