public class ExternalSampleSorter extends AbstractSampleConsumer
This SampleConsumer should be used to sort samples base on a
SampleComparator
Samples are sorted with the external sort algorithm. Thus, samples are not all stored in memory to be sorted. Instead, they are sorted by chunk in memory and then written to the disk before being merged at the end.
This sorter makes it possible to sort any number of samples with a fixed amount of memory. Hard disk will be used instead of RAM, at the cost of performance
When parallel mode is enabled and several CPU are available to the
JVM, this sorter uses multiple CPU to reduce sort time.
The parallel mode can be disabled if some sort of concurrency issue is
encountered.
As a last note, this SampleConsumer
can be used as normal class with
the different sort()
methods
It is important to set the chunkSize
property according
to the available memory as the algorithm does not take care of memory
allocation (samples sizes are not predictable)
Meanwhile, it is equally important to set a SampleComparator
to
define sample ordering
Constructor and Description |
---|
ExternalSampleSorter() |
ExternalSampleSorter(SampleComparator comparator) |
Modifier and Type | Method and Description |
---|---|
void |
consume(Sample s,
int channel)
Consumes the specified sample ton the specified channel.
|
boolean |
isParallelize() |
boolean |
isRevertedSort() |
void |
mergeFiles(java.util.List<java.io.File> chunks,
SampleMetadata metadata,
SampleProducer producer) |
void |
setChunkSize(long chunkSize)
Set the number of samples that will be stored in memory.
|
void |
setParallelize(boolean parallelize)
Enabled parallel mode
|
void |
setRevertedSort(boolean revertedSort) |
void |
setSampleComparator(SampleComparator sampleComparator)
Set the sample comparator that will define sample ordering
|
void |
sort(CsvFile inputFile,
java.io.File outputFile,
boolean writeHeader)
Sort an input CSV file to an sorted output CSV file.
|
java.util.List<Sample> |
sort(java.util.List<Sample> samples) |
void |
sort(SampleMetadata sampleMetadata,
java.io.File inputFile,
java.io.File outputFile,
boolean writeHeader)
Sort an input CSV file whose metadata structure is provided.
|
void |
startConsuming()
Start the sample consuming.
|
void |
stopConsuming()
Stops the consuming process.
|
addSampleConsumer, getConsumedChannelCount, getConsumedMetadata, getConsumer, getDataFromContext, getName, getWorkingDirectory, produce, removeSampleConsumer, setChannelAttribute, setConsumedMetadata, setDataToContext, setName, setProducedMetadata, setSampleConsumer, setSampleConsumers, setSampleContext, startProducing, stopProducing
getChannelAttribute, getSampleContext
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getChannelAttribute, getSampleContext
public ExternalSampleSorter()
public ExternalSampleSorter(SampleComparator comparator)
public void setChunkSize(long chunkSize)
chunkSize
- The number of samples sorted in memory before they are written
to disk. 5000 is the minimum and will be used if given
chunkSize is less than 5000public final void setSampleComparator(SampleComparator sampleComparator)
sampleComparator
- comparator to define the orderingpublic void setParallelize(boolean parallelize)
parallelize
- true
to enable, false
to disablepublic boolean isParallelize()
true
when parallel mode is enabled, false
otherwisepublic void sort(CsvFile inputFile, java.io.File outputFile, boolean writeHeader)
The input CSV must have a header otherwise sorting will give unpredictable results
inputFile
- The CSV file to be sorted (must not be null
)outputFile
- The sorted destination CSV file (must not be null
)writeHeader
- Whether the CSV header should be written to the output CSV filepublic void sort(SampleMetadata sampleMetadata, java.io.File inputFile, java.io.File outputFile, boolean writeHeader)
sampleMetadata
- The CSV metadata : header information + separator (must not be null
)inputFile
- The input file to be sorted (must not be null
)outputFile
- The output sorted file (must not be null
)writeHeader
- Whether output CSV header should be written (based on provided
sample metadata)public void startConsuming()
SampleConsumer
public void consume(Sample s, int channel)
SampleConsumer
s
- The sample to be consumedchannel
- The channel on which the sample is consumedpublic void stopConsuming()
SampleConsumer
public void mergeFiles(java.util.List<java.io.File> chunks, SampleMetadata metadata, SampleProducer producer)
public final boolean isRevertedSort()
public final void setRevertedSort(boolean revertedSort)
revertedSort
- flag, whether the order of the sort should be reverted.
false
uses the order of the configured
SampleComparator
Copyright © 1998-2019 Apache Software Foundation. All Rights Reserved.