public class ArcRecordReader extends Object implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
The ArchRecordReader
class provides a record reader which
reads records from arc files.
Arc files are essentially tars of gzips. Each record in an arc file is
a compressed gzip. Multiple records are concatenated together to form a
complete arc. For more information on the arc file format see
http://www.archive.org/web/researcher/ArcFileFormat.php
.
Arc files are used by the internet archive and grub projects.
seehttp://www.archive.org/
see http://www.grub.org/
Modifier and Type | Field and Description |
---|---|
protected org.apache.hadoop.conf.Configuration |
conf |
protected long |
fileLen |
protected org.apache.hadoop.fs.FSDataInputStream |
in |
static org.slf4j.Logger |
LOG |
protected long |
pos |
protected long |
splitEnd |
protected long |
splitLen |
protected long |
splitStart |
Constructor and Description |
---|
ArcRecordReader(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.mapred.FileSplit split)
Constructor that sets the configuration and file split.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes the record reader resources.
|
org.apache.hadoop.io.Text |
createKey()
Creates a new instance of the
Text object for the key. |
org.apache.hadoop.io.BytesWritable |
createValue()
Creates a new instance of the
BytesWritable object for the key |
long |
getPos()
Returns the current position in the file.
|
float |
getProgress()
Returns the percentage of progress in processing the file.
|
static boolean |
isMagic(byte[] input)
Returns true if the byte array passed matches the gzip header magic
number.
|
boolean |
next(org.apache.hadoop.io.Text key,
org.apache.hadoop.io.BytesWritable value)
Returns true if the next record in the split is read into the key and
value pair.
|
public static final org.slf4j.Logger LOG
protected org.apache.hadoop.conf.Configuration conf
protected long splitStart
protected long pos
protected long splitEnd
protected long splitLen
protected long fileLen
protected org.apache.hadoop.fs.FSDataInputStream in
public ArcRecordReader(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.mapred.FileSplit split) throws IOException
conf
- The job configuration.split
- The file split to read from.IOException
- If an IO error occurs while initializing file split.public static boolean isMagic(byte[] input)
Returns true if the byte array passed matches the gzip header magic number.
input
- The byte array to check.public void close() throws IOException
close
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
IOException
public org.apache.hadoop.io.Text createKey()
Text
object for the key.createKey
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
public org.apache.hadoop.io.BytesWritable createValue()
BytesWritable
object for the keycreateValue
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
public long getPos() throws IOException
getPos
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
IOException
public float getProgress() throws IOException
getProgress
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
IOException
public boolean next(org.apache.hadoop.io.Text key, org.apache.hadoop.io.BytesWritable value) throws IOException
Returns true if the next record in the split is read into the key and value pair. The key will be the arc record header and the values will be the raw content bytes of the arc record.
next
in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
key
- The record keyvalue
- The record valueIOException
- If an error occurs while reading the record value.Copyright © 2014 The Apache Software Foundation