ArcRecordReader (apache-nutch 1.8 API)

java.lang.Object
- org.apache.nutch.tools.arc.ArcRecordReader

All Implemented Interfaces:

org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
```
public class ArcRecordReader
extends Object
implements org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
```
The ArchRecordReader class provides a record reader which reads records from arc files.

Arc files are essentially tars of gzips. Each record in an arc file is a compressed gzip. Multiple records are concatenated together to form a complete arc. For more information on the arc file format see http://www.archive.org/web/researcher/ArcFileFormat.php .

Arc files are used by the internet archive and grub projects.
see http://www.archive.org/ see http://www.grub.org/

Field Summary

Fields
Modifier and Type	Field and Description
`protected org.apache.hadoop.conf.Configuration`	`conf`
`protected long`	`fileLen`
`protected org.apache.hadoop.fs.FSDataInputStream`	`in`
`static org.slf4j.Logger`	`LOG`
`protected long`	`pos`
`protected long`	`splitEnd`
`protected long`	`splitLen`
`protected long`	`splitStart`

Constructor Summary

Constructors
Constructor and Description
`ArcRecordReader(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.mapred.FileSplit split)` Constructor that sets the configuration and file split.

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`close()` Closes the record reader resources.
`org.apache.hadoop.io.Text`	`createKey()` Creates a new instance of the `Text` object for the key.
`org.apache.hadoop.io.BytesWritable`	`createValue()` Creates a new instance of the `BytesWritable` object for the key
`long`	`getPos()` Returns the current position in the file.
`float`	`getProgress()` Returns the percentage of progress in processing the file.
`static boolean`	`isMagic(byte[] input)` Returns true if the byte array passed matches the gzip header magic number.
`boolean`	`next(org.apache.hadoop.io.Text key, org.apache.hadoop.io.BytesWritable value)` Returns true if the next record in the split is read into the key and value pair.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - LOG
```
public static final org.slf4j.Logger LOG
```
  - conf
```
protected org.apache.hadoop.conf.Configuration conf
```
  - splitStart
```
protected long splitStart
```
  - pos
```
protected long pos
```
  - splitEnd
```
protected long splitEnd
```
  - splitLen
```
protected long splitLen
```
  - fileLen
```
protected long fileLen
```
  - in
```
protected org.apache.hadoop.fs.FSDataInputStream in
```
- Constructor Detail
  - ArcRecordReader
```
public ArcRecordReader(org.apache.hadoop.conf.Configuration conf,
               org.apache.hadoop.mapred.FileSplit split)
                throws IOException
```
    Constructor that sets the configuration and file split.
    
    Parameters:
    conf - The job configuration.
    split - The file split to read from.
    
    Throws:
    
    IOException - If an IO error occurs while initializing file split.
- Method Detail
  - isMagic
```
public static boolean isMagic(byte[] input)
```
    Returns true if the byte array passed matches the gzip header magic number.
    
    Parameters:
    input - The byte array to check.
    
    Returns:
    True if the byte array matches the gzip header magic number.
  - close
```
public void close()
           throws IOException
```
    Closes the record reader resources.
    
    Specified by:
    
    close in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
    
    Throws:
    
    IOException
  - createKey
```
public org.apache.hadoop.io.Text createKey()
```
    Creates a new instance of the Text object for the key.
    
    Specified by:
    
    createKey in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
  - createValue
```
public org.apache.hadoop.io.BytesWritable createValue()
```
    Creates a new instance of the BytesWritable object for the key
    
    Specified by:
    
    createValue in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
  - getPos
```
public long getPos()
            throws IOException
```
    Returns the current position in the file.
    
    Specified by:
    
    getPos in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
    
    Returns:
    The long of the current position in the file.
    
    Throws:
    
    IOException
  - getProgress
```
public float getProgress()
                  throws IOException
```
    Returns the percentage of progress in processing the file. This will be represented as a float from 0 to 1 with 1 being 100% completed.
    
    Specified by:
    
    getProgress in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
    
    Returns:
    The percentage of progress as a float from 0 to 1.
    
    Throws:
    
    IOException
  - next
```
public boolean next(org.apache.hadoop.io.Text key,
           org.apache.hadoop.io.BytesWritable value)
             throws IOException
```
    Returns true if the next record in the split is read into the key and value pair. The key will be the arc record header and the values will be the raw content bytes of the arc record.
    
    Specified by:
    
    next in interface org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.BytesWritable>
    
    Parameters:
    key - The record key
    value - The record value
    
    Returns:
    True if the next record is read.
    
    Throws:
    
    IOException - If an error occurs while reading the record value.

Class ArcRecordReader

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

LOG

conf

splitStart

pos

splitEnd

splitLen

fileLen

in

Constructor Detail

ArcRecordReader

Method Detail

isMagic

close

createKey

createValue

getPos

getProgress

next