org.apache.any23.extractor.csv
Class CSVExtractor

java.lang.Object
  extended by org.apache.any23.extractor.csv.CSVExtractor
All Implemented Interfaces:
Extractor<InputStream>, Extractor.ContentExtractor

public class CSVExtractor
extends Object
implements Extractor.ContentExtractor

This extractor produces RDF from a CSV file . It automatically detects fields delimiter. If not able uses the one provided in the Any23 configuration.

Author:
Davide Palmisano ( dpalmisano@gmail.com )
See Also:
CSVReaderBuilder}

Nested Class Summary
 
Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor
 
Field Summary
static ExtractorFactory<CSVExtractor> factory
           
 
Constructor Summary
CSVExtractor()
           
 
Method Summary
 ExtractorDescription getDescription()
          Returns a ExtractorDescription of this extractor.
 void run(ExtractionParameters extractionParameters, ExtractionContext extractionContext, InputStream in, ExtractionResult out)
          Executes the extractor.
 void setStopAtFirstError(boolean f)
          If true, the extractor will stop at first parsing error, iffalse the extractor will attempt to ignore all parsing errors.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

factory

public static final ExtractorFactory<CSVExtractor> factory
Constructor Detail

CSVExtractor

public CSVExtractor()
Method Detail

setStopAtFirstError

public void setStopAtFirstError(boolean f)
If true, the extractor will stop at first parsing error, iffalse the extractor will attempt to ignore all parsing errors.

Specified by:
setStopAtFirstError in interface Extractor.ContentExtractor
Parameters:
f - tolerance flag.

run

public void run(ExtractionParameters extractionParameters,
                ExtractionContext extractionContext,
                InputStream in,
                ExtractionResult out)
         throws IOException,
                ExtractionException
Executes the extractor. Will be invoked only once, extractors are not reusable.

Specified by:
run in interface Extractor<InputStream>
Parameters:
extractionParameters - the parameters to be applied during the extraction.
extractionContext - The document context.
in - The extractor input data.
out - the collector for the extracted data.
Throws:
IOException - On error while reading from the input stream.
ExtractionException - On other error, such as parse errors.

getDescription

public ExtractorDescription getDescription()
Returns a ExtractorDescription of this extractor.

Specified by:
getDescription in interface Extractor<InputStream>
Returns:
the object representing the extractor description.


Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.