Configuration

Configure the Core Module

The core module contains the main library code and the command-line implementation.

The main library configuration parameters are managed by the Configuration class. The default values are declared within the default-configuration.properties file. The following sections explain how to override the default configuration.

Override Default Configuration from Command-line

The default configuration can be overriden via command-line by passing to the java command system properties with the same name of the ones declared in configuration.

For example to override the HTTP Max Client Connections parameter it is sufficient to add the following option to the java command-line invocation:

-Dany23.http.client.max.connections=10

any23, any23tools and any23server scripts accept the variable ANY23_OPTS to specify custom options. It is possible to customize the HTTP Max Client Connections for the any23 script simply using:

any23-core/bin/$ ANY23_OPTS="-Dany23.http.client.max.connections=10" any23 http://path/to/resource

Override Default Configuration Programmatically

The Configuration properties can be accessed in read-only mode just retrieving the configuration singleton instance.
Such instance is immutable:

final Configuration immutableConf = DefaultConfiguration.singleton();
final String propertyValue = immutableConf.getProperty("propertyName", "default value");
...

To obtain a modifiable Configuration instead it is possible to use the copy() method.
One of the Apache Any23 constructors accepts a Configuration object that allows to customize the behavior of the Apache Any23 instance for its entire life-cycle.

final ModifiableConfiguration modifiableConf = DefaultConfiguration.copy();
final String oldPropertyValue = modifiableConf.setProperty("propertyName", "new property value");
final Apache Any23 any23 = new Apache Any23(modifiableConf, "extractor1", ...);
...

Use of ExtractionParameters

It is possible to customize the behavior of a single data extraction by providing an ExtractionParameters instance to one the Apache Any23#extract() methods accepting it. ExtractionParameters allows to customize any property and flag other then the specific extraction options.
If no custom parameters are specified the default configuration values are used.

final Apache Any23 any23 = ...
final TripleHandler tripleHandler = ...
final ExtractionParameters extractionParameters = ExtractionParameters.getDefault();
extractionParameters.setFlag("any23.microdata.strict", true);
any23.extract(extractionParameters, "http://path/to/doc", tripleHandler);

Apache Any23 Core Module Default Configuration

Property NameDefault Property ValueDescription
any23.core.versioncurrent any23 core versionString declaring the Apache Any23 Core module version.
any23.http.user.agent.defaultApache Any23-CLIUser Agent Name used for HTTP requests.
any23.http.client.timeout10000 (10 secs)Timeout in milliseconds for a HTTP request.
any23.http.client.max.connections5Max number of concurrent HTTP connections allowed by the internal Apache Any23 HTTP client.
any23.rdfa.extractor.xsltrdfa.xsltXSLT Stylesheet to be used to perform HTML to RDF extraction of RDFa.
any23.extraction.metadata.timesizeoff (possible values: on/off)Activates/deactivates the generation of time and size metadata triples.
any23.extraction.metadata.nestingon (possible values: on/off)Activates/deactivates the generation of nesting triples for Microformat entities.
any23.extraction.metadata.domain.per.entityon (possible values: on/off)Activates/deactivates the generation of domain triple per entity.
any23.extraction.rdfa.programmaticon (possible values: on/off)Switches between the programmatic RDFa 1.1 Extractor and the RDFa 1.0 XSLT base one.
any23.extraction.context.uri?(means current document URI)Default value for extraction content URI.
any23.plugin.dirs./pluginsDirectory containing Apache Any23 plugins.
any23.microdata.stricton (possible values: on/off)Activates/deactivates the microdata strict validation.
any23.microdata.ns.defaulthttp://rdf.data-vocabulary.org/Microdata default namespace.
any23.extraction.head.metaon (possible values: on/off)Activates/deactivates the HTMLMetaExtractor.
any23.extraction.csv.field,CSVExtractor field separator.
any23.extraction.csv.comment#CSVExtractor line comment marker.