This section describes the Apache Any23 plugins support.
Apache Any23 comes with a set of predefined plugins. Such plugins are located under the any23-root/plugins dir.
A plugin is a standard Maven3 module containing any implementation of
A plugin can be added to the Apache Any23 CLI interface by:
export CLASSPATH_PREFIX=../../../plugins/basic-crawler/target/any23-basic-crawler-VERSION.jar
A plugin can be added to the Apache Any23 library API by using the Any23PluginManager#createInstance(Configuration configuration, File... pluginLocations) method.
TODO: plugin support in Apache Any23 Service
Any implementation of ExtractorPlugin will automatically registered to the ExtractorRegistry.
Any detected implementation of Tool will be listed by the ToolRunner command-line tool in any23-root/bin/any23 .
Apache Any23 takes care to test and package plugins when distributed from its reactor POM. It is aways possible to rebuild a plugin using the command:
<plugin-dir>$ mvn clean assembly:assembly
An Extractor Plugin is a class:
An example of plugin is defined below.
@Author(name="Michele Mostarda (mostarda@fbk.eu)") public class HTMLScraperPlugin implements ExtractorPlugin { private static final Logger logger = LoggerFactory.getLogger(HTMLScraperPlugin.class); @Init public void init() { logger.info("Plugin initialization."); } @Shutdown public void shutdown() { logger.info("Plugin shutdown."); } public ExtractorFactory getExtractorFactory() { return HTMLScraperExtractor.factory; } }
A Tool Plugin is a Java class that:
An example of plugin is defined below.
@MetaInfServices @Parameters(commandNames = { "myexec" }, commandDescription = "Prints out XXX used by Any23.") public class MyExecutableTool implements Tool { @Parameter(names = { "-u", "--urls" }, description = "URLs to process") private List<URL> pairs; public void run() throws Exception; } }
So when executing any23>>, the <<<myexec will be available in the commands list.
The HTMLScraperPlugin is able to scrape plain text content from any HTML page and transform it into statement literals.
This plugin is documented here.
The Office Scraper Plugins allow to extract semantic content from several Microsoft Office document formats.
These plugins are documented here.
The Crawler CLI Tool is an extension of the Rover CLI Tool to add site crawling basic capabilities. More information about the CLI can be found at Getting Started - Crawler Tool section.