Apache Any23 Plugins

Introduction

This section describes the Apache Any23 plugins support.

Apache Any23 comes with a set of predefined plugins. Such plugins are located under the any23-root/plugins dir.

A plugin is a standard Maven3 module containing any implementation of

How to Register a Plugin

A plugin can be added to the Apache Any23 CLI interface by:

  • adding its JAR to the Apache Any23 JVM classpath;
  • adding its JAR to the CLASSPATH_PREFIX environment variable as:
    export CLASSPATH_PREFIX=../../../plugins/basic-crawler/target/any23-basic-crawler-VERSION.jar
  • adding its JAR to the $HOME/.any23/plugins directory.

    A plugin can be added to the Apache Any23 library API by using the Any23PluginManager#createInstance(Configuration configuration, File... pluginLocations) method.

    TODO: plugin support in Apache Any23 Service

    Any implementation of ExtractorPlugin will automatically registered to the ExtractorRegistry.

    Any detected implementation of Tool will be listed by the ToolRunner command-line tool in any23-root/bin/any23 .

How to Build a Plugin

Apache Any23 takes care to test and package plugins when distributed from its reactor POM. It is aways possible to rebuild a plugin using the command:

<plugin-dir>$ mvn clean assembly:assembly

How to Write an Extractor Plugin

An Extractor Plugin is a class:

  • implementing the ExtractorPlugin interface;
  • packaged under org.apache.any23.plugin .

    An example of plugin is defined below.

    @Author(name="Michele Mostarda (mostarda@fbk.eu)")
    public class HTMLScraperPlugin implements ExtractorPlugin {
    
        private static final Logger logger = LoggerFactory.getLogger(HTMLScraperPlugin.class);
    
        @Init
        public void init() {
            logger.info("Plugin initialization.");
        }
    
        @Shutdown
        public void shutdown() {
            logger.info("Plugin shutdown.");
        }
    
        public ExtractorFactory getExtractorFactory() {
            return HTMLScraperExtractor.factory;
        }
    
    }

How to Write a Tool Plugin

A Tool Plugin is a Java class that:

  • implementing the Tool interface;
  • CLI parameters are extracted by annotating the class members with JCommander annotations.
  • have to be found using the ServiceLoader (we usually plug the Kohsuke's generator)

    An example of plugin is defined below.

    @MetaInfServices
    @Parameters(commandNames = { "myexec" }, commandDescription = "Prints out XXX used by Any23.")
    public class MyExecutableTool implements Tool {
    
        @Parameter(names = { "-u", "--urls" }, description = "URLs to process")
        private List<URL> pairs;
    
        public void run() throws Exception;
            
        }
    
    }

So when executing any23>>, the <<<myexec will be available in the commands list.

Available Extractor Plugins

  • HTML Scraper Plugin

    The HTMLScraperPlugin is able to scrape plain text content from any HTML page and transform it into statement literals.

    This plugin is documented here.

  • Office Scraper Plugins

    The Office Scraper Plugins allow to extract semantic content from several Microsoft Office document formats.

    These plugins are documented here.

Available CLI Tool Plugins