Getting started with <> <> can be used: * as a commandline tool from your preferred shell environment; * as a RESTful Webservice; * as a library. * <> Modules <> is composed of the following modules: * <<>> The core library. * <<>> The REST service. * <<>> The core additional plugins. * Use <> as a commandline Tool The commandline tools are provided by the <> module. Once <> has been correctly {{{./install.html}installed}}, if you want to use it as a commandline tool, use the shell scripts within the <<"any23-core/bin">> directory. These are provided both for Windows and Unix (Linux/OSX). The main scripts are <<"any23">> and <<"any23tools">>. The <> command allows to extract metadata content from local and remote resources. The <> provides testing, debugging and analysis utilities. Simply running ./any23 without options will show the default configuration properties and the usage options. The resource (URL or local file) is the only mandatory argument. It is possible also to specify input format, output format and other advanced options. <> tool +------------------------------------------- any23-core/bin$ ./any23 Mar 26, 2011 11:47:20 PM org.deri.any23.Configuration INFO: ======================= Configuration Properties ======================= any23.http.client.max.connections=5 any23.plugin.dirs=./plugins any23.http.user.agent.name=Any23-CLI any23.core.version=0.4.2-SNAPSHOT any23.http.client.timeout=10000 any23.rdfa.extractor.xslt=rdfa.xslt ======================================================================== usage: any23 [-e ] [-f ] [-l ] [-n] [-o ] [-p] [-s] [-t] [-v] {|} -e comma-separated list of extractors, e.g. rdf-xml,rdf-turtle -f,--format Output format [turtle (default), ntriples, rdfxml, quad, uris] -l,--log logging, please specify a file -n,--nesting disable production of nesting triples -o,--output ouput file (defaults to stdout) -p,--pedantic validates and fixes HTML content detecting commons issues -s,--stats print out statistics of Any23 -t,--notrivial filter trivial statements -v,--verbose show progress and debug information +------------------------------------------- Extract metadata from an HTML page: +----------------------------------------- any23-core/bin$ ./any23 http://yourdomain/yourfile +----------------------------------------- Extract meta information from a local resource: +-------------------------------------- any23-core/bin$ ./any23 /home/user/myFoaf.rdf +-------------------------------------- Specify the output format, use the option <<"-f">> or <<"--format">>: TURTLE - default configuration, no need for specific flag +-------------------------------------- any23-core/bin$ ./any23 foaf.rdf +-------------------------------------- N3 - <<-f ntriples>> +-------------------------------------- any23-core/bin$ ./any23 -f ntriples foaf.rdf +-------------------------------------- Quad - <<-f quad>> (please see further information about the {{{http://sw.deri.org/2008/07/n-quads/}"quad"}} format) +-------------------------------------- any23-core/bin$ ./any23 -f quad foaf.rdf +-------------------------------------- Filtering trivial statements By default, <> will extract meta information, such as links to or meta information like the author or the software used to create the . Hence, if the user is only interested in the structured content from the tag we offer a filter functionality, activated by the <<"-t">> command line argument. +------------------------- any23-core/bin$ ./any23 -t foaf.rdf +------------------------- * <> script This script detects a list of available utilities within the <> classpath and allows to activate them. Such utilities are: * <<>>: commandline utility for processing Any23 generated output logs. * <<>>: a utility for obtaining useful information about extractors. * <<>>: commandline parser to extract specific Microdata content from a web page (local or remote) and produce a JSON output compliant with the Microdata specification ({{{http://www.w3.org/TR/microdata/}http://www.w3.org/TR/microdata/}}). * <<>>: an alias for the <> commandline tool. * <<>>: prints out useful information about the library version and configuration. * <<>>: allows to dump all the <> vocabularies declared within Any23. +------------------------------------------- any23-core/bin$ ./any23tools [...configuration data...] Usage: ToolRunner [options...] where one of: Eval Utility for processing output log. ExtractorDocumentation Utility for obtaining documentation about metadata extractors. MicrodataParser Commandline Tool for extracting Microdata from file/HTTP source. PluginVerifier Utility for plugin management verification. Rover Any23 Command Line Tool. Version Prints out the current library version and configuration information. VocabPrinter Prints out the RDF Schema of the vocabularies used by Any23. +------------------------------------------- [TODO: add other tools documentation] ** ExtractorDocumentation *** Obtain the <> sub commands +------------------------------------------- any23-core/bin$ ./any23tools ExtractorDocumentation [...configuration data...] Usage: ExtractorDocumentation -list shows the names of all available extractors ExtractorDocumentation -i extractor-name shows example input for the given extractor ExtractorDocumentation -o extractor-name shows example output for the given extractor ExtractorDocumentation -all shows a report about all available extractors +------------------------------------------- *** Get all declared extractors +-------------------------------------- any23-core/bin$ ./any23tools ExtractorDocumentation -list [...configuration data...] html-head-icbm html-head-rdflinks html-head-title html-mf-adr html-mf-geo html-mf-hcalendar html-mf-hcard html-mf-hlisting html-mf-hresume html-mf-hreview html-mf-license html-mf-species html-mf-xfn html-rdfa html-script-turtle rdf-nq rdf-nt rdf-turtle rdf-xml +-------------------------------------- * Use <> as a RESTful Web Service <> provides a Web Service that can be used to extract from Web documents. <> services can be accessed through a {{{./service.html}RESTful API}}. Running the server The server command line tool is defined within the <> module. Run the <"any23server"> script +-------------------------- any23-service/bin$ ./any23server +-------------------------- from the command line in order to start up the server, then go to {{{http://localhost:8080/}}} to access the web interface. A live demo version of such service is running at {{{http://any23.org/}}}. You can also start the server from Java by running the {{{./xref/org/deri/any23/servlet/Servlet.html}Any23 Servlet}} class. Maven can be used to create a WAR file for deployment into an existing servlet container such as {{{http://tomcat.apache.org/}Apache Tomcat}}. * Use <> as a Library See our {{{./developers.html}Developers guide}} for more details.