Apache Any23 REST Service

Apache Any23 provides REST Service module any23-service able to provide useful processing methods.

Compact API

HTTP GET requests can be made to IRIs of the shape:

http://<any23-service-host>/<output-format>/<input-uri>

Where input-uri is the input HTTP resource to be processed and output-format is the desired output format for the extracted RDF data.

Example requests:

http://any23.org/best/twitter.com/cygri
http://any23.org/rdfxml/http://data.gov
http://any23.org/ttl/http://www.w3.org/People/Berners-Lee/card
http://any23.org/?uri=http://dbpedia.org/resource/Berlin
http://any23.org/?format=nt&uri=http://dbpedia.org/resource/Berlin

Supported input and output formats are described here.

Form-style GET API

HTTP GET requests can be made to the IRI http://any23.org/ with the following query parameters:

uri         IRI of an input document
format      Desired output format; defaults to best

Direct POST API

HTTP POSTing a document body to http://any23.org/format will convert the document to the specified output format. The media type of the input has to be specified in the Content-Type HTTP header. Depending on the servlet container, a Content-Length header specifying the length of the input document in bytes might also be required. Typical media types for supported input formats are:

Input format        Media type
-------------------------------
HTML        text/html
RDF/XML     application/rdf+xml
Turtle      text/turtle
N-Triples   text/plain
N-Quads     text/plain

Example POST request:

POST /rdfxml HTTP/1.0
Host: any23.org
Content-Type: text/turtle
Content-Length: 174

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

[] a foaf:Person;
    foaf:name "John X. Foobar";
    foaf:mbox_sha1sum "cef817456278b70cee8e5a1611539ef9d928810e";
    .

Form-style POST API

A document body can also be converted by HTTP POSTing form data to http://any23.org/. The Content-Type HTTP header must be set to application/x-www-form-urlencoded. The following parameters are supported:

type Media type of the input, see the table above. If not present, auto-detection will be attempted.
body Document body to be converted.
format Desired output format; defaults to best.
validation The validation level to be applied, supported values: none (default), validate and validate-fix.

Output Formats

Supported input and output formats are described here.

Error reporting

Processing errors are indicated via HTTP status codes and brief text/plain error messages. The following status codes can be returned:

Code                        Reason
-------------------------------------------------------------------------------------------------------------
200 OK                      Success.
400 Bad                     Request     Missing or malformed input parameter.
404 Not                     Found       Malformed request IRI.
406 Not                     Acceptable  None of the media types specified in the Accept header are supported.
415 Unsupported Media Type      Document body with unsupported media type was POSTed.
501 Not Implemented             Extraction from input was successful, but yielded zero triples.
502 Bad Gateway             Input document from a remote server could not be fetched or parsed.

XML Report Format

report-format

The Apache Any23 Service can optionally return an XML report and attempt error fix if the flags fix and report are activated ( fix=on&report=on ). The following URL shows how to use these flags.

http://any23.org/any23-service/any23/?format=best&uri=http%3A%2F%2Fpath%2Fto%2Fresource&validation=none&report=on

The fix functionality is described here.

A report format example is listed below. In particular ath path response/extractors/extractor it is possible to find the list of all extractors activated during the page processing. The section response/report/message contains an eventual error message while the response/report/error section the error stack trace if available.

The result of validation is contained within the response/report/validationReport node. Within that node there is the list of the activated rules, the issues detected and the errors generated.

<?xml version="1.0" encoding="UTF-8" ?>
<response>
  <!-- List of activated extractors. -->
  <extractors>
    <extractor><!-- Extractor name. --></extractor>
    <!-- ... -->
  </extractors>
  <report>
    <message></message>
    <error></error>
    <!-- Validation specific report, contains all errors and issues detected within the document. -->
    <validationReport>
      <!-- List of errors found while validating the document. -->
      <errors>
      </errors>
      <!-- List of issues found while validating the document. -->
      <issues>
      </issues>
      <!-- List of rules activated to solve the detected issues. -->
      <ruleActivations>
      </ruleActivations>
    </validationReport>
  </report>
  <data>
  <![CDATA[
  -- Actual Data in the format specified as output. --
  ]]>
  </data>
</response>