Supported Formats in Apache Any23

Apache Any23 supports all the main standard formats introduced by the Semantic Web community.

Input Formats

The following list shows the accepted input formats and for each one the support level.

  • (X)HTML with RDFa 1.0, RDFa 1.1, Microdata and Microformats. Apache Any23 fully supports the (X)HTML5 input format and in particular provides a set of extractors for processing embedded RDFa 1.0, RDFa 1.1, Microformats and Microdata.
  • Turtle Apache Any23 fully supports the Turtle specification.
  • N-Triples Apache Any23 fully supports the N-Triples specification.
  • N-Quads Apache Any23 fully supports the N-Quads specification.
  • RDF/XML Apache Any23 fully supports the RDF/XML specification.
  • CSV Apache Any23 allows you to represent header-provided CSV files with RDF using a specific algorithm.

Output Formats

The supported output formats are enlisted below.

  • Turtle Apache Any23 is able to produce output in Turtle.
  • N-Triples Apache Any23 is able to produce output in N-Triples.
  • N-Quads Apache Any23 is able to produce output in N-Quads.
  • RDF/XML Apache Any23 is able to produce output in RDF/XML.
  • JSON Statements Apache Any23 is able to produce output in JSON . See the specific format.
  • XML Report Apache Any23 is able to produce a detailed report of the latest document extraction if required. See further details here.

JSON Statements Format

json-statements

Apache Any23 is able to produce JSON output following the format described below.

Given the following example statements (expressed in N-Quads format):

_:bn1          <http://pred/1> <http://value/1>         <http://graph/1> .
<http://sub/2> <http://pred/2> "language literal"@en    <http://graph/2> .
<http://sub/3> <http://pred/3> "123"^^<http://datatype> <http://graph/3> .

these will be represented as:

{
    "quads" : [
        [
            {
                "type" : "bnode",
                "value" : "bn1"
            },
            "http://pred/1",
            {
                "type" : "uri",
                "value" : "http://value/1"
            },
            "http://graph/1"
        ],
        [
            {
                "type" : "uri",
                "value" : "http://sub/2"
            },
            "http://pred/2",
            {
                "type" : "literal",
                "value" : "language literal",
                "lang" : "en",
                "datatype" : null
            },
            "http://graph/2"
        ],
        [
            {
                "type" : "uri",
                "value" : "http://sub/3"
            },
            "http://pred/3",
            {
                "type" : "literal",
                "value" : "123",
                "lang" : null,
                "datatype" : "http://datatype"
            },
            "http://graph/3"
        ]
    ]
}

The JSON object structure is described by the following BNF rules, where quotes are omitted to improve readability:

<json-response> ::= { "quads" : <statements> }
<statements>    ::= [ <statement>+ ]
<statement>     ::= [ <subject> , <predicate> , <object> , <graph> ]
<subject>       ::= { "type" : <subject-type> , "value" : <value> }
<predicate>     ::= <uri>
<object>        ::= { "type" : <object-type> , "value" : <value> , "lang" : <lang> , "datatype" : <datatype> }
<graph>         ::= <uri> | null
<subject-type>  ::= "uri" | "bnode"
<object-type>   ::= "uri" | "bnode"| "literal"
<value>         ::= String
<lang>          ::= String | null
<datatype>      ::= <uri>  | null
<uri>           ::= String