Developers guide This section introduces you to some <> fundamentals in order to quickly get you involved with the library. Two explained code snippets about how to programatically use <> to perform some data <> and <> are provided. * Data Conversion +---------------------------------------------------------------------------------------------- /*1*/ Any23 runner = new Any23(); /*2*/ final String content = "@prefix foo: . " + "@prefix : ." + "foo:bar foo: : . " + ":bar : foo:bar . "; // The second argument of StringDocumentSource() must be a valid URI. /*3*/ DocumentSource source = new StringDocumentSource(content, "http://host.com/service"); /*4*/ ByteArrayOutputStream out = new ByteArrayOutputStream(); /*5*/ TripleHandler handler = new NTriplesWriter(out); /*6*/ runner.extract(source, handler); /*7*/ String n3 = out.toString("UTF-8"); +---------------------------------------------------------------------------------------------- This example aims to demonstrate how to use <> to perform data conversion. In this code we provide some input data expressed as Turtle and convert it in N3 format. At <> we define a new instance of the <> facade, that provides all the methods useful for the transformation. The facade constructor accepts a list of extractor names, if specified the extraction will be done only over this list, otherwise the data will detected and will be applied all the compatible extractors declared within the {{{http://developers.any23.org/apidocs/org/deri/any23/extractor/ExtractorRegistry.html}ExtractorRegistry}}. The <> defines the input string containing some {{{http://www.w3.org/TeamSubmission/turtle/}Turtle}} data. At <> we instantiate a {{{http://developers.any23.org/apidocs/org/deri/any23/source/StringDocumentSource.html}StringDocumentSource}}, specifying a content and a the source . The should be the source of the content data, and must be valid. Besides the {{{http://developers.any23.org/apidocs/org/deri/any23/source/StringDocumentSource.html}StringDocumentSource}}, you can also provide input from other sources, such as requests and local files. See the classes in the sources {{{http://developers.any23.org/apidocs/org/deri/any23/source/package-summary.html}package}}. The <> defines a buffered output stream that will be used to store the data produced by the writer declared at <>. A writer stores the extracted triples in some destination. We use an {{{http://developers.any23.org/apidocs/org/deri/any23/writer/NTriplesWriter.html}NTriplesWriter}} here that writes into a ByteArrayOutputStream. There are writers for a number of formats, and you can also store the triples directly into a Sesame repository to query them with SPARQL; see {{{http://developers.any23.org/apidocs/org/deri/any23/writer/RepositoryWriter.html}RepositoryWriter}} and the writer {{{http://developers.any23.org/apidocs/org/deri/any23/writer/package-summary.html}package}}. The extractor method invoked at <> performs the metadata extraction. This method accepts as first argument a {{{http://developers.any23.org/apidocs/org/deri/any23/source/DocumentSource.html}DocumentSource}} and as second argument a {{{http://developers.any23.org/apidocs/org/deri/any23/writer/TripleHandler.html}TripleHandler}}, that will receive the sequence parsing events generated by the applied extractors. The extract method defines also another signature where it is possible to specify a charset encoding for the input data. If null, the charset will be auto detected. The expected output is encoded at <>: +---------------------------------------------------------------------------------------------- . . +---------------------------------------------------------------------------------------------- * Data Extraction +---------------------------------------------------------------------------------------------- /*1*/ Any23 runner = new Any23(); /*2*/ runner.setHTTPUserAgent("test-user-agent"); /*3*/ HTTPClient httpClient = runner.getHTTPClient(); /*4*/ DocumentSource source = new HTTPDocumentSource( httpClient, "http://www.rentalinrome.com/semanticloft/semanticloft.htm" ); /*5*/ ByteArrayOutputStream out = new ByteArrayOutputStream(); /*6*/ TripleHandler handler = new NTriplesWriter(out); /*7*/ runner.extract(source, handler); /*8*/ String n3 = out.toString("UTF-8"); +---------------------------------------------------------------------------------------------- This second example demonstrates the data extraction, that is the main purpose of <> library. At <> we define the <> facade instance. As described before, the constructor allows to enforce the usage of specific extractors. The <> defines the , used to identify the client during data collection. At <> we use the runner to create an instance of {{{http://developers.any23.org/apidocs/org/deri/any23/http/HTTPClient.html}HTTPClient}}, used by {{{http://developers.any23.org/apidocs/org/deri/any23/source/HTTPDocumentSource.html}HTTPDocumentSource}} for content fetching. The <> instantiates an {{{http://developers.any23.org/apidocs/org/deri/any23/source/HTTPDocumentSource.html}HTTPDocumentSource}} instance, specifying the {{{http://developers.any23.org/apidocs/org/deri/any23/http/HTTPClient.html}HTTPClient}} and the URL addressing the content to be processed. At <> we define a buffered output stream used to store data produced by the {{{http://developers.any23.org/apidocs/org/deri/any23/writer/TripleHandler.html}TripleHandler}} defined at <>. The extraction method at <> will run the metadata extraction. As discussed in the previous example it needs at least a {{{http://developers.any23.org/apidocs/org/deri/any23/writer/TripleHandler.html}TripleHandler}} instance. The expected output is encoded at <> and is: +---------------------------------------------------------------------------------------------- "Semantic Loft (beta) - Trastevere apartments | Rental in Rome - rentalinrome.com" . . . . . _:node14r93a8dex1 . [The complete output is omitted for brevity.] +----------------------------------------------------------------------------------------------