------ Apache Any23 - CSV Extractor Algorithm ------ The Apache Software Foundation ------ 2011-2012 ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. CSV Extractor Algorithm The {{{./xref/org/apache/any23/extractor/csv/CSVExtractor.html}CSV Extractor}} produces an RDF representation of a CSV file compliant with the {{{http://www.ietf.org/rfc/rfc4180.txt}RFC 4180}} and that foresees an header. Such extractor relies on the presence of an header to use the named fields as RDF properties. Field delimiter could be automatically guessed or specified via {{{./configuration.html}Apache Any23 Configuration}}. Given a document with URL , <> uses the following algorithm to extract RDF: * It tries to guess the fields delimiter and to detect the header * for each field : * if is a valid URI keep it as an URI since could be derefenceable. * if is not a valid URI, the associated RDF Property URI will be in the form of: concatenated * add label statement: rdfs:label * add column index statement: \ * for each : * add RDFS type statement: \\> rdfs:type \, where is the column index number. * for each value: * write statement, \\> where: could be an URI if the cell value is an URI, or a typed literal according the value of the CSV actual value . * add RDF statements claiming number of rows and columns. For example, given this trivial CSV with an header and just two rows: +--------------------------------------------------------------- first name; last name; http://xmlns.org/foaf/01/knows; age Davide; Palmisano; http://michelemostarda.com; 30; value should not appear Michele; Mostarda; http://g1o.net; +--------------------------------------------------------------- the following RDF (serialized in RDF/XML) is produced: +--------------------------------------------------------------- first name 0 last name 1 2 age 3 Davide Palmisano 30 0 Michele Mostarda 1 2 4 +---------------------------------------------------------------