The Stanbol enhancer can detect \ famous cities such as Paris and people such as Bob Marley.
" \ "http://localhost:8080/enhancer/chain/language?omitMetadata=true" There is also the possibility to request both the extracted metadata and the plain text version. Please see the Documentation of the RESTful API ([http://localhost:8080/enhacer](http://localhost:8080/enhacer) if Stanbol runs on localhost). NOTE: previous versions of this engine had stored the plain text version by using the "http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent" property directly in the metadata of the ContentItem. This is no longer supported. ### Vocabularies Metaxa uses a set of vocabularies ("ontologies") for structured data representation. #### Aperture Core Ontologies These ontologies belong to the underlying Aperture subsystem, contained in the package :::text org.semanticdesktop.aperture.vocabulary The most important ones with respect to top-level document properties are * NIE (Nepomuk Information Element): :::text http://www.semanticdesktop.org/ontologies/2007/01/19/nie# * NFO (Nepomuk File Object): :::text http://www.semanticdesktop.org/ontologies/2007/01/19/nfo# Documentation of Aperture's core ontologies is provided in Aperture's Javadoc [http://aperture.sourceforge.net/doc/javadoc/1.5.0/index.html](http://aperture.sourceforge.net/doc/javadoc/1.5.0/index.html) for the packages in :::text org.semanticdesktop.aperture.vocabulary. #### HTML Microformat Extractors The following table describes which vocabularies are used for representing microformat data in Metaxa:MF | Vocabulary (Namespace) |
---|---|
geo | wgs84 (http://www.w3.org/2003/01/geo/wgs84_pos#) |
hAtom | atom (http://www.w3.org/2005/Atom#) |
tagging (http://aperture.sourceforge.net/ontologies/tagging#) | |
hCal | ical (http://www.w3.org/2002/12/cal/icaltzd#) |
vcard (http://www.w3.org/2006/vcard/ns#) | |
hCard | vcard (http://www.w3.org/2006/vcard/ns#) |
hReview | review (http://www.purl.org/stuff/rev#) |
wgs84 (http://www.w3.org/2003/01/geo/wgs84_pos#) | |
dc (http://purl.org/dc/elements/1.1/) | |
dcterms (http://purl.org/dc/dcmitype/) | |
foaf (http://xmlns.com/foaf/0.1/) | |
vcard (http://www.w3.org/2006/vcard/ns#) | |
tag (http://www.holygoat.co.uk/owl/redwood/0.1/tags/) | |
rel-license | dc (http://purl.org/dc/elements/1.1/) |
rel-tag | tagging (http://aperture.sourceforge.net/ontologies/tagging#) |
xFolk | nfo (http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#) |
dc (http://purl.org/dc/elements/1.1/) | |
tagging (http://aperture.sourceforge.net/ontologies/tagging#) |
org.apache.stanbol.enhancer.engines.metaxa.extractionregistry
* org.apache.stanbol.enhancer.engines.metaxa.htmlextractors
## Usage
Assuming that the Stanbol endpoint with the full launcher is running at
:::text
http://localhost:8080
and the engine is activated, from the command line commands like this can be used for submitting some file as content item, where the mime type must match the document type:
* stateless interface
:::text
curl -i -X POST -H "Content-Type:text/html" -T testpage.html http://localhost:8080/engines
* stateful interface
:::text
curl -i -X PUT -H "Content-Type:text/html" -T testpage.html http://localhost:8080/contenthub/content/someFileId
Alternatively, the Stanbol web interface can be used for submitting documents
and viewing the metadata at
:::text
http://localhost:8080/contenthub