Title: Using Apache Stanbol for enhancing textual content For enhancing content you simply post plain text content to the Enhancement Engines and you will get back enhancement data. The enhancement process is stateless, so neither your content item, nor the enhancements will be stored. You can test this via the [web interface of the engines][stan-engines] or from console via curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \ --data "The Stanbol enhancer can detect famous cities such as Paris \ and people such as Bob Marley." http://localhost:8080/engines or by using the text examples delivered with Stanbol. for file in enhancer/data/text-examples/*.txt; do curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" -T $file http://localhost:8080/engines; done Content items in formats other than plain text can be tested via the [web interface of contenthub][stan-contenthub] or via the console by attaching files. (The Metaxa Engine needs to be activated). ## Using the enhancement engines Apache Stanbol starts with a number of active enhancement engines by default. You can activate or deactivate engines as well as configure them to your needs via the [OSGI administration console][stan-admin]. For the enhancement engines, a workflow for the enhancement process is defined as pre-processing, content-extraction, extraction-enhancement, default and post-processing. The following pre-processing engines are available: - The __Language Identification Engine__ detects several European languages of the content items you want to process. - The __Metaxa Engine__ extracts embedded metadata and textual content from a large variety of document types and formats. For content extraction / natural language processing one engine is available: - The __Named Entity Extraction Enhancement Engine__ leverages the sentence detector and name finder tools of the OpenNLP project bundled with statistical models trained to detect occurrences of names of persons, places and organizations. The extracted items will then be enhanced by a dedicated engine: - The __Named Entity Tagging Engine__ provides according suggestions from dbpedia (default) and other references sites for entities extracted by the NER engine . Specific additional enhancement engines are: - The __Location Enhancement Engine__ takes its suggestions from geonames.org only. - The __OpenCalais Enhancement Engine__ uses services from Open Calais. (Note: You need to provide a key in order to use this engine) - The __Zemanta Enhancement Engine__ uses the Zemanta services. (Note: You need to provide a key in order to use this engine) For post-processing the results of the enhancement engines - The __CachingDereferencerEngine__ is used for the Web UI and fetches files such as images for locations from external sites and is used to present the enhancement results. ## Using an index of linked open data locally To use the pre-configured indexes you can download them from [here][stan-download]. You will get two files for each index: * org.apache.stanbol.data.site.{name}-{version}.jar * {name}.solrindex.zip By copying the zip archive into the "/sling/datafiles" folder before installing the bundle, the data will used during the installation of the bundle automatically. If you provide the file after installing the bundle, you will need to restart the SolrYard installed by the bundle. The jar can be installed at any OSGI environment running the Apache Stanbol Entityhub. When started it will create and configure: - a "ReferencedSite" accessible at "http://{host}/{root}/entityhub/site/{name}" - a "Cache" used to connect the ReferencedSite with your Data and - a "SolrYard" that manages the data indexed by this utility. This bundle does not contain the indexed data but only the configuration for the Solr Index. If one has not copied the archive beforehand, the ZIP archive will be requested by the Apache Stanbol Data File Provider after installing the Bundle. To install the data you need copy this file to the "/sling/datafiles" folder within the working directory of your Stanbol Server. _Note: {name} denotes to the value you configured for the "name" property within the "indexing.properties" file._ ## Enhancement Example The text "The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley." with the default configuration of enhancement engines and with a local index of dbpedia entities will result in the following output graph of several __Entity Annotations__ and __Text Annotations__. Two of the relevant fragments for "Paris" are listed below in Turtle-Syntax: ### Example for Text Annotation a , ; "0.9322403510215739"^^ ; "59"^^ ; ; "Paris"^^ ; "The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley." ^^ ; "54"^^ ; "2012-02-29T11:18:36.282Z"^^ ; "org.apache.stanbol.enhancer.engines.opennlp.impl.NEREngineCore" ^^ ; . ### Example for Entity Annotation a , ; "1323049.5"^^ ; "Paris"@en ; ; , , , , ; ; "2012-02-29T11:18:36.320Z" ^^ ; "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine" ^^ ; .