The Stanbol Enhancement Structure (PROPOSAL)

Please NOTE: This is a proposal for the future version of the Enhancement Structure used by the Stanbol Enhancer. This DOES NOT describe the Enhancement Structure used by the current version of the Stanbol Enhancer!

This describe the schema (ontology) used by the Apache Stanbol Enhancer to express features extracted from parsed content items. The main purpose of this is to standardizes information created by EnhamncementEngines to enable users to easily work with enhancement results, but also to support cooperation between different enhancement engines.

Overview

The Stanbol Enhancement Structure is build around the following main Concepts. Each of this concepts covers a specific aspect related to the enhancement process of content.

The following list gives an overview about the concepts used by the Stanbol Enhancement Strucutre:

Overview about the Stanbol Enhancement Structure

ContentItem: This is the resource representing the parsed content. The URI of this resource depends on how the content was parsed to the Stanbol Enhancer. In case an absolute URI is provided by the request, than this URI is used. In all other cased the Stanbol Enhancer creates an URI based on the configured prefix or the URL of the service. The documentation of the RESTful service should provide more information about that.
sb:Content: Several content model distinguish between Content (data) and the ContentItem (Interpretation of the Data). The Enhancement Structure currently only defines ContentItem, because there is no need to describe the data for the purpose of the enhancement process. Other components (such as the /store endpoint) might need to formally describe the data. For such use cases the sic:content property will be used to refer from the ContentItem to the Content. The URI representing the Content will be the same to be used to retrieve its data via a RESTful service.
sb:Enhancement: This provides metadata about extractions created by EnhancementEngines or present within the content. This includes the creator (usually a EnhancementEngine), the creation time, as well as relations to other enhancements. Users of the Stanbol Enhancer will typically not care about such data because out of the their perspective they represent Meta-Meta-Data (meta data about the metadata). Every feature, suggestion or other piece of information extracted by any EnhancementEngine need to attach the metadata defined for this concept.
sb:Annotation: An annotation describe some piece of knowledge extracted from the parsed content and/or the metadata of the content. Information provided by Annotations include the label, type and the confidence. In addition Annotations need to link at least to a single Occurrence and may have one or more Suggestions. Annotations can also be related/dependent to other Annotations. The EnhancementStructure defines only a small set of different Annotation types. Implementors of EnhancementEngines that extract specific kind of things (e.g. coreferences, events, …) may need to define there own Annotation types. Such Extensions should be called "**Annotation" and be defined as rdfs:subclass of any Annotation type defined by this Enhancement structure.
sb:Suggestion An suggestion describes an Resource (Entity, Topic, Category …) that an EnhancementEngine suggests as a possible match for an Annotation. Suggestions are typically created by Engines that further process - semantic lifting - of Annotations. However EnhancementEngines might also create both - the Annotation and the Suggestions. Suggestions are always linked to a single Annotations (functional property). They define the label, the ID (typically the URI of the Resource), the type(s) of the suggested Resource and the confidence of the suggestion.
sb:Occurrence: An Occurrence describes the actual location of an extracted feature within the content. This location may be within the content or within parsed metadata. Occurrences are always linked to a single Annotation (functional property). Based on the type of the content there will be different types of Occurrences. This EnhancementStructure currently focus on two types of Occurrences: (1) TextOccurrence and (2) MetadataOccurrence. For details on the model of such Occurrence types see the according sections. EnhancementEngines that support the extraction of Features from content types that are not covered by this Specification (e.g. Pictures, Sound, Video) need to define there own Occurrence types. Such types should use the name "***Occurrence" and be defined as rdfs:subClassOf any of the Occurrence types defined in this specification.

Enhancements encoded based on this specification need to confirm to the following rules:

sb:Annotation and sb:Suggestion MUST also be of type sb:Enhancement and include the required metadata defined by sb:Enhancement.
sb:Occurrences, sb:Annotations and Suggestions MUST include rdf:type information for all parent types. e.g. when adding a sb:TextOccurrences the rdf:type MUST include sb:TextOccurrence AND sb:Occurrences. Consumers are expected to NOT using any kind of reasoner therefore adding such additional information is the only way to ensure that queries for occurrences, annotations or suggestions provide the expected results.

Specification

Namespaces and used Notations

While the Stanbol Enhancement Structure does define some Concepts and Properties it also uses a lot of existing things from other ontologies. To improve the readability of this specification namespace prefixes + local names are used instead of the full URLs by this specification.

All the namespace prefixes used within this specification are described by the following list:

sb: represents Stanbol and refers to all properties and concepts defined by the Stanbol enhancement structure. This URL is not yet final, but one of the options is "http://stanbol.apache.org/ontology/".
dc: the Dublin Core Terms (DCterms) ontology (http://dublincore.org/documents/dcmi-terms/)
rdf: the Resrouce Description Framework (http://www.w3.org/RDF/)
rdfs: the RDF schema (http://www.w3.org/TR/rdf-schema/)
sioc: SIOC (Semantically-Interlinked Online Communities) Core Ontology (http://rdfs.org/sioc/ns#)

Notations used by this specification:

<{code}> elements do refer to an instance identified by the URI {code}. To improve the readability {codes} that refer to instances of concepts defined by the Stanbol enhancement structure will use short forms (<ci> for a ContentItem instance, <a> for anAnnotation instance ...).
{prefix}:{localname} is used as short form for <{namespace+localname}>. The namespace -> prefix mappings are defined in the above list
{value}^^dataType The (xsd) dataType required by the value e.g. xsd:float, xsd:anyUri, The default is xsd:string
?{var} represent a resource that is unknown by the Stanbol Enhancer. Usually a resource of the Users knowledge model that is not necessarily parsed to the Stanbol
[{statement}] represent statements that are typically used in combination with the Stanbol Enhancement Structure but not required nor used by the enhancement process itself.

A special NOTE to the usage of <{code}> in comairism to {value}^^xsd:anyURI:

In both cases the value will be an URI
In case of <{code}> the URI identifies a resource that is created/defined by the enhancement results - meaning that the returned knowledge contains all information about that resource
{value}^^xsd:anyURI indicates that enhancement results will not provide additional knowledge about this resource. If the consumer needs more information about such resources he need to use other services to retrieve such knowledge or parse special parameters to tell Stanbol to explicitly include such knowledge in the response.

ContentItem <ci>

The ContentItem <ci> represents a content parsed to the Stanbol Enhancer. It is the central resource used to link all the enhancements created by the EnhancementEngines.

<ci> rdf:type sb:ContentItem
[<ci> sb:embeds-knowledge {knowlegeGraphId}]
[<ci> sb:has-section sb:ContentItem]
[<ci> <{metadatafield}> {value(s)}]

The ContentItem itself does only define two fields:

sb:embeds-knowledge: Documents might contain explicit knowledge (e.g. MicroData, RDFa). If such information can be extracted, than it will be stored in an own RDF graph. This property links to the ID of this RDF graph. Such knowledge is typically extracted during the pre-processing phase of the EnhancementProcess. Therefore EnhancementEngine do have access to this information.
sb:has-section: A ContentItem my define different sections. The Stanbol EnhancementEngine will create an own ContentItem with an own ID for such sections. The Stanbol Enhancer will first enhance the main content item and than all the sections. This feature is mainly intended to split up huge documents to feasible parts to enhance.

In addition metadata extracted or parsed with the parsed content (e.g. Dublin Core, EXIF, ID3 ...) can also be directly added to the ContentItem <ci>. EnhancementEngines may used such information during the EnancementProcess.

Example: Embedded Knowledge

TODO: Move this to an own section about RDFa support!

This example shows how SIOC (Semantically-Interlinked Online Communities) and RDFa can be used to embed knowledge to tell Stanbol how to process parsed HTML markup.

<body about="http://www.examplenews.com/featuredNews"><table><tr>
    <td><!-- The menue: Not to be enhanced --> </td>
    <td><span property="sic:content" about="http://www.examplenews.com/story123"> 
        This is the Content of this page to be enhanced by the Stanbol enhancer
    </span><span property="sic:content" about="http://www.examplenews.com/interview456">
        And there may be even more than one Sections within the document that need to be enhanced
    </span></td>
    <td> <!-- Advertisements: Not to be enhanced --> </td>
</tr></trable></body>

By parsing this as Content the Stanbol Enhancer should create:

A sb:ContentItem for "http://www.examplenews.com/featuredNews" with two section but an empty content.
- The knowledge as defined by the above RDFa markup is included in an own RDF graph and linked with the "sb:embeds-knowledge" property
A sb:ContentItem representing the section "http://www.examplenews.com/story123"
- the HTML fragment enclosed by the according span-tag is the content
A sb:ContentItem representing the section "http://www.examplenews.com/interview456"
- the HTML fragment enclosed by the according span-tag is the content

NOTE: This assumes the presence of

a Components for extracting RDFa
a Component that supports the creation of sb:ContentItems and fragments based on SIOC

Enhancement

The concept "Enhancement" defines properties that allow Stanbol EnhancementEngines to formally describe information about the enhancement process. This information are crucial for EnhancemetnEngines to cooperate with each other but typical Stanbol users will not need to border with such information even that in some situation such knowledge might even be useful on the client side e.g. if someone wants to ignore all enhancements created by an specific enhancement engine, or to calculate all enhancements affected by the removal of an part of the content.

The following code segments shows the knowledge typically described by using the Enhancement concept

<e> rdf:type sb:Enhancement
<e> dc:creator enhancementEngine^^xsd:anyURI
<e> dc:contributor enhancementEngine^^xsd:anyURI
<e> dc:created date^^xsd:dateTime
<e> dc:modified date^^xsd:dateTime
[<e> sb:relatedTo <relatedEnhancement>]
[<e> sb:dependsOn <dependsOnEnhancement>]

The presence of the statement "<e> rdf:type sd:Enhancement" statement indicated that enhancement metadata are present for the resource <e>. This also means that if there is some configuration set to exclude such information, than all the above properties MUST be removed from the results of the enhancement process. The metadata defined by sb:Enhancement MUST BE added for all sb:Annotation and sb:Suggestion instances created by an EnhancementEngine. This also includes any rdf:subClassOf of those two Concepts.

The following figure shows an example of an sb:Annotation and a sb:Suggestion for Paris with the according metadata as defined by the sb:Enhancement concept.

Example: sb:Annotation and sb:Suggestion including sb:Enhancement metadata

Note that sb:Annotation and sb:Suggestion are not sub-classes of sb:Annotation. EnhancementEngines need to add sb:Enhancement as an additional rdf:type to sb:Annotation and sb:Suggestion.

Description of the properties defined/used by sb:Enhancement:

dc:creator and dc:contributor link to the EnhancementEngine(s) involved in creating the Enhancement.
dc:created and dc:modified** are intended to help sort enhancement based on enhancement activities performed during the enhancement process (something that might be useful especially in case EnhancementEngines do work asynchronously).
sb:relatedTo defines that an sb:Enhancement is related to an other. However also specifies that both enhancements are still valid if the other one is deleted.
sb:dependsOn defines that an sb:Enhancement depends on the other. If the other Enhancement is deleted (or rejected by a user) than all dependent sb:Enhancements MUST BE also removed/rejected. The above figure shows that sb:hasSuggestion as defined by sb:Annotation is an inverse relation to sb:dependsOn because suggestions depend on the annotation they are suggested for.

In addition EnhancementEngines might want/need to add additional metadata to the sb:Annotation and sb:Suggestion instances they create. Implementors of such EnhancementEngines are free to define there own Enhancemnt types. Such types MUST BE defined as rdfs:subClassOf sb:Enhancement and SHOULD use **Enhancement in there Concept name. EnhancementEngine MUST also add both the specific type AND sb:Enhancement as rdf:type values.

Sections below are not yet updated

Annotations

The concept "Annotation" provides metadata about the extracted feature. This information are important both for the enhancement process and the users of the Stanbol Enhancer. The following code segment shows the knowledge typically provided by an Annotation <a>. A description of the properties is provided below:

<a> rdf:type sb:Annotation
[<a> rdf:type sb:Enhancement, sb:Occurrence]
<a> sb:extracted-from <ci>
<a> dc:title label  //TODO: maybe it is better to use rdfs:label
<a> dc:role annotationRole^^xsd:anyURI
<a> dc:type annotationType^^xsd:anyURI
<e> sb:confidence value^^xsd:float
<a> sb:entity entity^^xsd:anyURI
<a> sb:entity-type entityType^^xsd:anyURI
<a> sb:suggestion <a1>

The following properties are defined for Annotations <a>

rdf:type sb:Annotation: This states that someone can expect the resource to provide all the information as defined by this specification
sb:extracted-from: This links the annotation describing an feature with the content item this feature is extracted from.
dc:title: This is the human readable name - the label - of the extracted Feature
dc:role: If this Annotation is a Tag, Category, Suggestion ... There will be a controlled vocabulary describing the different roles used by the Stanbol Enhancer
dc:type: The type of the Feature described by this Annotation e.g. Person, Organization, Location ... There will be a controlled vocabulary with types used by the Stanbol Enhancer
sb:confidence: The value describes the confidence of the EnhancementEngine. Values are on an ordinal scale. TODO: In the current implementation values of different Enhancement Engines are not comparable, but that information might not be available/processed by users and therefore result in wrong interpretations (rwesten)
sb:entity: In case an annotation describes an Entity, this property provides the URI for the entity
sb:entity-type: In case an annotation describes an Entity, this property provides the rdf:types of the linked entity
sb:suggestion: Links to an other annotation that provides a suggestion for this one. This indicates that the Stanbol Enhancer requests the client to decide between the provided options - e.g. by some user interaction.
sb:occurrence: Optionally links to one or more sb:Occurrence of this annotation within the parsed Content. Note that there are several types of Occurrences (TextOccurrence, ImageOccurrence, MetadataOccurrence …) defined. If this property is missing, that the Annotation is assumed to be about the whole content (as referred to by the sb:extracted-from property).

Annotations Type describe the type of the annotated feature based on a terminology standardized by Stanbol. Current types include

dbpedia-ont:Place
dbpedia-ont:Organisation
dbpedia-ont:Person
add some additional types describing Occurrents (Activities, Events), Conceptualizations

This list should only contain some types useful for grouping Annotations in user interfaces. The exact types of entities can be anyway added by using the sb:entity-type property.

TODO: We need to decide if we create an own controlled vocabulary within the Stanbol namespace or if we select some concepts defined in an external ontology (such as the dbpedia ontology that is currently used).

Annotation Roles describe the proposed role of the extracted feature in relation to the content. The following list shows the currently defined roles:

sb:Tag: The feature can be suggested as tag for the parsed content.
sb:Category: The feature provides a categorization for the parsed content.
sb:Keyword: The feature describes a keyword within the parsed content TODO: describe the difference between keywords and tags

NOTE: Such roles should make it more easy to support additional Annotations roles as suggested by STANBOL-48 and STANBOL-12 that includes STANBOL-28 and STANBOL-29.

sb:Suggestion

Suggestions are used by the Stanbol Enhancer to suggest possible values for the resolution features extracted from the parsed content. Currently there are two different use cases for Suggestions defined

(1) Entity Resolution:* Suggests entities for an Feature extracted from the content. Typically such suggestions are calculated based on the name of the feature found within the content (e.g. the selected text of a sb:TextOccurrence).
(2) Field Value Suggestion:* Suggest a value for a specific property. This kind of suggestion are useful if an relation between two extracted features is detected. A typical example would be a person "Steve Jobs" with the role "CEO" of the company "Apple Inc". Such relations can be detected by NLP tools. However suggestions like this are also central for semantic lifting of RDFa annotations as shown in the example below.

sb:Suggestion uses the following properties

sb:entity: The id of the suggested Entity
sb:entity-type: The type(s) of the suggested Entity
sb:confidence: Needed to sort in case of multiple suggestions
sb:field: Defines the property this suggestion should become the value if accepted by the user

In addition all sb:Suggestions are also of type sb:Enhancement to allow EnhancementEngine to provide enhancement metadata for them.

for details how they are used please see the following Example

==== Example ====

As example lets assume that the following RDFa annotated content is parsed to the Stanbol Enhancer

<span typeof="cal:Vevent">
    <h3 property="dc:title"> Stanbol Teleconference </h3>
    <span property="cal:summary>
        <p> Agenda: </p>
        <ul>
            <li> ... </li>
        <ul>
        <p> Participants: </p>
        <ul>
            <li typeof="foaf:Person" property="foaf:name">Rupert Westenthaler</li>
            <li typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
            <li> ... </li>
        </ul>
    </span>
</span>

(1) Suggest the Entities for Rupert and Olivier (2) Suggest to link Rupert and Olivier as values for "cal:attendee"

Both for Rupert Westenthaler and Olivier Grisel an EntityAnnotation would be present - in that case created by the RDFa extractor, but in principle this could also work if the RDFa markup is missing. In such cases the EntityAnnotations could be created by an NLPEnhancementEngine.

<a1> rdf:type sb:EntityAnnotation
<a1> dc:title Rupert Westenthaler
<a1> sb:entity-type foaf:Person
<a1> sb:hasOccurrence <o1>
<a1> sb:hasSuggestion <s1>

<a2> rdf:type sb:EntityAnnotation
<a2> dc:title Olivier Grisel
<a1> sb:entity-type foaf:Person
<a2> sb:hasOccurrence <o2>
<a2> sb:hasSuggestion <s2>

Lets ignore the occurrences - because how to create Occurrences for RDFa markup is a whole different story that needs to be specified - and concentrate on the suggestions.

<s1> rdf:type sb:Suggestion
<s1> sb:entity <http://www.example.com/person/Rupert_Westenthaler>
<s1> sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
<s1> sb:confidence 123,456

<s2> rdf:type sb:Suggestion
<s2> sb:entity <http://www.example.com/person/Olivier_Grisel>
<s2> sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
<s2> sb:confidence 234,567

If the suggestion is accepted by the client the RDFa markup could be updated like this

<li about="http://www.example.com/person/Rupert_Westenthaler"
    typeof="foaf:Person" property="foaf:name">Rupert Westenthaler</li>
<li about="http://www.example.com/person/Olivier_Grisel"
    typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>

Now lets have a detailed look at the suggestions to add Rupert and Olivier as a "cal:attendee" to the meeting. First we need to create an EntityAnnotation for the Meeting that would be created by the RDFa extractor

<a> rdf:type sb:EntityAnnotation
<a> dc:title "Stanbol Teleconference"
<a> sb:entity-type cal:Vevent
<a> sb:hasOccurrence <o>
<a> sb:hasSuggestion <s3>
<a> sb:hasSuggestion <s4>

Again lets skip the occurrence and look at the two suggestions. What I want to do here is to suggest to use the Annotations for Rupert () and Olivier () as values for the property "cal:attendee".

It is important to suggest here the annotations and as values and NOT the suggested entities (e.g. http://www.example.com/person/Rupert_Westenthaler in case of ) because the Stanbol Enhancer can not assume that the user will accepts the suggestions for and for .

The following suggestions also use the sb:field property to tell the user that the suggestions is about values for the "cal:attendee" property.

<s3> rdf:type sb:Suggestion
<s3> sb:field cal:attendee
<s3> sb:entity <a1>
<s3> sb:entity-type sb:EntityAnnotation
<s3> sb:confidence 12,34

<s4> rdf:type sb:Suggestion
<s4> sb:field cal:attendee
<s4> sb:entity <a2>
<s4> sb:entity-type sb:EntityAnnotation
<s4> sb:confidence 12,34

NOTE:

I am not sure if it is a good Idea to use "sb:entity" to link to an annotation created by the Stanbol Enhancer because it might confuse users if the same property is used to link external and internal resources. However introducing an additional property such as "sb:value" seam also not better.

Here the RDFa markup if the user accepts and but not and

<span typeof="cal:Vevent">
    [...]
    <p> Participants: </p>
    <ul property="cal:attendee">
        <li typeof="foaf:Person" property="foaf:name">Rupert Westenthaler</li>
        <li typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
        <li> ... </li>
    </ul>
</span>

and finally the RDFa markup if the all suggestions are accepted by the client side

<span typeof="cal:Vevent">
    [...]
    <p> Participants: </p>
    <ul property="cal:attendee">
        <li about="http://www.example.com/person/Rupert_Westenthaler"
            typeof="foaf:Person" property="foaf:name">Rupert Westenthaler</li>
        <li about="http://www.example.com/person/Olivier_Grisel"
            typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
    </ul>
</span>

Occurrences

By default detected Features are considered to be extracted from the whole content. While this assumption is appropriate for things like Categorizations and keywords for a lot of cases it is possible to specify the exact occurrence of features within the content and/or the metadata of the content. In such cases the sb:Annotation will define one or more values for the sb:occurrence value.

Different Occurrence descriptions are needed to describe the position of a feature within different types of content or within the parsed metadata.

TextOccurrence:

Describe the occurrence of a feature within an textual content.

<o> rdf:type sb:TextOccurrence
    sb:TextOccurrence rdfs:subClassOf sb:Occurrence
<o> rdf:type sb:Occurrence
<o> sb:selected-text selectedText
<o> sb:start startPosition^^xsd:long
<o> sb:end endPosition^^xsd:long
<o> sb:context selectionContext
<o> sb:occurrence-within-context count^^xsd:int

rdf:type sb:TextOccurrence, sb:Occurrence: It is required to add both types, to support queries for all Occurrences when no RDFS reasoner is present
sb:selected-text: The text selected by this Occurrence. Often the value of this property is the same as of the dc:title property defined by sb:Annotation. However this is no requirement. Enhancement Engines may decide to use different values if appropriate.
sb:start and sb:end: The start and end position of the selected text relative to the start of the content
sb:context: The context (e.g. the sentence) used to extract the selected text.
sb:occurrence-within-context: Defines the n-th occurrence of the selected text with the context. Together with the sb:context this can be used to locate the selected text even if the sb:start/sb:end positions are no longer valid (e.g. when the original content was transformed to an other format).

MetadataOccurrence:

Describes the occurrence of an feature within the metadata of the parsed content. This are extremely useful to link entities for literal values provided by metadata standards such as creator information for Dublin Core, Artist, Album, Label ... information provided by ID3 or Camera Model information as present in EXIF metadata. Also geo-point to City, Region, Country enhancements could be done by using this type of occurrences.

<o> rdf:type sb:MetadataOccurrence
    sb:MetadataOccurrence rdfs:subClassOf sb:Occurrence
<o> rdf:type sb:Occurrence
<o> sb:field metadataProperty^^xsd:anyURI
<o> sb:value value

rdf:type sb:MetadataOccurrence, sb:Occurrence: It is required to add both types, to support queries for all Occurrences when no RDFS reasoner is present
rdf:field: The field of the metadata standard used. Multiple values describe that the feature occurs in several fields
rdf:value: The value that hints the described feature. The property is related to the properties dc:title - in case the value is a literal - and the sb:entity - in case the value is an URI - of sb:Annotation.

Other Occurrence Types

TimeBasedMediaOccurrence: This would define a temporal section within a time based media (e.g. a Sound File)
VisualOccurrence: This would define a section within a media that can be presented on a screen
VideoOccurrence: Would be the combination of a time based and a visual occurrence This kind of occurrences are currently not defined, because there is no Stanbol EnhancementEngine that could make use of it.

Use Cases and Examples

This Sections describes uses cases how the Stanbol Enhancement Structure is used to enhance documents. It also provides examples of how users can use/query for enhancements based on the returned knowledge

Simple Text Enhancement

An User types the text "Next week I will travel to Paris" and would like to have general Enhancements like Tags, Keywords and Categories

Lets assume that Paris was detected to describe a location and travel to be a keyword. There are also two known Entities with the name "Paris" and the type Location. This would result in an enhancement graph as follows

# The content item 
<ci> rdf:type sb:ContentItem

# Paris as detected by the nlpEngine as location
<a1> rdf:type sb:Enhancement
<a1> rdf:type sb:Annotation
<a1> rdf:type sb:Occurrence
<a1> rdf:type sb:TextOccurrence
# Properties for Enhancement
<a1> sb:extracted-from <ci>
<a1> dc:creator urn:stanbol.engines:nlpEngine
<a1> dc:created "2011-02-28T12:13:14Z"
# Properties for Annotation
<a1> dc:title "Paris"
<a1> dc:role sb:Tag
<a1> dc:type: dbpedia-ont:Place
<a1> dc:suggestion <a2>, <a3>
<a1> sb:confidence 0.85
# Properties for TextOccurrence
<ai> sb:selected-text "Paris"
<a1> sb:start 28
<a1> sb:end 32
<a1> sb:context "Next week I will travel to Paris"
<a1> sb:occurrence-within-context 1

# dbpedia:Paris as suggested Entity
<a2> rdf:type sb:Enhancement
<a2> rdf:type sb:Annotation
# Properties for Enhancement
<a2> sb:extracted-from <ci>
<a2> dc:requires <a1>
<a2> dc:creator urn:stanbol.engines:entityTaggingEngine
<a2> dc:created "2011-02-28T12:13:18Z"
# Properties for Annotation
<a2> dc:title "Paris"
<a2> dc:role sb:Suggestion
<a2> dc:type: dbpedia-ont:Place
<a2> sb:entity http://dbpedia.org/resources/Paris
<a2> sb:entity-type dbpedia-ont:City, dbpedia-ont:Settlement, dbpedia-ont:PopulatedPlace, dbpedia-ont:Place
<a2> sb:confidence 123.456

# dbpedia:Paris,_Texas as suggested Entity
<a3> rdf:type sb:Enhancement
<a3> rdf:type sb:Annotation
# Properties for Enhancement
<a3> sb:extracted-from <ci>
<a3> dc:requires <a1>
<a3> dc:creator urn:stanbol.engines:entityTaggingEngine
<a3> dc:created "2011-02-28T12:13:19Z"
# Properties for Annotation
<a3> dc:title "Paris, Texas"
<a3> dc:role sb:Suggestion
<a3> dc:type: dbpedia-ont:Place
<a3> sb:entity http://dbpedia.org/resources/Paris,_Texas
<a3> sb:entity-type dbpedia-ont:City, dbpedia-ont:Settlement, dbpedia-ont:PopulatedPlace, dbpedia-ont:Place
<a3> sb:confidence 12.34

# travel as detected keyword
<a4> rdf:type sb:Enhancement
<a4> rdf:type sb:Annotation
# Properties for Enhancement
<a4> sb:extracted-from <ci>
<a4> dc:creator urn:stanbol.engines:keywordExtractionEngine
<a4> dc:created "2011-02-28T12:13:22Z"
# Properties for Annotation
<a4> dc:title "travel"
<a4> dc:role sb:Keyword
<a4> dc:type: dbpedia-ont:Activity //can we expect this to be available -> probably not

When consuming the following queries would be used:

Getting all Tags: to get all Keywords/Categories replace sb:Tag with sb:Keyword/sb:Category

PREFIX dc: <http://purl.org/dc/terms/>
PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
SELECT ?id, ?title, ?type 
WHERE {
    ?id dc:role sb:Tag .
    ?id dc:title ?title .
    OPTIONAL { ?id dc:type ?type }
}

Getting suggestions for an known Annotation (e.g. urn:annotation1)

PREFIX dc: <http://purl.org/dc/terms/>
PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
SELECT ?entity, ?title, ?type ?score
WHERE {
    <urn:annotation1> sb:suggestion ?id .
    ?id dc:title ?title .
    ?id sb:entity ?entity .
    OPTIONAL { ?id sb:entity-type ?type } .
    OPTIONAL { ?id sb:confidence ?score }
}

Getting all selected Entities within the Text

PREFIX dc: <http://purl.org/dc/terms/>
PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
SELECT ?id, ?title, ?start, ?end, ?type 
WHERE {
    ?id dc:role sb:Tag .
    ?id dc:title ?title .
    ?id sb:start ?start .
    ?id sb:end ?end .
    OPTIONAL { ?id dc:type ?type }
}

Getting all Locations and optionally the occurrences within the text

PREFIX dc: <http://purl.org/dc/terms/>
PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
PREFIX dbpedia-ont: <http://dbpedia.org/ontology/>  
SELECT ?id, ?title, ?start, ?end
WHERE {
    ?id dc:type dbpedia-ont:Place .
    ?id dc:title ?title .
    OPTIONAL {
        ?id sb:start ?start .
        ?id sb:end ?end
    }
}

Enhancement of Metadata

This example shows the the Enhancement Structure allows to create enhancements based on parsed Metadata.

Lets assume that a user parses a content item and an additional file providing Dublin Core metadata that include (among others)

dc:creator "Richard Cypher"
dc:creator "Rachel Brandstone"
dc:contributor "Richard Cypher"

Further assume that both Richard and Rachel works for the company running the Stanbol Enhancer and there is an EnhancementEngine that knows about Company resource. This example uses the URI "http://www.company.org/team/Richard_Cypher" and "http://www.company.org/team/Rachel_Brandstone" to identify the two example employees.

#The content item
<ci> rdf:type sb:ContentItem
<ci> dc:creator "Richard Cypher", "Rachel Brandstone"
<ci> dc:contributor "Richard Cypher"
<ci> {other Dublin Core metadata extracted from the parsed file}

# Annotation describing the "Richard Cypher"
# Assumed to be created by the dcAnnotationEngine with the help
# of the entityTaggingEngine.
<a1> rdf:type sb:Enhancement
<a1> rdf:type sb:Annotation
<a1> rdf:type sb:Occurrence
<a1> rdf:type sb:MetadataOccurrence
# Properties for Enhancement
<a1> sb:extracted-from <ci>
<a1> dc:creator urn:stanbol.engines:dcAnnotationEngine
<a1> dc:contributor urn:stanbol.engines:entityTaggingEngine
<a1> dc:created "2011-02-28T13:14:15Z"
# Properties for Annotation
<a1> dc:title "Richard Cypher"
<a1> dc:role sb:Tag
<a1> dc:type: dbpedia-ont:Person
<a1> sb:confidence 1.0
<a1> sb:entity http://www.company.org/team/Richard_Cypher
<a1> sb:entity-type foaf:Agent, foaf:Person, vCard:Contact
# Properteis for MetadataOccurrence
<a1> sb:field dc:creator, dc:contributor
<a1> sb:value "Richard Cypher"

# Annotation describing the "Rachel Brandstone"
<a1> rdf:type sb:Enhancement
<a1> rdf:type sb:Annotation
<a1> rdf:type sb:Occurrence
<a1> rdf:type sb:MetadataOccurrence
# Properties for Enhancement
<a1> sb:extracted-from <ci>
<a1> dc:creator urn:stanbol.engines:dcAnnotationEngine
<a1> dc:contributor urn:stanbol.engines:entityTaggingEngine
<a1> dc:created "2011-02-28T13:14:22Z"
# Properties for Annotation
<a1> dc:title "Rachel Brandstone"
<a1> dc:role sb:Tag
<a1> dc:type: dbpedia-ont:Person
<a1> sb:confidence 1.0
<a1> sb:entity http://www.company.org/team/Rachel_Brandstone
<a1> sb:entity-type foaf:Agent, foaf:Person, vCard:Contact
# Properteis for MetadataOccurrence
<a1> sb:field dc:creator
<a1> sb:value "Rachel Brandstone"

NOTE: One could also create two sb:Annotations for both Richard and Rachel, one Annotation describing the annotated value and a second suggesting the entity for the first, but that seams like an unnecessary complexity as long as there is only one person with this name in the company. Nonetheless this decision needs to be reviewed. Therefore the code for Richard when using this variant.

#Annotation describing "Richard Cypher" as extracted from the DC description
<a1> rdf:type sb:Enhancement
<a1> rdf:type sb:Annotation
<a1> rdf:type sb:Occurrence
<a1> rdf:type sb:MetadataOccurrence
# Properties for Enhancement
<a1> sb:extracted-from <ci>
<a1> dc:creator urn:stanbol.engines:dcAnnotationEngine
<a1> dc:created "2011-02-28T13:14:15Z"
# Properties for Annotation
<a1> dc:title "Richard Cypher"
<a1> dc:role sb:Tag
<a1> dc:type: dbpedia-ont:Person
<a1> sb:confidence 1.0
<a1> sb:suggestion <a3>
# Properteis for MetadataOccurrence
<a1> sb:field dc:creator, dc:contributor
<a1> sb:value "Richard Cypher"

# Annotation describing the employee Richard Cypher
<a3> rdf:type sb:Enhancement
<a3> rdf:type sb:Annotation
# Properties for Enhancement
<a3> sb:extracted-from <ci>
<a3> dc:requires <a1>
<a3> dc:creator urn:stanbol.engines:entityTaggingEngine
<a3> dc:created "2011-02-28T13:14:18Z"
# Properties for Annotation
<a3> dc:title "Richard Cypher"
<a3> dc:role sb:Suggestion
<a3> dc:type: dbpedia-ont:Person
<a3> sb:entity http://www.company.org/team/Richard_Cypher
<a3> sb:entity-type foaf:Agent, foaf:Person, vCard:Contact
<a3> sb:confidence 8.76

When consuming the following queries would be used:

Getting all Annotations for the dc:creator field

Version based on variant 1:

PREFIX dc: <http://purl.org/dc/terms/>
PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
SELECT ?id, ?title, ?creatorId
WHERE {
    ?id dc:title ?title .
    ?id sb:entity ?creatorId .
    ?id sb:field dc:creator.
}

Version for variant 2:

PREFIX dc: <http://purl.org/dc/terms/>
PREFIX sb: <http://stanbol.apache.org/ontology/1.0/>    
SELECT ?id, ?title, ?creatorId
WHERE {
    ?ma sb:field dc:creator .
    ?ma sb:suggestion ?id . 
    ?id dc:title ?title .
    ?id sb:entity ?creatorId .
    ?id sb:field dc:creator.
}

Getting all Annotations created for DC properties