Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# LangDetect: Language Identification Enhancement Engine
The **LanguageDetection** engine determines the language of text.
## Technical Description
The provided engine is based on the [language detection library](http://code.google.com/p/language-detection/).
The text to be checked must be provided in plain text format by the content item.
The result of language identification is added as TextAnnotation to the content item's metadata as string value of the property
http://purl.org/dc/terms/language
This RDF snippet illustrates the output:
org.apache.stanbol.enhancer.engines.langdetect.probe-length
an integer specifying how many characters will be used for
identification. A value of 0 or below means to use the complete
text. Otherwise only a substring of the specified length taken from the
middle of the text will be used. The default value is 400 characters.
## Usage
Assuming that the Stanbol endpoint with the full launcher is running at
http://localhost:8080
and the engine is activated, from the command line commands like this
can be used for submitting some text file as content item:
* stateless interface
curl -i -X POST -H "Content-Type:text/plain" -T testfile.txt http://localhost:8080/engines
* stateful interface
curl -i -X PUT -H "Content-Type:text/plain" -T testfile.txt http://localhost:8080/contenthub/content/someFileId
Alternatively, the Stanbol web interface can be used for submitting documents
and viewing the metadata at
http://localhost:8080/contenthub