org.apache.ctakes.relationextractor.cr
Class GoldEntityAndRelationReader
java.lang.Object
org.apache.uima.analysis_component.AnalysisComponent_ImplBase
org.apache.uima.analysis_component.Annotator_ImplBase
org.apache.uima.analysis_component.JCasAnnotator_ImplBase
org.uimafit.component.JCasAnnotator_ImplBase
org.apache.ctakes.relationextractor.cr.GoldEntityAndRelationReader
- All Implemented Interfaces:
- org.apache.uima.analysis_component.AnalysisComponent
public class GoldEntityAndRelationReader
- extends org.uimafit.component.JCasAnnotator_ImplBase
Read named entity annotations and relations between them
from knowtator xml files into the CAS.
Assumptions:
- A pair of entities can only have a single relation between them
- An entity can have only a single semantic type
For each relation instance in the gold standard, this reader will:
- Check if the arguments of this relation instance can be extracted
by CTAKEs automatically. If one of them cannot, this relation
instance and the entities will be skipped.
- Check if another relation between a pair of entities with the same
knowtator mention ids has already been added to the cas. If it has,
the reader will not add a new relation between these entities.
This reader will also make sure each entity is added to the cas only once.
E.g. the cas may already contain an entity if it participates in another
relation that's already been added to the cas or due to an error in the gold
standard (i.e. if it was annotated twice -- such weirdness does happen).
TODO: Currently this reader does not normalize the roles of the arguments
accross different corpora. It will simply add to the cas whatever is in the data.
However, the roles were not consistently annotated accross different corpora
(e.g. Sharp and Share assign different roles to the modifiers and entity
mentions that participate in degree_of relation). This issue needs to be addressed
so that modles can be trained on data coming from different sources.
- Author:
- dmitriy dligach
Method Summary |
void |
initialize(org.apache.uima.UimaContext aContext)
|
void |
process(org.apache.uima.jcas.JCas jCas)
|
Methods inherited from class org.uimafit.component.JCasAnnotator_ImplBase |
getLogger |
Methods inherited from class org.apache.uima.analysis_component.JCasAnnotator_ImplBase |
getRequiredCasInterface, process |
Methods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase |
getCasInstancesRequired, hasNext, next |
Methods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase |
batchProcessComplete, collectionProcessComplete, destroy, getContext, getResultSpecification, reconfigure, setResultSpecification |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PARAM_INPUTDIR
public static final java.lang.String PARAM_INPUTDIR
- See Also:
- Constant Field Values
inputDirectory
public static java.io.File inputDirectory
identifiedAnnotationId
public int identifiedAnnotationId
relationId
public int relationId
relationArgumentId
public int relationArgumentId
GoldEntityAndRelationReader
public GoldEntityAndRelationReader()
initialize
public void initialize(org.apache.uima.UimaContext aContext)
throws org.apache.uima.resource.ResourceInitializationException
- Specified by:
initialize
in interface org.apache.uima.analysis_component.AnalysisComponent
- Overrides:
initialize
in class org.uimafit.component.JCasAnnotator_ImplBase
- Throws:
org.apache.uima.resource.ResourceInitializationException
process
public void process(org.apache.uima.jcas.JCas jCas)
throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
- Specified by:
process
in class org.apache.uima.analysis_component.JCasAnnotator_ImplBase
- Throws:
org.apache.uima.analysis_engine.AnalysisEngineProcessException