org.apache.ctakes.core.nlp.tokenizer
Class Token

java.lang.Object
  extended by org.apache.ctakes.core.nlp.tokenizer.Token

public class Token
extends Object

Object that represents a generic token. The token is related back to the original text via a start and end offset. These are character positions that relate directly to the original text. A token can be one of many different types. Please see the javadoc for the TYPE fields to see a description of each.

Author:
Mayo Clinic

Field Summary
static byte CAPS_ALL
           
static byte CAPS_FIRST_ONLY
           
static byte CAPS_MIXED
           
static byte CAPS_NONE
           
static byte CAPS_UNKNOWN
           
static byte NUM_FIRST
           
static byte NUM_LAST
           
static byte NUM_MIDDLE
           
static byte NUM_NONE
           
static byte TYPE_CONTRACTION
          Contains contractions and possessives (since they cannot be differentiated without context).
static byte TYPE_EOL
          A EOL token is defined as a line feed or carriage return character.
static byte TYPE_NUMBER
          A number token is defined as a consecutive series of digits.
static byte TYPE_PUNCT
          A punctuation token is defined as one character that can be either a period, double quote, single quote, question mark, exclamation point, hyphen (if not surrounded by word characters), etc...
static byte TYPE_SYMBOL
          Characters @!#$%^&*?
static byte TYPE_UNKNOWN
          The type is unknown.
static byte TYPE_WORD
          A word token is defined as a consecutive series of word characters.
 
Constructor Summary
Token(int startOffset, int endOffset)
          Constructor
 
Method Summary
 byte getCaps()
          Gets the caps state of the token.
 int getEndOffset()
          Gets the end offset.
 byte getNumPosition()
          Gets the position of a number inside a Token.
 int getStartOffset()
          Gets the start offset.
 String getText()
           
 byte getType()
          Gets the type of the token.
 boolean isInteger()
           
 void setCaps(byte b)
          Sets the caps state of the token.
 void setEndOffset(int i)
          Sets the end offset.
 void setIsInteger(boolean isInteger)
           
 void setNumPosition(byte b)
          Sets the position of a number inside a Token.
 void setStartOffset(int i)
          Sets the start offset.
 void setText(String s)
           
 void setType(byte b)
          Sets the type of the token.
 String toString()
           
static String typeDescription(byte type)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

TYPE_UNKNOWN

public static final byte TYPE_UNKNOWN
The type is unknown.

See Also:
Constant Field Values

TYPE_WORD

public static final byte TYPE_WORD
A word token is defined as a consecutive series of word characters. Word characters are defined as A-Z and a-z. A word token may contain hypens if the hyphen has a word character on each side. A word token may contain an apostrophe if the apostrophe has a word character on each side.

See Also:
Constant Field Values

TYPE_NUMBER

public static final byte TYPE_NUMBER
A number token is defined as a consecutive series of digits.

See Also:
Constant Field Values

TYPE_PUNCT

public static final byte TYPE_PUNCT
A punctuation token is defined as one character that can be either a period, double quote, single quote, question mark, exclamation point, hyphen (if not surrounded by word characters), etc...

See Also:
Constant Field Values

TYPE_EOL

public static final byte TYPE_EOL
A EOL token is defined as a line feed or carriage return character.

See Also:
Constant Field Values

TYPE_CONTRACTION

public static final byte TYPE_CONTRACTION
Contains contractions and possessives (since they cannot be differentiated without context).

See Also:
Constant Field Values

TYPE_SYMBOL

public static final byte TYPE_SYMBOL
Characters @!#$%^&*?

See Also:
Constant Field Values

CAPS_UNKNOWN

public static final byte CAPS_UNKNOWN
See Also:
Constant Field Values

CAPS_NONE

public static final byte CAPS_NONE
See Also:
Constant Field Values

CAPS_MIXED

public static final byte CAPS_MIXED
See Also:
Constant Field Values

CAPS_FIRST_ONLY

public static final byte CAPS_FIRST_ONLY
See Also:
Constant Field Values

CAPS_ALL

public static final byte CAPS_ALL
See Also:
Constant Field Values

NUM_NONE

public static final byte NUM_NONE
See Also:
Constant Field Values

NUM_FIRST

public static final byte NUM_FIRST
See Also:
Constant Field Values

NUM_MIDDLE

public static final byte NUM_MIDDLE
See Also:
Constant Field Values

NUM_LAST

public static final byte NUM_LAST
See Also:
Constant Field Values
Constructor Detail

Token

public Token(int startOffset,
             int endOffset)
Constructor

Parameters:
startOffset - The token's start offset.
endOffset - The token's end offset.
Method Detail

getEndOffset

public int getEndOffset()
Gets the end offset. This is the position directly after the last letter.


setEndOffset

public void setEndOffset(int i)
Sets the end offset. This is the position directly after the last letter.


getStartOffset

public int getStartOffset()
Gets the start offset. This is the position of the first letter.


setStartOffset

public void setStartOffset(int i)
Sets the start offset. This is the position of the first letter.


getType

public byte getType()
Gets the type of the token. Please see the javadoc for the TYPE fields.


setType

public void setType(byte b)
Sets the type of the token. Please see the javadoc for the TYPE fields.


getCaps

public byte getCaps()
Gets the caps state of the token.


setCaps

public void setCaps(byte b)
Sets the caps state of the token.


getNumPosition

public byte getNumPosition()
Gets the position of a number inside a Token.


setNumPosition

public void setNumPosition(byte b)
Sets the position of a number inside a Token.


getText

public String getText()

setText

public void setText(String s)

isInteger

public boolean isInteger()

setIsInteger

public void setIsInteger(boolean isInteger)

toString

public String toString()
Overrides:
toString in class Object

typeDescription

public static String typeDescription(byte type)


Copyright © 2012-2013 The Apache Software Foundation. All Rights Reserved.