Tokenizer for bbcode documents
The tokenizer used for all bbcode documents should prepare a token array, which can be used by the bbcode parser, without any bbcode language specific handling in the parser itself required.
Token extraction ----------------
For the token extraction the reqular expressions in the $tokens property are used. The $tokens array has to be build like, and can be created in the constrctor:
The array is evaluated in the given order, until one of the regular expressions match. The regular expression should have at least one named match (?P<value> ... ), with the name "value", which will be assigned to the token, created form the given class name, as its content. The matched contents will be removed from the beginning of the string. Optionally a second named match, called "match", may be used inside the regular expression. If so, only the contents inside this match will be removed from the beginning of the string. This enables you to perform a trivial lookahead inside the tokenizer.
If no expression matches, an exception will be thrown.
Source for this file: /Document/src/document/bbcode/tokenizer.php
Version: | //autogen// |
SPECIAL_CHARS
= '\\[\\]'
|
Special characters, which do have some special meaaning and though may not have been matched otherwise. |
TEXT_END_CHARS
= '\\[\\]\\r\\n'
|
Characters ending a pure text section. |
WHITESPACE_CHARS
= '[\\x20\\t]'
|
Common whitespace characters. The vertical tab is excluded, because it causes strange problems with PCRE. |
protected array |
$tokens
= array()
List with tokens and a regular expression matching the given token. The tokens are matched in the given order. |
public void |
__construct(
)
Construct tokenizer |
protected void |
convertTabs(
$token
)
Convert tabs to spaces |
public array |
tokenizeFile(
$file
)
Tokenize the given file |
public array |
tokenizeString(
$string
)
Tokenize the given string |
Construct tokenizer
Create token array with regular repression matching the respective token.
Convert tabs to spaces
Convert all tabs to spaces, as defined in: http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#whitespace
Name | Type | Description |
---|---|---|
$token |
ezcDocumentBBCodeToken |
Tokenize the given file
The method tries to tokenize the passed files and returns an array of ezcDocumentBBCodeToken struct on succes, or throws a ezcDocumentTokenizerException, if something could not be matched by any token.
Name | Type | Description |
---|---|---|
$file |
string |
Tokenize the given string
The method tries to tokenize the passed strings and returns an array of ezcDocumentBBCodeToken struct on succes, or throws a ezcDocumentTokenizerException, if something could not be matched by any token.
Name | Type | Description |
---|---|---|
$string |
string |