Tokenizer for Confluence wiki documents.
The Confluence wiki is a quite popular wiki and part of the Atlassian software stack. It is chosen, because it uses an entirely different markup in some places, compared to the other wiki markup languages. The markup is documented at:
http://confluence.atlassian.com/renderer/notationhelp.action?section=all
For the basic workings of the tokenizer see the class level documentation in the ezcDocumentWikiTokenizer class.
Source for this file: /Document/src/document/wiki/tokenizer/confluence.php
ezcDocumentWikiTokenizer | --ezcDocumentWikiConfluenceTokenizer
Version: | //autogen// |
SPECIAL_CHARS
= '/*^,#_~?+!\\\\\\[\\]{}|=-'
|
Special characters, which do have some special meaaning and though may not have been matched otherwise. |
TEXT_END_CHARS
= '/*^,#_~?+!\\\\\\[\\]{}|=\\r\\n\\t\\x20-'
|
Characters ending a pure text section. |
WHITESPACE_CHARS
= '[\\x20\\t]'
|
Common whitespace characters. The vertical tab is excluded, because it causes strange problems with PCRE. |
protected array |
$imageAttributeMapping
= array(
Mapping of confluence image attribute names to image start token properties. |
From ezcDocumentWikiTokenizer | |
---|---|
protected |
ezcDocumentWikiTokenizer::$tokens
|
public void |
__construct(
)
Construct tokenizer |
protected array |
filterTokens(
$tokens
)
Filter tokens |
protected void |
parseImageDescriptor(
$token
, $descriptor
)
Parse confluence image descriptors |
protected void |
parsePluginContents(
$plugin
)
Parse plugin contents |
From ezcDocumentWikiTokenizer | |
---|---|
public abstract void |
ezcDocumentWikiTokenizer::__construct()
Construct tokenizer |
protected void |
ezcDocumentWikiTokenizer::convertTabs()
Convert tabs to spaces |
protected abstract array |
ezcDocumentWikiTokenizer::filterTokens()
Filter tokens |
public array |
ezcDocumentWikiTokenizer::tokenizeFile()
Tokenize the given file |
public array |
ezcDocumentWikiTokenizer::tokenizeString()
Tokenize the given string |
Construct tokenizer
Create token array with regular repression matching the respective token.
Method | Description |
---|---|
ezcDocumentWikiTokenizer::__construct() |
Construct tokenizer |
Filter tokens
Method to filter tokens, after the input string ahs been tokenized. The filter should extract additional information from tokens, which are not generally available yet, like the depth of a title depending on the title markup.
Name | Type | Description |
---|---|---|
$tokens |
array |
Method | Description |
---|---|
ezcDocumentWikiTokenizer::filterTokens() |
Filter tokens |
Parse confluence image descriptors
Parse confluence image descriptors which are completely different from other wiki languages, so that they cannot be handled by the default parser.
Name | Type | Description |
---|---|---|
$token |
ezcDocumentWikiImageStartToken | |
$descriptor |
mixed |
Parse plugin contents
Plugins are totally different in each wiki component and its contents should not be passed through the normal wiki parser. So we fetch its contents completely and let each tokinzer extract names and parameters from the complete token itself.
Name | Type | Description |
---|---|---|
$plugin |
ezcDocumentWikiPluginToken |