eZ Component: Document, Design ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Design description ================== ezcDocument interface --------------------- Interface that defines an abstract document class. Classes that implement this interface are called after the format's name, for example 'ezcDocumentHtml'. These classes are able to generate document objects from a certain type of data (text, DOM, XML, file) with static functions like 'createFromText()'. When the object is created, it's possible to get it's content in any type of data that is supported by this format using functions like 'getText()' or 'getXML()'. If a certain given type of data is not supported by this format, it will throw a error. ezcDocConverter interface ------------------------- Interface that defines an abstract conversion class. Real conversion classes will implement conversion of the given document from one format to another. The names of that classes follow the pattern: ezcDocFormat1ToFormat2, where Format1 and Format2 are format names like 'Html' or 'Docbook", for example: 'ezcDocHtmlToDocbook' implements conversion from HTML to DocBook. The main function of this interface 'convert()' takes a document in one format and return it in another. Both argument and return value are objects of 'ezcDocument' implementation classes. ezcDocParser class ------------------ Contains methods for text document parsing using a formal grammar. Exact formal grammars and format-specific callback functions (if needed) are set in format handling classes. ezcDocXSLTTransformer class --------------------------- A class based on ezcDocConverterBase for transforming DOM documents using special rules and element callback handlers. Contains only methods for document transformation. Exact rules and element handlers are set in derived classes. ezcDocOutput class ------------------ Performs an output of the given document tree in the text format using simple internal templating system. Also it cares about text indenting to show the structure of the document. Exact templates for element output and helper formatting functions are set in derived classes. ezcDocOutputTemplate class -------------------------- Implemented in the DocumentTemplateTieIn component. It extends ezcDocOutput class for using Template component for elements output. ezcDocValidator --------------- Validates a document or a separate element against it's schema. This class uses RelaxNG schema format as the input. Algorithms ========== Transforming XML ---------------- XML documents are transformed using XSLT stylesheets and XSL extension for PHP. Transformations are done with ezcDocXSLTTransformer class. Parsing text/XML ---------------- ezcDocParser class performs a parsing of the input text and presents it as a DOM tree. This is not an implementation of a real context-free parser. There is an assumption that input language is XML-like, i.e. consists of elements that have their opening and ending parts and some content between them (that may contain another elements). Sometimes it's hard or impossible to formalize input in these terms, so some special algorithms or custom element handlers will be used in this case. Document output --------------- ezcDocOutput class performs an output of the given document tree in the text format using simple internal templating system. Also it cares about text indenting to show the structure of the document. Exact templates for element output and helper formatting functions are set in derived classes. Templates are simple strings in which some character or sequence is replaced with another string using str_replace. ezcDocOutputTemplate class is implemented in the DocumentTemplateTieIn component. It extends this class to use Template component for elements output. Validating documents -------------------- ezcDocValidator is used to validate a document or a separate element against it's schema. This class uses RelaxNG schema format as the input, then transforms it to the inner format for fast processing. The processed schema is stored in cached .php file for faster access in the future. The idea for fast validation is using regular expressions and strings. Here is an example: ... ... This RelaxNG schema for the element's content can be presented with regexp: '#(elem2)*elem3#' Validated document element's children can be also presented with a string, like 'elem2elem2elem3' for instance, which is validated with this regexp. The similar process used for attributes. Examples ======== Converting Format1 to Format2 ----------------------------- $docFormat1 = new ezcDocumentText( $text, 'format1' ); $converter1 = new ezcDocFormat1ToInternal( $parameters1 ); $docInternal = $converter1->convert( $docFormat1 ); $converter2 = new ezcDocInternalToFormat2( $parameters2 ); $docFormat2 = $converter2->convert( $docInternal ); $result = $docFormat2->getText(); /// The same with static functions: $docFormat1 = new ezcDocumentText( $text, 'format1' ); $docInternal = ezcDocFormat1ToInternal::convert( $docFormat1, $parameters1 ); $docFormat2 = ezcDocInternalToFormat2::convert( $docInternal, $parameters2 ); $result = $docFormat2->getText();