~~~~~~~~~~~~~~~~~~~~~~~~~~~~ eZ components: Yaml (Design) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :Author: Thomas Nunninger (thomas@nunninger.info) :Author: Josef Roth (webmaster@josef-roth.com) :Revision: $Rev$ :Date: $Date$ :Status: Draft .. contents:: Scope ===== The scope of this document is to describe the design of a new component which reads from and writes to YAML streams. Overview ======== The component provides two possibilities to work with YAML: - Low-level YAML parsing and generating. This covers our implementation of the YAML specification. - High-level handling of YAML streams or plain text. We provide some convenient classes that help to work with YAML documents in YAML streams or plain text strings directly. At this level you can even change the backend YAML implementation. Supported versions ------------------ The component will support version 1.0, 1.1, 1.2. It defaults to version 1.2. Layers ------ :: +-- High-Level API ------------------------------+ +-- Document ------+ | | | | | StreamReader and StreamWriter | | Document | | StringParser and StringGenerator | | | | | | AbstractNode and | | +-- AbstractBackend--------------------------+ | | extended classes | | | | | | | | | +-------------------+ +-------------+ | | +------------------+ | | | Backend | | SyckBackend | ... | | +-+-+-------------------+--+-------------+-----+-+ +-- Low-Level API --+ +-------------+ | | | ext/syck | ... | Parser Generator | +-------------+ | | | TagHandler | +-------------------+ (Probably the ``ezcYamlTagHandler`` is not only used by the low-level API but also by the different backends to parse their results as well. We need to look into it while developing.) This document starts with some use cases that show how to use the API of the component. That followed, the main classes are described in detail and in the end there is a short glossary explaining the meaning of some YAML specific terms. Use cases ========= In this section you find some code examples how to work with the component. First the low-level API is described, then the usage of the high-level API. In the end, using the ``ezcYamlDocument`` and ``ezcYamlNode``\s is explained. Low-level API ------------- Parse ~~~~~ If you want to parse some YAML stream via the low-level API, you need an ``ezcYamlParser`` object. If you want to influence the behavior of the parser, you create an ``ezcYamlOptions`` object. Via the options you tell the parser to behave like a parser of a specific YAML version. Also you define some tag handlers if needed. (Tag handlers are used to extend the set of available datatypes of the nodes' values.) Then you create an instance of the ``ezcYamlParser`` using the options to configure the parser's behavior. :: $options = new ezcYamlOptions(); $options->defaultVersion = $version; $options->tagHandler[$tagName] = $tagHandlerClassName; $parser = new ezcYamlParser( $options ); In the next step, some YAML stream (serialized data) is taken (in our example a text string) and each line of it is send to the parser. :: $lines = explode( "\n", $someYamlString ); foreach( $lines as $line ) { $state = $parser->parseLine( $line ); switch ( $state ) { case ezcYamlParser::STATE_FINISHED: case ezcYamlParser::STATE_FINISHED_AND_STARTED: if ( ezcYamlParser::STATE_FINISHED == $state ) { echo "Document explicitly finished.\n"; } else { echo "Document implicitly finished by starting a new one.\n"; } print_r( $parser->getFinishedDocument() ); break; case ezcYamlParser::STATE_STARTED: echo "New document started.\n"; break; case ezcYamlParser::STATE_STILL_FINISHED: echo "Probably some 'no-ops' in the stream to keep it alive.\n"; break; case ezcYamlParser::STATE_WAITING: echo "Waiting for a multi-line value to finish.\n"; break; case ezcYamlParser::STATE_OK: echo "Line parsed.\n"; break; } } When the stream has ended, perhaps the last document was not explicitly finished. Thus you need to check if there is a document in the parser. :: if ( $state !== ezcYamlParser::STATE_FINISHED && $state !== ezcYamlParser::STATE_STILL_FINSHED ) { if ( ezcYamlParser::STATE_WAITING == $state ) { throw new Exception( 'Document has a started multi-line value.' ); } else { print_r( $parse->getCurrentDocument() ); } } Generate ~~~~~~~~ In contrast to parsing (that is processed line by line), generating always processes a whole data structure that is to be serialized to a YAML document. If you want to serialize some data structure, you use an ``ezcYamlGenerator`` object. Again you can use an ``ezcYamlOptions`` object to define the parser version and some tag handlers. :: $options = new ezcYamlOptions(); $options->defaultVersion = $version; $options->tagHandler[$tagName] = $tagHandlerClassName; $generator = new ezcYamlGenerator( $options ); Then you need an ``ezcYamlDocument`` that mainly holds the data structure to be serialized. E.g. you can set a specific YAML version of the document. :: $document = new ezcYamlDocument( $dataStructure ); $document->version = $version; $serializedString = $generator->generate( $document ); High-level API for streams and convenient handling -------------------------------------------------- As the low-level API is somehow complex (at least when parsing), a high-level API is provided. That API mainly covers two use cases: - Handling of YAML streams (that can handle multiple YAML documents in succession). - Convenient handling of text strings. There is an additional layer that allows to exchange the YAML backend and unifies the different backend implementations. Backends ~~~~~~~~ The first step of the high-level API is the backend layer. It allowes to exchange the backend YAML implementation. Mainly it unifies the usage and returned data of the different backend libraries, adds support for line-by-line parsing and to work with multiple YAML documents in one YAML stream. The backend adapters need to implement the ``eczYamlAbstractBackend`` interface. It is close to the APIs of the ``ezcYamlParser`` and ``ezcYamlGenerator``: - ``__construct( ezcYamlOptions $options = null )`` - ``parseLine( $line )`` (returns a state) Works similar to ``ezcYamlParser::parseLine()``. The returned constants shown in the example of the low-level API above are mapped to constants of this class. Additionally ``ezcYamlAbstractBackend::STATE_OK_OR__WAITING`` could be returned by backends that are based on whole-document parsing and therefore can only indicate if a document is finished or not. In our implementation this feeds the parser with the next line. Other backends might collect the data until the document is ended and send it to the backend implementation when that happend. - ``getFinishedDocument()`` and ``getCurrentDocument()`` They are analog to the example shown above for the low-level API. - ``parseStream( $string )`` (returns an array of ``ezcYamlDocument`` objects) This is analog to the existing implementations of e.g. ext/syck or sfYaml. If a backend fails to load multiple YAML documents in one step, the backend adapter can handle that. The backend of our implementation just feeds ``self::parseLine()`` line by line. - ``generateStream( $yamlDocumentOrArrayOfYamlDocuments )`` (returns a string) This accesses the backend implementations' functionallity to generate a serialized YAML stream of one or multiple YAML documents. YAML streams ~~~~~~~~~~~~ Reading from a stream `````````````````````` Like before, you can use an ``ezcYamlOptions`` object. Additionally you need a stream handle to create an ``ezcYamlStreamReader`` object. @TODO: We are not sure about the name "stream reader" if it confuses because of existing stream related terms in programming languages/PHP. The ``ezcYamlStream*`` classes are a workaround, because PHP's stream features do not allow to write or read arrays or objects from a stream via a stream wrapper. :: $options = new ezcYamlOptions(); $options->defaultVersion = $version; $options->tagHandler[$tagName] = $tagHandlerClassName; $stream = fopen( 'example.yml', 'r' ); $streamReader = ezcYamlStreamReader( $stream, $options ); Using the stream reader, you can read the stream and get instances of ``ezcYamlDocument`` (that we used previously for generating more complex data structures). :: while( $document = $streamReader->readDocument() ) { print_r( $document ); } Writing to a stream ``````````````````` Creating a stream writer and generating YAML documents, works analogical. You need a stream and can use an ``ezcYamlOptions`` object to create an ``ezcYamlStreamWriter`` object. Then you can write ``ezcYamlDocument`` objects to the stream. :: $options = new ezcYamlOptions(); $options->parserVersion = $version; $options->tagHandler[$tagName] = $tagHandlerClassName; $stream = fopen( 'example.yml', 'w' ); $streamWriter = ezcYamlStreamWriter( $stream, $options ); foreach( $arrayContainingEzcYamlDocuments as $yamlDocument ) { $streamWriter->writeDocument( $yamlDocument ); } Plain string processing ~~~~~~~~~~~~~~~~~~~~~~~ @TODO: we are not sure about the ``eczYamlString*`` classes. At the moment we do not see a real adavantage as the relevant methods are almost one-liners accessing the analog backend method. So the only advantage is to exchange the backend. But as you need to provide the backend to the options (if you want to change it) there seems to be not difference than calling that backend directly. So the question is: is this a conveniant layer or is it just over-engineered? If you do not want to work with streams but have a YAML document as string or need the data structure returned as string when generating the serialized YAML document, use the ``eczYamlStringParser`` and ``ezcYamlStringGenerator`` classes. The code works analogical to stream reading or writing and should be self-explaining. :: $options = new ezcYamlOptions(); $options->defaultVersion = $version; $options->tagHandler[$tagName] = $tagHandlerClassName; $generator = new ezcYamlStringGenerator( $options ); $yamlDocument = new ezcYamlDocument( $someYamlDocument ); $yamlDocument->version = $version; $serializedData = $generator->generate( $yamlDocument ); // or multiple documents at once $serializedData = $generator->generate( $arrayOfYamlDocuments ); $parser = new ezcYamlStringParser( $options ); $arrayOfYamlDocuments = $parser->parse( $serializedData ); print_r( $arrayOfYamlDocuments ); Changing the YAML backend ~~~~~~~~~~~~~~~~~~~~~~~~~ The classes of the high-level API (``ezcYamlStreamReader``, ``ezcYamlStreamWriter``, ``ezcYamlStringParser`` and ``ezcYamlStringGenerator``) offer the possibility to use other YAML implementations for parsing and dumping. This is done via an additional option of the ``ezcYamlOptions``. Here an example for ext/syck. :: $options = new ezcYamlOptions(); $options->backend = new ezcYamlSyckBackend(); If you use the high-level API, the choosen backend is used. Working with the ``ezcYamlDocument`` ------------------------------------ On several places the example code talkes about ``ezcYamlDocument`` objects. Now this should be clarified. @TODO: Shall we rename ``ezcYamlDocument`` to ``ezcYamlStructure``? That would perhaps better fit to the terms as described in the glossary as YAML document is related to the serialized document and not to the data structure. An ``ezcYamlDocument`` object holds the structure representing a YAML document. The ``ezcYamlDocument`` itself has those optional attributes: - ``version`` - ``tagHandlers`` The structure (and the ``ezcYamlDocument`` itself) is based on nodes that inherit from ``ezcYamlAbstractNode``. A node mainly has these attributes: - ``comment`` - ``isTagExplicitlySet`` - ``data`` Collection nodes (like maps and sequences) provide two ways to work with the data: methods and array access. Methods ~~~~~~~ These methods are provided on collection node objects: - ``getNode( $key )`` - ``getNodes()`` - ``addNode( $node )`` resp. ``addNode( $key, $node )`` - ``setNodes( array $nodes )`` Array access ~~~~~~~~~~~~ Via array access, you can use the data structure as if it would be an array. (That is probably much nicer to work with and the "API" that other implementations provide.) If you use the array access, you can read or set the data. You do not see anything about the object-oriented structure, but can work with the data as if they where provided as array. But there are some important things to note: - You can get or set the data only (no comments and similar things). - If you'd get a node containing scalar data, the scalar data is returned instead of the node object. (For retrieving the node objects, the methods named above exist.) In this context even nodes like ``ezcYamlDateTimeNode``\s are scalars and the ``DateTime`` object is returned instead of the ``ezcYamlDateTimeNode`` - If you set a scalar value, the scalar is automatically turned into an appropriate ``ezcYaml*Node``. - If you set an array (or array of arrays), an appropriate node structure of collection and scalar nodes is created. Main Classes ============ General classes --------------- ezcYamlOptions ~~~~~~~~~~~~~~ Used as optional configuration for many classes from the low-level to the high-level API. Supports manipulation of parser version, application specific tag handlers, and exchange of the backend YAML implementation. ezcYamlAbstractNode ~~~~~~~~~~~~~~~~~~~ Base class for representing YAML documents in a node tree. The node structure provides array-like handling as well as methods to deal with the tree. There are two basic classes extending the abstract node: ezcYamlAbstractScalarNode and ezcYamlAbstractCollectionNode. Example classes extending that abstract classes: ezcYamlDocument, ezcYamlStringNode, ezcYamlDateTimeNode, ezcYamlSequenceNode, ezcYamlMappingNode. More information can be found in the above section "Working with the ``ezcYamlDocument``". Low-level classes ----------------- ezcYamlParser and ezcYamlGenerator ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These classes provide our implementation of the YAML specification. ezcYamlTagHandlerInterface ~~~~~~~~~~~~~~~~~~~~~~~~~~ Defines the interface for tag handling classes. The tag handlers are mainly used by our YAML implementation. Perhaps they are also used by the backends to unify the data structure when parsing or generating. We provide an ezcYamlDefaultTagHandler. Users can use the interface to write application specific tag handlers. High-level classes ------------------ ezcYamlAbstractBackend ~~~~~~~~~~~~~~~~~~~~~~ The backend layer provides the possibility to exchange the YAML implementation. It is an adapter to unify the usage and returned data of the different backend libraries, adds support for line-by-line parsing and to work with multiple YAML documents in one YAML stream. Two classes, extending the class, will be provided: ezcYamlSyckBackend and ezcYamlBackend. ezcYamlStreamReader and ezcYamlStreamWriter ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These classes provide an convenient API for reading from resp. writing to YAML streams. Internally they use the chosen backend implementation. The reader mainly uses the parseLine() method of the backend. ezcYamlStringParser and ezcYamlStringGenerator ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In contrast to the stream classes these classes provide an API for parsing strings resp. returning strings. The parser mainly uses the parseStream() method of the backend. Glossary ======== Data Structure The term "data structure" in this document is related to structures in PHP. In contrast, a YAML document is a serialized, plain text representation of that data structure. YAML stream A YAML stream is a stream of zero or more YAML documents. YAML document A YAML document is a serialized representation of a data structure. TAG Tags are used to explicitly define the datatype of a node's value in a YAML document. YAML includes some basic datatypes but you can extend them by application specific tag handlers. .. Local Variables: mode: rst fill-column: 79 End: vim: et syn=rst tw=79