/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ /*! \mainpage
Avro is a data serialization system. See http://avro.apache.org/docs/current/ for background information.
Avro C++ is a C++ library which implementats parts of the Avro Specification. The library includes the following functionality:
Although Avro does not require use of code generation, that is the easiest way to get started with the Avro C++ library. The code generator reads a schema, and generates a C++ header file that defines one or more C++ structs to represent the data for the schema and functions to encode and decode those structs. Even if you wish to write custom code to encode and decode your objects using the core functionality of Avro C++, the generated code can serve as an example of how to use the code functionality.
Let's walk through an example, using a simple schema. Use the schema that represents an complex number:
File: cpx.json \includelineno cpx.jsonNote: All the example code given here can be found under examples directory of the distribution.
Assume this JSON representation of the schema is stored in a file called cpx.json. To generate the code issue the command:.
avrogencpp -i cpx.json -o cpx.hh -n cThe -i flag specifies the input schema file and -o flag specifies the output header file to generate. The generated C++ code will be in the namespace specifed with -n flag.
The generated file, among other things will have the following:
... namespace c { ... struct cpx { double re; double im; }; ... }cpx is a C++ representation of the Avro schema cpx. Now let's see how we can use the code generated to encode data into avro and decode it back. File: generated.cc \includelineno generated.cc In line 9, we construct a memory output stream. By this we indicate that we want to send the encoded Avro data into memory. In line 10, we construct a binary encoder, whereby we mean the output should be encoded using the Avro binary standard. In line 11, we attach the output stream to the encoder. At any given time an incoder can write to only one output stream.
In line 14, we write the contents of c1 into the output stream using the encoder. Now the output stream contains the binary representation of the object. The rest of the code verifies that the data is indeed in the stream.
In line 17, we construct a memory input stream from the contents of the output stream. Thus the input stream has the binary representation of the object. In line 18 and 19, we construct a binary decoder and attach the input stream to it. Line 22 decodes the contents of the stream into another object c2. Now c1 and c2 should have identical contents, which one can readily verify from the output of the program, which should be:
(1, 2.13)Now, if you want to encode the data using Avro JSON encoding, you should use avro::jsonEncoder() instead of avro::binaryEncoder() in line 10 and avro::jsonDecoder() instead of avro::binaryDecoder() in line 18.
On the other hand, if you want to write the contents to a file instead of memory, you should use avro::fileOutputStream() instead of avro::memoryOutputStream() in ine 9 and avro::fileInputStream() instead of avro::memoryInputStream() in line 17.
The section above demonstrated pretty much all that's needed to know to get started reading and writing objects using the Avro C++ code generator. The following sections will cover some more information.
The library provides some utilities to read a schema that is stored in a JSON file:
File: schemaload.cc \includelineno schemaload.ccThis reads the file, and parses the JSON schema into an in-meory schema object of type avro::ValidSchema. If, for some reason, the schema is not valid, the cpxSchema object will not be set, and an exception will be thrown.
If you always use code Avro generator you don't really need the in-memory schema objects. But if you use custom objects and routines to encode or decode avro data, you will need the schema objects. Other uses of schema objects are generic data objects and schema resolution described in the following sections.
But wait, how does Avro know that complex
In order to ensure that you indeed use the correct type, you can use
the validating encoders and decoder. Here is how:
File: validating.cc
\includelineno validating.cc
Here, instead of using the plain binary encoder, you use a validating encoder
backed by a binary encoder. Similarly, instead of using the plain binary
decoder, you use a validating decoder backed by a binary decoder. Now,
if you use std::complex
You can use any encoder behind the validating encoder and any decoder
behind the validating decoder. But in practice, only the binary encoder
and the binary decoder have no knowledge of the underlying schema.
All other encoders (JSON encoder) and decoders (JSON decoder,
resolving decoder) do know about the schema and they validate internally. So,
fronting them with a validating encoder or validating decoder is wasteful.
For example, you have a reader which is interested only in the imaginary part
of a complex number while the writer writes both the real and imaginary parts.
It is possible to do automatic schema resolution between the writer's schema
and schema as shown below.
File: imaginary.json
\includelineno imaginary.json
Please notice how the reading part of the example at line 42 reads as if
the stream contains the data corresponding to its schema. The schema resolution
is automatically done by the resolving decoder.
In this example, we have used a simple (somewhat artificial) projection (where the set of fields in
the reader's schema is a subset of set of fields in the writer's). But more
complex resolutions are allowed by Avro specification.
Generic data objects
A third way to encode and decode data is to use Avro's generic datum.
Avro's generic datum allows you to read any arbitray data corresponding to
an arbitrary schema into a generic object. One need not know anything
about the schema or data at complie time.
Here is an example how one can use the generic datum.
File: generic.cc
\includelineno generic.cc
In this example, we encode the data using generated code and decode it with
generic datum. Then we examine the contents of the generic datum and extract
them. Please see \ref avro::GenericDatum for more details on how to use it.
Reading data with a schema different from that of the writer
It is possible to read the data written according to one schema
using a different schema, provided the reader's schema and the writer's
schema are compatible according to the Avro's Schema resolution rules.
avrogencpp -i imaginary.json -o imaginary.hh -n i
File: resolving.cc
\includelineno resolving.cc
In this example, writer and reader deal with different schemas,
both are recornd with the same name cpx. The writer schema has two fields and
the reader's has just one. We generated code for writer's schema in a namespace
c and the reader's in i.
Using Avro data files
Avro specification specifies a format for data files. Avro C++ implements
the sepcification. The code below demonstrates how one can use the
Avro data file to store and retrieve a collection of objects
corresponding to a given schema.
File: datafile.cc
\includelineno datafile.cc
Please see DataFile.hh for more details.
*/