Avro C ====== The current version of Avro is +{avro_version}+. The current version of +libavro+ is +{libavro_version}+. This document was created +{docdate}+. == Introduction to Avro Avro is a data serialization system. Avro provides: * Rich data structures. * A compact, fast, binary data format. * A container file, to store persistent data. * Remote procedure call (RPC). This document will focus on the C implementation of Avro. To learn more about Avro in general, http://hadoop.apache.org/avro/[visit the Avro website]. == Introduction to Avro C .... ___ ______ / |_ ___________ / ____/ / /| | | / / ___/ __ \ / / / ___ | |/ / / / /_/ / / /___ /_/ |_|___/_/ \____/ \____/ .... [quote,Waldi Ravens,(walra%moacs11 @ nl.net) 94/03/18] ____ A C program is like a fast dance on a newly waxed dance floor by people carrying razors. ____ The C implementation has been tested on +MacOSX+ and +Linux+ but, over time, the number of support OSes should grow. Please let us know if you're using +Avro C+ on other systems. There are no dependencies on external libraries. We embedded http://www.digip.org/jansson/[Jansson] into +Avro C+ for parsing JSON into schema structures. The C implementation supports: * binary encoding/decoding of all primitive and complex data types * storage to an Avro Object Container File * schema resolution, promotion and projection * validating and non-validating mode for writing Avro data The C implementation is lacking: * RPC To learn about the API, take a look at the examples and reference files later in this document. We're always looking for contributions so, if you're a C hacker, please feel free to http://hadoop.apache.org/avro/[submit patches to the project]. == Reference Counting +Avro C+ does reference counting for all schema and data objects. When the number of references drops to zero, the memory is freed. For example, to create and free a string, you would use: ---- avro_datum_t string = avro_string("This is my string"); ... avro_datum_decref(string); ---- Things get a little more complicated when you consider more elaborate schema and data structures. For example, let's say that you create a record with a single string field: ---- avro_datum_t example = avro_record("Example"); avro_datum_t solo_field = avro_string("Example field value"); avro_record_set(example, "solo", solo_field); ... avro_datum_decref(example); ---- In this example, the +solo_field+ datum would *not* be freed since it has two references: the original reference and a reference inside the +Example+ record. The +avro_datum_decref(example)+ call drops the number of reference to one. If you are finished with the +solo_field+ schema, then you need to +avro_schema_decref(solo_field)+ to completely dereference the +solo_field+ datum and free it. == Wrap It and Give It You'll notice that some datatypes can be "wrapped" and "given". This allows C programmers the freedom to decide who is responsible for the memory. Let's take strings for example. To create a string datum, you have three different methods: ---- avro_datum_t avro_string(const char *str); avro_datum_t avro_wrapstring(const char *str); avro_datum_t avro_givestring(const char *str); ---- If you use, +avro_string+ then +Avro C+ will make a copy of your string and free it when the datum is dereferenced. In some cases, especially when dealing with large amounts of data, you want to avoid this memory copy. That's where +avro_wrapstring+ and +avro_givestring+ can help. If you use, +avro_wrapstring+ then +Avro C+ will do no memory management at all. It will just save a pointer to your data and it's your responsibility to free the string. WARNING: When using +avro_wrapstring+, do not free the string before you dereference the string datum with +avro_datum_decref()+. Lastly, if you use +avro_givestring+ then +Avro C+ will free the string later when the datum is dereferenced. In a sense, you are "giving" responsibility for freeing the string to +Avro C+. [WARNING] =============================== Don't "give" +Avro C+ a string that you haven't allocated from the heap with e.g. +malloc+ or +strdup+. For example, *don't* do this: ---- avro_datum_t bad_idea = avro_givestring("This isn't allocated on the heap"); ---- =============================== == Schema Validation If you want to write a datum, you would use the following function [source,c] ---- int avro_write_data(avro_writer_t writer, avro_schema_t writers_schema, avro_datum_t datum); ---- If you pass in a +writers_schema+, then you +datum+ will be validated *before* it is sent to the +writer+. This check ensures that your data has the correct format. If you are certain your datum is correct, you can pass a +NULL+ value for +writers_schema+ and +Avro C+ will not validate before writing. NOTE: Data written to an Avro File Object Container is always validated. == Examples [quote,Dante Hicks] ____ I'm not even supposed to be here today! ____ Imagine you're a free-lance hacker in Leonardo, New Jersey and you've been approached by the owner of the local *Quick Stop Convenience* store. He wants you to create a contact database case he needs to call employees to work on their day off. You might build a simple contact system using Avro C like the following... [source,c] ---- include::../examples/quickstop.c[] ---- When you compile and run this program, you should get the following output ---- Successfully added Hicks, Dante id=1 Successfully added Graves, Randal id=2 Successfully added Loughran, Veronica id=3 Successfully added Bree, Caitlin id=4 Successfully added Silent, Bob id=5 Successfully added ???, Jay id=6 Avro is compact. Here is the data for all 6 people. | 02 0A 44 61 6E 74 65 0A | 48 69 63 6B 73 1C 28 35 | ..Dante.Hicks.(5 | 35 35 29 20 31 32 33 2D | 34 35 36 37 40 04 0C 52 | 55) 123-4567@..R | 61 6E 64 61 6C 0C 47 72 | 61 76 65 73 1C 28 35 35 | andal.Graves.(55 | 35 29 20 31 32 33 2D 35 | 36 37 38 3C 06 10 56 65 | 5) 123-5678<..Ve | 72 6F 6E 69 63 61 10 4C | 6F 75 67 68 72 61 6E 1C | ronica.Loughran. | 28 35 35 35 29 20 31 32 | 33 2D 30 39 38 37 38 08 | (555) 123-09878. | 0E 43 61 69 74 6C 69 6E | 08 42 72 65 65 1C 28 35 | .Caitlin.Bree.(5 | 35 35 29 20 31 32 33 2D | 32 33 32 33 36 0A 06 42 | 55) 123-23236..B | 6F 62 0C 53 69 6C 65 6E | 74 1C 28 35 35 35 29 20 | ob.Silent.(555) | 31 32 33 2D 36 34 32 32 | 3A 0C 06 4A 61 79 06 3F | 123-6422:..Jay.? | 3F 3F 1C 28 35 35 35 29 | 20 31 32 33 2D 39 31 38 | ??.(555) 123-918 | 32 34 .. .. .. .. .. .. | .. .. .. .. .. .. .. .. | 24.............. Now let's read all the records back out 1 | Dante | Hicks | (555) 123-4567 | 32 2 | Randal | Graves | (555) 123-5678 | 30 3 | Veronica | Loughran | (555) 123-0987 | 28 4 | Caitlin | Bree | (555) 123-2323 | 27 5 | Bob | Silent | (555) 123-6422 | 29 6 | Jay | ??? | (555) 123-9182 | 26 Use projection to print only the First name and phone numbers Dante | (555) 123-4567 | Randal | (555) 123-5678 | Veronica | (555) 123-0987 | Caitlin | (555) 123-2323 | Bob | (555) 123-6422 | Jay | (555) 123-9182 | ---- The *Quick Stop* owner was so pleased, he asked you to create a movie database for his *RST Video* store. == Reference files === avro.h The +avro.h+ header file contains the complete public API for +Avro C+. The documentation is rather sparse right now but we'll be adding more information soon. [source,c] ---- include::../src/avro.h[] ---- === test_avro_data.c Another good way to learn how to encode/decode data in +Avro C+ is to look at the +test_avro_data.c+ unit test. This simple unit test checks that all the avro types can be encoded/decoded correctly. [source,c] ---- include::../tests/test_avro_data.c[] ----