Blur is a table based query system. So within a single shard cluster there can be many different tables, each with a different schema, shard size, analyzers, etc. Each table contains Rows. A Row contains a row id (Lucene StringField internally) and many Records. A record has a record id (Lucene StringField internally), a family (Lucene StringField internally), and many Columns. A column contains a name and value, both are Strings in the API but the value can be interpreted as different types. All base Lucene Field types are supported, Text, String, Long, Int, Double, and Float.

Starting with the most basic structure and building on it.

Columns

Columns contain a name and value, both are strings in the API but can be interpreted as an Integer, Float, Long, Double, String, or Text. All Column types default to Text and will be analyzed during the indexing process.

Column {"name" => "value"}

Records

Record contains a Record Id, Family, and one or more Columns

Record {
  "recordId" => "1234",
  "family" => "family1",
  "columns" => [
    Column {"column1" => "value1"},
    Column {"column2" => "value2"},
    Column {"column2" => "value3"},
    Column {"column3" => "value4"}
  ]
}

Quick Tip!

The column names do not have to be unique within the Record. So you can treat multiple Columns with the same name as an array of values. Also the order of the values will be maintained.

Rows

Rows contain a row id and a list of Records.

Row {
  "id" => "r-5678",
  "records" => [
    Record {
      "recordId" => "1234",
      "family" => "family1",
      "columns" => [
        Column {"column1" => "value1"},
        Column {"column2" => "value2"},
        Column {"column2" => "value3"},
        Column {"column3" => "value4"}
      ]
    },
    Record {
      "recordId" => "9012",
      "family" => "family1",
      "columns" => [
        Column {"column1" => "value1"}
      ]
    },
    Record {
      "recordId" => "4321",
      "family" => "family2",
      "columns" => [
        Column {"column16" => "value1"}
      ]
    }
  ]
}

Custom types in Blur allow you to create your own types in Lucene as well as plugging into the query parser so that you can use your custom type.

Creating

You will need to extend the "org.apache.blur.analysis.FieldTypeDefinition" class found in the blur-query module. If you need to use a different Analyzer than the StandardAnalyzer used in the "text" type just extend the "org.apache.blur.analysis.type.TextFieldTypeDefinition" and make the appropriate changes.

For types that require custom query parsing or custom "org.apache.lucene.index.IndexableField" manipulation without the use of an Analyzer. Please extend "org.apache.blur.analysis.type.CustomFieldTypeDefinition".

Example

Below is a simple type that is basically the same as a "string" type, however it's implemented by extending "org.apache.blur.analysis.type.CustomFieldTypeDefinition".

public class ExampleType extends CustomFieldTypeDefinition {

  private String _fieldNameForThisInstance;

  /**
   * Get the name of the type.
   * 
   * @return the name.
   */
  @Override
  public String getName() {
    return "example";
  }

  /**
   * Configures this instance for the type.
   * 
   * @param fieldNameForThisInstance
   *          the field name for this instance.
   * @param properties
   *          the properties passed into this type definition from the
   *          {@link Blur.Iface#addColumnDefinition(String, ColumnDefinition)}
   *          method.
   */
  @Override
  public void configure(String fieldNameForThisInstance, Map properties, 
                        Configuration configuration) {
    _fieldNameForThisInstance = fieldNameForThisInstance;
  }

  /**
   * Create {@link Field}s for the index as well as for storing the original
   * data for retrieval.
   * 
   * @param family
   *          the family name.
   * @param column
   *          the column that holds the name and value.
   * 
   * @return the {@link Iterable} of {@link Field}s.
   */
  @Override
  public Iterable<? extends Field> getFieldsForColumn(String family, Column column) {
    String name = family + "." + column.getName();
    String value = column.getValue();
    return makeIterable(new StringField(name, value, Store.YES));
  }

  /**
   * Create {@link Field}s for the index do NOT store the data because the is a
   * sub column.
   * 
   * @param family
   *          the family name.
   * @param column
   *          the column that holds the name and value.
   * @param subName
   *          the sub column name.
   * 
   * @return the {@link Iterable} of {@link Field}s.
   */
  @Override
  public Iterable<? extends Field> getFieldsForSubColumn(String family, Column column, 
       String subName) {
    String name = family + "." + column.getName() + "." + subName;
    String value = column.getValue();
    return makeIterable(new StringField(name, value, Store.NO));
  }

  /**
   * Gets the query from the text provided by the query parser.
   * 
   * @param text
   *          the text provided by the query parser.
   * @return the {@link Query}.
   */
  @Override
  public Query getCustomQuery(String text) {
    return new TermQuery(new Term(_fieldNameForThisInstance, text));
  }

}

Distributing

Once you have created and tested your custom type you will need to copy the jar file containing your custom type to all the servers in the cluster. The jar file will need to be located within the $BLUR_HOME/lib directory. Once there all the servers will need to be restarted to have the jar file be picked up in the classpath.

In a later version of Blur we hope to have this be a dynamic operation that can be performed without restarting the cluster.

Using

You can either add your custom type to the entire cluster or per table.

Cluster Wide

For cluster wide configuration you will need to add the new field types into the blur-site.properties file on each server.

blur.fieldtype.customtype1=org.apache.blur.analysis.type.ExampleType1
blur.fieldtype.customtype2=org.apache.blur.analysis.type.ExampleType2
...
Please note that the prefix of "blur.fieldtype." is all that is used from the property name because the type gets it's name from the internal method of "getName". However the property names will need to be unique within the file.

Single Table

For a single table configuration you will need to add the new field types into the tableProperties map in the TableDescriptor as you define the table.

tableDescriptor.putToTableProperties("blur.fieldtype.customtype1", 
	"org.apache.blur.analysis.type.ExampleType1");
tableDescriptor.putToTableProperties("blur.fieldtype.customtype2", 
	"org.apache.blur.analysis.type.ExampleType2");
...
Please note that the prefix of "blur.fieldtype." is all that is used from the property name because the type gets it's name from the internal method of "getName". However the property names will need to be unique within the map.