RDF Generator (RDFGen) HOWTO

Objectives

The RDFGen is created for generating large volumns of synthetic RDF data for evaluating the performance of software systems. RDFGen can generate complex-structured RDF data that allows for a high level of control over the characteristics of the generated data.

How does RDFGen work?

As shown by Figure 1, RDFGen reads RDFGen specification file, RDFGen vocabulary and the application vocabularies, parses and stores the specification into a Jena model (called specification model). It then uses the options defined in the specification model to generate RDF data for each application class. A Jena model is created to hold the generated data for each class. The model is written to an RDF file after it is fully constructed.

Summary of RDFGen vocabulary

Keyword

Meaning

Uniform

Uniform distribution.

Zipf

Zipfian distribution.

Fixed

Fixed value.

BreadthFirst

Breadth first tree traverse order.

DepthFirst

Depth first tree traverse order.

True

Boolean value true.

False

Boolean value false

 

Keyword

Meaning

Typical Value

randomseed

The seed for random number generator.

positive integer value

datasetname

The name of the dataset to be generated.

string value

numclassinst

The number of instances of the class.

positive integer value

classnamespace

The namespace of the class.

string value, e.g. "http://somewhere.com/"

propcarddist

The distribution of the property cardinality.

predefined distribution

propvaldist

The distribution of the property value.

predefined distribution

propnamespace

The namespace of the property.

string value

propvallang

The language encoding of the string if the property value is string value

string value, e.g. "UTF8"

propvalstrfromvoc

The flag indicating whether to generate fixed-length string on the fly given an alphabet (False) or select strings from a string vocabulary (True).

True or False

propvalstrlen

The length of the string if the property value is fixed-length string.

positive integer value

propvalvocfile

The name of the vocabulary file containing all the candidate strings if "propvalstrfromvoc" is True.

string value

propvalsymfile

The name of the alphabet file if "propvalstrfromvoc" is False.

string value

resvalnamespace

The namespace of the resource if the property value is of resource type.

string value

propvalselftree

The flag indicating whether to organize the resource connected by the property as a tree if the property value is of resource type.

True or False

resvaltreetraverse

The tree traverse order if the property value is of resource type and the value of "propvalselftree" is True.

BreadthFirst or DepthFirst

disttype

The type of the distribution.

Fixed, Uniform or Zipf

uniformmin

The minimum value of the uniform distribution.

double value

uniformmax

The maximum value of the uniform distribution.

double value

zipfmax

The maximum value of the zipfian distribution.

positive long value

zipfskew

The skew of the zipfian distribution.

double value within [0, 1]

constant

The constant value of the distribution if the value of "disttype" is "Fixed".

number value

How to configure RDFGen ?

The configuration of RDFGen is composed of the following steps:

1. Specify the global options which hold for all the classes, including randomseed and datasetname.

2. For each individual class, specify numclassinst, classnamespace, and the options for all the properties belonging to this class.

3. For each property, first specify the propcarddist, propvaldist, propnamespace, then depending on the type of the property value, specify the options for the value to be generated for each property instance.

4. All the distributions used to specify the propcarddist and propvaldist are expected to be predefined using disttype, uniformmin, uniformmax, zipfmax, zipfskew and constant.

Example

Below is an example of RDFGen specification, which by itself is an RDF document. Dist1 through dist 4 (marked as blue) are four predefined distribution, among which dist1 is Zipfian distribution, dist2 and dist3 are fixed value, dist4 is uniform distribution. There are two classes (marked as red) specified in this spec file, science book and art book. Science book has one property (marked as orange), address, which is of fixed-length string type. Art book has two properties (marked as green), author and relation. Author is of fixed-length string type and relation is of resource type. The relationship of the resources connected by property relation form a depth-first ordered tree structure.

<rdf:RDF

    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

    xmlns:j.0="http://invent.hpl.hp.com/datagen/" >

  <rdf:Description rdf:about="http://www.library.org/distributions/dist3">

    <j.0:disttype rdf:resource="http://invent.hpl.hp.com/rdfgen/distributions/Fixed"/>

    <j.0:constant>2</j.0:constant>

  </rdf:Description>

  <rdf:Description rdf:about="http://www.library.org/distributions/dist1">

    <j.0:disttype rdf:resource="http://invent.hpl.hp.com/rdfgen/distributions/Zipf"/>

    <j.0:zipfmax>100</j.0:zipfmax>

    <j.0:zipfskew>0.5</j.0:zipfskew>

  </rdf:Description>

  <rdf:Description rdf:about="http://invent.hpl.hp.com/rdfgen/RDFGen">

    <j.0:randomseed>1</j.0:randomseed>

    <j.0:datasetname>Library</j.0:datasetname>

    <j.0:language>UTF8</j.0:language>

  </rdf:Description>

  <rdf:Description rdf:about="http://www.library.com/category/Science/book">

    <j.0:numclassinst>8</j.0:numclassinst>

    <j.0:classnamespace>http://www.library.com/category/Science/</j.0:classnamespace>

    <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>

  </rdf:Description>

  <rdf:Description rdf:about="http://www.library.com/category/Art/book">

    <j.0:numclassinst>8</j.0:numclassinst>

    <j.0:classnamespace>http://www.library.com/category/Art/</j.0:classnamespace>

    <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>

  </rdf:Description>

  <rdf:Description rdf:about="http://www.library.org/distributions/dist2">

    <j.0:disttype rdf:resource="http://invent.hpl.hp.com/rdfgen/distributions/Fixed"/>

    <j.0:constant>10</j.0:constant>

  </rdf:Description>

  <rdf:Description rdf:about="http://www.library.org/distributions/dist4">

    <j.0:disttype rdf:resource="http://invent.hpl.hp.com/rdfgen/distributions/Uniform"/>

    <j.0:uniformmax>7</j.0:uniformmax>

    <j.0:uniformmin>0</j.0:uniformmin>

  </rdf:Description>

  <rdf:Description rdf:about="http://www.library.com/category/Science/book/#address">

    <j.0:propcarddist rdf:resource="http://www.library.org/distributions/dist2"/>

    <j.0:propvaldist rdf:resource="http://www.library.org/distributions/dist1"/>

    <j.0:propnamespace>http://www.library.com/category/Science/book/</j.0:propnamespace>

    <j.0:propvalstrfromvolc rdf:resource="http://invent.hpl.hp.com/rdfgen/boolean/False"/>

    <j.0:propvalsymfile>volc-en.txt</j.0:propvalsymfile>

    <j.0:propvalstrlen>2</j.0:propvalstrlen>

    <rdfs:domain rdf:resource="http://www.library.com/category/Science/book"/>

    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>

  </rdf:Description>

  <rdf:Description rdf:about="http://www.library.com/category/Art/book/#author">

    <j.0:propcarddist rdf:resource="http://www.library.org/distributions/dist2"/>

    <j.0:propvaldist rdf:resource="http://www.library.org/distributions/dist1"/>

    <j.0:propnamespace>http://www.library.com/category/Art/book/</j.0:propnamespace>

    <j.0:propvalstrfromvolc rdf:resource="http://invent.hpl.hp.com/rdfgen/boolean/False"/>

    <j.0:propvalsymfile>volc-ch.txt</j.0:propvalsymfile>

    <j.0:propvalstrlen>2</j.0:propvalstrlen>

    <rdfs:domain rdf:resource="http://www.library.com/category/Art/book"/>

    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>

  </rdf:Description>

  <rdf:Description rdf:about="http://www.library.com/category/Art/book/#relation">

    <j.0:propcarddist rdf:resource="http://www.library.org/distributions/dist3"/>

    <j.0:propvaldist rdf:resource="http://www.library.org/distributions/dist4"/>

    <j.0:propnamespace>http://www.library.com/category/Art/book/</j.0:propnamespace>

    <rdfs:domain rdf:resource="http://www.library.com/category/Art/book"/>

    <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#anyURI"/>

    <j.0:propvalselftree rdf:resource="http://invent.hpl.hp.com/rdfgen/boolean/True"/>

    <j.0:resvalnamespace>http://www.library.com/category/Art/</j.0:resvalnamespace>

    <j.0:resvaltreetraverse rdf:resource="http://invent.hpl.hp.com/rdfgen/treetraverse/DepthFirst"/>

  </rdf:Description>

</rdf:RDF>