Intro / Tutorial:

RDF (Resource Description Framework) (n): A framework for describing resources.
By resources, it is meant pretty much anything that someone would like to say something about.

Essentially it is a graph with labeled edges. Any two connected nodes in the graph form a subject, predicate, object triple, with the two nodes being the subject and object, and the label on the edge being the predicate.
As in:

Cows eat grass.

To do "linked data" things, URI's (Uniform Resource Identifiers) are used to identify these resources. URI's, of which URL's (Uniform Resource Locator) are a major subset. In the case of URLs, you are generally assured a unique identifier, and you have a means of getting information directly from the address specified by that URL, e.g. might return some information about Leif in the form of an RDF file.

Giving URI's to everything in the previous example sentence might yield something that looks like this: .

And with the addition of brackets around the URIs, that is the simplest RDF serialization standard: N-Triples.

<> <> <> .

That gets a little verbose if you want to say a lot, though. An optimization is to factor out the common prefixes on the URL's:

@prefix things: <>.
things:cows <> things:grass.

To continue saying things about the same subject without repeating it, use a semicolon, and to continue with the same subject and predicate, use a comma.

@prefix things: <>.
@prefix do: <>.
things:cows do:eat things:grass, 
                do:lick things:salt. the same thing as the triples, "Cows eat grass. Cows eat hay. Cows lick salt." (The newlines / indentation don't matter, just used for presentation here.)

This is the serialization format known as "Turtle"; it is a superset of N-Triples, and nearly the same as N3.

An aside: RDF data is most commonly found as XML, as in RSS 1.0, FOAF files, etc. In the XML format, the above :cows eat grass" would look something like this:

<rdf:Description about="things:cows"
  <do:eat rdf:resource="things:grass"/>

A SPARQL query is just a graph described in Turtle with some holes in it, replaced by variables.

To make a SPARQL query, write some Turtle constraining / describing the info you want, with variables standing in for the unknown data, then surround it by 'SELECT * WHERE {turtle stuff here}' to make it look like SQL.
e.g. To ask "What eats grass?", just say:

SELECT * WHERE {?what <> things:grass .}

should return things like ?what = cows, depending on what you have in your graph.

What wears a black coat and eats pencils?

?what wears ?x .
?x rdf:type <> ;
   :color <> .
?what do:eat <> .

Triple Stores


  • Sesame
  • Mulgara Theoretically more scalable. Current, active development. Simple setup. Omniscent, immediate support. (higher-performance, leaner?)
  • Jena Is said to be the most feature-complete. (can use Mulgara as a backend)


  • Redland C, with bindings for several languages.
  • RDFLib Python, can use Redland or SQL as backend. Comes with command line utils for conversion between various formats (RDF/XML, Turtle, etc. via "rapper"), and running SPARQL queries ("roqet").
  • 4store C, optimized for shared-nothing clusters, can also be run on single machine. Includes SPARQL HTTP endpoint.