Intro / Tutorial:

RDF (Resource Description Framework) (n): A framework for describing resources.
By resources, it is meant pretty much anything that someone would like to say something about.

Essentially it is a graph with labeled edges. Any two connected nodes in the graph form a subject, predicate, object triple, with the two nodes being the subject and object, and the label on the edge being the predicate.
As in:

Cows eat grass.

To do “linked data” things, URI’s (Uniform Resource Identifiers) are used to identify these resources. URI’s, of which URL’s (Uniform Resource Locator) are a major subset. In the case of URLs, you are generally assured a unique identifier, and you have a means of getting information directly from the address specified by that URL, e.g. http://example.com/people/leif might return some information about Leif in the form of an RDF file.

Giving URI’s to everything in the previous example sentence might yield something that looks like this:

http://example.com/things/cows http://example.com/predicates/eat http://example.com/things/grass .

And with the addition of brackets around the URIs, that is the simplest RDF serialization standard: N-Triples.

<http://example.com/things/cows> <http://example.com/predicates/eat> <http://example.com/things/grass> .

That gets a little verbose if you want to say a lot, though. An optimization is to factor out the common prefixes on the URL’s:

@prefix things: <http://example.com/things/>.
things:cows <http://example.com/predicates/eat> things:grass.

To continue saying things about the same subject without repeating it, use a semicolon, and to continue with the same subject and predicate, use a comma.

@prefix things: <http://example.com/things/>.
@prefix do: <http://example.com/predicates/>.
things:cows do:eat things:grass, 
                   things:hay;
                do:lick things:salt.

…is the same thing as the triples, “Cows eat grass. Cows eat hay. Cows lick salt.” (The newlines / indentation don’t matter, just used for presentation here.)

This is the serialization format known as “Turtle”; it is a superset of N-Triples, and nearly the same as N3.

An aside: RDF data is most commonly found as XML, as in RSS 1.0, FOAF files, etc. In the XML format, the above :cows eat grass” would look something like this:

<rdf:Description about="things:cows"
 xmlns:things="http://example.com/things/"
 xmlns:do="http://example.com/predicates/">
  <do:eat rdf:resource="things:grass"/>
</rdf:Description>

A SPARQL query is just a graph described in Turtle with some holes in it, replaced by variables.

To make a SPARQL query, write some Turtle constraining / describing the info you want, with variables standing in for the unknown data, then surround it by ‘SELECT * WHERE {turtle stuff here}’ to make it look like SQL.
e.g. To ask “What eats grass?”, just say:

SELECT * WHERE {?what <http://example.com/predicates/eat> things:grass .}

should return things like ?what = cows, depending on what you have in your graph.

What wears a black coat and eats pencils?

SELECT ?what WHERE {
?what wears ?x .
?x rdf:type <http://dbpedia.org/classes/Coat> ;
   :color <http://dbpedia.org/stuff/black> .
?what do:eat <http://dbpedia.org/object/pencil> .
}

Triple Stores

Java

  • Sesame
  • Mulgara Theoretically more scalable. Current, active development. Simple setup. Omniscent, immediate support. (higher-performance, leaner?)
  • Jena Is said to be the most feature-complete. (can use Mulgara as a backend)

Otras

  • Redland C, with bindings for several languages.
  • RDFLib Python, can use Redland or SQL as backend. Comes with command line utils for conversion between various formats (RDF/XML, Turtle, etc. via “rapper”), and running SPARQL queries (“roqet”).
  • 4store C, optimized for shared-nothing clusters, can also be run on single machine. Includes SPARQL HTTP endpoint.