Friday, May 7, 2010

How to: Publish your Linked Data with TopBraid Live SPARQL Endpoints

SPARQL endpoints are an increasingly popular way to expose linked data. Invoking SPARQL Endpoints from TopBraid Composer's SPARQL view was the subject of a previous TQ blog on SPARQL Endpoints.In this entry we will discuss how TopBraid Live can be used to implement a SPARQL Endpoint using TopBraid Live. SPARQL Endpoints are Web services that conform to the SPARQL protocol. SPARQL queries are passed to a URL where a SPARQL service processes the query and returns results in a defined XML format. A number of SPARQL Endpoints exist for Web data (see the W3C list of current SPARQL Endpoints) and have become important sources for linked data.

A SPARQL Endpoint service implementation is packaged with TopBraid Live and is available out-of-the box for both TopBraid Live Personal Server (TopBraid Composer-ME running on localhost:8083), and TopBraid Live Enterprise Server (for more information, see TBL Home page). Creating a SPARQL Endpoint for your data is therefore an easy three-step process:
  1. Load the model you wish to query into your TBL/TBC-ME workspace.
  2. Use the GRAPH SPARQL keyword to access any named graph in the workspace.
  3. Send a SPARQL query in the query string of a url that access the TBL SPARQL endpoint.

For example, if you have TBC-ME running, the TopBraid Live Personal Server is automatically available. Open a browser window and enter the following URL:

http://localhost:8083/tbl/sparql?query=SELECT DISTINCT ?p WHERE {GRAPH <http://topbraid.org/countries> {?s ?p ?o} }

This URL passes a query string that is applied to the specified graph, the countries.owl example included in the TopBraid library. The query is passed to TopBraid Live and executed using TBL's SPARQL engine. The results are converted to the SPARQL Endpoint format and returned via HTTP. The above URL specifies the TBL Personal Server (via TBC-ME's localhost:8083) as the endpoint. If you have TopBraid Live Enterprise Edition running on a server, just substitute the server address for your Enterprise server.

To further explore the ease of creating SPARQL Endpoints with TopBraid Live,
click here to access a page that defines an HTML form that submits a query to the TBL Personal Server SPARQL Endpoint. Copy and paste the following queries that use some of the example models included in the TopBraid library.

This query finds all countries and their abbreviations from the countries model in TopBraid/Examples:

# Get all countries and abbreviations from countries model
PREFIX countries: <http://topbraid.org/countries#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?name ?abbrev
WHERE
{ GRAPH <http://topbraid.org/countries>
{ ?country a countries:Country .
?country rdfs:label ?name .
?country countries:abbreviation ?abbrev .
}
}

This query finds all children of Joseph Kennedy from the kennedys model in TopBraid/Examples:

# Find Joe Kennedy's children in kennedys model
PREFIX k: <http://topbraid.org/examples/kennedys#>
SELECT ?cname
WHERE
{ GRAPH <http://topbraid.org/examples/kennedys>
{ k:JosephKennedy k:child ?child .
?child k:name ?cname .
}
}

Again, substitute your Live server address for "localhost:8083" in the action tag of the HTML file to apply queries to your Live server.

Using SPIN functions in SPARQL Endpoints

TopBraid SPARQLMotion Functions and user-defined SPIN functions registered in a Live workspace can also be used in SPARQL Endpoint queries. For example, the following query uses the TopBraid SPARQLMotion Function smf:if() to compute the age of all persons at death or their current age using the example kennedys model. Instead of returning variable bindings via SELECT, this query returns a RDF graph via CONSTRUCT. Since the graph is in RDF/XML format, the file returned by the endpoint can easily be imported into existing RDF/OWL models.

# infer age at death or age as of 2010
PREFIX k: <http://topbraid.org/examples/kennedys#>
PREFIX smf: <http://topbraid.org/sparqlmotionfunctions#>
CONSTRUCT {?person k:age ?age}
WHERE
{ GRAPH <http://topbraid.org/examples/kennedys>
{ ?person k:birthYear ?byear .
OPTIONAL {?person k:deathYear ?dyear}
LET (?age := smf:if(bound(?dyear), ?dyear-?byear, 2010-?byear))
}
}

Note that the age computation is hardcoded for 2010. A SPARQL query that returns the current year can be defined with a few statements. An example is shown in the kennedysSPIN model in the TopBraid Library, see TopBraid/Examples/kennedysSPIN.rdf in the Composer workspace. If you look at the SPIN function getCurrentYear (defined as a subclass of spin:Functions, which is a subclass of spin:Modules), it finds the current year as the first four characters returned in xsd:dateTime format returned from the function afn:now().

Instead of copying this code into the query, let's register this as a SPIN function so it can be called by any model in the workspace, including SPARQL Endpoints. Do the following:

  1. Re-name the file kennedysSPIN.rdf to kennedysSPIN.spin.rdf. Adding the .spin extension registers all of the SPIN functions in this model with the workspace, allowing SPIN functions to be called without importing or opening the files.

  2. From the TBC-ME menu, select Scripts > Refresh/Display SPARQLMotion functions... This will register the functions for the current session. When Live or Composer is started, the system will scan the files in the workspace for .spin files and register all functions. The extra step is needed here only if the file name was changed without stopping the Composer session. A Deploy (Export... Deploy in Composer) to a Live server will automatically refresh scripts.

Now try the same query with the following changes:

# infer age at death or age from current year
PREFIX k: <http://topbraid.org/examples/kennedys#>
PREFIX smf: <http://topbraid.org/sparqlmotionfunctions#>
PREFIX kspin: <http://topbraid.org/examples/kennedysSPIN#>
CONSTRUCT {?person k:age ?age}
WHERE
{ GRAPH <http://topbraid.org/examples/kennedys>
{ ?person k:birthYear ?byear .
OPTIONAL {?person k:deathYear ?dyear}
LET (?age := smf:if(bound(?dyear), ?dyear-?byear, kspin:getCurrentYear()-?byear))
}
}

Note the use of the user-defined SPIN function getCurrentYear(). This feature can be used to call any SPIN function including those that are defined by SPARQLMotion scripts. This raises the potential of using SPARQL endpoints for a wide range of processing capabilities, including importing models from outside of a Live workspace, processing triples before querying, applying queries to inference results, integrating models from different file types, and other kinds of SPARQL and RDFS/OWL processing. For example, a SPARQL Endpoint request could call a SPARQLMotion script that runs standard RDFS or OWL inferences before submitting the query, thus returning results from both inferred and asserted triples.

Advanced SPARQL Protocol: Federated SPARQL Queries

The SPARQL SERVICE keyword sends a query to remote service endpoint. Since TopBraid Live supports the SERVICE keyword, SPARQL endpoint queries to TopBraid Live can call other SPARQL Endpoints! Try the following query in the example query form.


PREFIX k: <http://topbraid.org/examples/kennedys#>
PREFIX smf: <http://topbraid.org/sparqlmotionfunctions#>
CONSTRUCT {?child k:birthDate ?birthdate}
WHERE
{ GRAPH <http://topbraid.org/examples/kennedys>
{ k:RoseFitzgerald k:child ?child .
?child k:firstName ?fname .
?child k:lastName ?lname .
?child k:gender k:female .
?child k:spouse ?spouse .
?spouse k:lastName ?slname .
LET (?dbpRsc := smf:buildURI("http://dbpedia.org/resource/{?fname}_{?lname}_{?slname}"))

SERVICE <http://dbpedia.org/sparql>
{ ?dbpRsc <http://dbpedia.org/ontology/Person/birthDate> ?birthdate .
} .

}
}

This query is applied to the kennedys example model to query for female children of Rose Fitzgerald and sends a query to the DBPedia SPARQL Endpoint to find their birth dates. The buildURI() function will generate a URI that is known in DBPedia, such as <http://dbpedia.org/resource/Eunice_Kennedy_Shriver>. The results from DBPedia bind the birth date to ?birthdate, which is returned in the TopBraid Live SPARQL endpoint response. As long as DBPedia is up and running, the result federates data from two SPARQL Endpoints, realizing the potential of linked data sources.

Conclusions

SPARQL endpoints are a complement to TopBraid Live's ability to create
RESTful Web services. While Web services are more flexible, allowing data to be returned in any text-based format, SPARQL endpoints can be used in a variety of applications expecting SPARQL result sets in an XML format. TopBraid Live significantly improves on existing SPARQL Endpoints with capabilities to federate queries and design functions and scripts that process data for external usage.

These examples demonstrate the power of TopBraid Live as an RDF back-end. Using a straightforward HTML form, one can access to full power of TopBraid Live and advanced SPARQL queries. These examples can be directly applied against the Personal Server version of TopBraid Live, packaged in TopBraid Composer-Maestro Edition (TBC-ME), which is freely available for a 30-day trial. TopBraid Live Enterprise Edition is deployed as a Tomcat servlet for Web-enabled access. For more information, see the TopBraid Live web page.

This is a blog by TopQuadrant, developers of the TopBraid Suite, created to support the pursuit of our ongoing mission - to explode strange semantic myths, to seek out new models that support a new generation of dynamic business applications, to boldly integrate data that no one has integrated before.