Thursday, January 14, 2010

How to: create and run a SPARQLMotion script

An updated version of this blog posting is now available as a chapter in the TopBraid Application Development Quickstart Guide (pdf).

SPARQLMotion is a visual scripting language that lets you tie together triples from diverse data sources (whether they're natively stored as triples or not), process them using SPARQL, and then save the results in a wide choice of output formats. One of these scripts can be run from TopBraid Composer, as a semantic web service, or as the back end to a user interface created with TopBraid Ensemble.

You create a SPARQLMotion script by dragging icons representing different kinds of modules from a palette onto a workspace and then filling out a dialog box for each to configure it. Here, we're going to look at the creation of a simple script, and in future entries on this blog we'll learn about more modules and what they can add to your application development: reading and writing spreadsheet data, reading and writing from large, disk-based triplestores, checking of data constraints, HTML and XML processing, and much more.

Our sample script will import the triples from the FOAF ("Friend of a Friend") files of some well-known semantic web advocates, extract the names and nicknames of their friends who are listed there, and save the results in a new RDF file. Much more complex scripts are possible, but the creation of those scripts will always follow the basic steps described here.

To create a SPARQLMotion file, create a new file in TopBraid Composer the same way you would create any other file. When you indicate that you want to create a file, you'll see that "SPARQLMotion File" is one of the options; picking it creates a file that already has the sparqlmotionfunctions and sparqlmotionlib libraries imported; these have what you need to add a script to your new file. Assign the file any Base URI and File name you like (for example, http://www.example.com/sm1 and sm1) and click the Finish button to indicate that you're done with the Create SPARQLMotion File dialog box.

Next, select "Create SPARQLMotion Script" from TopBraid Composer's Scripts menu. A Select initial module type dialog box will ask you about the first module to add to your script. Because the script will import RDF files from the web, start with an ImportRDFFromURL module and name it GetTimBLData:

After filling out the dialog box as shown and clicking the OK button, the script appears with its single module:

Double-click the icon representing the module to configure it. The only information you must add to an ImportRDFFromURL module is the URL of the RDF to import. To set the value of a property such as sml:url on one of these dialog boxes when there is no existing value, display its context menu by clicking the white triangle next to it and select Add Empty Row. To to set the property value to retrieve Tim Berners-Lee's FOAF file, enter the value http://www.w3.org/People/Berners-Lee/card.rdf, as shown below. (I've also tweaked the rdfs:label value, and note from the "Ok" to the right of the sm:url that the data entry of this field is not complete—TopBraid Composer will not know about this new value until you either press the Enter key with the cursor in that field or click on "Ok" to make it go away.)

We'll see shortly why you don't have to add a value for the sm:next property on this dialog box. Click the Close button to return to the script workspace.

At this point, your script only has one module, but you can still run it. Clicking a module icon selects it, and clicking the little bug icon ("Debug selected SPARQLMotion module") at the top of your workspace tells TopBraid Composer to run the script up to the selected module. Do this with GetTimBLData, and after it retrieves the triples from the designated URL, TopBraid Composer displays a message box offering to display the result triples in TopBraid Composer's SPARQLMotion Results view.

As with so many semantic web applications, this one becomes more interesting when you add more data sources and then select a subset of the combined data that meets your needs. If you don't see the module palette on the right side of the script workspace, click the small white triangle in the upper-right of the workspace to display it. To add a second module, click "Import from Remote" on the palette and click the small darker triangle to scroll through the choices there. When you see Import RDF from URL, add a new one of these modules to your workspace by dragging it there and name it GetJimHendlerData. Enter http://www.cs.umd.edu/~hendler/2003/foaf.rdf as the data's location. Then, add a third Import RDF from URL module named GetDanBrickleyData and set http://danbri.org/foaf.rdf as the sml:url value. (At any point in the creation and editing of your script modules, you can rearrange the icons by selecting and dragging them.)

Once the script retrieves data from these three sources, it will send it to a module that only passes along the triples that identify the name and nickname values found the source data. Drag an Apply Construct module from the RDF Processing section of the palette onto the workspace and name it ExtractData. After this module appears, double-click it and set the sml:constructQuery's value to the following:

CONSTRUCT {
?s a <http://xmlns.com/foaf/0.1/Person> .
?s <http://xmlns.com/foaf/0.1/name> ?name .
?s <http://xmlns.com/foaf/0.1/nick> ?nick .
}
WHERE {
?s <http://xmlns.com/foaf/0.1/name> ?name .
?s <http://xmlns.com/foaf/0.1/nick> ?nick .
}

You can use SPARQL CONSTRUCT queries to rearrange and cross-reference data to infer and create new information based on the input, but our simple query merely finds and passes along triples that match two patterns and declares that the subject that has these properties is a Person as defined by the FOAF vocabulary.

After entering this CONSTRUCT query, set the same dialog box's sml:replace value to True so that only the constructed triples get passed along to the output without the input data. Now you're finished with this dialog box, so click its Close button.

We haven't identified the input of this CONSTRUCT module yet. Instead of writing code, we can do it by just pointing and clicking. To make your first connection, select the Add connection icon on the Palette, click on the GetTimBLData module, and then click the ExtractData module. An arrow will appear to show that data will flow from one to the other when the script is run:

Follow the same steps to connect the other two data retrieval modules to ExtractData.

Instead of waiting until the application is finished to test it, we can do more incremental testing here. Select the ExtractData module, click the debug icon at the top, and you should end up with a list of names, nicknames, and foaf:Person type declarations in the SPARQLMotion Results view.

Our final step has two parts:

  1. Tell the script to save the results of the ExtractData module to an RDF file. From the Export to Local section of the Palette, drag an Export to RDF file module onto the workspace and call it SaveOutput. Double-click the new module and set the sml:baseURI property to http://mytest/ (or any URI you like; it will be used as the base URI of the saved file) and set sml:targetFilePath to sm1out.rdf. If you don't specify a path for this file, the SPARQLMotion script will save the file in the same directory as the script itself. After setting these two values, click the Close button.

  2. By including the FOAF ontology in your output, applications that read the file created by your script will have more context about the meaning of the data. You can import this ontology from the web, but the TopBraid Composer distribution includes many popular ontologies so that importing them will happen more quickly. From the Import from Local section of the SPARQLMotion palette, drag an Import RDF from workspace module onto the workspace and name it FOAFOntology. Double-click it, set its sml:sourceFilePath property to /TopBraid/Common/foaf.owl, press Enter, and then click the dialog box's Close button.

Use Add connection to connect the ExtractData and the FOAFOntology modules to the Save Output module. Your completed script should look like this:

After saving your application, test it by selecting the Save Output module and clicking the debug icon. As it runs, watch the Console view, which has informational messages about your script as it runs. This and the Error Log view are valuable for debugging complex scripts.

The script should create the sm1out.rdf file described above. Open it up, and then in TopBraid Composer's Classes view, drill down from owl:Thing to foaf:Agent and select foaf:Person so that the instances you've created appear in TopBraid Composer's Instances view. Selecting any of them there will display details about that person in the Form view, where you'll see the name and nickname assigned to that person somewhere on that form. (You may need to scroll down a bit to see those particular properties.)

If you look through the many module choices on the SPARQLMotion palette you'll get some idea of the wide variety of possibilities you have for the kinds of data that a script can read and write and the sophisticated options for processing that data. We'll look at many of them in future entries here. Meanwhile, take a look at this short video, which shows you how to create another simple script that uses a few more interesting modules.

0 comments:

This is a blog by TopQuadrant, developers of the TopBraid Suite, created to support the pursuit of our ongoing mission - to explode strange semantic myths, to seek out new models that support a new generation of dynamic business applications, to boldly integrate data that no one has integrated before.