VOYAGES OF THE SEMANTIC ENTERPRISE: 2011

Friday, December 2, 2011

Publishing HTML created with SPARQL Web Pages

Last week we saw how to use SPARQL Web Pages (SWP) to render customized HTML of individual class instances and how to create a web page of all that class's instances with a title at the top. The fine-grained control that SWP gives us over the generated HTML let us take advantage of the jQuery Mobile libraries so that the sample TopBraid application generated web pages appropriate for a smartphone interface, with buttons that expand and collapse at your touch to display details about each class instance.

Testing this application meant choosing from two alternatives:

The first was to run it on TopBraid Composer's built-in TopBraid Live Personal Server, which let us look at the page from any web browser running on the same machine.
Uploading the application's project to a TopBraid Live Enterprise Server, where multiple devices, including phones, could access it.

Either way, because TopBraid Live generates these web pages dynamically, if the underlying data is changed, refreshed versions of the web page would reflect this, making TopBraid a great platform for interactive semantic web applications for any device.

You don't have to have a TopBraid Live Enterprise server to deliver pages generated by SWP, though. A simple SPARQLMotion script can save your formatted HTML in disk files that you can copy to a web server that may or may not have TopBraid Live installed. Using this technique, you can use the TopBraid platform to create semantic content publishing applications as well as interactive applications.

The following SPARQLMotion script, which is stored in the application file described last week, does this for the mobile Kennedys web application.

The first module is an sml:ImportRDFFromWorkspace module that reads the file that this script is stored in. That file has the Kennedys data and the SWP formatting markup so that this data can be fed to the next step in the process.

The second module, named mk:GenerateHTML, is an sml:CreateUISPINDocument SPARQLMotion module (from the Text Processing section of the SPARQLMotion palette) whose key setting is its sml:view property, which has the following:

<ui:resourceView
   ui:resource="&lt;http://topbraidlive.org/mobileKennedys&gt;"/>

It's a snippet of XML specifying that the module should create a resource view for the specified resource, which is identified here with a complete URI. (The URI's delimiting angle brackets are escaped because they're in an XML attribute.) The real work to make this happen was all described in the last blog entry, which showed how the SWP code to generate a complete web page was attached to the resource. The mk:GenerateHTML module in this script also specifies that this generated markup will be stored in a variable named doc.

The final mk:SaveFile module in the script is an sml:ExportToTextFile module that saves the contents of the doc variable (set in the module's sml:text property as the SPARQL expression ?doc) to a file called output.html. I also set sml:replace to true so that repeated execution of the script wouldn't append the output onto the result of previous runs.

After you run this script you'll have a web page called output.html that looks like the display shown in the phone browsers in last week's blog entry, and you can copy this file to any web server you want.

This script is very simple. As you bring other SPARQLMotion capabilities into it such as inferencing and reading from all the data formats that TopBraid understands, you can make it much more sophisticated. You can also configure the script to save a collection of multiple files, letting you publish large collections of data in pieces that are digestible for typical browsers. (Phone browsers in particular can get sluggish; my Android LG Ally is not a recent model, and the expanding and collapsing of information about each person on the display of this app is not as quick on the Ally as I'd like it to be.)

So, use your imagination to add new features to this SPARQLMotion script, and you can create dynamic or static web pages for phones or any other kinds of browsers, with all the power of TopBraid behind your application development.

Tuesday, November 22, 2011

Creating a TopBraid mobile web app with SPARQL Web Pages

I've written here before about how SPARQL Web Pages (SWP) let you convert your RDF to HTML or XML by embedding SPARQL queries into the appropriate markup. In that very simple example, I showed how to create a web page for an address book entry and then display it both in TopBraid Composer and in a regular web browser.

Today I'm going to show how I did something similar to display a single Person instance from the Kennedys sample data included with TopBraid Composer and then defined a page that showed all the people in that data model. You can download and try the project here. The fun part was displaying it so that it looks like a proper mobile web page on a phone's web browser, as shown here on an Android phone and on an iPhone turned sideways to test the re-orienting capability of the display.

Touching someone's name on the phone expands the display to show the remaining property names and values about that person underneath his or her name. In the picture, I've just touched Andrew Cuomo's name on the Android phone and Edward Kennedy Jr's name on the iPhone, displaying details about each of them below their names. Touching the names again hides their data.

In the picture, the two phone browsers are displaying the output of a TopBraid Live server running this application. As we'll see in the sequel to this blog entry, you can use the same SPARQL Web Page configuration to save HTML disk files with all of this formatting so that the phone browsers could view the static web pages stored on a server that didn't have TopBraid Live installed.

To enable proper mobile display, I used the jQuery Mobile library. jQuery is a set of Javascript and CSS libraries designed to let you add sophisticated user interfaces to your web pages without worrying about cross-browser compatibility, and jQuery Mobile is a branch of this project specialized for mobile phones. You don't need to know any JavaScript or CSS to use these libraries; if you're happy with one of their display configuration, using these libraries is usually just a matter of including the right file links in your HTML's head element and then setting certain attributes in your HTML elements to reference the libraries.

I began this application by creating an RDF/SPARQLMotion file in TopBraid Composer with a base URI of http://topbraidlive.org/mobileKennedys. I needed SPARQLMotion for the script that creates the static disk file version of the Kennedys display that we'll learn about next week. Next, I imported the kennedys.rdf model from the /TopBraid/Examples folder in the Navigator view. I also imported the SWP html.rdf and tui.rdf models from the Navigator's /TopBraid/UISPIN folder. (This all works the same when the files to import are Turtle ttl files instead of RDF/XML files.)

After importing the necessary files, the next step was to set up the display of data about a Person instance. After importing the files described above, clicking on kennedys:Person under owl:Thing on the Class view shows that the presence of the SWP libraries has added a ui:instanceView property to the kennedys:Person class form. I could have put the HTML to display a person here, like I did with the address book display in the blog entry mentioned above, but for greater flexibility, I created a separate PersonView class to store this markup and pointed at this class from the Person class's ui:instanceView value.

I created this mk:PersonView class (I had assigned the prefix "mk:" to the URI http://topbraidlive.org/mobileKennedys#) as a child of the ui:Element class, which is a child of the ui:Node class added by the SWP libraries. The ui:prototype property on this class's form is the place for the formatting code and markup, but I did a few setup steps before setting it:

Because the app needs to pass a parameter to the code in ui:prototype specifying which person to display, I had to define that parameter. To do this, I created an sp:person child of the sp:arg property in the Properties view to represent the person argument value passed to the prototype. Next, I dragged the new property from the Properties view to the spin:constraint property name on the mk:PersonView form to indicate that this would store the argument passed to the code and markup used to display a single person. This displays the "Create from SPIN template" wizard with all the values filled out the way I needed them, so I just clicked the OK button.
JQuery implements some of its magic with HTML extension attributes named data-collapsed and data-role. TopBraid Composer helps you assemble proper HTML by flagging any non-HTML markup, and it won't like these because they're not declared as HTML 4 properties. So, I declared them myself by making two clones of the html:class property (a subproperty of html:attributes) and renamed them html:data-collapsed and html:data-role. This way, TopBraid Composer wouldn't prevent me from saving HTML markup that used these properties as attributes.
When listing each person's property names and values (for example, Andrew Cuomo's year of birth and first name in the picture above), I certainly didn't want to list the full URI of each property name. Ideally, each property would have an rdfs:label value that I could display instead; if not, I thought it best to just show the local name of the property's URI. To make this easier, I created a new function called mk:bestName as a subclass of spin:Functions (itself a subclass of spin:Modules). I defined a spin:constraint of sp:arg1 for this function and then defined this spin:body for it:
```
SELECT ?label
WHERE {
    BIND (spif:name(?arg1) AS ?name) .
    BIND (IF(fn:contains(?name, ":"), afn:localname(?arg1), ?name) AS ?label) .
}
```
mk:bestName is a good general-purpose function. It calls the SPIN spif:name function, which gets a resource's skos:prefLabel value if available or an rdfs:label value as a second choice. If neither is available, mk:bestName takes the local name of the URI or prefixed name that got returned.
Because members of the kennedys:Person class might have a kennedys:name value that I'd prefer the application to use if available, I declared a similar but more specialized function for the Kennedys data called mk:bestKennedyName. This is also as a subclass of spin:Functions, and has a spin:constraint of sp:arg1 and the following as a spin:body:
```
SELECT ?label
WHERE {
    OPTIONAL {
        ?arg1 kennedys:name ?kname .
    } .
    BIND (spif:name(?arg1) AS ?name) .
    BIND (COALESCE(?kname, IF(fn:contains(?name, ":"), afn:localname(?arg1), ?name)) AS ?label) .
}}
```
This function body takes advantage of SPARQL 1.1's new COALESCE() function, which returns the value of the first parameter passed to it that can be evaluated without an error.

With the functions, the HTML extensions, and the argument to pass to it all set up for the formatting markup in the mk:PersonView class, I was ready to add that markup and SPARQL code to the ui:prototype property of my new class. It's mostly HTML div elements with attributes set according to the models I saw in the source of the jQuery Mobile demos. The "collapsible" part means that initially only the kennedys:name value will display, as an h3 element, and that clicking on that name (or, on a phone, touching it) will toggle the display of the remaining property names and values about that person.

<div data-collapsed="true" data-role="collapsible">
   <h3>{= spl:object(?person, kennedys:name) }</h3>
   <div class="ui-grid-a">
       <ui:forEach ui:resultSet="{#
               SELECT ?propertyName ?bestValueLabel
               WHERE {
                   ?person ?property ?value .
                   BIND (mk:bestName(?property) AS ?propertyName) .
                   BIND (IF(isIRI(?value), mk:bestKennedyName(?value), ?value)
                      AS ?bestValueLabel) .
               }
               ORDER BY (?property) }">
           <div class="ui-block-a">
               <div class="ui-bar ui-bar-c">{= ?propertyName }</div>
           </div>
           <div class="ui-block-b">
               <div class="ui-bar ui-bar-c">{= ?bestValueLabel }</div>
           </div>
       </ui:forEach>
   </div>
</div>

When you use SWP to define an HTML div element with the data and markup to display something, the SWP engine will create html, head, and body wrapper elements to ensure that a browser viewing the HTML gets a complete web page. The SWP ui:headIncludes property, which you'll see on the mk:PersonView class form with ui:prototype and the other properties there, lets you specify custom markup to add to the HTML head element when the SWP engine sends the web page to the requesting browser. I added the following to this property; it has the meta, link, and script elements necessary to make the resulting HTML a proper jQuery Mobile page:

<ui:group>
   <meta content="width=device-width, minimum-scale=1.0, maximum-scale=1.0"
         name="viewport"/>
   <link href="http://code.jquery.com/mobile/1.0/jquery.mobile-1.0.min.css"
         rel="stylesheet"/>
   <script src="http://code.jquery.com/jquery-1.6.4.min.js"/>
   <script src="http://code.jquery.com/mobile/1.0/jquery.mobile-1.0.min.js"/>
</ui:group>

Then, going back to the kennedys:Person element, I added this ui:instanceView value for it to point at the mk:PersonView class I had created:

<mk:PersonView sp:person="{= ?this }"/>

The ?this variable passes the Person instance currently being processed to be used as the ?person value in the SPARQL query in the mk:PersonView ui:prototype value.

This is all enough to display a single person, but I wanted to display all the Person instances in a sorted list. I attached this view's definition to the ontology resource itself by clicking on the little house icon at the top of TopBraid Composer and then adding this ui:view value to it (note that ui:view wasn't already part of the form, so I dragged it on there from TopBraid Composer's Properties view):

<div>
   <div data-role="header">
       <h1>Kennedys List</h1>
   </div>
   <div data-role="collapsible-set">
       <ui:forEach ui:resultSet="{#
               SELECT ?p
               WHERE {
                   ?p a kennedys:Person .
                   ?p kennedys:lastName ?lname .
               }
               ORDER BY (?lname) }">
           <ui:resourceView ui:resource="{= ?p }"/>
       </ui:forEach>
   </div>
</div>

As with the code to display each individual Person instance, this markup is mostly div elements with attribute settings based on the source of the jQuery Mobile demos I saw. The ui:resourceView element inside the ui:forEach element tells the SWP engine to display the resource according to whatever view was specified for it. In this case, the resource is a kennedys:Person instance, because that's what the SPARQL here query binds to the ?p variable, so it will use the view defined earlier.

To test this, I sent a browser to the URL http://localhost:8083/tbl/uispin?_resource=http://topbraidlive.org/mobileKennedys. (URLs for SPARQL Web Page applications often include a &_base parameter to identify the graph of data to use—in this case, it would be &_base=http://topbraid.org/examples/kennedys—but that was unnecessary here because one of the first steps of creating the mobileKennedys model was dragging the Kennedys data onto its Include tab, so it already knew which data to use.) The _resource parameter tells it which resource to render, so I used my file's base URI here because that's where I attached the markup and SPARQL code to display the full web page. These and other parameters are described in the SWP documentation.

This should work with any browser. (I recently discovered that picking User Agent from Safari's Develop menu lets you set Safari to emulate a variety of other browser, including the mobile versions that run on the iPhone and iPad, which helped me to debug some early problems I had with getting the jQuery Mobile code right.) Because you can't access TopBraid Composer's built-in copy of the TopBraid Live Personal edition from a different computer, there's no way for a phone's browser to access this application when running it on TopBraid Composer, so I uploaded the project storing this application to a copy of TopBraid Live to do the test shown in the photograph above.

Next week, I'll show how I extended this application to save a static HTML file of the mobile web display of Kennedys data as an alternative to the TopBraid Live server's dynamic display. I could then copy that file to a web server that doesn't necessarily have TopBraid Live installed on it. Then, any computer or phone web browser can display it. For a preview of how it looks, send your phone's browser to http://www.topquadrant.com/resources/blog/k/—or, if you want a shorter URL to type on your phone, http://bit.ly/topqkm.

Friday, September 30, 2011

Ontologies and Data Models – are they the same?

Yesterday a question about how ontologies may be different from logical data models was asked by a newcomer on TopBraid Users Forum. As to be expected on the TopBraid Forum, by ontologies he meant specifically ontology models expressed in RDFS/OWL. Because we frequently hear this or similar questions in our trainings, workshops and in conversations with customers, I decided to respond in a blog post instead of writing an e-mail.

Data modeling was invented more than thirty years ago to help with the design of databases, specifically, relational databases. As quoted below, ANSI definition from 1975 differentiated between three data models – conceptual, logical and physical. Data modeling quickly became recognized as a tool for analyzing the semantics of an organization with the respect to the structure and flow of the information used in carrying out organization’s activities. Wikipedia offers the following definition of Data Modeling:

Data modeling is a method used to define and analyze data requirements needed to support the business processes of an organization. The data requirements are recorded as a conceptual data model with associated data definitions. Actual implementation of the conceptual model is called a logical data model.
<…>
In 1975 ANSI described three kinds of data-model instance:

Conceptual schema: describes the semantics of a domain (the scope of the model). For example, it may be a model of the interest area of an organization or of an industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationships assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial "language" with a scope that is limited by the scope of the model.

Logical schema: describes the structure of some domain of information. This consists of descriptions of (for example) tables, columns, object-oriented classes, and XML tags.

Physical schema: describes the physical means used to store data. This is concerned with partitions, CPUs, tablespaces, and the like.

According to ANSI, this approach allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual model. The table/column structure can change without (necessarily) affecting the conceptual model.

These definitions describe a clear progression from conceptual to logical to physical data models. SInce their origin is in the 70s, they reflect certain technology assumptions than no longer hold true.

When information modeling is done to create a relational database, conceptual model must be different from a logical model because there is no place in a relational database structure to capture, for example, business rules, create subsumtion relationships and describe other key aspects of a conceptual model. This semantic information collected and documented as part of the initial modeling is left behind when modelers and designers move on to define a logical data model. The "left behind" parts are used by software developers as they encode business semantics directly into custom programs.

Logical data model is a subset of a conceptual model that can be expressed using a particular technology. However, there are always some performance considerations that require additional changes to the logical data model before it can be implemented in a relational database. Hence, some of the aspects of a logical model are left behind as it gets translated into a physical data model.

Since an ontology is a model of a domain describing objects that inhabit it, all three types of data models can be thought of as ontologies. They range from the most expressive one that describes business concepts and processes (the conceptual model) to less expressive and progressively moving from describing business semantics to describing physical structures of the data as it is stored in the databases (the logical and physical data model). Physical model can be thought of as an ontology of a particular database. Wikipedia goes on to note

Early phases of many software-development projects emphasize the design of a conceptual data model. Such a design can be detailed into a logical data model. In later stages, this model may be translated into physical data model. However, it is also possible to implement a conceptual model directly.

Semantic Web standards (governed by the W3C, the World Wide Web Consortium) make it possible to implement conceptual models directly. This is possible due to the layered architecture of the Semantic Web technology stack consisting of:

RDF – a canonical data model that is like relational data model in its ability to connect related objects and unlike relational data model in that the data objects (or resources in RDF-speak) are highly granular.

The smallest unit of information in RDF is not a table or a row in a table, but individual statements – a single fact about a resource.

These statements are called RDF triples. For example, “Atlantis decommission-date July, 2011” is a triple where Atlantis is a subject of a triple, decommission date is a predicate of a triple and July, 2011 is an object of a triple. Atlantis and decommission date are RDF resources and July, 2011 is XML literal. Subjects and predicates of a triple are always RDF resources. An object can be either a resource or a literal value. Predicates that connect two resources are relationships or associations in the data modeling speak. Predicates connecting a resource to a literal value are attributes. In RDF they are called respectively object and data properties.

Because RDF model is highly canonical, RDF data is schema-less. There are no constraints that require it to fit into tables or hierarchies. RDF data is simply a network of connected triples. As such, it can be used to represent, if needed, both - table structures and hierarchies. Standard mappings have been defined from relational tables and XML hierarchies into RDF.

Another key differentiating factor of RDF is that it was “born on the web”. Each RDF resource has a globally unique identity, a URI (uniform resource identifier). For example, the URI for Atlantis may be http://www.nasa.gov/shuttle/Atlantis and the URI for a decommission date may be http://www.nasa.gov/lifecycle#decommissionDate . As a result, it is possible to link RDF data over web in a way similar to how documents can be hyperlinked over the web. By web we mean all HTTP based networks including intranets and extranets.

RDF databases store and provide query access to RDF data. Just like there are standard languages for query of relational and XML data, there is a standard for querying RDF. It is called SPARQL. True to the web-native nature of RDF, SPARQL is not only a query language, but also a protocol that makes it possible to access RDF data over HTTP.

RDFS (RDF Schema) and OWL (Web Ontology Language) – RDF-based languages for expressing business semantics.

Jointly RDFS and OWL offer ability to define classes or groups of resources that share common characteristics such as Vehicles and Space Shuttles. The richness of RDFS/OWL makes it possible to fully express the meaning of the business concepts. Data models in RDFS/OWL are stored in the same way as the data, in RDF triples. For example, we can have triples stating that Space Shuttle is a Class and it is a sub class of a Vehicle class and that a vehicle can have only one decommission date (cardinality = 1) and its value must be xsd:date. And you can go beyond cardinality and use the Semantic Web standards to represent a variety of business rules.

Since the data and the schema are stored in the same way, it is possible to query schemas the same way data is queried and to combine search criteria about schemas with the search criteria about data. For example, we can create SPARQL queries to ask for all vehicles that have been decommissioned, all subclasses of a vehicle class, all relationships and attributes a vehicle should have and, when returning decommissioned vehicles, to provide only data values for the fields that have cardinality = 1.

The use of RDF means that the modeling constructs and definitions can be linked and connected. Organizations can refer to each other’s business definitions. Models can be modularized and re-used where appropriate. Differences between related, but not identical concepts can be described. All of this can now be done in a standard compliant and interoperable way.

A growing number of standards bodies and communities of interest are publishing RDF/OWL data models for their particular domains. For example:

SKOS – provides a way to represent taxonomies and thesauri
ISO 15926 – offers a data model for sharing life-cycle data for process plants including oil and gas production facilities
Ontology for Media Resources - defines a core set of metadata properties for multimedia resources
SIOC - defines information about online communities
QUDT - provides models describing measurable quantities, units for measuring different kinds of quantities and the data types used to store and manipulate these objects in software
Provenance Vocabulary - defines provenance-related metadata

There is much more that can be added to this post including a discussion on the best practices for ontology modeling, ontology architecture, approaches for connecting and mapping models, using rules and constraints, publishing, versioning and governing models. Each of these topics, however, deserves an exploration in its own right.

I will end by pointing to a few relevant related blogs and web pages we have published before:

How to extend an ontology http://topquadrantblog.blogspot.com/2011/03/how-to-extend-ontology.html
Ontology Mapping with SPINMap http://topquadrantblog.blogspot.com/search/label/SPINMap
Training on RDF, OWL and ontology modeling http://www.topquadrant.com/training/training_overview.html
Transforming XML Schemas and XML into RDF/OWL http://topquadrantblog.blogspot.com/2011/09/living-in-xml-and-owl-world.html
Converting UML models to OWL http://topquadrantblog.blogspot.com/2011/02/converting-uml-models-to-owl-part-1.html

Wednesday, September 28, 2011

Living in the XML and OWL World - Comprehensive Transformations of XML Schemas and XML data to RDF/OWL

Many enterprise information models are expressed using XML Schemas. Data between applications is commonly exchanged in XML, compliant with those schemas. Connecting XML data from different systems in a coherent aggregated way is a challenge that confronts many organizations. Capabilities of RDF/OWL to describe semantics of different data models and aggregate disparate data are a natural fit for addressing these challenges.

For a number of years now, TopBraid Composer included the ability to convert XSDs and associated XML files to RDF/OWL. However, for some XML Schemas our converter did not work as well as customers needed. For the upcoming TopBraid Composer 3.6.0 release, it was significantly improved to have a more comprehensive coverage of XSD constructs and more meaningful conversion to OWL. In parallel, we improved our XML data conversion to RDF so that transformations automatically happen based on the generated OWL models. And we have improved performance of the transformations.

An overview of the approach is illustrated in the following figure:

Since, the conversion occurs automatically, users do not have to worry about writing any rules for commonly needed mappings. However, those users that need to make further transformations can use SPARQL Rules and SPARQLMotion to customize their generated OWL ontology or further transform RDF triples representing the XML data.

The content of this blog is organized as follows:

XML Schemas converted as part of our tests
Some challenges in converting XML Schemas to OWL
Illustrative example of transformation rules
Another example of transformation rules
Complete table of supported transformations
A SPARQL Metric Query
Concluding remarks

XML Schemas converted as part of our tests

We tested the importer on a broad range of complicated and large-scale industry standard XSD files, and converted many XML instances with impressive results. The XSDs we have tested with the new importer include:

Banking
- FpML, the Financial products Markup Language
- ISO 20022, a standard for Universal financial industry message scheme

Energy and Utilities
- MultiSpeak, de-facto standard for defining data needed to be exchanged between software applications in order to support the business processes commonly applied at utilities

Government
- DoDAF, the Department of Defense Architecture Framework
- NIEM, the U.S. National Information Exchange Model

Oil and Gas
- ISO 15926, a standard for integration of life-cycle data for process plants including oil and gas production facilities
- WITSML, Wellsite Information Transfer Standard Markup Language

Healthcare
- HL7

Electronics
- IP-XACT, the XML Schema for meta-data documenting Intellectual Property (IP) used in the development, implementation and verification of electronic systems

Other
- ATML, the Auto-Test Markup Language

Some of the converted schemas will be published at LinkedModels.org. To get an early access to converted models or for any other questions, contact us at TopQuadrant.

The examples we use in this blog are mainly from the Financial products Markup Language (FpML). All FpML 5.2 XSD and XML instance files were tested. An example is transparency standard under FpML, for which the transparency.ttl ontology was generated from 23 XSD files.

Some challenges in converting XML Schemas to OWL

Some of the challenges in converting XSD to OWL that were addressed are:

Transforming of anonymous types
Converting complex types with simple contents

Resolving conflicting nested element and attribute names during OWL property generation

When and how to distinquish global elements from complex types with similar names during OWL class generation

Generating enumerations

Handling substitution groups both at the XSD and XML levels

Handling the overriding of an XSD type with xsi:type in XML

The example that follows shows the approaches that we have used for the transformation.

Illustrative example of transformation rules

The basic transform for a Complex Type in XSD follows these rules:

An OWL class is generated for a complex type.

The URI of the class is generated in three different ways. If the complex type is global and named, then the name attribute is used. If the complex type is local and named, then the name attribute of the owner element is used. If the complex type is anonymous, then the names of its owner element and its parent element are used.

The xsd:annotation and attribute annotations describing the complex type get generated as dc:description, rdfs:comment and/or skos:definition OWL annotations.

Nested or reference children elements of the complex type become OWL allValuesFrom restrictions on the class. If the element has a simple type, then a restriction with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range is generated. If the element has a complex type, then a restriction with an OWL object property and an OWL class range is generated. The minOccurs and maxOccurs values become OWL cardinality restrictions.

Element group and attribute group references are generated as super classes.

Attributes become restrictions with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range.

An example of a Complex Type, Trade, in fpml-doc-5-2.xsd of transparency standard is displayed below:

<xsd:complexType name="Trade">
  <xsd:annotation>
    <xsd:documentation xml:lang="en">
    A type defining an FpML trade.</xsd:documentation>
  </xsd:annotation>
  <xsd:sequence>
    <xsd:element name="tradeHeader" type="TradeHeader">
      <xsd:annotation>
        <xsd:documentation xml:lang="en">
         The information on the trade which is not 
         product specific, e.g. trade date.
         </xsd:documentation>
      </xsd:annotation>
    </xsd:element>
    <xsd:group ref="TradeEconomics.model">
      <xsd:annotation>
        <xsd:documentation xml:lang="en">
        The economics of the trade. In the case of an 
        OTC trade, this is the OTC derivative product.
        In the case of a trade of a security,
        it is the instrument trade economoics.
        </xsd:documentation>
      </xsd:annotation>
    </xsd:group>
  </xsd:sequence>
  <xsd:attribute name="id" type="xsd:ID" />
</xsd:complexType>

The following is the graph of the OWL class generated for Trade complex type, which shows the OWL class, restrictions, annotations and superclass.

The following class diagram shows a more sophisticated view of Trade and its related classes downstream in the generated ontology (click on the diagram to open a window with a bigger image).

The diagram highlights these advanced features in generation:

A superclass relation exists between Trade, generated from an XSD complex type and TradeEconomics.model, generated from an XSD element group.

In the XSD, Swap element has the substitutionGroup Product element. Thus, A_Global-Swap becomes a subclass of A_Global-Product. A_Global- prefix is used to distinguish the element-derived classes from similarly named complex-type-derived classes.

dtype:value restrictions are generated to hold the simple contents occuring in complex types. The complex content part of the type become other restrictions.

The generated object properties have a Ref suffix to distinguish them from datatype properties with same names. Both types of properties can be used in restrictions on different classes as they may be generated from nested or reference children elements under different complex types.

The instance file, "msg_ex001_new_trade.xml" was imported into the transparency ontology. Here is a peek into that XML file:

...
<trade>
<tradeHeader>
  <partyTradeIdentifier>
    <tradeId tradeIdScheme=
        "http://fpml.org/universal_swap_id">123</tradeId>
    <tradeId tradeIdScheme=]
        "http://fpml.org/submitter_trade_id">456</tradeId>
  </partyTradeIdentifier>
  <tradeInformation>
    ...
    <cleared>true</cleared>
    <nonStandardTerms>false</nonStandardTerms>
    <offMarketPrice>false</offMarketPrice>
    <largeSizeTrade>false</largeSizeTrade>
    ...
  </tradeInformation>
  <tradeDate>2011-02-04</tradeDate>
</tradeHeader>
<swap>
  <productType>InterestRateSwap</productType>
  <assetClass>InterestRates</assetClass>
  <swapStream>
   ...
  </swapStream>
  <swapStream>
    ...
  </swapStream>
</swap>
</trade>
...

The above XML constructs were mapped into the following RDF graph, where you can see how the instances, their relationships and their types are generated with respect to the Trade class diagram (click on the graph to open up a window for a more detailed view).

Another example of transformation rules

The basic transform for an Enumeration in XSD follows these rules:

An OWL class is generated from an XSD simple type having XSD enumeration facets. The localname of the class has Enum suffix to distinguish it from classes generated with similar names.

This class becomes a subclass of EnumeratedValue in the same namespace as the OWL class, which itself becomes a subclass of dtype:EnumeratedValue.

Each XSD enumeration facet becomes an instance of the generated class. dtype:value holds the enumeration value. dtype:order is the order in which the enumeration facet occurs.

An Enumeration class in the same namespace as the OWL class is also generated. This class becomes subclass of dtype:Enumeration. An instance of this class is generated as a container to refer to all the instances generated from the current simple type.

Enumerated value instance URIs are generated using a concatenation of the abbreviation of the class localname's upper case letters and the dtype:value literal.

The following figure shows a graph for PremiumQuoteBasisEnum class and the OWL constructs generated from the related XSD enumeration facets (click on the diagram to open a window with a bigger image):

Complete table of supported transformations

For the reader interested in more details a full overview of the mapping transformations is given in the following table:

*Table: Conversion from XSD Constructs to OWL Constructs*
#	XSD/XML Constructs	OWL Constructs
1	`xsd:simpleType`	`owl:Datatype`
2	`xsd:simpleType` with `xsd:enumeration`	Becomes an `owl:Class` as a subclass of `EnumeratedValue`. Instances are created for every enumerated value. An instance of `Enumeration`, referring to all the instances, is created as well as the `owl:oneOf` union over the instances.
3	`xsd:complexType` over `xsd:complexContent`	`owl:Class`
4	`xsd:complexType` over `xsd:simpleContent`	`owl:Class`
5	`xsd:element` (global) with complex type	`owl:Class` and subclass of the class generated from the referenced complex type
6	`xsd:element` (global) with simple type	`owl:Datatype`
7	`xsd:element` (local to a type)	`owl:DatatypeProperty` or `owl:ObjectProperty` depending on the element type. OWL Restrictions are built for the occurrence.
8	`xsd:group`	`owl:Class` and subclass of `A_AbstractElementGroup`
9	`xsd:attributeGroup`	`owl:Class` and subclass of `A_AbstractAttributeGroup`
10	`xsd:minOccurs` and `xsd:maxOccurs`	Cardinality specified in minimum cardinality, maximum cardinality and universal (allValuesFrom) OWL restrictions.
11	Anonymous Complex Type	As for Complex Type except a URI is constructed from the parent element and the nested element reference. Also, the class is defined as a subclass of `A_Anon`.
12	Anonymous Simple Type	As for Simple Type except a URI is constructed from the parent element and the nested element reference.
13	`xsd:default` on an attribute	Uses `dtype:defaultValue` to attach a value to the OWL restriction representing the associated property.
14	Substitution Groups	Subclass statements are generated for the members. Instance files resolve their types by consulting the OWL model at import-time.
15	Annotation attributes on elements	OWL Annotation properties are created and placed directly on the relevant class.
16	Annotations using `xsd:annotation`	Become, based on user selection, `dc:description`, `rdfs:comment` and/or `skos:definition` OWL annotations.
17	`xsi:type` on an XML element	Overrides the schema type with the specified type.

A SPARQL Metric Query

As a quick check on the generated OWL models, the following is a useful SPARQL query that counts the number of properties on each OWL class.



 SELECT ?class (COUNT(DISTINCT ?p) AS ?properties)
 WHERE {
     ?class a owl:Class .
     FILTER( afn:namespace( ?class ) = 
        "http://www.fpml.org/FpML-5/transparency#") .
     OPTIONAL {
         ?class rdfs:subClassOf ?r .
         ?r a owl:Restriction .
         ?r owl:onProperty ?p .
     }
 }
 GROUP BY ?class
 ORDER BY DESC( ?properties )

The classes for the transparency ontology have the distribution of properties shown in the following figure (click on the diagram to open a window with a bigger image). For example, TradeInformation has 12 properties:

Concluding remarks

The new capability is easy to use. As before, a convenient import wizard will guide the user. The dialog has a number of new options. XML conversion will happen automatically when users open XML files in TBC or use XML import modules in SPARQLMotion. As long as an XML file is valid against an XSD that it is based on, the XML will be transformed in accordance to the schema. Parts of the XML files that do not validate against a schema will continue to be converted using the default Semantic XML structure. There is also a new option to specify which OWL file to use as a schema when mapping a specific XML file to triples. This feature is also available for spreadsheets and will be covered in a separate blog.

We believe that the importance of this work is not only in its value to harvest XML Schemas. Ability to use the automatic creation of triples from XML instance files directly in applications is proving to be key to a number of customers. For example, TopQuadrant is currently using this approach in a project for the North Sea Oil and Gas industry.

The functionality we have described will be released in TopBraid Composer 3.6.0. This release entered internal beta this week. Please contact us if you want a try these capabilities before general availability, which is currently planned for November.

Thursday, July 21, 2011

Putting your drag-and-drop SPINMap vocabulary mappings into production

The "Composing the Semantic Web" blog entry SPINMap: SPARQL-based Ontology Mapping with a Graphical Notation describes TopBraid 3.5's new tool for mapping between vocabularies or ontologies. (It also points to a handy video that demonstrates both simple and sophisticated uses of SPINMap.) Once you've created a mapping, though, how do you use it to convert data? As it turns out, no new technology is necessary; SPINMap just creates SPIN rules that you can apply in a SPARQLMotion script.

Let's look at an example. Imagine that I'm a publisher who receives images and metadata about those images from ExampleCo every month, and I load these images and metadata into my company's Digital Asset Management system. ExampleCo uses their own vocabulary to describe the metadata, but I prefer to use the NEPOMUK vocabulary for describing image metadata, because I know that by taking advantage of a vocabulary used by other systems around the world, my data can more easily interoperate with other data and tools.

Following the steps described in the blog posting mentioned above, I create the mapping from ExampleCo's pd:Image class and its associated properties to the NEPOMUK equivalents. Because the NEPOMUK image vocabulary's nexif:Photo class has so many properties associated with it, the diagram of it doesn't all fit on the screen at once, but it was easy enough scroll up and down as I mapped the pd:Image properties on the left to various NEPOMUK nexif:Photo properties.

Mapping from input to output with SPINMap

I saved the mapping in its own file, which I called ExampleCo2Nepomuk.ttl. At this point, I could convert a set of ExampleCo metadata by importing a file of that data and ExampleCo2Nepomuk.ttl into the same model and then picking Run Inferences from the Inference menu, assuming that Configure Inferencing on the same menu had TopSPIN configured as the inferencing engine.

I wanted this to be more automated, though, so I put it in a SPARQLMotion script that could be called as a web service or from a TopBraid Ensemble interface. This would make it easier to re-use this mapping every month on each new batch of ExampleCo image data as it comes in:

SPARQLMotion script that applies mappings

The script's first module prompts for the input filename, because it will be a new dataset each month. This module hands the filename to the "Get ExampleCo RDF" module, an Import RDF From Workspace module that reads in the ExampleCo data.

At the same time, another Import RDF From Workspace module named "Get mapping rules" reads in the ExampleCo2Nepomuk.ttl file storing the SPIN-based mapping rules. Both of these modules feed their triples to an Apply TopSPIN module named "Apply mapping rules," which has its sml:replace value set to true so that it only passes along the new triples that it creates and not the input triples. The script's last module saves the result in a disk file, but could easily send it off for addition to a triplestore in a Digital Asset Management system.

There's nothing especially new or unusual in this script; what's new is that the rules that it applies to the data were created by a graphical drag-and-drop tool instead of being coded by hand. (Rest assured that the rules stored by the tool are still expressed using standard SPARQL.) With easy data aggregation being one of the great advantages of semantic web applications, it's nice to know that SPINMap lets you define data transformations with less trouble than ever before, making your application development (and application maintenance) even faster.

As an added bonus, because the mappings are stored as SPIN rules (also known as SPARQL Rules), they can easily be combined with other SPARQL Rules that you can run with the same script. These other rules might perform validation to ensure that the data being read conforms to certain data quality standards, or they could calculate new values based on a combination of the incoming data and existing stored data.

Sunday, June 26, 2011

Comparing SPIN with RIF

Since SPIN (SPARQL Inferencing Notation) aka SPARQL Rules became W3C member submission,we find ourselves responding to the growing interest to it.

With this, a question some may ask is how SPIN is different from or similar to RIF - W3C's standard for rules interchange.

While I have heard this asked a couple of times, I was pleasantly surprised that it was is not a very common question. Pleasantly, because a certain level of confusion is to be expected about new things and, both, SPIN and RIF are relatively new. If so few people ask this question, then SPIN specification did a good job explaining and positioning it and people easily grasp the unique and important needs it serves. Still, I thought it was worth while to write up my thoughts on comparing SPIN with RIF.

The goal of RIF was to create an interchange format for use between rules engines. As such, unlike SPIN, RIF is not an idea that is specifically or particularly aligned with RDF. This is why RIF was created as XML (although there is now work on RDF serialization). I am not pointing this out as a shortcoming of RIF, but rather to put in perspective the origin and the reason for RIF. In its goals, RIF is similar to OMG's XMI which also uses XML and was created to be an interchange format between different tools.

Given this similarity, XMI’s failure in being a reliable interchange format becomes relevant when considering RIF's future. Will RIF succeed in reaching its goal? One can easily argue that with the variety of available rules languages and engines, RIF’s job is harder than what XMI needed to do to succeed.

As noted here, different rules languages exist because there are different algorithms and formalisms for rules. Furthermore, different rule products have different sets of capabilities. RIF dialects are intended to be the least common denominators for a given type of a rule engine. This means that in order to effectively use the same set of RIF rules in the ‘rules engine A’ and in the ‘rules engine B’, the following needs to happen:

1. RIF dialect used to express the rules, needs to be supported by both rules engines.

Checking the implementation page, one will see that currently the overlap between any two engines is not that great. Some support BLD, some support PRD + Core, others support BLD partial or PRD minus something, etc.

2. RIF dialect used to express the rules, must be enough for the task at hand.

As mentioned above, RIF by design is somewhat of a least common denominator. This means that a user could always do more with a given rules engine than they can express in a dialect of RIF.

For example (as noted here), SPARQL is more expressive than what is possible with RIF. This is not unique to SPARQL, it is true for pretty much any rules technology.

3. The interchange must work

Given well known XMI issues, I am quite keen to see RIF test cases as well as test case results from the implementers

Attitude of the major rule engine vendors towards RIF is currently, at best, lukewarm. For example, on the Oracle forum, support engineers recommend against attempting to interchange rules by saying:

“In a hybrid environment I'd recommend that rules authored in ILOG be executed in the ILOG engine, and that rules authored in OPA be executed in the OPA engine, rather than attempt to interchange rules between the two products. As long as there is a clear scope boundary between what the rule sets are used for, then there wouldn't be any duplication or interchange of rules.”

Having considered the design goals and challenges of RIF, it is easy to see that the design goals of SPIN are quite different. SPIN is not about capturing rules that can then be translated for execution by different types of rule engines. Rather it is about capturing rules that can be executed directly over RDF data and about having rules that are intimately connected to the Semantic Web models.

With these goals in mind, we identified the following three things as important principles in SPIN's design:

1. Rules can be expressed in a familiar language. People working with RDF must know SPARQL. Using SPARQL for rules means that they don’t need to use another language

2. Rules can be executed by any RDF database. Since they are in SPARQL, rules are portable – not across rules engines, but across RDF stores

3. Evolution of the models does not unnecessarily break the rules. For example, let’s say we change the URI of a resource used in a rule. If a rule uses some other format (XML) and is not connected to the underlying RDF in a way other than a blob, it becomes hard to maintain these two different sets of information

Finally, SPIN takes an object-oriented approach to rules. It is about programming and about associating behavior with classes while RIF takes a model-theoretic view on how the rules may relate to ontologies. This is a key difference as noted in W3C comments on SPIN submission.

In short, SPIN and RIF address different needs and have different design goals. They can be considered complimentary.

What about using SPIN and RIF together? Given the key role SPARQL plays in the architecture of Semantic Web solutions, I am certain that should RIF get traction in its adoption, someone will create a RIF profile for SPARQL and write a RIF to SPARQL translation.

Wednesday, March 30, 2011

How to: extend an ontology

When people work with ontologies, XML schemas and software development, it's almost a cliché to say that re-use of existing work is better than creating something new from scratch. Existing work, though, is not always a perfect fit to your needs, and the ease of customizing it for your needs often depends a lot on how the original work was designed—when you're reusing XML schemas or software source code. Customizing OWL ontologies and RDF schemas, on the other hand, is pretty simple nearly all of the time, especially when you use TopBraid Composer.

For example, let's say that you have a taxonomy of business terms to track, and the W3C's SKOS standard defines all the properties you need to maintain metadata about these terms, with two exceptions: SKOS has nothing about the last person to edit a term and it has no slot for the editor's department code, which is a special bit of metadata within your enterprise. Customizing SKOS to include these is just two steps:

Create an empty new ontology and import SKOS into it.
Define your two properties in this new ontology and give them a domain of skos:Concept so that SKOS tools such as TopBraid Enterprise Vocabulary Net(EVN) know that they're potential properties of your SKOS concepts.

To do this with TopBraid Composer, start by creating a new RDF/OWL file in one of your projects. On the wizard dialog box for creating this file, enter the Base URI of your new customized version of the SKOS ontology. If I worked for The Example Company, I might create a baseURI of http://example.com/ns/exskos.

On the same dialog box, click the checkbox next to SKOS under "Initial imports" and click Finish. (If you forget to click the checkbox first, you can always drag the skos-core.rdf file from the Navigator View's /TopBraid/SKOS folder to the Imports tab. In fact, to start creating a customized version of any ontology, drag a copy of it into your custom ontology's Imports tab.)

Once the file is created, having a namespace prefix associated with the base URI makes it easier to create the new properties, so on your new ontology's Overview tab add an ex prefix for the http://example.com/ns/exskos# namespace. Don't forget the pound sign; the prefix will be standing in for this URI, and you wantex:editor to represent http://example.com/ns/exskos#editor, not http://example.com/ns/exskoseditor.

Now we're ready to add the customizations. Instead of creating a new editor property from scratch, it's better to define it as a subproperty of the Dublin Core dc:creator property so that applications that don't know about our new property but do know about Dublin Core properties will have some clue what it's for.

Drag the dc1-1.rdf Dublin Core ontology file from the /TopBraid/Common folder in TopBraid Composer's Navigator view to the Imports tab to import the Dublin Core ontology. You'll see several new properties will join the SKOS ones on the Properties view.
Right-click on dc:creator there and pick Create subproperty. In the Create subproperty wizard, replace the dc:Property_1 value that appears as the default Name of new instance value with ex:editor, which uses the ex prefix that you defined earlier.
Click the checkbox next to rdfs:label on the dialog box's Annotations Template so that an rdfs:label property gets automatically set for this property. (One nice thing about how the RDF data model lets you assign properties to properties is that you can associate human readable names to substitute for the actual property names on forms and reports.)
Click the OK button and you'll see a property form in the middle of your screen for your new property, which will be selected in the Properties view. (It's on the left of the screenshot below because the Classes view is not shown there.)
To show that your ontology is defining this new property for SKOS concepts, click the small white triangle next to the word domain on the ex:editor property form and pick Add existing to indicate that this property's domain will be an existing class. Click skos:Concept, which will be under owl:Thing on the wizard's class tree, and click OK.

You're finished defining your editor property. The department code one will be even quicker to create, because it won't be a subproperty of something else:

Click the Properties view's small white triangle to display its menu and pick Create rdf:Property. Name the new property ex:deptCode on the Create Property wizard dialog box. The Annotations Template's rdfs:label checkbox should already be checked, so click OK.
Set your newest property's domain to skos:Concept the same way you did for ex:editor and save your file.

You're done! You know have a customized version of the SKOS ontology. If you now use TopBraid Composer or EVN to create a new instance of the Concept class in this file, you would see editor and dept code fields on your new concept's resource form along with has broader and all the other standard SKOS properties. (If you create concept instances with TopBraid Composer and the labels are toggled to show qnames instead of human-readable labels, they'll say ex:editor andex:deptCode.) Instead of creating concept instances in this file, though, you would more likely create a new taxonomy file that imports your customized ontology the same way that your ontology imported the SKOS and Dublin Core ontologies, and then you would store your taxonomy's concepts in this new file.

The modularity of this approach brings another benefit that isn't as easy when customizing typical XML schemas and other software resource files: when a SKOS upgrade is released, you can simply delete the import of the current SKOS ontology in your customization of it and import the new one instead, and all of your applications that use your custom ontology should be able to go on using it the same way the did before.

It's nice to know that customization of a standard ontology that nearly meets your needs is so easy, and many organizations are doing this with the SKOS ontology to create a better fit with their vocabulary management requirements. This isn't limited to customizing SKOS, though; the same principle works with any OWL ontology or RDF schema. As an added benefit, if you create a customized version of a particular standard for your enterprise, you can follow these same steps to create customizations of your customization for individual departments within your enterprise.

Thursday, February 3, 2011

Converting UML Models to OWL - Part 1: The Approach

Convert UML to OWL - why would you ever want to do this? One reason suffices: many enterprise models, that serve as either standards or enterprise schemas, are specified in UML. Increasingly, there is interest in having content of UML models re-purposed in RDF/OWL and the need for RDF/OWL to interoperate with systems built from UML Models.

UML Models are notoriously hard to exchange between UML tools, let alone be transformed into OWL. The exchange format XMI is not only is difficult to understand but also has vendor-specific extensions. The vagaries of MOF, CMOF and EMOF create their own challenges. Nonetheless we have done transformations of UML to OWL. Using a model-based transformation approach, based on SPARQL Rules, XMI models of UML models can be converted to OWL. UML class diagrams can be represented in OWL without information loss. The inverse, however, is not true and will require another blog series.

UML to OWL - Part 1 Contents

Part 1 of the series explains the basis of the approach. The complete series of blogs, as currently conceived, is as follows:

Converting UML Models to OWL - Part 1: The Approach
Converting UML Models to OWL - Part 2: Transforming UML Models to OWL Using SPARQL
Converting UML Models to OWL - Part 3: Examples of Industry UML Model Transformations

The content of this blog is organized as follows:

Goals, Objectives and Requirements
Backgrounder on XMI
Backgrounder on MOF
Solution Outline
Overview of Semantic XML
OCMOF - the OWL Representation of CMOF
How the Transformations from UML to OWL Work
Generation of UML Metaclasses
Generation of UML Classes
Generation of UML Class Superclass Relationships
Generation of UML Packages
Generation of UML Package Relationships
Performance
Concluding Remarks

Readers who are very interested in the detailed technical approach, should read all sections of this blog in order. Those who just need to have an overview of the approach could skip sections 9 through 12. Those who have deep knowledge of XMI and MOF may want to skip sections 2 and 3, but I would welcome their feedback on the accuracy of my statements.

Note that some diagrams may be too small to be viewed in the body of the document. Clicking on such a diagram will open a new window with a larger depiction of the diagram.

Goals, Objectives and Requirements

The OWL Models must faithfully represent packages and the logical models or class diagrams. Out of scope, currently, are all of the other UML models such as Interaction Diagrams and State Diagrams. The approach must be able to convert UML by processing XMI files from specific tools. This requires a strategy for converting from the XML structures of XMI to OWL models.

Backgrounder on XMI

XMI, the XML Metadata Interchange standard is a serialization format for UML Models. The main purpose of XMI is to define how the XML elements are organized within an XMI file. The XMI spec also defines a mechanism for how one XMI element references another, within and across XMI files. Such a mechanism is needed as it is a legal scenario for a single UML model to be serialized to more than one XMI file.

Top

Backgrounder on MOF

MOF began at the time of CORBA and the need for IDL interfaces. MOF 1.4 resulted in its mapping to Java being codified in the Java Community Process (JCP) as the Java Metadata Initiative (JMI). MOF 2.0 was developed in tandem with UML 2.0. The separation of MOF into EMOF and CMOF was motivated by the influence of EMF's Ecore, and model-driven Java development. CMOF was more the motivation of meta model developers. CMOF stands for Complete Meta Object Facility and is an OMG standard for the UML 2 model interchange. More information can be found at this page on the OMG Website.

CMOF includes fully fledged associations, association generalization, property subsetting and redefinition, derived unions, and package merge. Typical XMI container structures look like the example below, from the CMOF UML Infrastructure Model. The basic idea is that a packagedElement owns other elements. A type attribute specifies the type of the packagedElement.
Things get a little busy with how IDs are used for associations and their member ends. That complication, we can leave for Part 2.

<?xml version="1.0" encoding="UTF-8"?>
  <xmi:XMI xmi:version="2.1" xmlns:xmi="http://schema.omg.org/spec/XMI/2.1"
   xmlns:cmof="http://schema.omg.org/spec/MOF/2.0/cmof.xml">
   <cmof:Package xmi:id="_0" name="InfrastructureLibrary">
    <ownedMember xmi:type="cmof:Package" xmi:id="Core" name="Core">
     <ownedMember xmi:type="cmof:Package"
      xmi:id="Core-Abstractions" name="Abstractions">
      <packageImport xmi:type="cmof:packageImport"
      xmi:id="Core-Abstractions-_packageImport.0"
      importedPackage="Core-PrimitiveTypes"
      importingNamespace="Core-Abstractions"/>
      <ownedMember xmi:type="cmof:Package"
       xmi:id="Core-Abstractions-Ownerships" name="Ownerships">
       <packageImport xmi:type="cmof:packageImport"
        xmi:id="Core-Abstractions-Ownerships-_packageImport.0"
        importedPackage="Core-Abstractions-Elements"
        importingNamespace="Core-Abstractions-Ownerships"/>
        <ownedMember xmi:type="cmof:Class"
         xmi:id="Core-Abstractions-Ownerships-Element" name="Element" isAbstract="true">
         <ownedComment xmi:type="cmof:Comment"
          xmi:id="Core-Abstractions-Ownerships-Element-_ownedComment.0"
          annotatedElement="Core-Abstractions-Ownerships-Element">
          <body>An element is a constituent of a model.
          As such, it has the capability of owning other elements.</body>
         </ownedComment>
        <ownedRule xmi:type="cmof:Constraint"
         xmi:id="Core-Abstractions-Ownerships-Element-not_own_self"
         name="not_own_self" constrainedElement="Core-Abstractions-Ownerships-Element"
         namespace="Core-Abstractions-Ownerships-Element">
         <ownedComment xmi:type="cmof:Comment"
          xmi:id="Core-Abstractions-Ownerships-Element-not_own_self-_ownedComment.0"
          annotatedElement="Core-Abstractions-Ownerships-Element-not_own_self">
          <body>An element may not directly or indirectly own itself.</body>
         </ownedComment>
         <specification xmi:type="cmof:OpaqueExpression"
          xmi:id="Core-Abstractions-Ownerships-Element-not_own_self-_specification">
          <language>OCL</language>
          <body>not self.allownedElements()->includes(self)</body>
         </specification>
        </ownedRule>
        ...
        <ownedAttribute xmi:type="cmof:Property"
         xmi:id="Core-Abstractions-Ownerships-Element-ownedElement"
         name="ownedElement" type="Core-Abstractions-Ownerships-Element"
         upper="*" lower="0" isReadOnly="true" isDerived="true"
         isDerivedUnion="true" isComposite="true"
         association="Core-Abstractions-Ownerships-A_ownedElement_owner">
         <ownedComment xmi:type="cmof:Comment"
          xmi:id="Core-Abstractions-Ownerships-Element-ownedElement-_ownedComment.0"
          annotatedElement="Core-Abstractions-Ownerships-Element-ownedElement">
          <body>The Elements owned by this element.</body>
         </ownedComment>
        </ownedAttribute>
        ...

Figure 1: A sample of XMI

For more background on the history of MOF the following references may be of value: MOFLON, and Wikipedia.

Top

Solution Outline

Model-based transformation is the central idea of the approach. To implement it we have developed a metamodel of CMOF in OWL. Our strategy is to get out of XML into RDF Triples as soon as possible. Using an ontology of XML we convert XMI into a composite model of triples. XML is a simple enough structure for the composite object pattern - elements contain elements and elements have attributes. XML elements and attributes that make up the XMI file are transformed into OWL instances of the CMOF metamodel. Once we have the XMI in triples we can map constructs to classes and properties of a CMOF metamodel. This model then serves as the generator for model-based transformations to an OWL model of the UML.

Once these instances are loaded as "raw" RDF, rules fire to perform the transformations. Rules are associated with classes to ensure that instances of those classes are processed in an execution sequence. Using SPARQL Rules (SPIN), instances of a class are each processed through a binding mechanism specified by ?this variable. SPARQL Rules can be considered an approach that is similar to, or can be compared with, UML's Object Constraint Language (OCL) and the Query/View/Transformation (QVT) approach to transformations.

The benefits of the OWL and SPARQLRules model-based approach to transformation are:

Intimacy of the rules with RDF/OWL - triples are evaluated directly
Understandability - rules are smaller and expressed in the relevant contexts of the model
Enhanced Performance - evaluation of rules is localized to relevant instances
Customizability and Evolvability – transformations can be changed by modifying models and/or SPARQL rules
Ease of maintenance - rules are associated with the constructs they operate over

Top

Overview of Semantic XML

XMI is imported into the CMOF metamodel using TopBraid Composer's Semantic XML as a mapping method. With Semantic XML, TopBraid can automatically generate an OWL/RDF ontology from any XML file. Each distinct XML element name is mapped into a class, and the elements themselves become instances of those classes. A datatype property is generated for each attribute. Nesting of XML elements is represented in OWL using a composite:child property - an object pattern in OWL that is described at this blog entry.

The key idea of Semantic XML is that each of the generated OWL classes and datatype properties is annotated with an annotation property, sxml:element and sxml:attribute, respectively. These properties relate the OWL concepts to the XML serialization. Note that these annotations are also used if an OWL model needs to be serialized back to XML format.

If you import an XML file into an ontology that already contains classes and properties with Semantic XML annotations, then the loader will reuse those. The mapping is bi-directional and loss-less so that files can be loaded, manipulated and saved without losing structural information.

A video explaining how Semantic XML works is available at this link.

Top

OCMOF - the OWL Representation of CMOF

The strategy for the transformation can be summarized as follows:

Use OWL classes to represent XMI Element Types
Use SPARQL Rules on those classes to generate CMOF Metaclasses
Use Metaclasses to make OWL Classes that represent the UML Model

An OWL metamodel of CMOF represents the kinds of containers, elements and attributes shown above. The metamodel was built by studying the UML Metamodel of UML 2.0 - the original motivation for this was to have an automated way of dealing with changes to UML. That will be a future consideration, for now this has proven to be a valuable way of doing verification and validation. The UML metamodel will be covered in Part 2 of this blog series, for Part 1, it is instructive perhaps to show a small piece of the XMI. Below is the XMI for Basic-Property from the UML model infrastructure.cmof.xmi.

<ownedAttribute xmi:type="cmof:Property"
  xmi:id="Core-Basic-Class-ownedAttribute" name="ownedAttribute"
  type="Core-Basic-Property" isOrdered="true"
  upper="*" lower="0" isComposite="true"
  association="Core-Basic-A_ownedAttribute_class">
  <ownedComment xmi:type="cmof:Comment"
    xmi:id="Core-Basic-Class-ownedAttribute-_ownedComment.0"
    annotatedElement="Core-Basic-Class-ownedAttribute">
    <body>The attributes owned by a class.
    These do not include the inherited attributes.
    Attributes are represented by instances of Property.</body>
  </ownedComment>
 </ownedAttribute>

Figure 2: A fragment of the XMI for the UML metamodel

As a example of XMI element mappings, the sxml:element maps the XMI element for ocmof:ownedAttribute as shown in the Turtle extract from the OWL model below.

ocmof:ownedAttribute
  a  owl:Class ;
  rdfs:label "Attribute"^^xsd:string ;
  rdfs:subClassOf ocmof:TypedThing , ocmof:NamedThing ;
  rdfs:subClassOf
    [ a owl:Restriction ;
      owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
      owl:onProperty ocmof:isComposite
    ] ;
  rdfs:subClassOf
    [ a owl:Restriction ;
      owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
      owl:onProperty ocmof:type
    ] ;
  rdfs:subClassOf
    [ a  owl:Restriction ;
      owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
      owl:onProperty ocmof:isDerivedUnion
    ] ;
  rdfs:subClassOf
    [ a  owl:Restriction ;
     owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
     owl:onProperty ocmof:isReadOnly
    ] ;
  rdfs:subClassOf
    [ a  owl:Restriction ;
      owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
      owl:onProperty ocmof:default
    ] ;
  sxml:element "ownedAttribute"^^xsd:string .

Figure 3: ocmof:ownedAttribute in Turtle

The last line, sxml:element "ownedAttribute"^^xsd:string, is the mapping.

As a example of XMI attribute mappings, the sxml:attribute maps the XMI attribute for ocmof:isOrdered as shown in the Turtle extract from the OWL model below.

ocmof:isOrdered
  a owl:DatatypeProperty ;
  rdfs:domain ocmof:ownedAttribute ,
  ocmof:ownedParameter ,
  ocmof:OwnedEnd ;
  rdfs:label "is ordered"^^xsd:string ;
  rdfs:range xsd:boolean ;
  sxml:attribute "isOrdered"^^xsd:string .

Figure 4: ocmof:isOrdered in Turtle

The last line, sxml:attribute "isOrdered"^^xsd:string, is the mapping.

The transformation to OWL results in the following class for uml:Core-Basic-Property.
Clicking on the image will open a larger image in a new window.

Figure 5: A Generated Metaclass Example - uml:Core-Basic-Property

The diagram shows how the datatype properties of the class uml:Core-Basic-Property correspond to the XMI attributes given in the above fragment. For example isComposite becomes the property hasBooleanIsComposite. The prefix hasBoolean is customizable.

First an OWL model of CMOF XML Elements is used to generate instances of metaclasses to build OWL Classes for XMI Elements. The namespace prefix of ocmof has been used to denote all modeling constructs that makeup the CMOF metamodel. The prefix cmof is the namespace for all constructs generated from the import of the XMI files.

In the diagram below, we show the main classes of the metamodel. Classes like NamedThing and TypedThing have been introduced to optimize the work of the transformers. Constructs in XMI can typically be both named and typed. This kind of multiple inheritance is no problem for the transformations. The diagram is a partial view only. Clicking on the image will open a larger image in a new window.

Figure 6: Some of the classes of the CMOF OWL model

As an alternate view, the diagram that follows is an HTML report of NamedThing in TopBraid Composer. This is automatically generated using SPARQL Web Pages (aka UISPIN)

Figure 7: OCMOF NamedThing - an abstract class for the transformations

The diagram below shows more details of some ownedElements. Note how attributes of each of these classes relate to CMOF constructs.

Figure 8: Some "ownedElement" OWL Classes in the OCMOF model

These ocmof classes serve as the starting point for generating ocmof meta-classes and instances of these classes that become the UML model transformed into OWL. The figure below shows the main metaclasses that are generated by rules on the ocmof classes.

Figure 9: The key Meta-classes of the CMOF OWL model

Top

How the Transformations from UML to OWL Work

Model-based transformations use rules associated with OWL Classes. OWL Metaclasses are built using a SPARQL rule for instances of TypedThing. The names of the metaclasses are determined from the value of the xmi:type attribute. A number of SPARQL Rules are defined on TypeThing. Priorities are set by the alphabetic ordering given by the first comment line of the rule. These rules look after the generation of:

UML Metaclasses
UML Classes
UML Class Superclass Relationships
UML Packages
UML Package Relationships

Each rule will now be described.

Top

Generation of UML Metaclasses

The first task is to create a metaclass and class for every type of element in the ingested XMI file. This is done using the SPARQL Rule below:


# STEP CMOF-SR-001  make UML Metaclass from CMOF type
CONSTRUCT {
  ?metaClassURI a rdfs:Class .
  ?metaClassURI rdfs:subClassOf cmof:MetaClass .
  ?metaClassURI rdfs:label ?metaClassLabel .
  ?typeURI a owl:Class .
  ?typeURI a ?metaClassURI .
  ?typeURI rdfs:subClassOf uml:Construct .
  ?typeURI rdfs:label ?classLabel .
 }
WHERE {
  ?this xmi:type ?type .
  FILTER (?type != "cmof:Property") .
  BIND (o2o:localNameOfQName(?type) AS ?name) .
  BIND (fn:concat("CMOF ", ?name) AS ?metaClassLabel) .
  BIND (fn:concat("UML ", ?name) AS ?classLabel) .
  BIND (xmi.common:makeUML-URI(?name) AS ?typeURI) .
  BIND (xmi.common:makeCMOF-URI(?name) AS ?metaClassURI) .
}

Figure 10: The SPARQL Rules that make the metaclasses in the OCMOF model

What is going on in these rules? First we explain the "where" clause.

?this xmi:type ?type binds ?this to an instance of TypedThing. For each instance the rule is evaluated.

FILTER (?type != "cmof:Property") blocks further evaluation of the rule if the instance is of type cmof:Property. The reason for this will be explained in Part 2.

BIND (o2o:localNameOfQName(?type) AS ?name) extracts the name of the type from the QName.

BIND (fn:concat("CMOF ", ?name) AS ?metaClassLabel ) builds a label for the metaclass. The function fn:concat is from the JENA SPARQL Library. We use it here to prepend "CMOF" to the name we get from the type of the XMI Element.

BIND (fn:concat("UML ", ?name) AS ?classLabel) makes a class label from the name. We will be constructing both a metaclass and a class from the XMI type. We build a metaclass in order to say what kind of things can happen on the classes. In other words, the generated OWL model is a 3-level ontology. Likewise here we build a label for the UML Class.

BIND (xmi.common:makeUML-URI(?name) AS ?typeURI) builds a URI for the UML Class corresponding to type. This uses a function call to xmi.common:makeUML-URI whose job it is to build the correct namespace path for a UML construct URI. The implementation is shown below.

SELECT ?uri
WHERE {
  BIND (xmi.common:baseURI() AS ?baseURI) .
  BIND (smf:buildURI("{?baseURI}#{?arg1}") AS ?uri) .
}

where,
smf:buildURI("{?baseURI}#{?arg1}")) builds a URI for the name given in ?arg1 with a base URI supplied by the function xmi.common:baseURI().

BIND (xmi.common:makeCMOF-URI(?name) AS ?metaClassURI ) builds a URI for the metaclass corresponding to type. Likewise this constructs a namspace path for CMOF constructs.

Next we explain what is happening in the head of the rule with the Construct statements. These statements use the generated URIs to create instances of meta-classes and classes.

?metaClassURI a rdfs:Class
gives the metaClass its type.

?metaClassURI rdfs:subClassOf cmof:MetaClass
specifies that the metaclass is a sub-class of cmof:MetaClass - an abstract metaclass for all cmof classes.

?metaClassURI rdfs:label ?metaClassLabel
gives the metaclass a human label.

?typeURI a owl:Class
gives the UML Class a type

?typeURI a ?metaClassURI
gives the UML Class a more specific type so that it can have more properties than owl:Class provides.

?typeURI rdfs:subClassOf uml:Construct
specifies that the UML Class is a subclass of the abstract OWL Class uml:Construct/

?typeURI rdfs:label ?classLabel
gives the UML Class a human label.

Top

Generation of UML Classes

Once we have the necessary metaclasses we can begin the work of creating instances of those classes. These instances will, of course, be classes (the meta-world can get confusing). This work is done the the SPARQL Rule below.


# STEP CMOF-SR-002  make UML Classes from CMOF elements
CONSTRUCT {
  ?type a rdfs:Class .
  ?type rdfs:subClassOf cmof:MetaClass .
  ?class a ?type .
  ?class rdfs:label ?name .
  ?class ocmof:hasCMOFbasis ?this .
  ?superURI a owl:Class .
  ?superURI a cmof:CategoryClass .
  ?superURI rdfs:label ?super .
  ?subURI a owl:Class .
  ?subURI a cmof:CategoryClass .
  ?subURI rdfs:subClassOf ?superURI .
  ?subURI rdfs:label ?sub .
  ?class rdfs:subClassOf ?mySuperClass .
}
WHERE {
  ?this xmi:type "cmof:Class" .
  ?this xmi:id ?name .
  BIND (o2o:pathPart(?name, "-") AS ?path) .
  OPTIONAL {
    ?path o2o:pairHyphenIncrementally ( ?super ?sub ) .
    BIND (xmi.common:makeUML-URI(smf:buildString("CLASSES_{?super}")) AS ?superURI) .
    BIND (xmi.common:makeUML-URI(smf:buildString("CLASSES_{?sub}")) AS ?subURI) .
    BIND (xmi.common:makeCMOF-Resource("cmof:Class") AS ?type) .
    BIND (xmi.common:makeUML-URI(?name) AS ?class) .
    } .
  BIND (xmi.common:makeUML-URI(smf:buildString("CLASSES_{?path}")) AS ?mySuperClass) .
  }

Figure 11: The SPARQL Rules that make UML Classes

More details of this transformation will be given in Part 2 of this blog series. An interesting aspect of this particular rule to mention now is how it builds deep inheritance structures by using Property Functions to recurse over hyphenated names (more on the use of Property Functions, also known as Magic Properties, with TopBraid Composer can be found at this blog entry). These hyphenated names occur throughout the XMI metamodel of UML. For example Core-Basic-Class looks like:

<ownedMember xmi:type="cmof:Class" xmi:id="Core-Basic-Class" name="Class" superClass="Core-Basic-Type">
  <ownedComment xmi:type="cmof:Comment" xmi:id="Core-Basic-Class-_ownedComment.0"
    annotatedElement="Core-Basic-Class">
    <body>A class is a type that has objects as its instances.</body>
  </ownedComment>
  <ownedAttribute xmi:type="cmof:Property" xmi:id="Core-Basic-Class-isAbstract"
    name="isAbstract" type="Core-PrimitiveTypes-Boolean" default="false">
    <ownedComment xmi:type="cmof:Comment" xmi:id="Core-Basic-Class-isAbstract-_ownedComment.0" annotatedElement="Core-Basic-Class-isAbstract">
    <body>True when a class is abstract.</body>
   </Attribute>

Figure 12: Example of Hyphenated Names in the UML Metamodel

How this is done in the SPARQL Rule is explained briefly below.

In the SPARQL Rule shown above in figure 11, the statement in the tail: ?path o2o:pairHyphenIncrementally ( ?super ?sub ) is a Property Function that returns two results: ?super and ?sub for every hypenated pair.
for each pair the statement: ?subURI rdfs:subClassOf ?superURI in the head of the rule builds superclass relationships.

Top

Generation of UML Class Superclass Relationships

Once we have all of the UML Classes, the next rule can build the rdfs:subClassOf relationships.

# STEP CMOF-SR-005 - fixup the superclass of the root Classes
CONSTRUCT {
  ?class rdfs:subClassOf uml:Class .
}
WHERE {
  ?class a cmof:CategoryClass .
  BIND (afn:localname(?class) AS ?className) .
  FILTER fn:starts-with(?className, "CLASSES_") .
  NOT EXISTS {
  ?class rdfs:subClassOf ?superClass .
 } .
}

The result of executing the preceding UML Class rules is the UML Class Hierarchy shown in the diagram below.

Figure 13: Generated UML Metamodel Class Hierarchy

Top

Generation of UML Packages

# STEP CMOF-SR-020 - make Packages
CONSTRUCT {
  ?package rdfs:label ?name .
  ?package ocmof:hasCMOFbasis ?this .
  ?mySuperClass a owl:Class .
  ?mySuperClass rdfs:label ?path .
  ?superURI a owl:Class .
  ?superURI a cmof:CategoryClass .
  ?superURI rdfs:label ?super .
  ?subURI a owl:Class .
  ?subURI a cmof:CategoryClass .
  ?subURI rdfs:subClassOf ?superURI .
  ?subURI rdfs:label ?sub .
  ?package a ?mySuperClass .
  ?package a uml:Package .
}
WHERE {
  ?this xmi:type "cmof:Package" .
  ?this xmi:id ?name .
  BIND (xmi.common:makeUML-URI(?name) AS ?package) .
  BIND (o2o:pathPart(?name, "-") AS ?path) .
  OPTIONAL {
    ?path o2o:pairHyphenIncrementally ( ?super ?sub ) .
    BIND (xmi.common:makeUML-URI(smf:buildString("PACKAGES_{?super}")) AS ?superURI) .
    BIND (xmi.common:makeUML-URI(smf:buildString("PACKAGES_{?sub}")) AS ?subURI) .
  } .
  BIND (xmi.common:makeUML-URI(smf:buildString("PACKAGES_{?path}")) AS ?mySuperClass) .
}

Top

Generation of UML Package Relationships

# STEP CMOF-SR-024 - fixup the superclass of the root Packages
CONSTRUCT {
  ?packageClass rdfs:subClassOf uml:Package .
}
WHERE {
  ?this xmi:type "cmof:Package" .
  ?package ocmof:hasCMOFbasis ?this .
  ?package a ?packageClass .
  NOT EXISTS {
  ?packageClass rdfs:subClassOf ?superClass .
  } .
 }

The result of executing the preceding UML Package rules is the UML Package Hierarchy shown in the diagram below.

Figure 14: Generated UML Metamodel Package Hierarchy

Top

Performance

As a measurement of the performance with the TopBraid Composer release 3.4.0, the conversion of the UML Infrastructure XMI took 38.611 seconds and generated 19,575 statements (RDF triples) on a DELL Studio XPS Laptop with 4GB of memory, running Windows 7. This translates to an inference speed of 507 TPS (Triples per second).

Top

Concluding Remarks

Part 1 of this blog has introduced the power of model-based transformation using SPARQL Rules as a means to transform XMI to OWL. Our experience in doing this work confirms the extensibility and flexibility of this approach. The subject is a complex one requiring a grounding in the intricacies of UML Metamodeling, and a knowledge of SPARQL and SPARQL Rules. We have attempted to do that briefly in this blog - not an easy matter.

Part 2 of this blog series will discuss transforming UML Models to OWL Using SPARQL.