Showing posts with label OWL. Show all posts
Showing posts with label OWL. Show all posts

Friday, September 30, 2011

Ontologies and Data Models – are they the same?

Yesterday a question about how ontologies may be different from logical data models was asked by a newcomer on TopBraid Users Forum. As to be expected on the TopBraid Forum, by ontologies he meant specifically ontology models expressed in RDFS/OWL. Because we frequently hear this or similar questions in our trainings, workshops and in conversations with customers, I decided to respond in a blog post instead of writing an e-mail.

Data modeling was invented more than thirty years ago to help with the design of databases, specifically, relational databases. As quoted below, ANSI definition from 1975 differentiated between three data models – conceptual, logical and physical. Data modeling quickly became recognized as a tool for analyzing the semantics of an organization with the respect to the structure and flow of the information used in carrying out organization’s activities. Wikipedia offers the following definition of Data Modeling:

Data modeling is a method used to define and analyze data requirements needed to support the business processes of an organization. The data requirements are recorded as a conceptual data model with associated data definitions. Actual implementation of the conceptual model is called a logical data model.
<…>
In 1975 ANSI described three kinds of data-model instance:
  • Conceptual schema: describes the semantics of a domain (the scope of the model). For example, it may be a model of the interest area of an organization or of an industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationships assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial "language" with a scope that is limited by the scope of the model.
  • Logical schema: describes the structure of some domain of information. This consists of descriptions of (for example) tables, columns, object-oriented classes, and XML tags.
  • Physical schema: describes the physical means used to store data. This is concerned with partitions, CPUs, tablespaces, and the like.
According to ANSI, this approach allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual model. The table/column structure can change without (necessarily) affecting the conceptual model.
These definitions describe a clear progression from conceptual to logical to physical data models. SInce their origin is in the 70s, they reflect certain technology assumptions than no longer hold true.

When information modeling is done to create a relational database, conceptual model must be different from a logical model because there is no place in a relational database structure to capture, for example, business rules, create subsumtion relationships and describe other key aspects of a conceptual model. This semantic information collected and documented as part of the initial modeling is left behind when modelers and designers move on to define a logical data model. The "left behind" parts are used by software developers as they encode business semantics directly into custom programs.

Logical data model is a subset of a conceptual model that can be expressed using a particular technology. However, there are always some performance considerations that require additional changes to the logical data model before it can be implemented in a relational database. Hence, some of the aspects of a logical model are left behind as it gets translated into a physical data model.

Since an ontology is a model of a domain describing objects that inhabit it, all three types of data models can be thought of as ontologies. They range from the most expressive one that describes business concepts and processes (the conceptual model) to less expressive and progressively moving from describing business semantics to describing physical structures of the data as it is stored in the databases (the logical and physical data model). Physical model can be thought of as an ontology of a particular database. Wikipedia goes on to note
Early phases of many software-development projects emphasize the design of a conceptual data model. Such a design can be detailed into a logical data model. In later stages, this model may be translated into physical data model. However, it is also possible to implement a conceptual model directly.
Semantic Web standards (governed by the W3C, the World Wide Web Consortium) make it possible to implement conceptual models directly. This is possible due to the layered architecture of the Semantic Web technology stack consisting of:
  • RDF – a canonical data model that is like relational data model in its ability to connect related objects and unlike relational data model in that the data objects (or resources in RDF-speak) are highly granular.
The smallest unit of information in RDF is not a table or a row in a table, but individual statements – a single fact about a resource. 

These statements are called RDF triples. For example, “Atlantis decommission-date July, 2011” is a triple where Atlantis is a subject of a triple, decommission date is a predicate of a triple and July, 2011 is an object of a triple. Atlantis and decommission date are RDF resources and July, 2011 is XML literal. Subjects and predicates of a triple are always RDF resources. An object can be either a resource or a literal value. Predicates that connect two resources are relationships or associations in the data modeling speak. Predicates connecting a resource to a literal value are attributes. In RDF they are called respectively object and data properties.

Because RDF model is highly canonical, RDF data is schema-less. There are no constraints that require it to fit into tables or hierarchies. RDF data is simply a network of connected triples. As such, it can be used to represent, if needed, both - table structures and hierarchies. Standard mappings have been defined from relational tables and XML hierarchies into RDF.

Another key differentiating factor of RDF is that it was “born on the web”. Each RDF resource has a globally unique identity, a URI (uniform resource identifier). For example, the URI for Atlantis may be http://www.nasa.gov/shuttle/Atlantis and the URI for a decommission date may be http://www.nasa.gov/lifecycle#decommissionDate . As a result, it is possible to link RDF data over web in a way similar to how documents can be hyperlinked over the web. By web we mean all HTTP based networks including intranets and extranets.

RDF databases store and provide query access to RDF data. Just like there are standard languages for query of relational and XML data, there is a standard for querying RDF. It is called SPARQL. True to the web-native nature of RDF, SPARQL is not only a query language, but also a protocol that makes it possible to access RDF data over HTTP.
  • RDFS (RDF Schema) and OWL (Web Ontology Language) – RDF-based languages for expressing business semantics.
Jointly RDFS and OWL offer ability to define classes or groups of resources that share common characteristics such as Vehicles and Space Shuttles. The richness of RDFS/OWL makes it possible to fully express the meaning of the business concepts. Data models in RDFS/OWL are stored in the same way as the data, in RDF triples. For example, we can have triples stating that Space Shuttle is a Class and it is a sub class of a Vehicle class and that a vehicle can have only one decommission date (cardinality = 1) and its value must be xsd:date. And you can go beyond cardinality and use the Semantic Web standards to represent a variety of business rules.

Since the data and the schema are stored in the same way, it is possible to query schemas the same way data is queried and to combine search criteria about schemas with the search criteria about data. For example, we can create SPARQL queries to ask for all vehicles that have been decommissioned, all subclasses of a vehicle class, all relationships and attributes a vehicle should have and, when returning decommissioned vehicles, to provide only data values for the fields that have cardinality = 1.

The use of RDF means that the modeling constructs and definitions can be linked and connected. Organizations can refer to each other’s business definitions. Models can be modularized and re-used where appropriate. Differences between related, but not identical concepts can be described. All of this can now be done in a standard compliant and interoperable way.

A growing number of standards bodies and communities of interest are publishing RDF/OWL data models for their particular domains. For example:
  • SKOS – provides a way to represent taxonomies and thesauri
  • ISO 15926 – offers a data model for sharing life-cycle data for process plants including oil and gas production facilities
  • Ontology for Media Resources - defines a core set of metadata properties for multimedia resources
  • SIOC - defines information about online communities
  • QUDT - provides models describing measurable quantities, units for measuring different kinds of quantities and the data types used to store and manipulate these objects in software
  • Provenance Vocabulary - defines provenance-related metadata
There is much more that can be added to this post including a discussion on the best practices for ontology modeling, ontology architecture, approaches for connecting and mapping models, using rules and constraints, publishing, versioning and governing models. Each of these topics, however, deserves an exploration in its own right.

I will end by pointing to a few relevant related blogs and web pages we have published before:

Wednesday, September 28, 2011

Living in the XML and OWL World - Comprehensive Transformations of XML Schemas and XML data to RDF/OWL

Many enterprise information models are expressed using XML Schemas. Data between applications is commonly exchanged in XML, compliant with those schemas. Connecting XML data from different systems in a coherent aggregated way is a challenge that confronts many organizations. Capabilities of RDF/OWL to describe semantics of different data models and aggregate disparate data are a natural fit for addressing these challenges.

For a number of years now, TopBraid Composer included the ability to convert XSDs and associated XML files to RDF/OWL. However, for some XML Schemas our converter did not work as well as customers needed. For the upcoming TopBraid Composer 3.6.0 release, it was significantly improved to have a more comprehensive coverage of XSD constructs and more meaningful conversion to OWL. In parallel, we improved our XML data conversion to RDF so that transformations automatically happen based on the generated OWL models. And we have improved performance of the transformations.

An overview of the approach is illustrated in the following figure:

Approach

Since, the conversion occurs automatically, users do not have to worry about writing any rules for commonly needed mappings. However, those users that need to make further transformations can use SPARQL Rules and SPARQLMotion to customize their generated OWL ontology or further transform RDF triples representing the XML data.

The content of this blog is organized as follows:

XML Schemas converted as part of our tests

We tested the importer on a broad range of complicated and large-scale industry standard XSD files, and converted many XML instances with impressive results. The XSDs we have tested with the new importer include:


  1. Banking
    • FpML, the Financial products Markup Language
    • ISO 20022, a standard for Universal financial industry message scheme

  2. Energy and Utilities
    • MultiSpeak, de-facto standard for defining data needed to be exchanged between software applications in order to support the business processes commonly applied at utilities

  3. Government
    • DoDAF, the Department of Defense Architecture Framework
    • NIEM, the U.S. National Information Exchange Model

  4. Oil and Gas
    • ISO 15926, a standard for integration of life-cycle data for process plants including oil and gas production facilities
    • WITSML, Wellsite Information Transfer Standard Markup Language

  5. Healthcare

  6. Electronics
    • IP-XACT, the XML Schema for meta-data documenting Intellectual Property (IP) used in the development, implementation and verification of electronic systems

  7. Other
    • ATML, the Auto-Test Markup Language

Some of the converted schemas will be published at LinkedModels.org. To get an early access to converted models or for any other questions, contact us at TopQuadrant.

The examples we use in this blog are mainly from the Financial products Markup Language (FpML). All FpML 5.2 XSD and XML instance files were tested. An example is transparency standard under FpML, for which the transparency.ttl ontology was generated from 23 XSD files.


Some challenges in converting XML Schemas to OWL


Some of the challenges in converting XSD to OWL that were addressed are:

  1. Transforming of anonymous types
  2. Converting complex types with simple contents

  3. Resolving conflicting nested element and attribute names during OWL property generation

  4. When and how to distinquish global elements from complex types with similar names during OWL class generation

  5. Generating enumerations

  6. Handling substitution groups both at the XSD and XML levels

  7. Handling the overriding of an XSD type with xsi:type in XML

The example that follows shows the approaches that we have used for the transformation.

Illustrative example of transformation rules


The basic transform for a Complex Type in XSD follows these rules:


  1. An OWL class is generated for a complex type.

  2. The URI of the class is generated in three different ways. If the complex type is global and named, then the name attribute is used. If the complex type is local and named, then the name attribute of the owner element is used. If the complex type is anonymous, then the names of its owner element and its parent element are used.

  3. The xsd:annotation and attribute annotations describing the complex type get generated as dc:description, rdfs:comment and/or skos:definition OWL annotations.

  4. Nested or reference children elements of the complex type become OWL allValuesFrom restrictions on the class. If the element has a simple type, then a restriction with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range is generated. If the element has a complex type, then a restriction with an OWL object property and an OWL class range is generated. The minOccurs and maxOccurs values become OWL cardinality restrictions.

  5. Element group and attribute group references are generated as super classes.

  6. Attributes become restrictions with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range.

An example of a Complex Type, Trade, in fpml-doc-5-2.xsd of transparency standard is displayed below:


<xsd:complexType name="Trade">
<xsd:annotation>
<xsd:documentation xml:lang="en">
A type defining an FpML trade.</xsd:documentation>
</xsd:annotation>
<xsd:sequence>
<xsd:element name="tradeHeader" type="TradeHeader">
<xsd:annotation>
<xsd:documentation xml:lang="en">
The information on the trade which is not
product specific, e.g. trade date.
</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:group ref="TradeEconomics.model">
<xsd:annotation>
<xsd:documentation xml:lang="en">
The economics of the trade. In the case of an
OTC trade, this is the OTC derivative product.
In the case of a trade of a security,
it is the instrument trade economoics.
</xsd:documentation>
</xsd:annotation>
</xsd:group>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:ID" />
</xsd:complexType>

The following is the graph of the OWL class generated for Trade complex type, which shows the OWL class, restrictions, annotations and superclass.


Trade Graph

The following class diagram shows a more sophisticated view of Trade and its related classes downstream in the generated ontology (click on the diagram to open a window with a bigger image).


Trade Class Diagram

The diagram highlights these advanced features in generation:



  1. A superclass relation exists between Trade, generated from an XSD complex type and TradeEconomics.model, generated from an XSD element group.

  2. In the XSD, Swap element has the substitutionGroup Product element. Thus, A_Global-Swap becomes a subclass of A_Global-Product. A_Global- prefix is used to distinguish the element-derived classes from similarly named complex-type-derived classes.

  3. dtype:value restrictions are generated to hold the simple contents occuring in complex types. The complex content part of the type become other restrictions.

  4. The generated object properties have a Ref suffix to distinguish them from datatype properties with same names. Both types of properties can be used in restrictions on different classes as they may be generated from nested or reference children elements under different complex types.

The instance file, "msg_ex001_new_trade.xml" was imported into the transparency ontology. Here is a peek into that XML file:


...
<trade>
<tradeHeader>
<partyTradeIdentifier>
<tradeId tradeIdScheme=
"http://fpml.org/universal_swap_id">123</tradeId>
<tradeId tradeIdScheme=]
"http://fpml.org/submitter_trade_id">456</tradeId>
</partyTradeIdentifier>
<tradeInformation>
...
<cleared>true</cleared>
<nonStandardTerms>false</nonStandardTerms>
<offMarketPrice>false</offMarketPrice>
<largeSizeTrade>false</largeSizeTrade>
...
</tradeInformation>
<tradeDate>2011-02-04</tradeDate>
</tradeHeader>
<swap>
<productType>InterestRateSwap</productType>
<assetClass>InterestRates</assetClass>
<swapStream>
...
</swapStream>
<swapStream>
...
</swapStream>
</swap>
</trade>
...

The above XML constructs were mapped into the following RDF graph, where you can see how the instances, their relationships and their types are generated with respect to the Trade class diagram (click on the graph to open up a window for a more detailed view).


Trade Instance Example

Another example of transformation rules

The basic transform for an Enumeration in XSD follows these rules:

  1. An OWL class is generated from an XSD simple type having XSD enumeration facets. The localname of the class has Enum suffix to distinguish it from classes generated with similar names.

  2. This class becomes a subclass of EnumeratedValue in the same namespace as the OWL class, which itself becomes a subclass of dtype:EnumeratedValue.

  3. Each XSD enumeration facet becomes an instance of the generated class. dtype:value holds the enumeration value. dtype:order is the order in which the enumeration facet occurs.

  4. An Enumeration class in the same namespace as the OWL class is also generated. This class becomes subclass of dtype:Enumeration. An instance of this class is generated as a container to refer to all the instances generated from the current simple type.

Enumerated value instance URIs are generated using a concatenation of the abbreviation of the class localname's upper case letters and the dtype:value literal.

The following figure shows a graph for PremiumQuoteBasisEnum class and the OWL constructs generated from the related XSD enumeration facets (click on the diagram to open a window with a bigger image):

Trade Instance Example

Complete table of supported transformations


For the reader interested in more details a full overview of the mapping transformations is given in the following table:


Table: Conversion from XSD Constructs to OWL Constructs
#XSD/XML ConstructsOWL Constructs
1xsd:simpleTypeowl:Datatype
2xsd:simpleType with xsd:enumerationBecomes an owl:Class as a subclass of EnumeratedValue. Instances are created for every enumerated value. An instance of Enumeration, referring to all the instances, is created as well as the owl:oneOf union over the instances.
3xsd:complexType over xsd:complexContentowl:Class
4xsd:complexType over xsd:simpleContentowl:Class
5xsd:element (global) with complex typeowl:Class and subclass of the class generated from the referenced complex type
6xsd:element (global) with simple typeowl:Datatype
7xsd:element (local to a type)owl:DatatypeProperty or owl:ObjectProperty depending on the element type. OWL Restrictions are built for the occurrence.
8xsd:groupowl:Class and subclass of A_AbstractElementGroup
9xsd:attributeGroupowl:Class and subclass of A_AbstractAttributeGroup
10xsd:minOccurs and xsd:maxOccursCardinality specified in minimum cardinality, maximum cardinality and universal (allValuesFrom) OWL restrictions.
11Anonymous Complex TypeAs for Complex Type except a URI is constructed from the parent element and the nested element reference. Also, the class is defined as a subclass of A_Anon.
12Anonymous Simple TypeAs for Simple Type except a URI is constructed from the parent element and the nested element reference.
13xsd:default on an attributeUses dtype:defaultValue to attach a value to the OWL restriction representing the associated property.
14Substitution GroupsSubclass statements are generated for the members. Instance files resolve their types by consulting the OWL model at import-time.
15Annotation attributes on elementsOWL Annotation properties are created and placed directly on the relevant class.
16Annotations using xsd:annotationBecome, based on user selection, dc:description, rdfs:comment and/or skos:definition OWL annotations.
17xsi:type on an XML elementOverrides the schema type with the specified type.

A SPARQL Metric Query

As a quick check on the generated OWL models, the following is a useful SPARQL query that counts the number of properties on each OWL class.




SELECT ?class (COUNT(DISTINCT ?p) AS ?properties)
WHERE {
?class a owl:Class .
FILTER( afn:namespace( ?class ) =
"http://www.fpml.org/FpML-5/transparency#") .
OPTIONAL {
?class rdfs:subClassOf ?r .
?r a owl:Restriction .
?r owl:onProperty ?p .
}
}
GROUP BY ?class
ORDER BY DESC( ?properties )

The classes for the transparency ontology have the distribution of properties shown in the following figure (click on the diagram to open a window with a bigger image). For example, TradeInformation has 12 properties:


FpML Example 2

Concluding remarks

The new capability is easy to use. As before, a convenient import wizard will guide the user. The dialog has a number of new options. XML conversion will happen automatically when users open XML files in TBC or use XML import modules in SPARQLMotion. As long as an XML file is valid against an XSD that it is based on, the XML will be transformed in accordance to the schema. Parts of the XML files that do not validate against a schema will continue to be converted using the default Semantic XML structure. There is also a new option to specify which OWL file to use as a schema when mapping a specific XML file to triples. This feature is also available for spreadsheets and will be covered in a separate blog.

We believe that the importance of this work is not only in its value to harvest XML Schemas. Ability to use the automatic creation of triples from XML instance files directly in applications is proving to be key to a number of customers. For example, TopQuadrant is currently using this approach in a project for the North Sea Oil and Gas industry.

The functionality we have described will be released in TopBraid Composer 3.6.0. This release entered internal beta this week. Please contact us if you want a try these capabilities before general availability, which is currently planned for November.


Thursday, February 3, 2011

Converting UML Models to OWL - Part 1: The Approach


Convert UML to OWL - why would you ever want to do this? One reason suffices: many enterprise models, that serve as either standards or enterprise schemas, are specified in UML. Increasingly, there is interest in having content of UML models re-purposed in RDF/OWL and the need for RDF/OWL to interoperate with systems built from UML Models.

UML Models are notoriously hard to exchange between UML tools, let alone be transformed into OWL. The exchange format XMI is not only is difficult to understand but also has vendor-specific extensions. The vagaries of MOF, CMOF and EMOF create their own challenges. Nonetheless we have done transformations of UML to OWL. Using a model-based transformation approach, based on SPARQL Rules, XMI models of UML models can be converted to OWL. UML class diagrams can be represented in OWL without information loss. The inverse, however, is not true and will require another blog series.

UML to OWL - Part 1 Contents

Part 1 of the series explains the basis of the approach. The complete series of blogs, as currently conceived, is as follows:
  • Converting UML Models to OWL - Part 1: The Approach
  • Converting UML Models to OWL - Part 2: Transforming UML Models to OWL Using SPARQL
  • Converting UML Models to OWL - Part 3: Examples of Industry UML Model Transformations
The content of this blog is organized as follows:
  1. Goals, Objectives and Requirements
  2. Backgrounder on XMI
  3. Backgrounder on MOF
  4. Solution Outline
  5. Overview of Semantic XML
  6. OCMOF - the OWL Representation of CMOF
  7. How the Transformations from UML to OWL Work
  8. Generation of UML Metaclasses
  9. Generation of UML Classes
  10. Generation of UML Class Superclass Relationships
  11. Generation of UML Packages
  12. Generation of UML Package Relationships
  13. Performance
  14. Concluding Remarks
Readers who are very interested in the detailed technical approach, should read all sections of this blog in order. Those who just need to have an overview of the approach could skip sections 9 through 12. Those who have deep knowledge of XMI and MOF may want to skip sections 2 and 3, but I would welcome their feedback on the accuracy of my statements.

Note that some diagrams may be too small to be viewed in the body of the document. Clicking on such a diagram will open a new window with a larger depiction of the diagram.

Goals, Objectives and Requirements

The OWL Models must faithfully represent packages and the logical models or class diagrams. Out of scope, currently, are all of the other UML models such as Interaction Diagrams and State Diagrams. The approach must be able to convert UML by processing XMI files from specific tools. This requires a strategy for converting from the XML structures of XMI to OWL models.

Backgrounder on XMI

XMI, the XML Metadata Interchange standard is a serialization format for UML Models. The main purpose of XMI is to define how the XML elements are organized within an XMI file. The XMI spec also defines a mechanism for how one XMI element references another, within and across XMI files. Such a mechanism is needed as it is a legal scenario for a single UML model to be serialized to more than one XMI file.

Backgrounder on MOF

MOF began at the time of CORBA and the need for IDL interfaces. MOF 1.4 resulted in its mapping to Java being codified in the Java Community Process (JCP) as the Java Metadata Initiative (JMI). MOF 2.0 was developed in tandem with UML 2.0. The separation of MOF into EMOF and CMOF was motivated by the influence of EMF's Ecore, and model-driven Java development. CMOF was more the motivation of meta model developers. CMOF stands for Complete Meta Object Facility and is an OMG standard for the UML 2 model interchange. More information can be found at this page on the OMG Website.

CMOF includes fully fledged associations, association generalization, property subsetting and redefinition, derived unions, and package merge. Typical XMI container structures look like the example below, from the CMOF UML Infrastructure Model. The basic idea is that a packagedElement owns other elements. A type attribute specifies the type of the packagedElement.
Things get a little busy with how IDs are used for associations and their member ends. That complication, we can leave for Part 2.


<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmi:version="2.1" xmlns:xmi="http://schema.omg.org/spec/XMI/2.1"
xmlns:cmof="http://schema.omg.org/spec/MOF/2.0/cmof.xml">
<cmof:Package xmi:id="_0" name="InfrastructureLibrary">
<ownedMember xmi:type="cmof:Package" xmi:id="Core" name="Core">
<ownedMember xmi:type="cmof:Package"
xmi:id="Core-Abstractions" name="Abstractions">
<packageImport xmi:type="cmof:packageImport"
xmi:id="Core-Abstractions-_packageImport.0"
importedPackage="Core-PrimitiveTypes"
importingNamespace="Core-Abstractions"/>
<ownedMember xmi:type="cmof:Package"
xmi:id="Core-Abstractions-Ownerships" name="Ownerships">
<packageImport xmi:type="cmof:packageImport"
xmi:id="Core-Abstractions-Ownerships-_packageImport.0"
importedPackage="Core-Abstractions-Elements"
importingNamespace="Core-Abstractions-Ownerships"/>
<ownedMember xmi:type="cmof:Class"
xmi:id="Core-Abstractions-Ownerships-Element" name="Element" isAbstract="true">
<ownedComment xmi:type="cmof:Comment"
xmi:id="Core-Abstractions-Ownerships-Element-_ownedComment.0"
annotatedElement="Core-Abstractions-Ownerships-Element">
<body>An element is a constituent of a model.
As such, it has the capability of owning other elements.</body>
</ownedComment>
<ownedRule xmi:type="cmof:Constraint"
xmi:id="Core-Abstractions-Ownerships-Element-not_own_self"
name="not_own_self" constrainedElement="Core-Abstractions-Ownerships-Element"
namespace="Core-Abstractions-Ownerships-Element">
<ownedComment xmi:type="cmof:Comment"
xmi:id="Core-Abstractions-Ownerships-Element-not_own_self-_ownedComment.0"
annotatedElement="Core-Abstractions-Ownerships-Element-not_own_self">
<body>An element may not directly or indirectly own itself.</body>
</ownedComment>
<specification xmi:type="cmof:OpaqueExpression"
xmi:id="Core-Abstractions-Ownerships-Element-not_own_self-_specification">
<language>OCL</language>
<body>not self.allownedElements()->includes(self)</body>
</specification>
</ownedRule>
...
<ownedAttribute xmi:type="cmof:Property"
xmi:id="Core-Abstractions-Ownerships-Element-ownedElement"
name="ownedElement" type="Core-Abstractions-Ownerships-Element"
upper="*" lower="0" isReadOnly="true" isDerived="true"
isDerivedUnion="true" isComposite="true"
association="Core-Abstractions-Ownerships-A_ownedElement_owner">
<ownedComment xmi:type="cmof:Comment"
xmi:id="Core-Abstractions-Ownerships-Element-ownedElement-_ownedComment.0"
annotatedElement="Core-Abstractions-Ownerships-Element-ownedElement">
<body>The Elements owned by this element.</body>
</ownedComment>
</ownedAttribute>
...

Figure 1: A sample of XMI


For more background on the history of MOF the following references may be of value: MOFLON, and Wikipedia.

Top

Solution Outline

Model-based transformation is the central idea of the approach. To implement it we have developed a metamodel of CMOF in OWL. Our strategy is to get out of XML into RDF Triples as soon as possible. Using an ontology of XML we convert XMI into a composite model of triples. XML is a simple enough structure for the composite object pattern - elements contain elements and elements have attributes. XML elements and attributes that make up the XMI file are transformed into OWL instances of the CMOF metamodel. Once we have the XMI in triples we can map constructs to classes and properties of a CMOF metamodel. This model then serves as the generator for model-based transformations to an OWL model of the UML.

Once these instances are loaded as "raw" RDF, rules fire to perform the transformations. Rules are associated with classes to ensure that instances of those classes are processed in an execution sequence. Using SPARQL Rules (SPIN), instances of a class are each processed through a binding mechanism specified by ?this variable. SPARQL Rules can be considered an approach that is similar to, or can be compared with, UML's Object Constraint Language (OCL) and the Query/View/Transformation (QVT) approach to transformations.

The benefits of the OWL and SPARQLRules model-based approach to transformation are:
  • Intimacy of the rules with RDF/OWL - triples are evaluated directly
  • Understandability - rules are smaller and expressed in the relevant contexts of the model
  • Enhanced Performance - evaluation of rules is localized to relevant instances
  • Customizability and Evolvability – transformations can be changed by modifying models and/or SPARQL rules
  • Ease of maintenance - rules are associated with the constructs they operate over
Top

Overview of Semantic XML

XMI is imported into the CMOF metamodel using TopBraid Composer's Semantic XML as a mapping method. With Semantic XML, TopBraid can automatically generate an OWL/RDF ontology from any XML file. Each distinct XML element name is mapped into a class, and the elements themselves become instances of those classes. A datatype property is generated for each attribute. Nesting of XML elements is represented in OWL using a composite:child property - an object pattern in OWL that is described at this blog entry.

The key idea of Semantic XML is that each of the generated OWL classes and datatype properties is annotated with an annotation property, sxml:element and sxml:attribute, respectively. These properties relate the OWL concepts to the XML serialization. Note that these annotations are also used if an OWL model needs to be serialized back to XML format.

If you import an XML file into an ontology that already contains classes and properties with Semantic XML annotations, then the loader will reuse those. The mapping is bi-directional and loss-less so that files can be loaded, manipulated and saved without losing structural information.

A video explaining how Semantic XML works is available at this link.

OCMOF - the OWL Representation of CMOF

The strategy for the transformation can be summarized as follows:
  • Use OWL classes to represent XMI Element Types
  • Use SPARQL Rules on those classes to generate CMOF Metaclasses
  • Use Metaclasses to make OWL Classes that represent the UML Model
An OWL metamodel of CMOF represents the kinds of containers, elements and attributes shown above. The metamodel was built by studying the UML Metamodel of UML 2.0 - the original motivation for this was to have an automated way of dealing with changes to UML. That will be a future consideration, for now this has proven to be a valuable way of doing verification and validation. The UML metamodel will be covered in Part 2 of this blog series, for Part 1, it is instructive perhaps to show a small piece of the XMI. Below is the XMI for Basic-Property from the UML model infrastructure.cmof.xmi.
<ownedAttribute xmi:type="cmof:Property"
xmi:id="Core-Basic-Class-ownedAttribute" name="ownedAttribute"
type="Core-Basic-Property" isOrdered="true"
upper="*" lower="0" isComposite="true"
association="Core-Basic-A_ownedAttribute_class">
<ownedComment xmi:type="cmof:Comment"
xmi:id="Core-Basic-Class-ownedAttribute-_ownedComment.0"
annotatedElement="Core-Basic-Class-ownedAttribute">
<body>The attributes owned by a class.
These do not include the inherited attributes.
Attributes are represented by instances of Property.</body>
</ownedComment>
</ownedAttribute>

Figure 2: A fragment of the XMI for the UML metamodel


As a example of XMI element mappings, the sxml:element maps the XMI element for ocmof:ownedAttribute as shown in the Turtle extract from the OWL model below.


ocmof:ownedAttribute
a owl:Class ;
rdfs:label "Attribute"^^xsd:string ;
rdfs:subClassOf ocmof:TypedThing , ocmof:NamedThing ;
rdfs:subClassOf
[ a owl:Restriction ;
owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
owl:onProperty ocmof:isComposite
] ;
rdfs:subClassOf
[ a owl:Restriction ;
owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
owl:onProperty ocmof:type
] ;
rdfs:subClassOf
[ a owl:Restriction ;
owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
owl:onProperty ocmof:isDerivedUnion
] ;
rdfs:subClassOf
[ a owl:Restriction ;
owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
owl:onProperty ocmof:isReadOnly
] ;
rdfs:subClassOf
[ a owl:Restriction ;
owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
owl:onProperty ocmof:default
] ;
sxml:element "ownedAttribute"^^xsd:string .

Figure 3: ocmof:ownedAttribute in Turtle

The last line, sxml:element "ownedAttribute"^^xsd:string, is the mapping.

As a example of XMI attribute mappings, the sxml:attribute maps the XMI attribute for ocmof:isOrdered as shown in the Turtle extract from the OWL model below.


ocmof:isOrdered
a owl:DatatypeProperty ;
rdfs:domain ocmof:ownedAttribute ,
ocmof:ownedParameter ,
ocmof:OwnedEnd ;
rdfs:label "is ordered"^^xsd:string ;
rdfs:range xsd:boolean ;
sxml:attribute "isOrdered"^^xsd:string .

Figure 4: ocmof:isOrdered in Turtle

The last line, sxml:attribute "isOrdered"^^xsd:string, is the mapping.

The transformation to OWL results in the following class for uml:Core-Basic-Property.
Clicking on the image will open a larger image in a new window.

Core-Basic-Property Classes

Figure 5: A Generated Metaclass Example - uml:Core-Basic-Property

The diagram shows how the datatype properties of the class uml:Core-Basic-Property correspond to the XMI attributes given in the above fragment. For example isComposite becomes the property hasBooleanIsComposite. The prefix hasBoolean is customizable.

First an OWL model of CMOF XML Elements is used to generate instances of metaclasses to build OWL Classes for XMI Elements. The namespace prefix of ocmof has been used to denote all modeling constructs that makeup the CMOF metamodel. The prefix cmof is the namespace for all constructs generated from the import of the XMI files.

In the diagram below, we show the main classes of the metamodel. Classes like NamedThing and TypedThing have been introduced to optimize the work of the transformers. Constructs in XMI can typically be both named and typed. This kind of multiple inheritance is no problem for the transformations. The diagram is a partial view only. Clicking on the image will open a larger image in a new window.


OCMOF Classes

Figure 6: Some of the classes of the CMOF OWL model

As an alternate view, the diagram that follows is an HTML report of NamedThing in TopBraid Composer. This is automatically generated using SPARQL Web Pages (aka UISPIN)

NamedThing Summary generated by UISPIN

Figure 7: OCMOF NamedThing - an abstract class for the transformations

The diagram below shows more details of some ownedElements. Note how attributes of each of these classes relate to CMOF constructs.

Owned Elements

Figure 8: Some "ownedElement" OWL Classes in the OCMOF model

These ocmof classes serve as the starting point for generating ocmof meta-classes and instances of these classes that become the UML model transformed into OWL. The figure below shows the main metaclasses that are generated by rules on the ocmof classes.

CMOF Metaclasses

Figure 9: The key Meta-classes of the CMOF OWL model



How the Transformations from UML to OWL Work

Model-based transformations use rules associated with OWL Classes. OWL Metaclasses are built using a SPARQL rule for instances of TypedThing. The names of the metaclasses are determined from the value of the xmi:type attribute. A number of SPARQL Rules are defined on TypeThing. Priorities are set by the alphabetic ordering given by the first comment line of the rule. These rules look after the generation of:
  1. UML Metaclasses
  2. UML Classes
  3. UML Class Superclass Relationships
  4. UML Packages
  5. UML Package Relationships

Each rule will now be described.



Generation of UML Metaclasses

The first task is to create a metaclass and class for every type of element in the ingested XMI file. This is done using the SPARQL Rule below:

# STEP CMOF-SR-001 make UML Metaclass from CMOF type
CONSTRUCT {
?metaClassURI a rdfs:Class .
?metaClassURI rdfs:subClassOf cmof:MetaClass .
?metaClassURI rdfs:label ?metaClassLabel .
?typeURI a owl:Class .
?typeURI a ?metaClassURI .
?typeURI rdfs:subClassOf uml:Construct .
?typeURI rdfs:label ?classLabel .
}
WHERE {
?this xmi:type ?type .
FILTER (?type != "cmof:Property") .
BIND (o2o:localNameOfQName(?type) AS ?name) .
BIND (fn:concat("CMOF ", ?name) AS ?metaClassLabel) .
BIND (fn:concat("UML ", ?name) AS ?classLabel) .
BIND (xmi.common:makeUML-URI(?name) AS ?typeURI) .
BIND (xmi.common:makeCMOF-URI(?name) AS ?metaClassURI) .
}

Figure 10: The SPARQL Rules that make the metaclasses in the OCMOF model

What is going on in these rules? First we explain the "where" clause.

?this xmi:type ?type binds ?this to an instance of TypedThing. For each instance the rule is evaluated.

FILTER (?type != "cmof:Property") blocks further evaluation of the rule if the instance is of type cmof:Property. The reason for this will be explained in Part 2.

BIND (o2o:localNameOfQName(?type) AS ?name) extracts the name of the type from the QName.

BIND (fn:concat("CMOF ", ?name) AS ?metaClassLabel ) builds a label for the metaclass. The function fn:concat is from the JENA SPARQL Library. We use it here to prepend "CMOF" to the name we get from the type of the XMI Element.


BIND (fn:concat("UML ", ?name) AS ?classLabel) makes a class label from the name. We will be constructing both a metaclass and a class from the XMI type. We build a metaclass in order to say what kind of things can happen on the classes. In other words, the generated OWL model is a 3-level ontology. Likewise here we build a label for the UML Class.

BIND (xmi.common:makeUML-URI(?name) AS ?typeURI) builds a URI for the UML Class corresponding to type. This uses a function call to xmi.common:makeUML-URI whose job it is to build the correct namespace path for a UML construct URI. The implementation is shown below.

SELECT ?uri
WHERE {
BIND (xmi.common:baseURI() AS ?baseURI) .
BIND (smf:buildURI("{?baseURI}#{?arg1}") AS ?uri) .
}

where,
smf:buildURI("{?baseURI}#{?arg1}")) builds a URI for the name given in ?arg1 with a base URI supplied by the function xmi.common:baseURI().

BIND (xmi.common:makeCMOF-URI(?name) AS ?metaClassURI ) builds a URI for the metaclass corresponding to type. Likewise this constructs a namspace path for CMOF constructs.


Next we explain what is happening in the head of the rule with the Construct statements. These statements use the generated URIs to create instances of meta-classes and classes.


?metaClassURI a rdfs:Class
gives the metaClass its type.


?metaClassURI rdfs:subClassOf cmof:MetaClass
specifies that the metaclass is a sub-class of cmof:MetaClass - an abstract metaclass for all cmof classes.

?metaClassURI rdfs:label ?metaClassLabel
gives the metaclass a human label.

?typeURI a owl:Class
gives the UML Class a type

?typeURI a ?metaClassURI
gives the UML Class a more specific type so that it can have more properties than owl:Class provides.

?typeURI rdfs:subClassOf uml:Construct
specifies that the UML Class is a subclass of the abstract OWL Class uml:Construct/

?typeURI rdfs:label ?classLabel
gives the UML Class a human label.



Generation of UML Classes

Once we have the necessary metaclasses we can begin the work of creating instances of those classes. These instances will, of course, be classes (the meta-world can get confusing). This work is done the the SPARQL Rule below.

# STEP CMOF-SR-002 make UML Classes from CMOF elements
CONSTRUCT {
?type a rdfs:Class .
?type rdfs:subClassOf cmof:MetaClass .
?class a ?type .
?class rdfs:label ?name .
?class ocmof:hasCMOFbasis ?this .
?superURI a owl:Class .
?superURI a cmof:CategoryClass .
?superURI rdfs:label ?super .
?subURI a owl:Class .
?subURI a cmof:CategoryClass .
?subURI rdfs:subClassOf ?superURI .
?subURI rdfs:label ?sub .
?class rdfs:subClassOf ?mySuperClass .
}
WHERE {
?this xmi:type "cmof:Class" .
?this xmi:id ?name .
BIND (o2o:pathPart(?name, "-") AS ?path) .
OPTIONAL {
?path o2o:pairHyphenIncrementally ( ?super ?sub ) .
BIND (xmi.common:makeUML-URI(smf:buildString("CLASSES_{?super}")) AS ?superURI) .
BIND (xmi.common:makeUML-URI(smf:buildString("CLASSES_{?sub}")) AS ?subURI) .
BIND (xmi.common:makeCMOF-Resource("cmof:Class") AS ?type) .
BIND (xmi.common:makeUML-URI(?name) AS ?class) .
} .
BIND (xmi.common:makeUML-URI(smf:buildString("CLASSES_{?path}")) AS ?mySuperClass) .
}

Figure 11: The SPARQL Rules that make UML Classes

More details of this transformation will be given in Part 2 of this blog series. An interesting aspect of this particular rule to mention now is how it builds deep inheritance structures by using Property Functions to recurse over hyphenated names (more on the use of Property Functions, also known as Magic Properties, with TopBraid Composer can be found at this blog entry). These hyphenated names occur throughout the XMI metamodel of UML. For example Core-Basic-Class looks like:
<ownedMember xmi:type="cmof:Class" xmi:id="Core-Basic-Class" name="Class" superClass="Core-Basic-Type">
<ownedComment xmi:type="cmof:Comment" xmi:id="Core-Basic-Class-_ownedComment.0"
annotatedElement="Core-Basic-Class">
<body>A class is a type that has objects as its instances.</body>
</ownedComment>
<ownedAttribute xmi:type="cmof:Property" xmi:id="Core-Basic-Class-isAbstract"
name="isAbstract" type="Core-PrimitiveTypes-Boolean" default="false">
<ownedComment xmi:type="cmof:Comment" xmi:id="Core-Basic-Class-isAbstract-_ownedComment.0" annotatedElement="Core-Basic-Class-isAbstract">
<body>True when a class is abstract.</body>
</Attribute>

Figure 12: Example of Hyphenated Names in the UML Metamodel


How this is done in the SPARQL Rule is explained briefly below.


In the SPARQL Rule shown above in figure 11, the statement in the tail: ?path o2o:pairHyphenIncrementally ( ?super ?sub ) is a Property Function that returns two results: ?super and ?sub for every hypenated pair.
for each pair the statement: ?subURI rdfs:subClassOf ?superURI in the head of the rule builds superclass relationships.



Generation of UML Class Superclass Relationships

Once we have all of the UML Classes, the next rule can build the rdfs:subClassOf relationships.
# STEP CMOF-SR-005 - fixup the superclass of the root Classes
CONSTRUCT {
?class rdfs:subClassOf uml:Class .
}
WHERE {
?class a cmof:CategoryClass .
BIND (afn:localname(?class) AS ?className) .
FILTER fn:starts-with(?className, "CLASSES_") .
NOT EXISTS {
?class rdfs:subClassOf ?superClass .
} .
}
The result of executing the preceding UML Class rules is the UML Class Hierarchy shown in the diagram below.

UML Metamodel Class Hierarchy

Figure 13: Generated UML Metamodel Class Hierarchy

Top

Generation of UML Packages

# STEP CMOF-SR-020 - make Packages
CONSTRUCT {
?package rdfs:label ?name .
?package ocmof:hasCMOFbasis ?this .
?mySuperClass a owl:Class .
?mySuperClass rdfs:label ?path .
?superURI a owl:Class .
?superURI a cmof:CategoryClass .
?superURI rdfs:label ?super .
?subURI a owl:Class .
?subURI a cmof:CategoryClass .
?subURI rdfs:subClassOf ?superURI .
?subURI rdfs:label ?sub .
?package a ?mySuperClass .
?package a uml:Package .
}
WHERE {
?this xmi:type "cmof:Package" .
?this xmi:id ?name .
BIND (xmi.common:makeUML-URI(?name) AS ?package) .
BIND (o2o:pathPart(?name, "-") AS ?path) .
OPTIONAL {
?path o2o:pairHyphenIncrementally ( ?super ?sub ) .
BIND (xmi.common:makeUML-URI(smf:buildString("PACKAGES_{?super}")) AS ?superURI) .
BIND (xmi.common:makeUML-URI(smf:buildString("PACKAGES_{?sub}")) AS ?subURI) .
} .
BIND (xmi.common:makeUML-URI(smf:buildString("PACKAGES_{?path}")) AS ?mySuperClass) .
}


Generation of UML Package Relationships


# STEP CMOF-SR-024 - fixup the superclass of the root Packages
CONSTRUCT {
?packageClass rdfs:subClassOf uml:Package .
}
WHERE {
?this xmi:type "cmof:Package" .
?package ocmof:hasCMOFbasis ?this .
?package a ?packageClass .
NOT EXISTS {
?packageClass rdfs:subClassOf ?superClass .
} .
}
The result of executing the preceding UML Package rules is the UML Package Hierarchy shown in the diagram below.

UML Metamodel Packages Hierarchy

Figure 14: Generated UML Metamodel Package Hierarchy




Performance

As a measurement of the performance with the TopBraid Composer release 3.4.0, the conversion of the UML Infrastructure XMI took 38.611 seconds and generated 19,575 statements (RDF triples) on a DELL Studio XPS Laptop with 4GB of memory, running Windows 7. This translates to an inference speed of 507 TPS (Triples per second).

Concluding Remarks

Part 1 of this blog has introduced the power of model-based transformation using SPARQL Rules as a means to transform XMI to OWL. Our experience in doing this work confirms the extensibility and flexibility of this approach. The subject is a complex one requiring a grounding in the intricacies of UML Metamodeling, and a knowledge of SPARQL and SPARQL Rules. We have attempted to do that briefly in this blog - not an easy matter.

Part 2 of this blog series will discuss transforming UML Models to OWL Using SPARQL.


This is a blog by TopQuadrant, developers of the TopBraid Suite, created to support the pursuit of our ongoing mission - to explode strange semantic myths, to seek out new models that support a new generation of dynamic business applications, to boldly integrate data that no one has integrated before.