Wednesday, September 28, 2011

Living in the XML and OWL World - Comprehensive Transformations of XML Schemas and XML data to RDF/OWL

Many enterprise information models are expressed using XML Schemas. Data between applications is commonly exchanged in XML, compliant with those schemas. Connecting XML data from different systems in a coherent aggregated way is a challenge that confronts many organizations. Capabilities of RDF/OWL to describe semantics of different data models and aggregate disparate data are a natural fit for addressing these challenges.

For a number of years now, TopBraid Composer included the ability to convert XSDs and associated XML files to RDF/OWL. However, for some XML Schemas our converter did not work as well as customers needed. For the upcoming TopBraid Composer 3.6.0 release, it was significantly improved to have a more comprehensive coverage of XSD constructs and more meaningful conversion to OWL. In parallel, we improved our XML data conversion to RDF so that transformations automatically happen based on the generated OWL models. And we have improved performance of the transformations.

An overview of the approach is illustrated in the following figure:

Approach

Since, the conversion occurs automatically, users do not have to worry about writing any rules for commonly needed mappings. However, those users that need to make further transformations can use SPARQL Rules and SPARQLMotion to customize their generated OWL ontology or further transform RDF triples representing the XML data.

The content of this blog is organized as follows:

XML Schemas converted as part of our tests

We tested the importer on a broad range of complicated and large-scale industry standard XSD files, and converted many XML instances with impressive results. The XSDs we have tested with the new importer include:


  1. Banking
    • FpML, the Financial products Markup Language
    • ISO 20022, a standard for Universal financial industry message scheme

  2. Energy and Utilities
    • MultiSpeak, de-facto standard for defining data needed to be exchanged between software applications in order to support the business processes commonly applied at utilities

  3. Government
    • DoDAF, the Department of Defense Architecture Framework
    • NIEM, the U.S. National Information Exchange Model

  4. Oil and Gas
    • ISO 15926, a standard for integration of life-cycle data for process plants including oil and gas production facilities
    • WITSML, Wellsite Information Transfer Standard Markup Language

  5. Healthcare

  6. Electronics
    • IP-XACT, the XML Schema for meta-data documenting Intellectual Property (IP) used in the development, implementation and verification of electronic systems

  7. Other
    • ATML, the Auto-Test Markup Language

Some of the converted schemas will be published at LinkedModels.org. To get an early access to converted models or for any other questions, contact us at TopQuadrant.

The examples we use in this blog are mainly from the Financial products Markup Language (FpML). All FpML 5.2 XSD and XML instance files were tested. An example is transparency standard under FpML, for which the transparency.ttl ontology was generated from 23 XSD files.


Some challenges in converting XML Schemas to OWL


Some of the challenges in converting XSD to OWL that were addressed are:

  1. Transforming of anonymous types
  2. Converting complex types with simple contents

  3. Resolving conflicting nested element and attribute names during OWL property generation

  4. When and how to distinquish global elements from complex types with similar names during OWL class generation

  5. Generating enumerations

  6. Handling substitution groups both at the XSD and XML levels

  7. Handling the overriding of an XSD type with xsi:type in XML

The example that follows shows the approaches that we have used for the transformation.

Illustrative example of transformation rules


The basic transform for a Complex Type in XSD follows these rules:


  1. An OWL class is generated for a complex type.

  2. The URI of the class is generated in three different ways. If the complex type is global and named, then the name attribute is used. If the complex type is local and named, then the name attribute of the owner element is used. If the complex type is anonymous, then the names of its owner element and its parent element are used.

  3. The xsd:annotation and attribute annotations describing the complex type get generated as dc:description, rdfs:comment and/or skos:definition OWL annotations.

  4. Nested or reference children elements of the complex type become OWL allValuesFrom restrictions on the class. If the element has a simple type, then a restriction with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range is generated. If the element has a complex type, then a restriction with an OWL object property and an OWL class range is generated. The minOccurs and maxOccurs values become OWL cardinality restrictions.

  5. Element group and attribute group references are generated as super classes.

  6. Attributes become restrictions with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range.

An example of a Complex Type, Trade, in fpml-doc-5-2.xsd of transparency standard is displayed below:


<xsd:complexType name="Trade">
<xsd:annotation>
<xsd:documentation xml:lang="en">
A type defining an FpML trade.</xsd:documentation>
</xsd:annotation>
<xsd:sequence>
<xsd:element name="tradeHeader" type="TradeHeader">
<xsd:annotation>
<xsd:documentation xml:lang="en">
The information on the trade which is not
product specific, e.g. trade date.
</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:group ref="TradeEconomics.model">
<xsd:annotation>
<xsd:documentation xml:lang="en">
The economics of the trade. In the case of an
OTC trade, this is the OTC derivative product.
In the case of a trade of a security,
it is the instrument trade economoics.
</xsd:documentation>
</xsd:annotation>
</xsd:group>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:ID" />
</xsd:complexType>

The following is the graph of the OWL class generated for Trade complex type, which shows the OWL class, restrictions, annotations and superclass.


Trade Graph

The following class diagram shows a more sophisticated view of Trade and its related classes downstream in the generated ontology (click on the diagram to open a window with a bigger image).


Trade Class Diagram

The diagram highlights these advanced features in generation:



  1. A superclass relation exists between Trade, generated from an XSD complex type and TradeEconomics.model, generated from an XSD element group.

  2. In the XSD, Swap element has the substitutionGroup Product element. Thus, A_Global-Swap becomes a subclass of A_Global-Product. A_Global- prefix is used to distinguish the element-derived classes from similarly named complex-type-derived classes.

  3. dtype:value restrictions are generated to hold the simple contents occuring in complex types. The complex content part of the type become other restrictions.

  4. The generated object properties have a Ref suffix to distinguish them from datatype properties with same names. Both types of properties can be used in restrictions on different classes as they may be generated from nested or reference children elements under different complex types.

The instance file, "msg_ex001_new_trade.xml" was imported into the transparency ontology. Here is a peek into that XML file:


...
<trade>
<tradeHeader>
<partyTradeIdentifier>
<tradeId tradeIdScheme=
"http://fpml.org/universal_swap_id">123</tradeId>
<tradeId tradeIdScheme=]
"http://fpml.org/submitter_trade_id">456</tradeId>
</partyTradeIdentifier>
<tradeInformation>
...
<cleared>true</cleared>
<nonStandardTerms>false</nonStandardTerms>
<offMarketPrice>false</offMarketPrice>
<largeSizeTrade>false</largeSizeTrade>
...
</tradeInformation>
<tradeDate>2011-02-04</tradeDate>
</tradeHeader>
<swap>
<productType>InterestRateSwap</productType>
<assetClass>InterestRates</assetClass>
<swapStream>
...
</swapStream>
<swapStream>
...
</swapStream>
</swap>
</trade>
...

The above XML constructs were mapped into the following RDF graph, where you can see how the instances, their relationships and their types are generated with respect to the Trade class diagram (click on the graph to open up a window for a more detailed view).


Trade Instance Example

Another example of transformation rules

The basic transform for an Enumeration in XSD follows these rules:

  1. An OWL class is generated from an XSD simple type having XSD enumeration facets. The localname of the class has Enum suffix to distinguish it from classes generated with similar names.

  2. This class becomes a subclass of EnumeratedValue in the same namespace as the OWL class, which itself becomes a subclass of dtype:EnumeratedValue.

  3. Each XSD enumeration facet becomes an instance of the generated class. dtype:value holds the enumeration value. dtype:order is the order in which the enumeration facet occurs.

  4. An Enumeration class in the same namespace as the OWL class is also generated. This class becomes subclass of dtype:Enumeration. An instance of this class is generated as a container to refer to all the instances generated from the current simple type.

Enumerated value instance URIs are generated using a concatenation of the abbreviation of the class localname's upper case letters and the dtype:value literal.

The following figure shows a graph for PremiumQuoteBasisEnum class and the OWL constructs generated from the related XSD enumeration facets (click on the diagram to open a window with a bigger image):

Trade Instance Example

Complete table of supported transformations


For the reader interested in more details a full overview of the mapping transformations is given in the following table:


Table: Conversion from XSD Constructs to OWL Constructs
#XSD/XML ConstructsOWL Constructs
1xsd:simpleTypeowl:Datatype
2xsd:simpleType with xsd:enumerationBecomes an owl:Class as a subclass of EnumeratedValue. Instances are created for every enumerated value. An instance of Enumeration, referring to all the instances, is created as well as the owl:oneOf union over the instances.
3xsd:complexType over xsd:complexContentowl:Class
4xsd:complexType over xsd:simpleContentowl:Class
5xsd:element (global) with complex typeowl:Class and subclass of the class generated from the referenced complex type
6xsd:element (global) with simple typeowl:Datatype
7xsd:element (local to a type)owl:DatatypeProperty or owl:ObjectProperty depending on the element type. OWL Restrictions are built for the occurrence.
8xsd:groupowl:Class and subclass of A_AbstractElementGroup
9xsd:attributeGroupowl:Class and subclass of A_AbstractAttributeGroup
10xsd:minOccurs and xsd:maxOccursCardinality specified in minimum cardinality, maximum cardinality and universal (allValuesFrom) OWL restrictions.
11Anonymous Complex TypeAs for Complex Type except a URI is constructed from the parent element and the nested element reference. Also, the class is defined as a subclass of A_Anon.
12Anonymous Simple TypeAs for Simple Type except a URI is constructed from the parent element and the nested element reference.
13xsd:default on an attributeUses dtype:defaultValue to attach a value to the OWL restriction representing the associated property.
14Substitution GroupsSubclass statements are generated for the members. Instance files resolve their types by consulting the OWL model at import-time.
15Annotation attributes on elementsOWL Annotation properties are created and placed directly on the relevant class.
16Annotations using xsd:annotationBecome, based on user selection, dc:description, rdfs:comment and/or skos:definition OWL annotations.
17xsi:type on an XML elementOverrides the schema type with the specified type.

A SPARQL Metric Query

As a quick check on the generated OWL models, the following is a useful SPARQL query that counts the number of properties on each OWL class.




SELECT ?class (COUNT(DISTINCT ?p) AS ?properties)
WHERE {
?class a owl:Class .
FILTER( afn:namespace( ?class ) =
"http://www.fpml.org/FpML-5/transparency#") .
OPTIONAL {
?class rdfs:subClassOf ?r .
?r a owl:Restriction .
?r owl:onProperty ?p .
}
}
GROUP BY ?class
ORDER BY DESC( ?properties )

The classes for the transparency ontology have the distribution of properties shown in the following figure (click on the diagram to open a window with a bigger image). For example, TradeInformation has 12 properties:


FpML Example 2

Concluding remarks

The new capability is easy to use. As before, a convenient import wizard will guide the user. The dialog has a number of new options. XML conversion will happen automatically when users open XML files in TBC or use XML import modules in SPARQLMotion. As long as an XML file is valid against an XSD that it is based on, the XML will be transformed in accordance to the schema. Parts of the XML files that do not validate against a schema will continue to be converted using the default Semantic XML structure. There is also a new option to specify which OWL file to use as a schema when mapping a specific XML file to triples. This feature is also available for spreadsheets and will be covered in a separate blog.

We believe that the importance of this work is not only in its value to harvest XML Schemas. Ability to use the automatic creation of triples from XML instance files directly in applications is proving to be key to a number of customers. For example, TopQuadrant is currently using this approach in a project for the North Sea Oil and Gas industry.

The functionality we have described will be released in TopBraid Composer 3.6.0. This release entered internal beta this week. Please contact us if you want a try these capabilities before general availability, which is currently planned for November.


0 comments:

This is a blog by TopQuadrant, developers of the TopBraid Suite, created to support the pursuit of our ongoing mission - to explode strange semantic myths, to seek out new models that support a new generation of dynamic business applications, to boldly integrate data that no one has integrated before.