Many enterprise information models are expressed using XML Schemas. Data between applications is commonly exchanged in XML, compliant with those schemas. Connecting XML data from different systems in a coherent aggregated way is a challenge that confronts many organizations. Capabilities of RDF/OWL to describe semantics of different data models and aggregate disparate data are a natural fit for addressing these challenges.
For a number of years now, TopBraid Composer included the ability to convert XSDs and associated XML files to RDF/OWL. However, for some XML Schemas our converter did not work as well as customers needed. For the upcoming TopBraid Composer 3.6.0 release, it was significantly improved to have a more comprehensive coverage of XSD constructs and more meaningful conversion to OWL. In parallel, we improved our XML data conversion to RDF so that transformations automatically happen based on the generated OWL models. And we have improved performance of the transformations.
An overview of the approach is illustrated in the following figure:
Since, the conversion occurs automatically, users do not have to worry about writing any rules for commonly needed mappings. However, those users that need to make further transformations can use SPARQL Rules and SPARQLMotion to customize their generated OWL ontology or further transform RDF triples representing the XML data.
The content of this blog is organized as follows:
- XML Schemas converted as part of our tests
- Some challenges in converting XML Schemas to OWL
- Illustrative example of transformation rules
- Another example of transformation rules
- Complete table of supported transformations
- A SPARQL Metric Query
- Concluding remarks
We tested the importer on a broad range of complicated and large-scale industry standard XSD files, and converted many XML instances with impressive results. The XSDs we have tested with the new importer include:
- Energy and Utilities
- MultiSpeak, de-facto standard for defining data needed to be exchanged between software applications in order to support the business processes commonly applied at utilities
- Oil and Gas
- IP-XACT, the XML Schema for meta-data documenting Intellectual Property (IP) used in the development, implementation and verification of electronic systems
- ATML, the Auto-Test Markup Language
Some of the converted schemas will be published at LinkedModels.org. To get an early access to converted models or for any other questions, contact us at TopQuadrant.
The examples we use in this blog are mainly from the Financial products Markup Language (FpML). All FpML 5.2 XSD and XML instance files were tested. An example is transparency standard under FpML, for which the
transparency.ttl ontology was generated from 23 XSD files.
Some of the challenges in converting XSD to OWL that were addressed are:
- Transforming of anonymous types
- Converting complex types with simple contents
- Resolving conflicting nested element and attribute names during OWL property generation
- When and how to distinquish global elements from complex types with similar names during OWL class generation
- Generating enumerations
- Handling substitution groups both at the XSD and XML levels
- Handling the overriding of an XSD type with
The example that follows shows the approaches that we have used for the transformation.
The basic transform for a Complex Type in XSD follows these rules:
- An OWL class is generated for a complex type.
- The URI of the class is generated in three different ways. If the complex type is global and named, then the
nameattribute is used. If the complex type is local and named, then the
nameattribute of the owner element is used. If the complex type is anonymous, then the names of its owner element and its parent element are used.
xsd:annotationand attribute annotations describing the complex type get generated as
- Nested or reference children elements of the complex type become OWL allValuesFrom restrictions on the class. If the element has a simple type, then a restriction with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range is generated. If the element has a complex type, then a restriction with an OWL object property and an OWL class range is generated. The
maxOccursvalues become OWL cardinality restrictions.
- Element group and attribute group references are generated as super classes.
- Attributes become restrictions with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range.
An example of a Complex Type,
Trade, in fpml-doc-5-2.xsd of transparency standard is displayed below:
A type defining an FpML trade.</xsd:documentation>
<xsd:element name="tradeHeader" type="TradeHeader">
The information on the trade which is not
product specific, e.g. trade date.
The economics of the trade. In the case of an
OTC trade, this is the OTC derivative product.
In the case of a trade of a security,
it is the instrument trade economoics.
<xsd:attribute name="id" type="xsd:ID" />
The following is the graph of the OWL class generated for
Trade complex type, which shows the OWL class, restrictions, annotations and superclass.
The following class diagram shows a more sophisticated view of
Trade and its related classes downstream in the generated ontology (click on the diagram to open a window with a bigger image).
The diagram highlights these advanced features in generation:
- A superclass relation exists between
Trade, generated from an XSD complex type and
TradeEconomics.model, generated from an XSD element group.
- In the XSD,
Swapelement has the substitutionGroup
A_Global-Swapbecomes a subclass of
A_Global-prefix is used to distinguish the element-derived classes from similarly named complex-type-derived classes.
dtype:valuerestrictions are generated to hold the simple contents occuring in complex types. The complex content part of the type become other restrictions.
- The generated object properties have a
Refsuffix to distinguish them from datatype properties with same names. Both types of properties can be used in restrictions on different classes as they may be generated from nested or reference children elements under different complex types.
The instance file, "msg_ex001_new_trade.xml" was imported into the transparency ontology. Here is a peek into that XML file:
The above XML constructs were mapped into the following RDF graph, where you can see how the instances, their relationships and their types are generated with respect to the
Trade class diagram (click on the graph to open up a window for a more detailed view).
The basic transform for an Enumeration in XSD follows these rules:
- An OWL class is generated from an XSD simple type having XSD enumeration facets. The localname of the class has
Enumsuffix to distinguish it from classes generated with similar names.
- This class becomes a subclass of
EnumeratedValuein the same namespace as the OWL class, which itself becomes a subclass of
- Each XSD enumeration facet becomes an instance of the generated class.
dtype:valueholds the enumeration value.
dtype:orderis the order in which the enumeration facet occurs.
Enumerationclass in the same namespace as the OWL class is also generated. This class becomes subclass of
dtype:Enumeration. An instance of this class is generated as a container to refer to all the instances generated from the current simple type.
Enumerated value instance URIs are generated using a concatenation of the abbreviation of the class localname's upper case letters and the
The following figure shows a graph for
PremiumQuoteBasisEnum class and the OWL constructs generated from the related XSD enumeration facets (click on the diagram to open a window with a bigger image):
For the reader interested in more details a full overview of the mapping transformations is given in the following table:
|#||XSD/XML Constructs||OWL Constructs|
|2||Becomes an |
|10||Cardinality specified in minimum cardinality, maximum cardinality and universal (allValuesFrom) OWL restrictions.|
|11||Anonymous Complex Type||As for Complex Type except a URI is constructed from the parent element and the nested element reference. Also, the class is defined as a subclass of |
|12||Anonymous Simple Type||As for Simple Type except a URI is constructed from the parent element and the nested element reference.|
|14||Substitution Groups||Subclass statements are generated for the members. Instance files resolve their types by consulting the OWL model at import-time.|
|15||Annotation attributes on elements||OWL Annotation properties are created and placed directly on the relevant class.|
|16||Annotations using ||Become, based on user selection, |
|17||Overrides the schema type with the specified type.|
As a quick check on the generated OWL models, the following is a useful SPARQL query that counts the number of properties on each OWL class.
SELECT ?class (COUNT(DISTINCT ?p) AS ?properties)
?class a owl:Class .
FILTER( afn:namespace( ?class ) =
?class rdfs:subClassOf ?r .
?r a owl:Restriction .
?r owl:onProperty ?p .
GROUP BY ?class
ORDER BY DESC( ?properties )
The classes for the transparency ontology have the distribution of properties shown in the following figure (click on the diagram to open a window with a bigger image). For example,
TradeInformation has 12 properties:
The new capability is easy to use. As before, a convenient import wizard will guide the user. The dialog has a number of new options. XML conversion will happen automatically when users open XML files in TBC or use XML import modules in SPARQLMotion. As long as an XML file is valid against an XSD that it is based on, the XML will be transformed in accordance to the schema. Parts of the XML files that do not validate against a schema will continue to be converted using the default Semantic XML structure. There is also a new option to specify which OWL file to use as a schema when mapping a specific XML file to triples. This feature is also available for spreadsheets and will be covered in a separate blog.
We believe that the importance of this work is not only in its value to harvest XML Schemas. Ability to use the automatic creation of triples from XML instance files directly in applications is proving to be key to a number of customers. For example, TopQuadrant is currently using this approach in a project for the North Sea Oil and Gas industry.
The functionality we have described will be released in TopBraid Composer 3.6.0. This release entered internal beta this week. Please contact us if you want a try these capabilities before general availability, which is currently planned for November.