Many enterprise information models are expressed using XML Schemas. Data between applications is commonly exchanged in XML, compliant with those schemas. Connecting XML data from different systems in a coherent aggregated way is a challenge that confronts many organizations. Capabilities of RDF/OWL to describe semantics of different data models and aggregate disparate data are a natural fit for addressing these challenges.
For a number of years now, TopBraid Composer included the ability to convert XSDs and associated XML files to RDF/OWL. However, for some XML Schemas our converter did not work as well as customers needed. For the upcoming TopBraid Composer 3.6.0 release, it was significantly improved to have a more comprehensive coverage of XSD constructs and more meaningful conversion to OWL. In parallel, we improved our XML data conversion to RDF so that transformations automatically happen based on the generated OWL models. And we have improved performance of the transformations.
An overview of the approach is illustrated in the following figure:
Since, the conversion occurs automatically, users do not have to worry about writing any rules for commonly needed mappings. However, those users that need to make further transformations can use SPARQL Rules and SPARQLMotion to customize their generated OWL ontology or further transform RDF triples representing the XML data.
The content of this blog is organized as follows:
- XML Schemas converted as part of our tests
- Some challenges in converting XML Schemas to OWL
- Illustrative example of transformation rules
- Another example of transformation rules
- Complete table of supported transformations
- A SPARQL Metric Query
- Concluding remarks
XML Schemas converted as part of our tests
We tested the importer on a broad range of complicated and large-scale industry standard XSD files, and converted many XML instances with impressive results. The XSDs we have tested with the new importer include:
- Banking
- Energy and Utilities
- MultiSpeak, de-facto standard for defining data needed to be exchanged between software applications in order to support the business processes commonly applied at utilities
- Government
- Oil and Gas
- Healthcare
- Electronics
- IP-XACT, the XML Schema for meta-data documenting Intellectual Property (IP) used in the development, implementation and verification of electronic systems
- Other
- ATML, the Auto-Test Markup Language
Some of the converted schemas will be published at LinkedModels.org. To get an early access to converted models or for any other questions, contact us at TopQuadrant.
The examples we use in this blog are mainly from the Financial products Markup Language (FpML). All FpML 5.2 XSD and XML instance files were tested. An example is transparency standard under FpML, for which the transparency.ttl
ontology was generated from 23 XSD files.
Some challenges in converting XML Schemas to OWL
Some of the challenges in converting XSD to OWL that were addressed are:
- Transforming of anonymous types
- Converting complex types with simple contents
- Resolving conflicting nested element and attribute names during OWL property generation
- When and how to distinquish global elements from complex types with similar names during OWL class generation
- Generating enumerations
- Handling substitution groups both at the XSD and XML levels
- Handling the overriding of an XSD type with
xsi:type
in XML
The example that follows shows the approaches that we have used for the transformation.
Illustrative example of transformation rules
The basic transform for a Complex Type in XSD follows these rules:
- An OWL class is generated for a complex type.
- The URI of the class is generated in three different ways. If the complex type is global and named, then the
name
attribute is used. If the complex type is local and named, then thename
attribute of the owner element is used. If the complex type is anonymous, then the names of its owner element and its parent element are used. - The
xsd:annotation
and attribute annotations describing the complex type get generated asdc:description
,rdfs:comment
and/orskos:definition
OWL annotations. - Nested or reference children elements of the complex type become OWL allValuesFrom restrictions on the class. If the element has a simple type, then a restriction with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range is generated. If the element has a complex type, then a restriction with an OWL object property and an OWL class range is generated. The
minOccurs
andmaxOccurs
values become OWL cardinality restrictions. - Element group and attribute group references are generated as super classes.
- Attributes become restrictions with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range.
An example of a Complex Type, Trade
, in fpml-doc-5-2.xsd of transparency standard is displayed below:
<xsd:complexType name="Trade">
<xsd:annotation>
<xsd:documentation xml:lang="en">
A type defining an FpML trade.</xsd:documentation>
</xsd:annotation>
<xsd:sequence>
<xsd:element name="tradeHeader" type="TradeHeader">
<xsd:annotation>
<xsd:documentation xml:lang="en">
The information on the trade which is not
product specific, e.g. trade date.
</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:group ref="TradeEconomics.model">
<xsd:annotation>
<xsd:documentation xml:lang="en">
The economics of the trade. In the case of an
OTC trade, this is the OTC derivative product.
In the case of a trade of a security,
it is the instrument trade economoics.
</xsd:documentation>
</xsd:annotation>
</xsd:group>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:ID" />
</xsd:complexType>
The following is the graph of the OWL class generated for Trade
complex type, which shows the OWL class, restrictions, annotations and superclass.
The following class diagram shows a more sophisticated view of Trade
and its related classes downstream in the generated ontology (click on the diagram to open a window with a bigger image).
The diagram highlights these advanced features in generation:
- A superclass relation exists between
Trade
, generated from an XSD complex type andTradeEconomics.model
, generated from an XSD element group. - In the XSD,
Swap
element has the substitutionGroupProduct
element. Thus,A_Global-Swap
becomes a subclass ofA_Global-Product
.A_Global-
prefix is used to distinguish the element-derived classes from similarly named complex-type-derived classes. dtype:value
restrictions are generated to hold the simple contents occuring in complex types. The complex content part of the type become other restrictions.- The generated object properties have a
Ref
suffix to distinguish them from datatype properties with same names. Both types of properties can be used in restrictions on different classes as they may be generated from nested or reference children elements under different complex types.
The instance file, "msg_ex001_new_trade.xml" was imported into the transparency ontology. Here is a peek into that XML file:
...
<trade>
<tradeHeader>
<partyTradeIdentifier>
<tradeId tradeIdScheme=
"http://fpml.org/universal_swap_id">123</tradeId>
<tradeId tradeIdScheme=]
"http://fpml.org/submitter_trade_id">456</tradeId>
</partyTradeIdentifier>
<tradeInformation>
...
<cleared>true</cleared>
<nonStandardTerms>false</nonStandardTerms>
<offMarketPrice>false</offMarketPrice>
<largeSizeTrade>false</largeSizeTrade>
...
</tradeInformation>
<tradeDate>2011-02-04</tradeDate>
</tradeHeader>
<swap>
<productType>InterestRateSwap</productType>
<assetClass>InterestRates</assetClass>
<swapStream>
...
</swapStream>
<swapStream>
...
</swapStream>
</swap>
</trade>
...
The above XML constructs were mapped into the following RDF graph, where you can see how the instances, their relationships and their types are generated with respect to the Trade
class diagram (click on the graph to open up a window for a more detailed view).
Another example of transformation rules
The basic transform for an Enumeration in XSD follows these rules:
- An OWL class is generated from an XSD simple type having XSD enumeration facets. The localname of the class has
Enum
suffix to distinguish it from classes generated with similar names. - This class becomes a subclass of
EnumeratedValue
in the same namespace as the OWL class, which itself becomes a subclass ofdtype:EnumeratedValue
. - Each XSD enumeration facet becomes an instance of the generated class.
dtype:value
holds the enumeration value.dtype:order
is the order in which the enumeration facet occurs. - An
Enumeration
class in the same namespace as the OWL class is also generated. This class becomes subclass ofdtype:Enumeration
. An instance of this class is generated as a container to refer to all the instances generated from the current simple type.
Enumerated value instance URIs are generated using a concatenation of the abbreviation of the class localname's upper case letters and the dtype:value
literal.
The following figure shows a graph for PremiumQuoteBasisEnum
class and the OWL constructs generated from the related XSD enumeration facets (click on the diagram to open a window with a bigger image):
Complete table of supported transformations
For the reader interested in more details a full overview of the mapping transformations is given in the following table:
# | XSD/XML Constructs | OWL Constructs |
---|---|---|
1 | xsd:simpleType | owl:Datatype |
2 | xsd:simpleType with xsd:enumeration | Becomes an owl:Class as a subclass of EnumeratedValue . Instances are created for every enumerated value. An instance of Enumeration , referring to all the instances, is created as well as the owl:oneOf union over the instances. |
3 | xsd:complexType over xsd:complexContent | owl:Class |
4 | xsd:complexType over xsd:simpleContent | owl:Class |
5 | xsd:element (global) with complex type | owl:Class and subclass of the class generated from the referenced complex type |
6 | xsd:element (global) with simple type | owl:Datatype |
7 | xsd:element (local to a type) | owl:DatatypeProperty or owl:ObjectProperty depending on the element type. OWL Restrictions are built for the occurrence. |
8 | xsd:group | owl:Class and subclass of A_AbstractElementGroup |
9 | xsd:attributeGroup | owl:Class and subclass of A_AbstractAttributeGroup |
10 | xsd:minOccurs and xsd:maxOccurs | Cardinality specified in minimum cardinality, maximum cardinality and universal (allValuesFrom) OWL restrictions. |
11 | Anonymous Complex Type | As for Complex Type except a URI is constructed from the parent element and the nested element reference. Also, the class is defined as a subclass of A_Anon . |
12 | Anonymous Simple Type | As for Simple Type except a URI is constructed from the parent element and the nested element reference. |
13 | xsd:default on an attribute | Uses dtype:defaultValue to attach a value to the OWL restriction representing the associated property. |
14 | Substitution Groups | Subclass statements are generated for the members. Instance files resolve their types by consulting the OWL model at import-time. |
15 | Annotation attributes on elements | OWL Annotation properties are created and placed directly on the relevant class. |
16 | Annotations using xsd:annotation | Become, based on user selection, dc:description , rdfs:comment and/or skos:definition OWL annotations. |
17 | xsi:type on an XML element | Overrides the schema type with the specified type. |
A SPARQL Metric Query
As a quick check on the generated OWL models, the following is a useful SPARQL query that counts the number of properties on each OWL class.
SELECT ?class (COUNT(DISTINCT ?p) AS ?properties)
WHERE {
?class a owl:Class .
FILTER( afn:namespace( ?class ) =
"http://www.fpml.org/FpML-5/transparency#") .
OPTIONAL {
?class rdfs:subClassOf ?r .
?r a owl:Restriction .
?r owl:onProperty ?p .
}
}
GROUP BY ?class
ORDER BY DESC( ?properties )
The classes for the transparency ontology have the distribution of properties shown in the following figure (click on the diagram to open a window with a bigger image). For example, TradeInformation
has 12 properties:
Concluding remarks
The new capability is easy to use. As before, a convenient import wizard will guide the user. The dialog has a number of new options. XML conversion will happen automatically when users open XML files in TBC or use XML import modules in SPARQLMotion. As long as an XML file is valid against an XSD that it is based on, the XML will be transformed in accordance to the schema. Parts of the XML files that do not validate against a schema will continue to be converted using the default Semantic XML structure. There is also a new option to specify which OWL file to use as a schema when mapping a specific XML file to triples. This feature is also available for spreadsheets and will be covered in a separate blog.
We believe that the importance of this work is not only in its value to harvest XML Schemas. Ability to use the automatic creation of triples from XML instance files directly in applications is proving to be key to a number of customers. For example, TopQuadrant is currently using this approach in a project for the North Sea Oil and Gas industry.
The functionality we have described will be released in TopBraid Composer 3.6.0. This release entered internal beta this week. Please contact us if you want a try these capabilities before general availability, which is currently planned for November.
0 comments:
Post a Comment