VOYAGES OF THE SEMANTIC ENTERPRISE

TopQuadrant supports PhUSE Semantic technology Initiative creating RDF Representations of CDISC Standards for Clinical Trial Information.

2013-12-02T16:14:00.000-08:00

Recently, PhUSE and CDISC announced the completion of Phase I of the FDA/PhUSE Semantic Technology Working Group Project. The PhUSE Semantic Technology Working Group aims to investigate how formal semantic standards can support the clinical and non-clinical trial data life cycle from protocol to submission. This deliverable includes a draft set of existing CDISC standards represented in RDF.

In this stage of the project, the focus is on describing meta-models and models that describe and allow flexible use of the standards. This important first step was created out of an industry regulator collaboration initiated through PhUSE and supported by many CDISC volunteers who originally developed these foundational standards.

The ADaM representation of this project was co-led by Phil Ashworth, semantic solution architect at TopQuadrant, with Josephine Gough and contributions from Nate Freimark, Dave Jordan, Kirsten Langendorf and Mitra Rocca. ADaM is the last standard in the food chain when it comes to presenting the results of a clinical trial for revision by the authorities. Designed to standardize the way that analysis results are stored and thus presented to authorities, ADaM provides standards for statistical analyses to reduce and eliminate any processing required by the reviewer.

According to Phil, the current diverse set of standards leave the understanding of data and data relationships more open to interpretation. This initiative will create a more efficient, accurate way of representing and submitting data to the FDA and allowing data to be analyzed based on a common model where the meaning is well understood. For more details on Phil’s involvement in this project, read this wiki on ADaM in RDF standards.

In the next stage of this project, PhUSE and CDISC will take a closer look at demonstrating the capture of information based on these initial models and then submitting to the FDA. Additional information about this project can be found on the PhUSE wiki.

Through our collaboration with PhUSE and CDISC, we hope to enable quicker decision-making for life science professionals. At TopQuadrant, we understand they need to simply and readily bring data together and view, explore and analyze it.

Event Recap: Robert Coyne presented at MANAGE VARIETY—create VALUE from Big Data

2013-11-22T08:11:00.000-08:00

Gartner named semantic technologies one of the 10 most important trends in 2013 and beyond. When it comes to Big Data, semantic technologies deliver on the variety aspect of the three Vs of Big Data (Volume, Velocity, and Variety) by enabling the right data to be processed in the correct way. During the Gartner Symposium ITxpo in Barcelona, TopQuadrant’s partner Computas held a satellite event, MANAGE VARIETY—create VALUE from Big Data, on November 10^th to share how customers are getting value by managing the variety of their Big Data.

Robert Coyne, a co-founder of TopQuadrant, presented on the company’s vision for semantic information ecosystems and showcased several case studies where Semantic Web Technology is a core element in their solutions. As Robert told the MANAGE VARIETY audience, semantic ecosystem solutions are evolutionary, allowing organizations to build a modern information infrastructure incrementally with capabilities that let them adapt and capitalize on rapidly changing scenarios, have better access to information resources, discover new opportunities, and inform better decision making.

Attendees also learned more about TopQuadrant’s partnership with Computas, which is aimed at delivering efficient semantic solutions in Europe. Current projects include delivering IT services for data exchange and collaboration in the Oil and Gas industry and an ontology management and semantic enrichment infrastructure for the Organization for Economic Co-operation and Development (OECD), whose Knowledge Information Management (KIM) program aims to establish an integrated strategy and framework for managing and delivering information and knowledge.

TopQuadrant is working with Computas to fulfill the vision of the Exploration & Production Information Management Association (EPIM) to build a shared suite of knowledge based-applications for operators on the Norwegian Continental Shelf using semantic technology and industry-standard domain concepts. Computas and TopQuadrant are engaged with EPIM to develop EnvironmentHub, a solution using TopBraid capabilities and tools to provide a flexible semantic web standards-based platform for environmental reporting and data integration. TopQuadrant previously developed the ReportingHub system, a standards-based information-exchange solution that uses semantic web standards to store, query and analyze exploration and production data.

Want to learn more about these initiatives with EPIM and Computas or semantic information ecosystems? Leave us a comment and we’ll fill you in.

TopQuadrant’s Bob Ducharme speaking at Taxonomy Boot Camp 2013

2013-10-11T08:04:00.000-07:00

Bob DuCharme is speaking on the upcoming "Semantic Search" panel during Taxonomy Boot Camp on November 6th in Washington D.C. Bob’s topic, "Enhancing Searches With Taxonomies and Semantic Technology," will cover how taxonomies based on semantic technology can help with focusing and augmenting searches, correcting terms, disambiguation and using other vocabulary metadata sources to increase the quality of your search results. Attendees will learn more about how TopBraid EVN, acting as a vocabulary server, can improve a search engine's ability to get more accurate and relevant results, and the role of semantic technology standards in making this happen.

Taxonomy Boot Camp is the premiere event for people who manage vocabularies. Bob has spoken at this conference before, noting “it's been interesting to see ideas about the value of semantic technology spread out from thought leaders in this field to a wider audience.”

In addition to speaking, Bob is looking forward to attending presentations at the conference to learn more about the latest techniques for managing controlled vocabularies–specifically, how people use these vocabularies to get more value out of other information assets (for example, by aiding search engines). According to Bob, “talking with attendees also brings me up to date about the latest challenges facing taxonomists, and this provides great input for new TopBraid EVN features."

Are you attending the show? If so, drop us a line and don’t miss out on Bob and the Semantic Search panel from 3:15 p.m. to 4:00 p.m on November 6^th!

Looking forward to Semtechbiz in San Francisco next week

2013-05-29T09:53:00.000-07:00

The Semantic Technology & Business series of conferences has held successful events in London, Berlin, New York City, and Washington D.C. in the last few years, but the annual one in San Francisco is the big one. We're a silver sponsor of next week's event, where we'll be looking forward to showing everyone what we've been up to and learning more about what everyone else has been doing with standards-based semantic technology.

In the exposition hall, we'll be at booth 116, right by the exhibit hall entrance. In our booth, we'll be showing TopBraid Life Sciences Insight, a Logical Data Warehouse aimed at people in the Life Sciences industry, and TopBraid Enterprise Vocabulary Net, our standards-based multi-user vocabulary management solution. We'll also be happy to talk about any of our other products and projects.

We'll be giving four talks at the conference:

TopQuadrant CTO Ralph Hodgson will present A Case Study of a Deployed Reporting Hub for the Norwegian Oil and Gas Industry, describing the EPIM Reporting Hub that we developed with Franz Inc. (another perennial presence at the semtech conferences). In particular, Ralph will discuss the business benefits that resulted from using semantic technologies in the project.
Peter Lawrence, filling in for Phil Ashworth, will talk about ways to create a standards-based federated search environment in Enabling Data Integration and Discovery Using the Semantic Web.
I'll discuss approaches to Enhancing Searches with Semantic Technology—the standards, the available datasets, and the tools that can be used to improve searches.
Peter will also give a talk titled A Logical Data Warehouse in Action, in which he demonstrates TopBraid Insight.

Stop by our booth or one of our talks to say hi and to tell us about what you've been doing with semantic technologies and what you hope to do. We love the opportunity that this conference gives us to learn the latest about what people are doing with semantic web standards.

Introducing TopBraid Life Sciences Insight—at Bio-IT World 2013

2013-04-05T09:28:00.001-07:00

We're looking forward very much to showing TopBraid Life Sciences Insight (TopBraid LSI) at the Bio-IT World conference and expo next week in Boston.

Through our research, we've found efforts in drug discovery, clinical trial research, and other data-intensive life sciences tasks are often hampered by the need for more efficient federated queries across silos, bad alignment of related data, and dependence on expensive, inflexible tools.

Working with experts in the field, we've developed TopBraid LSI as a Logical or Virtual Data Warehouse: coordinated views on multiple data sources that let you query and use these data sources without actually loading them into a single data warehouse. A web-based interface lets users identify alignments between different data sources without requiring them to know the standards used to store and leverage these alignments, and then, using an approach similar to Map-Reduce, queries can be efficiently distributed across the data sources to return federated answer sets. You can learn more about how TopBraid LSI works and how it can help life sciences professionals from our new white paper (PDF) on it.

TopBraid LSI has also been selected as a finalist for Bio-IT's Best of Show award in the Informatics Tools & Data Category, so we're also looking forward to showing it to the judges. And, we're taking part in the conference's New Product Showcase, which has a very interesting mix of cutting edge research and IT tools for the life sciences field. If you'll be at Bio-IT World, stop by and see us at Booth 323, where you can find out more about TopBraid Life Sciences Insight.

Who needs SKOS-XL? Maybe no one

2012-07-26T18:35:00.001-07:00

The SKOS-XL extension to the W3C’s SKOS standard for vocabulary management adds flexibility in how you track concept names, but it adds complexity and potential confusion that are rarely, if ever, worth it.

What is the appeal of SKOS-XL? Information modelers wanting to separate concepts (so-called conceptual ideas) from terms (the names people use for concepts) often base their thinking on the model of Semiotic Triangle http://en.wikipedia.org/wiki/Triangle_of_reference or Peirce’s Triangle. Sometimes also called a triangle of meaning, this philosophy distinguishes a concept that exists in a human mind—a thought—from how it is referred to and from a symbol that evokes it.

A referent is understood as a word. A symbol is typically explained as a pictorial depiction. A key aspect of this theory is its focus on human cognition. It postulates that there can be no name or identity intrinsic to a concept as it only exists as a thought in a human mind.

The challenge of applying this thinking to information modeling is that, ultimately, in information modeling we must commit everything to paper, electronic or otherwise. Thus, every concept must have an identity and a name. As a result, a separate model for concepts and terms where terms themselves have identity, names, relationships and are tracked separately from concepts is typically an over-complication that does not deliver practical value. For one thing, even explaining to a business audience a difference between a concept and a term is not simple. Colloquially, these words are often used interchangeably. Once explained, distinguishing and keeping track of these on an ongoing basis, when both “concepts” and “terms” are more often than not named using the same words, can be mind boggling.

SKOS takes a simpler and what we believe to be a more practical approach to information modeling. It provides a way to describe concepts by giving each one:

A globally unique identity
A preferred label that is unique for a human language (such as English or German) within a scope of a particular “concept scheme”. It is called skos:prefLabel.
Any number of alternative labels called skos:altLabel. Concept's alternative labels in a given language should not be the same as its preferred label in this language.
Whatever other properties (attributes and relationships) are deemed necessary:

SKOS supplies some standard relationships such as skos:broader, skos:related and skos:exactMatch and a number of annotations that are thought to be universally useful such as skos:definition and skos:editorialNote.
Users of SKOS are free to add properties specific to their domain. For example, when using SKOS to describe different companies, a user may want to add a stock ticker field.

If needed, metadata about labels can be captured without giving them identity of their own. TopBraid EVN is a good example of a tool that offers this capability. Besides the language part, such metadata is typically not just about the label itself, but about its relationship to the concept—for example, who said that this is a preferred label for this concept and when. All the relationships are between concepts, not between the labels.

The W3C has published an optional extension to SKOS called SKOS-XL (SKOS eXtension for Labels) that accommodates those who want to give separate identity to concepts and terms. It does not use the word “term”—presumably, because informally terms are often understood as concepts and vice versa. Instead it introduces a class Label explained as a “lexical entity”. While the extension is small with only one new class and five new properties, its implications are far-reaching. As a result, providing tool support for SKOS-XL is considerably more complex than for SKOS proper.

Following the SKOS-XL model, labels are not strings as in SKOS proper, but RDF resources with their own identity. Each label can have only one literal form; this is where the actual text string (the name) goes. The literal form is not one per Label per language as with SKOS’s constraint for assigning preferred labels, but one per Label. So, to accommodate different languages, different label resources must be created. At the same time, there can be multiple Label resources with the same literal form (for example, two different Label resources with the literal form “Mouse”). Even a simple SKOS-XL vocabulary is considerably bulkier than its SKOS alternative. Since SKOS-XL format takes far more more space, storage, import/export and performance of search and query can become an issue for larger vocabularies.

Concepts are connected to Labels by relationships that indicate preferred (skosxl:prefLabel) and alternative Labels (skosxl:altLabel) for a Concept. There is no cardinality restrictions on these relationships–that is, a Concept can be linked to multiple Labels using skosxl:prefLabel link. Labels can be linked to each other using skosxl:labelRelation relationship. These links are separate from the relationships between Concepts.

Direct use of SKOS properties that associate label strings with Concepts can be tricky when using SKOS-XL. According to SKOS-XL label strings for Concepts are derived using rules such as:

The property chain (skosxl:prefLabel, skosxl:literalForm) is a sub-property of skos:prefLabel.

This means that if there is a Label ex:Label1 with literal form “love” and a Concept ex:Concept1 where ex:Concept1 connects to ex:Label1 using a skosxl:prefLabel relationship, we can conclude that ex:Concept1 has a skos:prefLabel value of “love”. Since simultaneously keeping the integrity of directly entered and inferred values is problematic, any tool supporting SKOS-XL must protect the user from directly entering label strings for Concepts. This makes it difficult to use the same tool to edit for SKOS and SKOS-XL vocabularies, especially if users want to intermix different vocabulary formats.

Furthermore, a user will see the same text label for different entities. This will be not only because different Labels can have the same literal forms, but also because the Concept resources “inherit” string labels from the associated Label resources. This can easily lead to confusing results.

There are also various integrity clashes between SKOS and SKOS-XL. For example:

1. Two different preferred labels in the same language

ex:Concept1skosxl:prefLabelex:Label1; skosxl:prefLabelex:Label2.
ex:Label1 skosxl:literalForm "love"@en .
ex:Label2 skosxl:literalForm "adoration"@en .

This is not “wrong” according to SKOS-XL because a Concept can be connected to multiple Labels using the skosxl:prefLabel relationship. But, it means that ex:Concept1has skos:prefLabel values of both "love"@en and "adoration"@en. This is a violation of SKOS constraint S14, which prohibits a concept from having more than one preferred label string in a given language.

Clash between preferred and alternative labels

ex:Concept1skosxl:prefLabel ex:Label1; skosxl:altLabel ex:Label2
ex:Label1 skosxl:literalForm "love"@en .
ex:Label2 skosxl:literalForm "love"@en .

Again, this is not “wrong” according to SKOS-XL because different Labels can have the same literal form, but it’s a problem for SKOS because it implies identical English language preferred and alternative label strings for ex:Concept1.

Without a doubt, these issues play a role in the fact that while the use and tool support for SKOS is growing, there are few if any tools for SKOS-XL or published SKOS-XL vocabularies. An even more important factor is the lack of compelling business value that would justify SKOS-XL complexity. Having talked to a wide range of users working on business vocabularies, we have yet to hear a use case that cannot be supported by SKOS alone.

Does SKOS-XL look like the only viable approach to your vocabulary management needs? Let’s discuss it—maybe we can help you find a simpler solution.

New white paper: Controlled vocabularies, taxonomies, and thesauruses (and ontologies)

2012-05-29T04:16:00.000-07:00

What's the difference between a controlled vocabulary, a taxonomy, a thesaurus, and an ontology? We've found people using some of these terms interchangeably, and it's difficult to find reliable official definitions for each.
We've come up with some unofficial working definitions that have served us well when discussing different kinds of controlled vocabularies and their associated metadata. Our new white paper Controlled vocabularies, taxonomies, and thesauruses (and ontologies) describes each of these and which TopQuadrant products give customers the control they need over the kinds of vocabularies and metadata that they're working with.
We hope that this short white paper gives you a better idea of why people build, use, and store controlled vocabularies and the advantages that standards bring to this work.

Data cathedrals versus information bazaars?

2012-05-13T08:03:00.003-07:00

Enterprises create data cathedrals with an enforced dogma to control data purity, causing much information to be outside its walls where informal information bazaars thrive. These information bazaars have suspect quality, uncertain provenance, yet are responsive to users’ needs. Metcalf's law suggests that the benefit gained from integrated information grows geometrically¹ with the number of data communities that are integrated. How can we balance the dogma of the data cathedrals and the spontaneity of the information bazaar?

Enterprise's database cathedrals reflect corporate dogma. Nothing gets changed without approval from high. Change is very slow. New databases orders get integrated only after a considerably long time assuming that the new data is 100% squeaky clean. So there are a lot of databases that are entirely outside the database cathedrals' walls. Badly behaved sources of data might even be excommunicated.

Where does the other data go? It is not as though this other data does not exist, although many would like to pretend it to be so. Instead they are all in the information bazaar. Anyone with any information can set up their own information stall, and store their own data in Excel, Access, anywhere they want. They only specialize in their own data for their own use. This data is pretty good because that is all they need for their business. They share well with others but on a barter basis. In fact the information bazaar is chaotic, but lively, always changing to users’ demands, and a fun place to be.

Why do we have the conflict between the database cathedral and the information bazaars?

The data cathedral offers security, quality, and good provenance. It provides the system of record for users who then should have complete confidence in their decision making. It does this using accurate relational models capturing enterprise information. But a relational model is designed by the cathedral hierarchy based on the closed model: only pure data can be entered into the database; impure data can lead to excommunication.

The information bazaar has few rules of entry. As demonstrated by the web, it allows anyone to say anything about anything (AAA). Even with this deficiency we will regularly search the web to help us with our decision making, not exploring sources that are suspect, and filtering information that we feel lacks accuracy until we end up with information to support our decision.

Can we resolve these conflicting objectives?

Can we expect the cathedral hierarchy to relax its admittance criteria to let in as much of the information bazaar as possible? Somewhat, but we cannot expect miracles.

Can we expect the information bazaar to become more sober and responsible so that it can securely provide information with guaranteed quality and provenance? Somewhat, but we cannot expect an evangelical conversion?

Really this is not optimal, because the benefit of having data integrated grows geometrically with the number of interconnected sources, yet the database cathedral cannot grow because the information bazaar does not meet their purity dogma.

So how can these conflicting objectives be redeemed?

One path to redemption is to unite the information bazaar through a common semantic model. This allows all information to be available within a universal graph (model). Of course some riff-raff will get in, but again that is an advantage for the semantic model as you can also declare rules that will verify the accuracy of the data even though it is already stored.

At the same time the data cathedral can continue to expand, hopefully at faster pace, by integrating those graphs that meet their criteria.

However we allow users to access both the data cathedral, from where they can obtain the system of record, and information bazaar. We could even report results federating form the two data-sources annotating that information from the information bazaar with its provenance and hence less certain data quality. Doing this in a standards compliant way turns existing enterprise information resources into connectable, responsive and interoperable semantic assets.

Harmony

Using this approach we don’t need to force the data cathedral to relax its dogma, nor do we ask the information bazaar to shut down. Yet we can offer users access to 99% of the enterprise information providing users the 'Metcalf'¹ benefits of full integration. As semantic assets grow and connect, they enable a resilient semantic ecosystem of meaningful interactions between people, applications and data irrespective of the differences in structures, data schemas, governance and technologies. The dividing boundaries between the cathedral and the bazaar no longer need to be obstacles to information users. Semantic ecosystem seamlessly embraces and provides integrated access to data cathedrals and information bazaars alike.

¹ If I have 10 database systems running my business that are entirely disconnected, then the benefits are 10 * K, some constant. If I integrate these databases in pairs (operations + accounting, accounting + payroll, etc), then the benefits increase to 10 * K * 2. If I integrate in threes, (operations + accounting + maintenance, accounting + payroll + receiving, etc), then the benefits increase four-fold (a corollary of Metcalf's law) to 10 * K * 4. For quad-wise integration my benefits would be 10 * K * 8 and so on. Now it might not be 8 fold but the point is there is a geometric, not linear, growth in benefits as I integrate all of my information across my organization.

Can semantic technology melt process industry’s icebergs of information?

2012-04-30T15:47:00.001-07:00

Icebergs of information loiter throughout process manufacturing IT waiting to sink any information integration project. The impact of semantic technologies is being felt in medicine, life sciences, intelligence, and elsewhere but can it solve this problem in process manufacturing? The ability to federate information from multiple data-sources into a schema-less structure, and then deliver that federated information in any format and in accordance with any standard schema uniquely positions semantic technology. Is this a sweet spot for semantic technologies?

Process Manufacturing Application Focus over the Years

Over the years we have been solving problems within process manufacturing IT only to uncover more problems. Once the problem was that of measurement data in silos which was solved by the introduction of real-time data historians. However that created the problems of data visibility, solved by the introduction of graphical user interfaces. This introduced data overload which was partially solved by the introduction of analytical tools to digest the information and produce diagnostics. Unfortunately these tools were difficult to deploy across all assets within an organization, so we have been trying to solve that problem with information models. The current problem is how to convert the diagnostics into actionable knowledge with the use of work-flow engines and ensuring the sustainability of applications as solutions increases in complexity.

Process Manufacturing Application Problems and Solutions over the Years
	1985- 1995	1990- 2000	1995- 2005	2000- 2010	2005- 2015	2010-
Problem	Measurement data in silos	Data access and visualization	Analysis and business intelligence	Contextualized information	Consistent actioning	Sustainability
Industry Response	Real-time databases collecting measurements (proprietary)	Graphical user interfaces, trending and reporting tools (proprietary)	Analytical tools to digest data into information and diagnostics	Plant data models (ProdML, ISA-95, ISO15926, IEC 61970/61968, Proprietary)	ISO-9001	Outsourcing Standards
Consequence	Data but no user access	Data overload	Deployability of analysis to all assets	Interpretation limited to experts	Complexity, much more than RTDB, limiting sustainability	Improved ongoing application benefits

However it is not only the increased technological complexity that is causing problems. Business decisions now cross many more business boundaries. When measurement data was trapped in silos we were content with unit-wide or plant-wide data historians. Now a well performance problem might involve a maintenance engineer located in Houston accessing a Mimosa[1]-based maintenance management system, an operations engineer located in Aberdeen accessing an OPC-UA[2]-based data historian, a production engineer located in London accessing a custom system driven by WITSML[3]-based feeds, and a facilities engineer using an ISO-15926[4] facilities management model. Not only are the participants in different locations and business units, but they also rely on different systems using different models to support their decision making. However they all should be talking about the same well, measured by the same instruments, producing the same flows, and processed by the same equipment.

The problem is that these operational support systems are not simply data silos whose homogeneous data we need to merge into one to answer our questions. In fact these operational support systems are icebergs of information. Above the surface they publish a public perspective focused on the core operational function of the application. However this data needs context, so below the surface is much of the same information that is contained in other systems. This information provides the context to the operational data so that the operational system can perform its required functions. For example the historian needs to know something about the instruments that are the source of its measurements; maintenance management systems need to know not only about the equipment to be maintained but the location of that equipment, physically and organizationally.

Figure 1: Icebergs of Information

Icebergs of information are not limited to the operational data stores deployed in organizations. An essential practice in these days of interoperability requirements is the adoption of model standards. However even these exhibit the same problems as shown by the diagram below. This diagram maps the available standards to its focus within the hydrocarbon supply chain.

Figure 2: Multiple Overlapping Model Standards

Increasing regulatory and competitive demands on the business are forcing decision making to be more timely, and to be more integrated across the traditional business boundaries. However these icebergs are getting in the way of effective decision making.

One way to make any or all of this information available to consumers is to create the bigger iceberg. ‘Simply’ create the relational database schema that covers every past, current, and future business need, and build adapters to populate this database from the operational data stores. Unfortunately this mega-store can only get more complex as it has to keep up with an expanding scope of information required to support the decision making processes.

Figure 3: Integration using the Bigger Iceberg

Alternatively we can keep building data-marts every time someone has a different business query. However these do not provide the timeliness required to support operational decision making.

The Need for a Babel-Fish

We cannot meet the needs of the business, and solve their decision making needs by having one mega-store because it will never keep up with the changing business requirements. Instead we need a babel-fish (with thanks to the Hitchhikers Guide to the Galaxy).

This babel-fish can consume all of the different operational data in different standards, and translate them into any standard that the end-consumer wants. Thus the babel-fish will need to know that OPC UA's concept 'hasInstrument' has the same meaning as Mimosa's concept of 'Instrumented'. Similarly 10FIC107 from an OPCUA provider is the same as 10-FIC-1-7 from Mimosa.

1. Information providers (operational data stores) within the business will want to provide information according to their capabilities, but preferably using the standards appropriate for their application. For example measurements should be OPC UA, maintenance should use Mimosa

2. Information consumers will want to consume information in the form of one or more standards appropriate for their application.

Figure 4: Integration babel-fish

The Semantic/RDF model comes to the rescue

First of all a definition: a semantic model means organizing all data and knowledge as RDF triples {subject, property, object}. Thus {:Peter, :hasAge, 21^^:years}, and {:Pump101, :manufacturedBy, :Rotek} are examples of RDF triples. RDF triples can be persisted in a variety of ways: SQL table, custom organizations, NoSQL, XML files and many more. If we were designing relational database to hold these RDF triples we would only have one ‘table’ so it may appear that we have no schema, in the relational database design-sense when we have key relationships to enforce integrity, and unique indices to enforce uniqueness. However we can add other statements about the data such as {:Pump101, :type, :ReciprocatingPump} and {:ReciprocatingPump, :subClassOf, :Pump}^{^[5]}. Used in combination with a reasoner we can infer consequences from these asserted facts, such as :Pump101 is a type of :Pump, and Peter is not a :Pump, despite rumors to the contrary. These triples can be visualized as the links in a graph with the subject and object being the nodes of the graph, and the property the name of the edge linking these nodes:

Figure 5: RDF Triples as a graph

Over the years, new modeling metaphors have been introduced to solve perceived or actual problems with their predecessors. For example the Relational Model had perceived difficulties associated with reporting, model complexity, flexibility, and data distribution. A semantic model helps solve these problems.

Figure 6: Evolution of Model Metaphors

· In response to the perceived reporting issues, OLAP techniques were introduced along with the data warehouse. This greatly eased the problem of user-reporting, and data mining. However it did introduce the problem of data duplication.

o A semantic model can query against a federated model in which information is distributed throughout the original data sources.

· In response to the perceived complexity issues, various forms of object-orientated modeling were introduced. There is no doubt that it is easier to think of one’s problem in terms of an object model rather than a complex relational or ER model, especially when there are a large number of entities and relations.

o The semantic model is built around the very simple concept of statements of facts such as {:Peter, :hasAge, 21^^:years}, and {:Pump101, :manufacturedBy, :Rotek} combined with statements that describe the model such as {:Pump101, :type, :ReciprocatingPump} and {:ReciprocatingPump, :subClassOf, :Pump}.

· The model flexibility problem occurs when, after the model has been designed, the business needs the model to change. In response to this flexibility issue, the choice is to make the original model anticipate all potential uses but then risk complexity, or use an object-relational approach in which it is possible to add new attributes without changing the underlying storage schema.

o In semantic models these relationships are expressed in triples, using RDFS, SKOS, OWL, etc. Thus RDF is also used as the physical model (in RDF stores, at least).

· There have been various responses to data distribution.

o In the relational world there is not much choice other than to replicate the data from heterogeneous data stores using Extract-Transform-Load (ETL) techniques. In the case of homogenous but distributed databases distributed queries are possible, although it does require intimate knowledge of all the schemas in all of the distributed databases.

o In the object-orientated world we are in a worse situation: it is very difficult to manage a distributed object in which different objects are distributed or attributes are distributed.

The good news is that a semantic approach is the ideal (or even the only) approach that can solve the information integration problem as follows:

1. Convert to RDF normal form: Convert all source data into RDF. The data can be left at source and fetched on demand (federated) or moved into temporary RDF storage

There are already standard ways of doing this for any spreadsheet, relational database, XML schema, and more. For example, TopBraid Suite (http://www.topquadrant.com/products/TB_Suite.html) provides converters and adaptors for all common data sources. It is relatively easy to create more mappings such as OPCUA. The dynamic adapters act as SPARQLEndpoints[6].

2. Federated data model: Create 'rules' that map one vocabulary to another.

The language of these rules would be RDFS, SKOS and OWL. For example you can declare {OPCUA:hasInstrument, owl:sameAs, Mimosa:Instrumented}. Note that these are simply additional statements expressed in RDF which are then used by a reasoner to infer the consequences such as :FI101 is actually the same as :10FIC101.
More sophisticated rules can also be created using directly RDF and SPARQL. For some examples, see SPIN or SPARQL Rules at http://spinrdf.org/ and http://www.w3.org/Submission/2011/SUBM-spin-overview-20110222/

3. Chameleon data services: Create consumer queries that extract the information from the combined model into the standard required using SPARQL queries.

For example even though all instrument data is in OPCUA, a consumer could use a Mimosa interface to fetch this data. The results can then be published as web-services for consumption by external applications using SPARQLMotion (http://www.topquadrant.com/products/SPARQLMotion.html)

Figure 7: Federation End-to-End

Let’s look into these steps in detail:

Convert to RDF normal form

Despite the fact that data will be stored in different formats (relational, XML, object, Excel, etc) according to different schemas they can always be converted into RDF triples. Always is a strong word, but it really does work. There are already ways of doing this for any spreadsheet, relational database, XML schema, and more and it is relatively easy to create more mappings such as OPC-UA. The data can be left at source and fetched on demand (federated) or moved into temporary RDF storage. For example, TopBraid Suite (http://www.topquadrant.com/products/TB_Suite.html) provides converters and adaptors for all common data sources.

Figure 8: Conversion to RDF Normal Form

Federated Data Model

A federated data model allows different graphs (aka databases) to be aggregated by linking the shared objects. This applies to real-time measurements (OPC-UA), maintenance (MIMOSA), production data (ProdML), or any external database. We can visualize this as combining the graphs of the individual operational data stores into a single graph.

Of course there will be vocabulary differences between the different data-sources. For example, in the OPC-UA data-source you might have a property OPCUA:hasInstrument, and in a MIMOSA data-source the equivalent is called Mimosa:Instrumented. So the federated data model incorporates 'rules' that map one vocabulary to another. The language of these rules would be RDFS, SKOS, and OWL. For example, in OWL, you can declare {OPCUA:hasInstrument owl:sameAs Mimosa:Instrumented}. Note that these are simply additional statements expressed as RDF triples which are then used by a reasoner to infer consequences such as :FI101 is actually the same as :10FIC101.

There will also be identity differences between the different data-sources. These can also be handled by additional statements, such as {:TANK#102, owl:sameAs, :TK102 }. This allows a reasoner to infer that the statement {:TK102, :has_price, 83^^:$} also applies to :TANK#102, implying {:TANK#102, :has_price, 83^^:$}.

Figure 9: Information Federated from MulTiple Datasources

Chameleon Data Services

To extract information from the federated information, the best choice is SPARQL, the semantic equivalent of SQL only simpler. Whilst SQL allows one to query the contents of multiple tables within a database, SPARQL matches patterns within the graph. With SQL we need to know in which table each field belongs. With SPARQL we define the graph pattern that we want to match, and the query engine will search throughout the federated graphs to find the matches. In the example illustrated below we do not need to know that the price attribute comes from one data source, whilst the volume comes from another. In fact SPARQL allows even further flexibility. The price attribute for Tank#101 could come from a different data source than the price attribute for Tank#102. This is part of the magic of the semantic technology.

Figure 10: Graph Pattern matching with SPARQL

SPARQL can be used to directly query the federated graph for reporting purposes, however most consumers of the information will expect to interface to a web-service, with SOAP or REST being the most popular. These services do not have to be programmed. Instead they can be declared using SPARQLMotion (http:www.sparqlmotion.org) to produce easily consumed and adaptable web-services. The designer for SPARQLMotion is shown below:

Figure 11: Example SPARQLMotion

Semantic/RDF advantages for the Process Manufacturing

Despite solving a complex data integration problem, Semantic/RDF is inherently simpler. Can there be anything simpler than storing all knowledge as RDF triples? Despite this simplicity, we do not lose any expressivity.

There is no predefined schema to limit flexibility. However the schema rules, encoded as tables and keys in the relational model, can still be expressed using RDFS, OWL, and SKOS statements.

Deconstructing all information into statements (triples) allows data from distributed sources to be easily merged into a single graph.

Any information model can be reconstructed from the merged graph using SPARQL and presented as web-services (SOAP or REST).

[1] MIMOSA is a not-for-profit trade association dedicated to developing and encouraging the adoption of open information standards for Operations and Maintenance in manufacturing, fleet, and facility environments. MIMOSA's open standards enable collaborative asset lifecycle management in both commercial and military applications.

[2] The Unified Architecture (UA) is THE next generation OPC standard that provides a cohesive, secure and reliable cross platform framework for access to real time and historical data and events.

[3] WITSML™ (Wellsite Information Transfer Standard Markup Language) is an industry initiative to provide open, non-proprietary, standard interfaces for technology and software that monitor and manage wells, completions and workovers.
[4] ISO 15926 provides integration of life-cycle data for process plants including oil and gas production facilities

[5] I should really be using URIs instead of text labels for subject, property, and objects, but the intent of the semantic model is conveyed more simply if we avoid identifiers like ‘http://www.example.org/equipment#Pump101’ and use :Pump#101

[6] SPARQL is a query language for RDF. A SPARQL endpoint is a protocol service that makes it possible to query a data source using SPARQL. The source itself does not need to be in RDF. It can, for example, be a traditional relational database. Later in this article we will describe SPARQL in more detail and show some query examples.

Publishing HTML created with SPARQL Web Pages

2011-12-02T07:09:00.000-08:00

Last week we saw how to use SPARQL Web Pages (SWP) to render customized HTML of individual class instances and how to create a web page of all that class's instances with a title at the top. The fine-grained control that SWP gives us over the generated HTML let us take advantage of the jQuery Mobile libraries so that the sample TopBraid application generated web pages appropriate for a smartphone interface, with buttons that expand and collapse at your touch to display details about each class instance.

Testing this application meant choosing from two alternatives:

The first was to run it on TopBraid Composer's built-in TopBraid Live Personal Server, which let us look at the page from any web browser running on the same machine.
Uploading the application's project to a TopBraid Live Enterprise Server, where multiple devices, including phones, could access it.

Either way, because TopBraid Live generates these web pages dynamically, if the underlying data is changed, refreshed versions of the web page would reflect this, making TopBraid a great platform for interactive semantic web applications for any device.

You don't have to have a TopBraid Live Enterprise server to deliver pages generated by SWP, though. A simple SPARQLMotion script can save your formatted HTML in disk files that you can copy to a web server that may or may not have TopBraid Live installed. Using this technique, you can use the TopBraid platform to create semantic content publishing applications as well as interactive applications.

The following SPARQLMotion script, which is stored in the application file described last week, does this for the mobile Kennedys web application.

The first module is an sml:ImportRDFFromWorkspace module that reads the file that this script is stored in. That file has the Kennedys data and the SWP formatting markup so that this data can be fed to the next step in the process.

The second module, named mk:GenerateHTML, is an sml:CreateUISPINDocument SPARQLMotion module (from the Text Processing section of the SPARQLMotion palette) whose key setting is its sml:view property, which has the following:

<ui:resourceView
   ui:resource="&lt;http://topbraidlive.org/mobileKennedys&gt;"/>

It's a snippet of XML specifying that the module should create a resource view for the specified resource, which is identified here with a complete URI. (The URI's delimiting angle brackets are escaped because they're in an XML attribute.) The real work to make this happen was all described in the last blog entry, which showed how the SWP code to generate a complete web page was attached to the resource. The mk:GenerateHTML module in this script also specifies that this generated markup will be stored in a variable named doc.

The final mk:SaveFile module in the script is an sml:ExportToTextFile module that saves the contents of the doc variable (set in the module's sml:text property as the SPARQL expression ?doc) to a file called output.html. I also set sml:replace to true so that repeated execution of the script wouldn't append the output onto the result of previous runs.

After you run this script you'll have a web page called output.html that looks like the display shown in the phone browsers in last week's blog entry, and you can copy this file to any web server you want.

This script is very simple. As you bring other SPARQLMotion capabilities into it such as inferencing and reading from all the data formats that TopBraid understands, you can make it much more sophisticated. You can also configure the script to save a collection of multiple files, letting you publish large collections of data in pieces that are digestible for typical browsers. (Phone browsers in particular can get sluggish; my Android LG Ally is not a recent model, and the expanding and collapsing of information about each person on the display of this app is not as quick on the Ally as I'd like it to be.)

So, use your imagination to add new features to this SPARQLMotion script, and you can create dynamic or static web pages for phones or any other kinds of browsers, with all the power of TopBraid behind your application development.

Creating a TopBraid mobile web app with SPARQL Web Pages

2011-11-22T07:18:00.000-08:00

I've written here before about how SPARQL Web Pages (SWP) let you convert your RDF to HTML or XML by embedding SPARQL queries into the appropriate markup. In that very simple example, I showed how to create a web page for an address book entry and then display it both in TopBraid Composer and in a regular web browser.

Today I'm going to show how I did something similar to display a single Person instance from the Kennedys sample data included with TopBraid Composer and then defined a page that showed all the people in that data model. You can download and try the project here. The fun part was displaying it so that it looks like a proper mobile web page on a phone's web browser, as shown here on an Android phone and on an iPhone turned sideways to test the re-orienting capability of the display.

Touching someone's name on the phone expands the display to show the remaining property names and values about that person underneath his or her name. In the picture, I've just touched Andrew Cuomo's name on the Android phone and Edward Kennedy Jr's name on the iPhone, displaying details about each of them below their names. Touching the names again hides their data.

In the picture, the two phone browsers are displaying the output of a TopBraid Live server running this application. As we'll see in the sequel to this blog entry, you can use the same SPARQL Web Page configuration to save HTML disk files with all of this formatting so that the phone browsers could view the static web pages stored on a server that didn't have TopBraid Live installed.

To enable proper mobile display, I used the jQuery Mobile library. jQuery is a set of Javascript and CSS libraries designed to let you add sophisticated user interfaces to your web pages without worrying about cross-browser compatibility, and jQuery Mobile is a branch of this project specialized for mobile phones. You don't need to know any JavaScript or CSS to use these libraries; if you're happy with one of their display configuration, using these libraries is usually just a matter of including the right file links in your HTML's head element and then setting certain attributes in your HTML elements to reference the libraries.

I began this application by creating an RDF/SPARQLMotion file in TopBraid Composer with a base URI of http://topbraidlive.org/mobileKennedys. I needed SPARQLMotion for the script that creates the static disk file version of the Kennedys display that we'll learn about next week. Next, I imported the kennedys.rdf model from the /TopBraid/Examples folder in the Navigator view. I also imported the SWP html.rdf and tui.rdf models from the Navigator's /TopBraid/UISPIN folder. (This all works the same when the files to import are Turtle ttl files instead of RDF/XML files.)

After importing the necessary files, the next step was to set up the display of data about a Person instance. After importing the files described above, clicking on kennedys:Person under owl:Thing on the Class view shows that the presence of the SWP libraries has added a ui:instanceView property to the kennedys:Person class form. I could have put the HTML to display a person here, like I did with the address book display in the blog entry mentioned above, but for greater flexibility, I created a separate PersonView class to store this markup and pointed at this class from the Person class's ui:instanceView value.

I created this mk:PersonView class (I had assigned the prefix "mk:" to the URI http://topbraidlive.org/mobileKennedys#) as a child of the ui:Element class, which is a child of the ui:Node class added by the SWP libraries. The ui:prototype property on this class's form is the place for the formatting code and markup, but I did a few setup steps before setting it:

Because the app needs to pass a parameter to the code in ui:prototype specifying which person to display, I had to define that parameter. To do this, I created an sp:person child of the sp:arg property in the Properties view to represent the person argument value passed to the prototype. Next, I dragged the new property from the Properties view to the spin:constraint property name on the mk:PersonView form to indicate that this would store the argument passed to the code and markup used to display a single person. This displays the "Create from SPIN template" wizard with all the values filled out the way I needed them, so I just clicked the OK button.
JQuery implements some of its magic with HTML extension attributes named data-collapsed and data-role. TopBraid Composer helps you assemble proper HTML by flagging any non-HTML markup, and it won't like these because they're not declared as HTML 4 properties. So, I declared them myself by making two clones of the html:class property (a subproperty of html:attributes) and renamed them html:data-collapsed and html:data-role. This way, TopBraid Composer wouldn't prevent me from saving HTML markup that used these properties as attributes.
When listing each person's property names and values (for example, Andrew Cuomo's year of birth and first name in the picture above), I certainly didn't want to list the full URI of each property name. Ideally, each property would have an rdfs:label value that I could display instead; if not, I thought it best to just show the local name of the property's URI. To make this easier, I created a new function called mk:bestName as a subclass of spin:Functions (itself a subclass of spin:Modules). I defined a spin:constraint of sp:arg1 for this function and then defined this spin:body for it:
```
SELECT ?label
WHERE {
    BIND (spif:name(?arg1) AS ?name) .
    BIND (IF(fn:contains(?name, ":"), afn:localname(?arg1), ?name) AS ?label) .
}
```
mk:bestName is a good general-purpose function. It calls the SPIN spif:name function, which gets a resource's skos:prefLabel value if available or an rdfs:label value as a second choice. If neither is available, mk:bestName takes the local name of the URI or prefixed name that got returned.
Because members of the kennedys:Person class might have a kennedys:name value that I'd prefer the application to use if available, I declared a similar but more specialized function for the Kennedys data called mk:bestKennedyName. This is also as a subclass of spin:Functions, and has a spin:constraint of sp:arg1 and the following as a spin:body:
```
SELECT ?label
WHERE {
    OPTIONAL {
        ?arg1 kennedys:name ?kname .
    } .
    BIND (spif:name(?arg1) AS ?name) .
    BIND (COALESCE(?kname, IF(fn:contains(?name, ":"), afn:localname(?arg1), ?name)) AS ?label) .
}}
```
This function body takes advantage of SPARQL 1.1's new COALESCE() function, which returns the value of the first parameter passed to it that can be evaluated without an error.

With the functions, the HTML extensions, and the argument to pass to it all set up for the formatting markup in the mk:PersonView class, I was ready to add that markup and SPARQL code to the ui:prototype property of my new class. It's mostly HTML div elements with attributes set according to the models I saw in the source of the jQuery Mobile demos. The "collapsible" part means that initially only the kennedys:name value will display, as an h3 element, and that clicking on that name (or, on a phone, touching it) will toggle the display of the remaining property names and values about that person.

<div data-collapsed="true" data-role="collapsible">
   <h3>{= spl:object(?person, kennedys:name) }</h3>
   <div class="ui-grid-a">
       <ui:forEach ui:resultSet="{#
               SELECT ?propertyName ?bestValueLabel
               WHERE {
                   ?person ?property ?value .
                   BIND (mk:bestName(?property) AS ?propertyName) .
                   BIND (IF(isIRI(?value), mk:bestKennedyName(?value), ?value)
                      AS ?bestValueLabel) .
               }
               ORDER BY (?property) }">
           <div class="ui-block-a">
               <div class="ui-bar ui-bar-c">{= ?propertyName }</div>
           </div>
           <div class="ui-block-b">
               <div class="ui-bar ui-bar-c">{= ?bestValueLabel }</div>
           </div>
       </ui:forEach>
   </div>
</div>

When you use SWP to define an HTML div element with the data and markup to display something, the SWP engine will create html, head, and body wrapper elements to ensure that a browser viewing the HTML gets a complete web page. The SWP ui:headIncludes property, which you'll see on the mk:PersonView class form with ui:prototype and the other properties there, lets you specify custom markup to add to the HTML head element when the SWP engine sends the web page to the requesting browser. I added the following to this property; it has the meta, link, and script elements necessary to make the resulting HTML a proper jQuery Mobile page:

<ui:group>
   <meta content="width=device-width, minimum-scale=1.0, maximum-scale=1.0"
         name="viewport"/>
   <link href="http://code.jquery.com/mobile/1.0/jquery.mobile-1.0.min.css"
         rel="stylesheet"/>
   <script src="http://code.jquery.com/jquery-1.6.4.min.js"/>
   <script src="http://code.jquery.com/mobile/1.0/jquery.mobile-1.0.min.js"/>
</ui:group>

Then, going back to the kennedys:Person element, I added this ui:instanceView value for it to point at the mk:PersonView class I had created:

<mk:PersonView sp:person="{= ?this }"/>

The ?this variable passes the Person instance currently being processed to be used as the ?person value in the SPARQL query in the mk:PersonView ui:prototype value.

This is all enough to display a single person, but I wanted to display all the Person instances in a sorted list. I attached this view's definition to the ontology resource itself by clicking on the little house icon at the top of TopBraid Composer and then adding this ui:view value to it (note that ui:view wasn't already part of the form, so I dragged it on there from TopBraid Composer's Properties view):

<div>
   <div data-role="header">
       <h1>Kennedys List</h1>
   </div>
   <div data-role="collapsible-set">
       <ui:forEach ui:resultSet="{#
               SELECT ?p
               WHERE {
                   ?p a kennedys:Person .
                   ?p kennedys:lastName ?lname .
               }
               ORDER BY (?lname) }">
           <ui:resourceView ui:resource="{= ?p }"/>
       </ui:forEach>
   </div>
</div>

As with the code to display each individual Person instance, this markup is mostly div elements with attribute settings based on the source of the jQuery Mobile demos I saw. The ui:resourceView element inside the ui:forEach element tells the SWP engine to display the resource according to whatever view was specified for it. In this case, the resource is a kennedys:Person instance, because that's what the SPARQL here query binds to the ?p variable, so it will use the view defined earlier.

To test this, I sent a browser to the URL http://localhost:8083/tbl/uispin?_resource=http://topbraidlive.org/mobileKennedys. (URLs for SPARQL Web Page applications often include a &_base parameter to identify the graph of data to use—in this case, it would be &_base=http://topbraid.org/examples/kennedys—but that was unnecessary here because one of the first steps of creating the mobileKennedys model was dragging the Kennedys data onto its Include tab, so it already knew which data to use.) The _resource parameter tells it which resource to render, so I used my file's base URI here because that's where I attached the markup and SPARQL code to display the full web page. These and other parameters are described in the SWP documentation.

This should work with any browser. (I recently discovered that picking User Agent from Safari's Develop menu lets you set Safari to emulate a variety of other browser, including the mobile versions that run on the iPhone and iPad, which helped me to debug some early problems I had with getting the jQuery Mobile code right.) Because you can't access TopBraid Composer's built-in copy of the TopBraid Live Personal edition from a different computer, there's no way for a phone's browser to access this application when running it on TopBraid Composer, so I uploaded the project storing this application to a copy of TopBraid Live to do the test shown in the photograph above.

Next week, I'll show how I extended this application to save a static HTML file of the mobile web display of Kennedys data as an alternative to the TopBraid Live server's dynamic display. I could then copy that file to a web server that doesn't necessarily have TopBraid Live installed on it. Then, any computer or phone web browser can display it. For a preview of how it looks, send your phone's browser to http://www.topquadrant.com/resources/blog/k/—or, if you want a shorter URL to type on your phone, http://bit.ly/topqkm.

Ontologies and Data Models – are they the same?

2011-09-30T10:32:00.000-07:00

Yesterday a question about how ontologies may be different from logical data models was asked by a newcomer on TopBraid Users Forum. As to be expected on the TopBraid Forum, by ontologies he meant specifically ontology models expressed in RDFS/OWL. Because we frequently hear this or similar questions in our trainings, workshops and in conversations with customers, I decided to respond in a blog post instead of writing an e-mail.

Data modeling was invented more than thirty years ago to help with the design of databases, specifically, relational databases. As quoted below, ANSI definition from 1975 differentiated between three data models – conceptual, logical and physical. Data modeling quickly became recognized as a tool for analyzing the semantics of an organization with the respect to the structure and flow of the information used in carrying out organization’s activities. Wikipedia offers the following definition of Data Modeling:

Data modeling is a method used to define and analyze data requirements needed to support the business processes of an organization. The data requirements are recorded as a conceptual data model with associated data definitions. Actual implementation of the conceptual model is called a logical data model.
<…>
In 1975 ANSI described three kinds of data-model instance:

Conceptual schema: describes the semantics of a domain (the scope of the model). For example, it may be a model of the interest area of an organization or of an industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationships assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial "language" with a scope that is limited by the scope of the model.

Logical schema: describes the structure of some domain of information. This consists of descriptions of (for example) tables, columns, object-oriented classes, and XML tags.

Physical schema: describes the physical means used to store data. This is concerned with partitions, CPUs, tablespaces, and the like.

According to ANSI, this approach allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual model. The table/column structure can change without (necessarily) affecting the conceptual model.

These definitions describe a clear progression from conceptual to logical to physical data models. SInce their origin is in the 70s, they reflect certain technology assumptions than no longer hold true.

When information modeling is done to create a relational database, conceptual model must be different from a logical model because there is no place in a relational database structure to capture, for example, business rules, create subsumtion relationships and describe other key aspects of a conceptual model. This semantic information collected and documented as part of the initial modeling is left behind when modelers and designers move on to define a logical data model. The "left behind" parts are used by software developers as they encode business semantics directly into custom programs.

Logical data model is a subset of a conceptual model that can be expressed using a particular technology. However, there are always some performance considerations that require additional changes to the logical data model before it can be implemented in a relational database. Hence, some of the aspects of a logical model are left behind as it gets translated into a physical data model.

Since an ontology is a model of a domain describing objects that inhabit it, all three types of data models can be thought of as ontologies. They range from the most expressive one that describes business concepts and processes (the conceptual model) to less expressive and progressively moving from describing business semantics to describing physical structures of the data as it is stored in the databases (the logical and physical data model). Physical model can be thought of as an ontology of a particular database. Wikipedia goes on to note

Early phases of many software-development projects emphasize the design of a conceptual data model. Such a design can be detailed into a logical data model. In later stages, this model may be translated into physical data model. However, it is also possible to implement a conceptual model directly.

Semantic Web standards (governed by the W3C, the World Wide Web Consortium) make it possible to implement conceptual models directly. This is possible due to the layered architecture of the Semantic Web technology stack consisting of:

RDF – a canonical data model that is like relational data model in its ability to connect related objects and unlike relational data model in that the data objects (or resources in RDF-speak) are highly granular.

The smallest unit of information in RDF is not a table or a row in a table, but individual statements – a single fact about a resource.

These statements are called RDF triples. For example, “Atlantis decommission-date July, 2011” is a triple where Atlantis is a subject of a triple, decommission date is a predicate of a triple and July, 2011 is an object of a triple. Atlantis and decommission date are RDF resources and July, 2011 is XML literal. Subjects and predicates of a triple are always RDF resources. An object can be either a resource or a literal value. Predicates that connect two resources are relationships or associations in the data modeling speak. Predicates connecting a resource to a literal value are attributes. In RDF they are called respectively object and data properties.

Because RDF model is highly canonical, RDF data is schema-less. There are no constraints that require it to fit into tables or hierarchies. RDF data is simply a network of connected triples. As such, it can be used to represent, if needed, both - table structures and hierarchies. Standard mappings have been defined from relational tables and XML hierarchies into RDF.

Another key differentiating factor of RDF is that it was “born on the web”. Each RDF resource has a globally unique identity, a URI (uniform resource identifier). For example, the URI for Atlantis may be http://www.nasa.gov/shuttle/Atlantis and the URI for a decommission date may be http://www.nasa.gov/lifecycle#decommissionDate . As a result, it is possible to link RDF data over web in a way similar to how documents can be hyperlinked over the web. By web we mean all HTTP based networks including intranets and extranets.

RDF databases store and provide query access to RDF data. Just like there are standard languages for query of relational and XML data, there is a standard for querying RDF. It is called SPARQL. True to the web-native nature of RDF, SPARQL is not only a query language, but also a protocol that makes it possible to access RDF data over HTTP.

RDFS (RDF Schema) and OWL (Web Ontology Language) – RDF-based languages for expressing business semantics.

Jointly RDFS and OWL offer ability to define classes or groups of resources that share common characteristics such as Vehicles and Space Shuttles. The richness of RDFS/OWL makes it possible to fully express the meaning of the business concepts. Data models in RDFS/OWL are stored in the same way as the data, in RDF triples. For example, we can have triples stating that Space Shuttle is a Class and it is a sub class of a Vehicle class and that a vehicle can have only one decommission date (cardinality = 1) and its value must be xsd:date. And you can go beyond cardinality and use the Semantic Web standards to represent a variety of business rules.

Since the data and the schema are stored in the same way, it is possible to query schemas the same way data is queried and to combine search criteria about schemas with the search criteria about data. For example, we can create SPARQL queries to ask for all vehicles that have been decommissioned, all subclasses of a vehicle class, all relationships and attributes a vehicle should have and, when returning decommissioned vehicles, to provide only data values for the fields that have cardinality = 1.

The use of RDF means that the modeling constructs and definitions can be linked and connected. Organizations can refer to each other’s business definitions. Models can be modularized and re-used where appropriate. Differences between related, but not identical concepts can be described. All of this can now be done in a standard compliant and interoperable way.

A growing number of standards bodies and communities of interest are publishing RDF/OWL data models for their particular domains. For example:

SKOS – provides a way to represent taxonomies and thesauri
ISO 15926 – offers a data model for sharing life-cycle data for process plants including oil and gas production facilities
Ontology for Media Resources - defines a core set of metadata properties for multimedia resources
SIOC - defines information about online communities
QUDT - provides models describing measurable quantities, units for measuring different kinds of quantities and the data types used to store and manipulate these objects in software
Provenance Vocabulary - defines provenance-related metadata

There is much more that can be added to this post including a discussion on the best practices for ontology modeling, ontology architecture, approaches for connecting and mapping models, using rules and constraints, publishing, versioning and governing models. Each of these topics, however, deserves an exploration in its own right.

I will end by pointing to a few relevant related blogs and web pages we have published before:

How to extend an ontology http://topquadrantblog.blogspot.com/2011/03/how-to-extend-ontology.html
Ontology Mapping with SPINMap http://topquadrantblog.blogspot.com/search/label/SPINMap
Training on RDF, OWL and ontology modeling http://www.topquadrant.com/training/training_overview.html
Transforming XML Schemas and XML into RDF/OWL http://topquadrantblog.blogspot.com/2011/09/living-in-xml-and-owl-world.html
Converting UML models to OWL http://topquadrantblog.blogspot.com/2011/02/converting-uml-models-to-owl-part-1.html

Living in the XML and OWL World - Comprehensive Transformations of XML Schemas and XML data to RDF/OWL

2011-09-28T23:03:00.000-07:00

Many enterprise information models are expressed using XML Schemas. Data between applications is commonly exchanged in XML, compliant with those schemas. Connecting XML data from different systems in a coherent aggregated way is a challenge that confronts many organizations. Capabilities of RDF/OWL to describe semantics of different data models and aggregate disparate data are a natural fit for addressing these challenges.

For a number of years now, TopBraid Composer included the ability to convert XSDs and associated XML files to RDF/OWL. However, for some XML Schemas our converter did not work as well as customers needed. For the upcoming TopBraid Composer 3.6.0 release, it was significantly improved to have a more comprehensive coverage of XSD constructs and more meaningful conversion to OWL. In parallel, we improved our XML data conversion to RDF so that transformations automatically happen based on the generated OWL models. And we have improved performance of the transformations.

An overview of the approach is illustrated in the following figure:

Since, the conversion occurs automatically, users do not have to worry about writing any rules for commonly needed mappings. However, those users that need to make further transformations can use SPARQL Rules and SPARQLMotion to customize their generated OWL ontology or further transform RDF triples representing the XML data.

The content of this blog is organized as follows:

XML Schemas converted as part of our tests
Some challenges in converting XML Schemas to OWL
Illustrative example of transformation rules
Another example of transformation rules
Complete table of supported transformations
A SPARQL Metric Query
Concluding remarks

XML Schemas converted as part of our tests

We tested the importer on a broad range of complicated and large-scale industry standard XSD files, and converted many XML instances with impressive results. The XSDs we have tested with the new importer include:

Banking
- FpML, the Financial products Markup Language
- ISO 20022, a standard for Universal financial industry message scheme

Energy and Utilities
- MultiSpeak, de-facto standard for defining data needed to be exchanged between software applications in order to support the business processes commonly applied at utilities

Government
- DoDAF, the Department of Defense Architecture Framework
- NIEM, the U.S. National Information Exchange Model

Oil and Gas
- ISO 15926, a standard for integration of life-cycle data for process plants including oil and gas production facilities
- WITSML, Wellsite Information Transfer Standard Markup Language

Healthcare
- HL7

Electronics
- IP-XACT, the XML Schema for meta-data documenting Intellectual Property (IP) used in the development, implementation and verification of electronic systems

Other
- ATML, the Auto-Test Markup Language

Some of the converted schemas will be published at LinkedModels.org. To get an early access to converted models or for any other questions, contact us at TopQuadrant.

The examples we use in this blog are mainly from the Financial products Markup Language (FpML). All FpML 5.2 XSD and XML instance files were tested. An example is transparency standard under FpML, for which the transparency.ttl ontology was generated from 23 XSD files.

Some challenges in converting XML Schemas to OWL

Some of the challenges in converting XSD to OWL that were addressed are:

Transforming of anonymous types
Converting complex types with simple contents

Resolving conflicting nested element and attribute names during OWL property generation

When and how to distinquish global elements from complex types with similar names during OWL class generation

Generating enumerations

Handling substitution groups both at the XSD and XML levels

Handling the overriding of an XSD type with xsi:type in XML

The example that follows shows the approaches that we have used for the transformation.

Illustrative example of transformation rules

The basic transform for a Complex Type in XSD follows these rules:

An OWL class is generated for a complex type.

The URI of the class is generated in three different ways. If the complex type is global and named, then the name attribute is used. If the complex type is local and named, then the name attribute of the owner element is used. If the complex type is anonymous, then the names of its owner element and its parent element are used.

The xsd:annotation and attribute annotations describing the complex type get generated as dc:description, rdfs:comment and/or skos:definition OWL annotations.

Nested or reference children elements of the complex type become OWL allValuesFrom restrictions on the class. If the element has a simple type, then a restriction with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range is generated. If the element has a complex type, then a restriction with an OWL object property and an OWL class range is generated. The minOccurs and maxOccurs values become OWL cardinality restrictions.

Element group and attribute group references are generated as super classes.

Attributes become restrictions with an OWL datatype property and an XSD datatype range or a user-defined RDFS datatype range.

An example of a Complex Type, Trade, in fpml-doc-5-2.xsd of transparency standard is displayed below:

<xsd:complexType name="Trade">
  <xsd:annotation>
    <xsd:documentation xml:lang="en">
    A type defining an FpML trade.</xsd:documentation>
  </xsd:annotation>
  <xsd:sequence>
    <xsd:element name="tradeHeader" type="TradeHeader">
      <xsd:annotation>
        <xsd:documentation xml:lang="en">
         The information on the trade which is not 
         product specific, e.g. trade date.
         </xsd:documentation>
      </xsd:annotation>
    </xsd:element>
    <xsd:group ref="TradeEconomics.model">
      <xsd:annotation>
        <xsd:documentation xml:lang="en">
        The economics of the trade. In the case of an 
        OTC trade, this is the OTC derivative product.
        In the case of a trade of a security,
        it is the instrument trade economoics.
        </xsd:documentation>
      </xsd:annotation>
    </xsd:group>
  </xsd:sequence>
  <xsd:attribute name="id" type="xsd:ID" />
</xsd:complexType>

The following is the graph of the OWL class generated for Trade complex type, which shows the OWL class, restrictions, annotations and superclass.

The following class diagram shows a more sophisticated view of Trade and its related classes downstream in the generated ontology (click on the diagram to open a window with a bigger image).

The diagram highlights these advanced features in generation:

A superclass relation exists between Trade, generated from an XSD complex type and TradeEconomics.model, generated from an XSD element group.

In the XSD, Swap element has the substitutionGroup Product element. Thus, A_Global-Swap becomes a subclass of A_Global-Product. A_Global- prefix is used to distinguish the element-derived classes from similarly named complex-type-derived classes.

dtype:value restrictions are generated to hold the simple contents occuring in complex types. The complex content part of the type become other restrictions.

The generated object properties have a Ref suffix to distinguish them from datatype properties with same names. Both types of properties can be used in restrictions on different classes as they may be generated from nested or reference children elements under different complex types.

The instance file, "msg_ex001_new_trade.xml" was imported into the transparency ontology. Here is a peek into that XML file:

...
<trade>
<tradeHeader>
  <partyTradeIdentifier>
    <tradeId tradeIdScheme=
        "http://fpml.org/universal_swap_id">123</tradeId>
    <tradeId tradeIdScheme=]
        "http://fpml.org/submitter_trade_id">456</tradeId>
  </partyTradeIdentifier>
  <tradeInformation>
    ...
    <cleared>true</cleared>
    <nonStandardTerms>false</nonStandardTerms>
    <offMarketPrice>false</offMarketPrice>
    <largeSizeTrade>false</largeSizeTrade>
    ...
  </tradeInformation>
  <tradeDate>2011-02-04</tradeDate>
</tradeHeader>
<swap>
  <productType>InterestRateSwap</productType>
  <assetClass>InterestRates</assetClass>
  <swapStream>
   ...
  </swapStream>
  <swapStream>
    ...
  </swapStream>
</swap>
</trade>
...

The above XML constructs were mapped into the following RDF graph, where you can see how the instances, their relationships and their types are generated with respect to the Trade class diagram (click on the graph to open up a window for a more detailed view).

Another example of transformation rules

The basic transform for an Enumeration in XSD follows these rules:

An OWL class is generated from an XSD simple type having XSD enumeration facets. The localname of the class has Enum suffix to distinguish it from classes generated with similar names.

This class becomes a subclass of EnumeratedValue in the same namespace as the OWL class, which itself becomes a subclass of dtype:EnumeratedValue.

Each XSD enumeration facet becomes an instance of the generated class. dtype:value holds the enumeration value. dtype:order is the order in which the enumeration facet occurs.

An Enumeration class in the same namespace as the OWL class is also generated. This class becomes subclass of dtype:Enumeration. An instance of this class is generated as a container to refer to all the instances generated from the current simple type.

Enumerated value instance URIs are generated using a concatenation of the abbreviation of the class localname's upper case letters and the dtype:value literal.

The following figure shows a graph for PremiumQuoteBasisEnum class and the OWL constructs generated from the related XSD enumeration facets (click on the diagram to open a window with a bigger image):

Complete table of supported transformations

For the reader interested in more details a full overview of the mapping transformations is given in the following table:

*Table: Conversion from XSD Constructs to OWL Constructs*
#	XSD/XML Constructs	OWL Constructs
1	`xsd:simpleType`	`owl:Datatype`
2	`xsd:simpleType` with `xsd:enumeration`	Becomes an `owl:Class` as a subclass of `EnumeratedValue`. Instances are created for every enumerated value. An instance of `Enumeration`, referring to all the instances, is created as well as the `owl:oneOf` union over the instances.
3	`xsd:complexType` over `xsd:complexContent`	`owl:Class`
4	`xsd:complexType` over `xsd:simpleContent`	`owl:Class`
5	`xsd:element` (global) with complex type	`owl:Class` and subclass of the class generated from the referenced complex type
6	`xsd:element` (global) with simple type	`owl:Datatype`
7	`xsd:element` (local to a type)	`owl:DatatypeProperty` or `owl:ObjectProperty` depending on the element type. OWL Restrictions are built for the occurrence.
8	`xsd:group`	`owl:Class` and subclass of `A_AbstractElementGroup`
9	`xsd:attributeGroup`	`owl:Class` and subclass of `A_AbstractAttributeGroup`
10	`xsd:minOccurs` and `xsd:maxOccurs`	Cardinality specified in minimum cardinality, maximum cardinality and universal (allValuesFrom) OWL restrictions.
11	Anonymous Complex Type	As for Complex Type except a URI is constructed from the parent element and the nested element reference. Also, the class is defined as a subclass of `A_Anon`.
12	Anonymous Simple Type	As for Simple Type except a URI is constructed from the parent element and the nested element reference.
13	`xsd:default` on an attribute	Uses `dtype:defaultValue` to attach a value to the OWL restriction representing the associated property.
14	Substitution Groups	Subclass statements are generated for the members. Instance files resolve their types by consulting the OWL model at import-time.
15	Annotation attributes on elements	OWL Annotation properties are created and placed directly on the relevant class.
16	Annotations using `xsd:annotation`	Become, based on user selection, `dc:description`, `rdfs:comment` and/or `skos:definition` OWL annotations.
17	`xsi:type` on an XML element	Overrides the schema type with the specified type.

A SPARQL Metric Query

As a quick check on the generated OWL models, the following is a useful SPARQL query that counts the number of properties on each OWL class.



 SELECT ?class (COUNT(DISTINCT ?p) AS ?properties)
 WHERE {
     ?class a owl:Class .
     FILTER( afn:namespace( ?class ) = 
        "http://www.fpml.org/FpML-5/transparency#") .
     OPTIONAL {
         ?class rdfs:subClassOf ?r .
         ?r a owl:Restriction .
         ?r owl:onProperty ?p .
     }
 }
 GROUP BY ?class
 ORDER BY DESC( ?properties )

The classes for the transparency ontology have the distribution of properties shown in the following figure (click on the diagram to open a window with a bigger image). For example, TradeInformation has 12 properties:

Concluding remarks

The new capability is easy to use. As before, a convenient import wizard will guide the user. The dialog has a number of new options. XML conversion will happen automatically when users open XML files in TBC or use XML import modules in SPARQLMotion. As long as an XML file is valid against an XSD that it is based on, the XML will be transformed in accordance to the schema. Parts of the XML files that do not validate against a schema will continue to be converted using the default Semantic XML structure. There is also a new option to specify which OWL file to use as a schema when mapping a specific XML file to triples. This feature is also available for spreadsheets and will be covered in a separate blog.

We believe that the importance of this work is not only in its value to harvest XML Schemas. Ability to use the automatic creation of triples from XML instance files directly in applications is proving to be key to a number of customers. For example, TopQuadrant is currently using this approach in a project for the North Sea Oil and Gas industry.

The functionality we have described will be released in TopBraid Composer 3.6.0. This release entered internal beta this week. Please contact us if you want a try these capabilities before general availability, which is currently planned for November.

Putting your drag-and-drop SPINMap vocabulary mappings into production

2011-07-21T15:25:00.000-07:00

The "Composing the Semantic Web" blog entry SPINMap: SPARQL-based Ontology Mapping with a Graphical Notation describes TopBraid 3.5's new tool for mapping between vocabularies or ontologies. (It also points to a handy video that demonstrates both simple and sophisticated uses of SPINMap.) Once you've created a mapping, though, how do you use it to convert data? As it turns out, no new technology is necessary; SPINMap just creates SPIN rules that you can apply in a SPARQLMotion script.

Let's look at an example. Imagine that I'm a publisher who receives images and metadata about those images from ExampleCo every month, and I load these images and metadata into my company's Digital Asset Management system. ExampleCo uses their own vocabulary to describe the metadata, but I prefer to use the NEPOMUK vocabulary for describing image metadata, because I know that by taking advantage of a vocabulary used by other systems around the world, my data can more easily interoperate with other data and tools.

Following the steps described in the blog posting mentioned above, I create the mapping from ExampleCo's pd:Image class and its associated properties to the NEPOMUK equivalents. Because the NEPOMUK image vocabulary's nexif:Photo class has so many properties associated with it, the diagram of it doesn't all fit on the screen at once, but it was easy enough scroll up and down as I mapped the pd:Image properties on the left to various NEPOMUK nexif:Photo properties.

I saved the mapping in its own file, which I called ExampleCo2Nepomuk.ttl. At this point, I could convert a set of ExampleCo metadata by importing a file of that data and ExampleCo2Nepomuk.ttl into the same model and then picking Run Inferences from the Inference menu, assuming that Configure Inferencing on the same menu had TopSPIN configured as the inferencing engine.

I wanted this to be more automated, though, so I put it in a SPARQLMotion script that could be called as a web service or from a TopBraid Ensemble interface. This would make it easier to re-use this mapping every month on each new batch of ExampleCo image data as it comes in:

The script's first module prompts for the input filename, because it will be a new dataset each month. This module hands the filename to the "Get ExampleCo RDF" module, an Import RDF From Workspace module that reads in the ExampleCo data.

At the same time, another Import RDF From Workspace module named "Get mapping rules" reads in the ExampleCo2Nepomuk.ttl file storing the SPIN-based mapping rules. Both of these modules feed their triples to an Apply TopSPIN module named "Apply mapping rules," which has its sml:replace value set to true so that it only passes along the new triples that it creates and not the input triples. The script's last module saves the result in a disk file, but could easily send it off for addition to a triplestore in a Digital Asset Management system.

There's nothing especially new or unusual in this script; what's new is that the rules that it applies to the data were created by a graphical drag-and-drop tool instead of being coded by hand. (Rest assured that the rules stored by the tool are still expressed using standard SPARQL.) With easy data aggregation being one of the great advantages of semantic web applications, it's nice to know that SPINMap lets you define data transformations with less trouble than ever before, making your application development (and application maintenance) even faster.

As an added bonus, because the mappings are stored as SPIN rules (also known as SPARQL Rules), they can easily be combined with other SPARQL Rules that you can run with the same script. These other rules might perform validation to ensure that the data being read conforms to certain data quality standards, or they could calculate new values based on a combination of the incoming data and existing stored data.

Comparing SPIN with RIF

2011-06-26T17:50:00.000-07:00

Since SPIN (SPARQL Inferencing Notation) aka SPARQL Rules became W3C member submission,we find ourselves responding to the growing interest to it.

With this, a question some may ask is how SPIN is different from or similar to RIF - W3C's standard for rules interchange.

While I have heard this asked a couple of times, I was pleasantly surprised that it was is not a very common question. Pleasantly, because a certain level of confusion is to be expected about new things and, both, SPIN and RIF are relatively new. If so few people ask this question, then SPIN specification did a good job explaining and positioning it and people easily grasp the unique and important needs it serves. Still, I thought it was worth while to write up my thoughts on comparing SPIN with RIF.

The goal of RIF was to create an interchange format for use between rules engines. As such, unlike SPIN, RIF is not an idea that is specifically or particularly aligned with RDF. This is why RIF was created as XML (although there is now work on RDF serialization). I am not pointing this out as a shortcoming of RIF, but rather to put in perspective the origin and the reason for RIF. In its goals, RIF is similar to OMG's XMI which also uses XML and was created to be an interchange format between different tools.

Given this similarity, XMI’s failure in being a reliable interchange format becomes relevant when considering RIF's future. Will RIF succeed in reaching its goal? One can easily argue that with the variety of available rules languages and engines, RIF’s job is harder than what XMI needed to do to succeed.

As noted here, different rules languages exist because there are different algorithms and formalisms for rules. Furthermore, different rule products have different sets of capabilities. RIF dialects are intended to be the least common denominators for a given type of a rule engine. This means that in order to effectively use the same set of RIF rules in the ‘rules engine A’ and in the ‘rules engine B’, the following needs to happen:

1. RIF dialect used to express the rules, needs to be supported by both rules engines.

Checking the implementation page, one will see that currently the overlap between any two engines is not that great. Some support BLD, some support PRD + Core, others support BLD partial or PRD minus something, etc.

2. RIF dialect used to express the rules, must be enough for the task at hand.

As mentioned above, RIF by design is somewhat of a least common denominator. This means that a user could always do more with a given rules engine than they can express in a dialect of RIF.

For example (as noted here), SPARQL is more expressive than what is possible with RIF. This is not unique to SPARQL, it is true for pretty much any rules technology.

3. The interchange must work

Given well known XMI issues, I am quite keen to see RIF test cases as well as test case results from the implementers

Attitude of the major rule engine vendors towards RIF is currently, at best, lukewarm. For example, on the Oracle forum, support engineers recommend against attempting to interchange rules by saying:

“In a hybrid environment I'd recommend that rules authored in ILOG be executed in the ILOG engine, and that rules authored in OPA be executed in the OPA engine, rather than attempt to interchange rules between the two products. As long as there is a clear scope boundary between what the rule sets are used for, then there wouldn't be any duplication or interchange of rules.”

Having considered the design goals and challenges of RIF, it is easy to see that the design goals of SPIN are quite different. SPIN is not about capturing rules that can then be translated for execution by different types of rule engines. Rather it is about capturing rules that can be executed directly over RDF data and about having rules that are intimately connected to the Semantic Web models.

With these goals in mind, we identified the following three things as important principles in SPIN's design:

1. Rules can be expressed in a familiar language. People working with RDF must know SPARQL. Using SPARQL for rules means that they don’t need to use another language

2. Rules can be executed by any RDF database. Since they are in SPARQL, rules are portable – not across rules engines, but across RDF stores

3. Evolution of the models does not unnecessarily break the rules. For example, let’s say we change the URI of a resource used in a rule. If a rule uses some other format (XML) and is not connected to the underlying RDF in a way other than a blob, it becomes hard to maintain these two different sets of information

Finally, SPIN takes an object-oriented approach to rules. It is about programming and about associating behavior with classes while RIF takes a model-theoretic view on how the rules may relate to ontologies. This is a key difference as noted in W3C comments on SPIN submission.

In short, SPIN and RIF address different needs and have different design goals. They can be considered complimentary.

What about using SPIN and RIF together? Given the key role SPARQL plays in the architecture of Semantic Web solutions, I am certain that should RIF get traction in its adoption, someone will create a RIF profile for SPARQL and write a RIF to SPARQL translation.

How to: extend an ontology

2011-03-30T07:04:00.000-07:00

When people work with ontologies, XML schemas and software development, it's almost a cliché to say that re-use of existing work is better than creating something new from scratch. Existing work, though, is not always a perfect fit to your needs, and the ease of customizing it for your needs often depends a lot on how the original work was designed—when you're reusing XML schemas or software source code. Customizing OWL ontologies and RDF schemas, on the other hand, is pretty simple nearly all of the time, especially when you use TopBraid Composer.

For example, let's say that you have a taxonomy of business terms to track, and the W3C's SKOS standard defines all the properties you need to maintain metadata about these terms, with two exceptions: SKOS has nothing about the last person to edit a term and it has no slot for the editor's department code, which is a special bit of metadata within your enterprise. Customizing SKOS to include these is just two steps:

Create an empty new ontology and import SKOS into it.
Define your two properties in this new ontology and give them a domain of skos:Concept so that SKOS tools such as TopBraid Enterprise Vocabulary Net(EVN) know that they're potential properties of your SKOS concepts.

To do this with TopBraid Composer, start by creating a new RDF/OWL file in one of your projects. On the wizard dialog box for creating this file, enter the Base URI of your new customized version of the SKOS ontology. If I worked for The Example Company, I might create a baseURI of http://example.com/ns/exskos.

On the same dialog box, click the checkbox next to SKOS under "Initial imports" and click Finish. (If you forget to click the checkbox first, you can always drag the skos-core.rdf file from the Navigator View's /TopBraid/SKOS folder to the Imports tab. In fact, to start creating a customized version of any ontology, drag a copy of it into your custom ontology's Imports tab.)

Once the file is created, having a namespace prefix associated with the base URI makes it easier to create the new properties, so on your new ontology's Overview tab add an ex prefix for the http://example.com/ns/exskos# namespace. Don't forget the pound sign; the prefix will be standing in for this URI, and you wantex:editor to represent http://example.com/ns/exskos#editor, not http://example.com/ns/exskoseditor.

Now we're ready to add the customizations. Instead of creating a new editor property from scratch, it's better to define it as a subproperty of the Dublin Core dc:creator property so that applications that don't know about our new property but do know about Dublin Core properties will have some clue what it's for.

Drag the dc1-1.rdf Dublin Core ontology file from the /TopBraid/Common folder in TopBraid Composer's Navigator view to the Imports tab to import the Dublin Core ontology. You'll see several new properties will join the SKOS ones on the Properties view.
Right-click on dc:creator there and pick Create subproperty. In the Create subproperty wizard, replace the dc:Property_1 value that appears as the default Name of new instance value with ex:editor, which uses the ex prefix that you defined earlier.
Click the checkbox next to rdfs:label on the dialog box's Annotations Template so that an rdfs:label property gets automatically set for this property. (One nice thing about how the RDF data model lets you assign properties to properties is that you can associate human readable names to substitute for the actual property names on forms and reports.)
Click the OK button and you'll see a property form in the middle of your screen for your new property, which will be selected in the Properties view. (It's on the left of the screenshot below because the Classes view is not shown there.)
To show that your ontology is defining this new property for SKOS concepts, click the small white triangle next to the word domain on the ex:editor property form and pick Add existing to indicate that this property's domain will be an existing class. Click skos:Concept, which will be under owl:Thing on the wizard's class tree, and click OK.

You're finished defining your editor property. The department code one will be even quicker to create, because it won't be a subproperty of something else:

Click the Properties view's small white triangle to display its menu and pick Create rdf:Property. Name the new property ex:deptCode on the Create Property wizard dialog box. The Annotations Template's rdfs:label checkbox should already be checked, so click OK.
Set your newest property's domain to skos:Concept the same way you did for ex:editor and save your file.

You're done! You know have a customized version of the SKOS ontology. If you now use TopBraid Composer or EVN to create a new instance of the Concept class in this file, you would see editor and dept code fields on your new concept's resource form along with has broader and all the other standard SKOS properties. (If you create concept instances with TopBraid Composer and the labels are toggled to show qnames instead of human-readable labels, they'll say ex:editor andex:deptCode.) Instead of creating concept instances in this file, though, you would more likely create a new taxonomy file that imports your customized ontology the same way that your ontology imported the SKOS and Dublin Core ontologies, and then you would store your taxonomy's concepts in this new file.

The modularity of this approach brings another benefit that isn't as easy when customizing typical XML schemas and other software resource files: when a SKOS upgrade is released, you can simply delete the import of the current SKOS ontology in your customization of it and import the new one instead, and all of your applications that use your custom ontology should be able to go on using it the same way the did before.

It's nice to know that customization of a standard ontology that nearly meets your needs is so easy, and many organizations are doing this with the SKOS ontology to create a better fit with their vocabulary management requirements. This isn't limited to customizing SKOS, though; the same principle works with any OWL ontology or RDF schema. As an added benefit, if you create a customized version of a particular standard for your enterprise, you can follow these same steps to create customizations of your customization for individual departments within your enterprise.

Converting UML Models to OWL - Part 1: The Approach

2011-02-03T17:12:00.000-08:00

Convert UML to OWL - why would you ever want to do this? One reason suffices: many enterprise models, that serve as either standards or enterprise schemas, are specified in UML. Increasingly, there is interest in having content of UML models re-purposed in RDF/OWL and the need for RDF/OWL to interoperate with systems built from UML Models.

UML Models are notoriously hard to exchange between UML tools, let alone be transformed into OWL. The exchange format XMI is not only is difficult to understand but also has vendor-specific extensions. The vagaries of MOF, CMOF and EMOF create their own challenges. Nonetheless we have done transformations of UML to OWL. Using a model-based transformation approach, based on SPARQL Rules, XMI models of UML models can be converted to OWL. UML class diagrams can be represented in OWL without information loss. The inverse, however, is not true and will require another blog series.

UML to OWL - Part 1 Contents

Part 1 of the series explains the basis of the approach. The complete series of blogs, as currently conceived, is as follows:

Converting UML Models to OWL - Part 1: The Approach
Converting UML Models to OWL - Part 2: Transforming UML Models to OWL Using SPARQL
Converting UML Models to OWL - Part 3: Examples of Industry UML Model Transformations

The content of this blog is organized as follows:

Goals, Objectives and Requirements
Backgrounder on XMI
Backgrounder on MOF
Solution Outline
Overview of Semantic XML
OCMOF - the OWL Representation of CMOF
How the Transformations from UML to OWL Work
Generation of UML Metaclasses
Generation of UML Classes
Generation of UML Class Superclass Relationships
Generation of UML Packages
Generation of UML Package Relationships
Performance
Concluding Remarks

Readers who are very interested in the detailed technical approach, should read all sections of this blog in order. Those who just need to have an overview of the approach could skip sections 9 through 12. Those who have deep knowledge of XMI and MOF may want to skip sections 2 and 3, but I would welcome their feedback on the accuracy of my statements.

Note that some diagrams may be too small to be viewed in the body of the document. Clicking on such a diagram will open a new window with a larger depiction of the diagram.

Goals, Objectives and Requirements

The OWL Models must faithfully represent packages and the logical models or class diagrams. Out of scope, currently, are all of the other UML models such as Interaction Diagrams and State Diagrams. The approach must be able to convert UML by processing XMI files from specific tools. This requires a strategy for converting from the XML structures of XMI to OWL models.

Backgrounder on XMI

XMI, the XML Metadata Interchange standard is a serialization format for UML Models. The main purpose of XMI is to define how the XML elements are organized within an XMI file. The XMI spec also defines a mechanism for how one XMI element references another, within and across XMI files. Such a mechanism is needed as it is a legal scenario for a single UML model to be serialized to more than one XMI file.

Top

Backgrounder on MOF

MOF began at the time of CORBA and the need for IDL interfaces. MOF 1.4 resulted in its mapping to Java being codified in the Java Community Process (JCP) as the Java Metadata Initiative (JMI). MOF 2.0 was developed in tandem with UML 2.0. The separation of MOF into EMOF and CMOF was motivated by the influence of EMF's Ecore, and model-driven Java development. CMOF was more the motivation of meta model developers. CMOF stands for Complete Meta Object Facility and is an OMG standard for the UML 2 model interchange. More information can be found at this page on the OMG Website.

CMOF includes fully fledged associations, association generalization, property subsetting and redefinition, derived unions, and package merge. Typical XMI container structures look like the example below, from the CMOF UML Infrastructure Model. The basic idea is that a packagedElement owns other elements. A type attribute specifies the type of the packagedElement.
Things get a little busy with how IDs are used for associations and their member ends. That complication, we can leave for Part 2.

<?xml version="1.0" encoding="UTF-8"?>
  <xmi:XMI xmi:version="2.1" xmlns:xmi="http://schema.omg.org/spec/XMI/2.1"
   xmlns:cmof="http://schema.omg.org/spec/MOF/2.0/cmof.xml">
   <cmof:Package xmi:id="_0" name="InfrastructureLibrary">
    <ownedMember xmi:type="cmof:Package" xmi:id="Core" name="Core">
     <ownedMember xmi:type="cmof:Package"
      xmi:id="Core-Abstractions" name="Abstractions">
      <packageImport xmi:type="cmof:packageImport"
      xmi:id="Core-Abstractions-_packageImport.0"
      importedPackage="Core-PrimitiveTypes"
      importingNamespace="Core-Abstractions"/>
      <ownedMember xmi:type="cmof:Package"
       xmi:id="Core-Abstractions-Ownerships" name="Ownerships">
       <packageImport xmi:type="cmof:packageImport"
        xmi:id="Core-Abstractions-Ownerships-_packageImport.0"
        importedPackage="Core-Abstractions-Elements"
        importingNamespace="Core-Abstractions-Ownerships"/>
        <ownedMember xmi:type="cmof:Class"
         xmi:id="Core-Abstractions-Ownerships-Element" name="Element" isAbstract="true">
         <ownedComment xmi:type="cmof:Comment"
          xmi:id="Core-Abstractions-Ownerships-Element-_ownedComment.0"
          annotatedElement="Core-Abstractions-Ownerships-Element">
          <body>An element is a constituent of a model.
          As such, it has the capability of owning other elements.</body>
         </ownedComment>
        <ownedRule xmi:type="cmof:Constraint"
         xmi:id="Core-Abstractions-Ownerships-Element-not_own_self"
         name="not_own_self" constrainedElement="Core-Abstractions-Ownerships-Element"
         namespace="Core-Abstractions-Ownerships-Element">
         <ownedComment xmi:type="cmof:Comment"
          xmi:id="Core-Abstractions-Ownerships-Element-not_own_self-_ownedComment.0"
          annotatedElement="Core-Abstractions-Ownerships-Element-not_own_self">
          <body>An element may not directly or indirectly own itself.</body>
         </ownedComment>
         <specification xmi:type="cmof:OpaqueExpression"
          xmi:id="Core-Abstractions-Ownerships-Element-not_own_self-_specification">
          <language>OCL</language>
          <body>not self.allownedElements()->includes(self)</body>
         </specification>
        </ownedRule>
        ...
        <ownedAttribute xmi:type="cmof:Property"
         xmi:id="Core-Abstractions-Ownerships-Element-ownedElement"
         name="ownedElement" type="Core-Abstractions-Ownerships-Element"
         upper="*" lower="0" isReadOnly="true" isDerived="true"
         isDerivedUnion="true" isComposite="true"
         association="Core-Abstractions-Ownerships-A_ownedElement_owner">
         <ownedComment xmi:type="cmof:Comment"
          xmi:id="Core-Abstractions-Ownerships-Element-ownedElement-_ownedComment.0"
          annotatedElement="Core-Abstractions-Ownerships-Element-ownedElement">
          <body>The Elements owned by this element.</body>
         </ownedComment>
        </ownedAttribute>
        ...

Figure 1: A sample of XMI

For more background on the history of MOF the following references may be of value: MOFLON, and Wikipedia.

Top

Solution Outline

Model-based transformation is the central idea of the approach. To implement it we have developed a metamodel of CMOF in OWL. Our strategy is to get out of XML into RDF Triples as soon as possible. Using an ontology of XML we convert XMI into a composite model of triples. XML is a simple enough structure for the composite object pattern - elements contain elements and elements have attributes. XML elements and attributes that make up the XMI file are transformed into OWL instances of the CMOF metamodel. Once we have the XMI in triples we can map constructs to classes and properties of a CMOF metamodel. This model then serves as the generator for model-based transformations to an OWL model of the UML.

Once these instances are loaded as "raw" RDF, rules fire to perform the transformations. Rules are associated with classes to ensure that instances of those classes are processed in an execution sequence. Using SPARQL Rules (SPIN), instances of a class are each processed through a binding mechanism specified by ?this variable. SPARQL Rules can be considered an approach that is similar to, or can be compared with, UML's Object Constraint Language (OCL) and the Query/View/Transformation (QVT) approach to transformations.

The benefits of the OWL and SPARQLRules model-based approach to transformation are:

Intimacy of the rules with RDF/OWL - triples are evaluated directly
Understandability - rules are smaller and expressed in the relevant contexts of the model
Enhanced Performance - evaluation of rules is localized to relevant instances
Customizability and Evolvability – transformations can be changed by modifying models and/or SPARQL rules
Ease of maintenance - rules are associated with the constructs they operate over

Top

Overview of Semantic XML

XMI is imported into the CMOF metamodel using TopBraid Composer's Semantic XML as a mapping method. With Semantic XML, TopBraid can automatically generate an OWL/RDF ontology from any XML file. Each distinct XML element name is mapped into a class, and the elements themselves become instances of those classes. A datatype property is generated for each attribute. Nesting of XML elements is represented in OWL using a composite:child property - an object pattern in OWL that is described at this blog entry.

The key idea of Semantic XML is that each of the generated OWL classes and datatype properties is annotated with an annotation property, sxml:element and sxml:attribute, respectively. These properties relate the OWL concepts to the XML serialization. Note that these annotations are also used if an OWL model needs to be serialized back to XML format.

If you import an XML file into an ontology that already contains classes and properties with Semantic XML annotations, then the loader will reuse those. The mapping is bi-directional and loss-less so that files can be loaded, manipulated and saved without losing structural information.

A video explaining how Semantic XML works is available at this link.

Top

OCMOF - the OWL Representation of CMOF

The strategy for the transformation can be summarized as follows:

Use OWL classes to represent XMI Element Types
Use SPARQL Rules on those classes to generate CMOF Metaclasses
Use Metaclasses to make OWL Classes that represent the UML Model

An OWL metamodel of CMOF represents the kinds of containers, elements and attributes shown above. The metamodel was built by studying the UML Metamodel of UML 2.0 - the original motivation for this was to have an automated way of dealing with changes to UML. That will be a future consideration, for now this has proven to be a valuable way of doing verification and validation. The UML metamodel will be covered in Part 2 of this blog series, for Part 1, it is instructive perhaps to show a small piece of the XMI. Below is the XMI for Basic-Property from the UML model infrastructure.cmof.xmi.

<ownedAttribute xmi:type="cmof:Property"
  xmi:id="Core-Basic-Class-ownedAttribute" name="ownedAttribute"
  type="Core-Basic-Property" isOrdered="true"
  upper="*" lower="0" isComposite="true"
  association="Core-Basic-A_ownedAttribute_class">
  <ownedComment xmi:type="cmof:Comment"
    xmi:id="Core-Basic-Class-ownedAttribute-_ownedComment.0"
    annotatedElement="Core-Basic-Class-ownedAttribute">
    <body>The attributes owned by a class.
    These do not include the inherited attributes.
    Attributes are represented by instances of Property.</body>
  </ownedComment>
 </ownedAttribute>

Figure 2: A fragment of the XMI for the UML metamodel

As a example of XMI element mappings, the sxml:element maps the XMI element for ocmof:ownedAttribute as shown in the Turtle extract from the OWL model below.

ocmof:ownedAttribute
  a  owl:Class ;
  rdfs:label "Attribute"^^xsd:string ;
  rdfs:subClassOf ocmof:TypedThing , ocmof:NamedThing ;
  rdfs:subClassOf
    [ a owl:Restriction ;
      owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
      owl:onProperty ocmof:isComposite
    ] ;
  rdfs:subClassOf
    [ a owl:Restriction ;
      owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
      owl:onProperty ocmof:type
    ] ;
  rdfs:subClassOf
    [ a  owl:Restriction ;
      owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
      owl:onProperty ocmof:isDerivedUnion
    ] ;
  rdfs:subClassOf
    [ a  owl:Restriction ;
     owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
     owl:onProperty ocmof:isReadOnly
    ] ;
  rdfs:subClassOf
    [ a  owl:Restriction ;
      owl:maxCardinality "1"^^xsd:nonNegativeInteger ;
      owl:onProperty ocmof:default
    ] ;
  sxml:element "ownedAttribute"^^xsd:string .

Figure 3: ocmof:ownedAttribute in Turtle

The last line, sxml:element "ownedAttribute"^^xsd:string, is the mapping.

As a example of XMI attribute mappings, the sxml:attribute maps the XMI attribute for ocmof:isOrdered as shown in the Turtle extract from the OWL model below.

ocmof:isOrdered
  a owl:DatatypeProperty ;
  rdfs:domain ocmof:ownedAttribute ,
  ocmof:ownedParameter ,
  ocmof:OwnedEnd ;
  rdfs:label "is ordered"^^xsd:string ;
  rdfs:range xsd:boolean ;
  sxml:attribute "isOrdered"^^xsd:string .

Figure 4: ocmof:isOrdered in Turtle

The last line, sxml:attribute "isOrdered"^^xsd:string, is the mapping.

The transformation to OWL results in the following class for uml:Core-Basic-Property.
Clicking on the image will open a larger image in a new window.

Figure 5: A Generated Metaclass Example - uml:Core-Basic-Property

The diagram shows how the datatype properties of the class uml:Core-Basic-Property correspond to the XMI attributes given in the above fragment. For example isComposite becomes the property hasBooleanIsComposite. The prefix hasBoolean is customizable.

First an OWL model of CMOF XML Elements is used to generate instances of metaclasses to build OWL Classes for XMI Elements. The namespace prefix of ocmof has been used to denote all modeling constructs that makeup the CMOF metamodel. The prefix cmof is the namespace for all constructs generated from the import of the XMI files.

In the diagram below, we show the main classes of the metamodel. Classes like NamedThing and TypedThing have been introduced to optimize the work of the transformers. Constructs in XMI can typically be both named and typed. This kind of multiple inheritance is no problem for the transformations. The diagram is a partial view only. Clicking on the image will open a larger image in a new window.

Figure 6: Some of the classes of the CMOF OWL model

As an alternate view, the diagram that follows is an HTML report of NamedThing in TopBraid Composer. This is automatically generated using SPARQL Web Pages (aka UISPIN)

Figure 7: OCMOF NamedThing - an abstract class for the transformations

The diagram below shows more details of some ownedElements. Note how attributes of each of these classes relate to CMOF constructs.

Figure 8: Some "ownedElement" OWL Classes in the OCMOF model

These ocmof classes serve as the starting point for generating ocmof meta-classes and instances of these classes that become the UML model transformed into OWL. The figure below shows the main metaclasses that are generated by rules on the ocmof classes.

Figure 9: The key Meta-classes of the CMOF OWL model

Top

How the Transformations from UML to OWL Work

Model-based transformations use rules associated with OWL Classes. OWL Metaclasses are built using a SPARQL rule for instances of TypedThing. The names of the metaclasses are determined from the value of the xmi:type attribute. A number of SPARQL Rules are defined on TypeThing. Priorities are set by the alphabetic ordering given by the first comment line of the rule. These rules look after the generation of:

UML Metaclasses
UML Classes
UML Class Superclass Relationships
UML Packages
UML Package Relationships

Each rule will now be described.

Top

Generation of UML Metaclasses

The first task is to create a metaclass and class for every type of element in the ingested XMI file. This is done using the SPARQL Rule below:


# STEP CMOF-SR-001  make UML Metaclass from CMOF type
CONSTRUCT {
  ?metaClassURI a rdfs:Class .
  ?metaClassURI rdfs:subClassOf cmof:MetaClass .
  ?metaClassURI rdfs:label ?metaClassLabel .
  ?typeURI a owl:Class .
  ?typeURI a ?metaClassURI .
  ?typeURI rdfs:subClassOf uml:Construct .
  ?typeURI rdfs:label ?classLabel .
 }
WHERE {
  ?this xmi:type ?type .
  FILTER (?type != "cmof:Property") .
  BIND (o2o:localNameOfQName(?type) AS ?name) .
  BIND (fn:concat("CMOF ", ?name) AS ?metaClassLabel) .
  BIND (fn:concat("UML ", ?name) AS ?classLabel) .
  BIND (xmi.common:makeUML-URI(?name) AS ?typeURI) .
  BIND (xmi.common:makeCMOF-URI(?name) AS ?metaClassURI) .
}

Figure 10: The SPARQL Rules that make the metaclasses in the OCMOF model

What is going on in these rules? First we explain the "where" clause.

?this xmi:type ?type binds ?this to an instance of TypedThing. For each instance the rule is evaluated.

FILTER (?type != "cmof:Property") blocks further evaluation of the rule if the instance is of type cmof:Property. The reason for this will be explained in Part 2.

BIND (o2o:localNameOfQName(?type) AS ?name) extracts the name of the type from the QName.

BIND (fn:concat("CMOF ", ?name) AS ?metaClassLabel ) builds a label for the metaclass. The function fn:concat is from the JENA SPARQL Library. We use it here to prepend "CMOF" to the name we get from the type of the XMI Element.

BIND (fn:concat("UML ", ?name) AS ?classLabel) makes a class label from the name. We will be constructing both a metaclass and a class from the XMI type. We build a metaclass in order to say what kind of things can happen on the classes. In other words, the generated OWL model is a 3-level ontology. Likewise here we build a label for the UML Class.

BIND (xmi.common:makeUML-URI(?name) AS ?typeURI) builds a URI for the UML Class corresponding to type. This uses a function call to xmi.common:makeUML-URI whose job it is to build the correct namespace path for a UML construct URI. The implementation is shown below.

SELECT ?uri
WHERE {
  BIND (xmi.common:baseURI() AS ?baseURI) .
  BIND (smf:buildURI("{?baseURI}#{?arg1}") AS ?uri) .
}

where,
smf:buildURI("{?baseURI}#{?arg1}")) builds a URI for the name given in ?arg1 with a base URI supplied by the function xmi.common:baseURI().

BIND (xmi.common:makeCMOF-URI(?name) AS ?metaClassURI ) builds a URI for the metaclass corresponding to type. Likewise this constructs a namspace path for CMOF constructs.

Next we explain what is happening in the head of the rule with the Construct statements. These statements use the generated URIs to create instances of meta-classes and classes.

?metaClassURI a rdfs:Class
gives the metaClass its type.

?metaClassURI rdfs:subClassOf cmof:MetaClass
specifies that the metaclass is a sub-class of cmof:MetaClass - an abstract metaclass for all cmof classes.

?metaClassURI rdfs:label ?metaClassLabel
gives the metaclass a human label.

?typeURI a owl:Class
gives the UML Class a type

?typeURI a ?metaClassURI
gives the UML Class a more specific type so that it can have more properties than owl:Class provides.

?typeURI rdfs:subClassOf uml:Construct
specifies that the UML Class is a subclass of the abstract OWL Class uml:Construct/

?typeURI rdfs:label ?classLabel
gives the UML Class a human label.

Top

Generation of UML Classes

Once we have the necessary metaclasses we can begin the work of creating instances of those classes. These instances will, of course, be classes (the meta-world can get confusing). This work is done the the SPARQL Rule below.


# STEP CMOF-SR-002  make UML Classes from CMOF elements
CONSTRUCT {
  ?type a rdfs:Class .
  ?type rdfs:subClassOf cmof:MetaClass .
  ?class a ?type .
  ?class rdfs:label ?name .
  ?class ocmof:hasCMOFbasis ?this .
  ?superURI a owl:Class .
  ?superURI a cmof:CategoryClass .
  ?superURI rdfs:label ?super .
  ?subURI a owl:Class .
  ?subURI a cmof:CategoryClass .
  ?subURI rdfs:subClassOf ?superURI .
  ?subURI rdfs:label ?sub .
  ?class rdfs:subClassOf ?mySuperClass .
}
WHERE {
  ?this xmi:type "cmof:Class" .
  ?this xmi:id ?name .
  BIND (o2o:pathPart(?name, "-") AS ?path) .
  OPTIONAL {
    ?path o2o:pairHyphenIncrementally ( ?super ?sub ) .
    BIND (xmi.common:makeUML-URI(smf:buildString("CLASSES_{?super}")) AS ?superURI) .
    BIND (xmi.common:makeUML-URI(smf:buildString("CLASSES_{?sub}")) AS ?subURI) .
    BIND (xmi.common:makeCMOF-Resource("cmof:Class") AS ?type) .
    BIND (xmi.common:makeUML-URI(?name) AS ?class) .
    } .
  BIND (xmi.common:makeUML-URI(smf:buildString("CLASSES_{?path}")) AS ?mySuperClass) .
  }

Figure 11: The SPARQL Rules that make UML Classes

More details of this transformation will be given in Part 2 of this blog series. An interesting aspect of this particular rule to mention now is how it builds deep inheritance structures by using Property Functions to recurse over hyphenated names (more on the use of Property Functions, also known as Magic Properties, with TopBraid Composer can be found at this blog entry). These hyphenated names occur throughout the XMI metamodel of UML. For example Core-Basic-Class looks like:

<ownedMember xmi:type="cmof:Class" xmi:id="Core-Basic-Class" name="Class" superClass="Core-Basic-Type">
  <ownedComment xmi:type="cmof:Comment" xmi:id="Core-Basic-Class-_ownedComment.0"
    annotatedElement="Core-Basic-Class">
    <body>A class is a type that has objects as its instances.</body>
  </ownedComment>
  <ownedAttribute xmi:type="cmof:Property" xmi:id="Core-Basic-Class-isAbstract"
    name="isAbstract" type="Core-PrimitiveTypes-Boolean" default="false">
    <ownedComment xmi:type="cmof:Comment" xmi:id="Core-Basic-Class-isAbstract-_ownedComment.0" annotatedElement="Core-Basic-Class-isAbstract">
    <body>True when a class is abstract.</body>
   </Attribute>

Figure 12: Example of Hyphenated Names in the UML Metamodel

How this is done in the SPARQL Rule is explained briefly below.

In the SPARQL Rule shown above in figure 11, the statement in the tail: ?path o2o:pairHyphenIncrementally ( ?super ?sub ) is a Property Function that returns two results: ?super and ?sub for every hypenated pair.
for each pair the statement: ?subURI rdfs:subClassOf ?superURI in the head of the rule builds superclass relationships.

Top

Generation of UML Class Superclass Relationships

Once we have all of the UML Classes, the next rule can build the rdfs:subClassOf relationships.

# STEP CMOF-SR-005 - fixup the superclass of the root Classes
CONSTRUCT {
  ?class rdfs:subClassOf uml:Class .
}
WHERE {
  ?class a cmof:CategoryClass .
  BIND (afn:localname(?class) AS ?className) .
  FILTER fn:starts-with(?className, "CLASSES_") .
  NOT EXISTS {
  ?class rdfs:subClassOf ?superClass .
 } .
}

The result of executing the preceding UML Class rules is the UML Class Hierarchy shown in the diagram below.

Figure 13: Generated UML Metamodel Class Hierarchy

Top

Generation of UML Packages

# STEP CMOF-SR-020 - make Packages
CONSTRUCT {
  ?package rdfs:label ?name .
  ?package ocmof:hasCMOFbasis ?this .
  ?mySuperClass a owl:Class .
  ?mySuperClass rdfs:label ?path .
  ?superURI a owl:Class .
  ?superURI a cmof:CategoryClass .
  ?superURI rdfs:label ?super .
  ?subURI a owl:Class .
  ?subURI a cmof:CategoryClass .
  ?subURI rdfs:subClassOf ?superURI .
  ?subURI rdfs:label ?sub .
  ?package a ?mySuperClass .
  ?package a uml:Package .
}
WHERE {
  ?this xmi:type "cmof:Package" .
  ?this xmi:id ?name .
  BIND (xmi.common:makeUML-URI(?name) AS ?package) .
  BIND (o2o:pathPart(?name, "-") AS ?path) .
  OPTIONAL {
    ?path o2o:pairHyphenIncrementally ( ?super ?sub ) .
    BIND (xmi.common:makeUML-URI(smf:buildString("PACKAGES_{?super}")) AS ?superURI) .
    BIND (xmi.common:makeUML-URI(smf:buildString("PACKAGES_{?sub}")) AS ?subURI) .
  } .
  BIND (xmi.common:makeUML-URI(smf:buildString("PACKAGES_{?path}")) AS ?mySuperClass) .
}

Top

Generation of UML Package Relationships

# STEP CMOF-SR-024 - fixup the superclass of the root Packages
CONSTRUCT {
  ?packageClass rdfs:subClassOf uml:Package .
}
WHERE {
  ?this xmi:type "cmof:Package" .
  ?package ocmof:hasCMOFbasis ?this .
  ?package a ?packageClass .
  NOT EXISTS {
  ?packageClass rdfs:subClassOf ?superClass .
  } .
 }

The result of executing the preceding UML Package rules is the UML Package Hierarchy shown in the diagram below.

Figure 14: Generated UML Metamodel Package Hierarchy

Top

Performance

As a measurement of the performance with the TopBraid Composer release 3.4.0, the conversion of the UML Infrastructure XMI took 38.611 seconds and generated 19,575 statements (RDF triples) on a DELL Studio XPS Laptop with 4GB of memory, running Windows 7. This translates to an inference speed of 507 TPS (Triples per second).

Top

Concluding Remarks

Part 1 of this blog has introduced the power of model-based transformation using SPARQL Rules as a means to transform XMI to OWL. Our experience in doing this work confirms the extensibility and flexibility of this approach. The subject is a complex one requiring a grounding in the intricacies of UML Metamodeling, and a knowledge of SPARQL and SPARQL Rules. We have attempted to do that briefly in this blog - not an easy matter.

Part 2 of this blog series will discuss transforming UML Models to OWL Using SPARQL.

Running TopBraid Live in the Amazon EC2 cloud

2011-01-19T05:12:00.000-08:00

A recent Dilbert strip inspired me to go through Dave Winer's EC2 for Poets tutorial as a geeky weekend project. It was surprisingly easy and inexpensive to get a computer image running in Amazon's "Elastic Computing Cloud" (EC2) and to then get a copy of TopQuadrant's TopBraid Live running in that image.

These images are cheap to run, as you can see on their price list. Note that the cost per hour of running a default Linux image is not eighty-five cents an hour when using servers in northern Virginia, but eight and a half cents. (It's an additional penny an hour when using their servers in California, Ireland, or Singapore.) If you're willing to spend a dollar or two an hour, you can get full control of some really large-scale computing power without spending much money unless you're planning some long-term use of it, in which case you'll want to compare the options with your requirements more closely than I did. You'll want one of the EBS images, which still offers you a wide choice of platforms. Otherwise, the only way to stop a running image is to terminate it, in which case it (and all your configuration of it) is gone. With an EBS image, once you stop running it, you can always restart it.

I started off with an Ubuntu Linux image, but I had trouble installing Sun Java on it. After starting up a Fedora image, installing Sun Java and Tomcat were easy, and once Tomcat was installed, the installation of TopBraid Live under Tomcat according to the TBL installation instructions was simple and straightforward.

The ease with which I switched my efforts from Ubuntu to Fedora was an important lesson in cloud use; it sure was easier than installing Fedora over Ubuntu on a laptop hard disk, and if I decided to switch back, going back to the Ubuntu EC2 image would take seconds, unlike reverting a hard disk once I'd overwritten the Ubuntu image with Fedora. You can initialize several different images in the EC2 cloud, and if you're not running any of them, Amazon only bills you for storage, which is pennies, so you can set up several to wait for you and start and stop them whenever you like. If you want more memory, cluster processing, 64-bit instead of 32-bit, or other additional resources, you can just try it and see how it goes. This means that when you plan out the deployment of TopBraid applications using 32- or 64-bit TBL, you don't have to worry too much about having an available machine with the best possible operating system and hardware configuration, because you can experiment with different cloud images until you find the setup that's best for you. (I never tried a Windows image or SUSE or one of the other Linux images, but if I was going to roll out a production TBL application I'd explore these further as part of my planning.)

It's pretty impressive to think about the kind of power and flexibility that you can get when you deploy cloud-based semantic web applications, and as I found out with my little experiment, the barriers to entry for trying it out are extremely low. After 16 hours of running one instance, 3 hours of running another, and 3.1 GB of data transfer in the last few weeks, I currently owe Amazon Web Services a total of $2.29. So, if you're holding off on trying TBL because you don't have the appropriate box on which to run it, Amazon's EC2 offers some nice options to try out.

How to: convert a spreadsheet to SKOS

2010-12-29T11:10:00.000-08:00

In an earlier entry, we learned how SPARQL Rules can increase the quality of taxonomies and other controlled vocabularies stored using the W3C SKOS ontology. (As I wrote there, the Simple Knowledge Organization Systemvocabulary management specification is gaining popularity because, as a standard, it makes it easier to share taxonomies and thesaurii between different systems. It also guards investments in vocabulary development against the potential problems of dependence on a proprietary vendor format.)

TopQuadrant's Enterprise Vocabulary Net (EVN) vocabulary manager uses SKOS as its default format for storing data. Whether you use EVN or not, a first step in systematic management of vocabularies is often the conversion of vocabularies stored in ad hoc spreadsheets—an unfortunately very popular way to store them—to SKOS, so today we'll look at how TopBraid makes this conversion easy.

Below is an Excel spreadsheet with some data about a few Caniformia animals. (In the Linnaeus classification of animals, Caniformia is the suborder of Carnivora, which is an order of the Mammalia class.) It shows two families of this suborder and a few genuses and species of each family, with both the Latin and common name of each species.

Using a SPARQLMotion script, the basic steps of converting a spreadsheet like this to SKOS are:

Read in the spreadsheet as a set of RDF triples.
Use a CONSTRUCT query to convert the spreadsheet triples to SKOS triples. This is the step that varies the most from one conversion to another, because people can arrange spreadsheets any way they want, so the logic of the CONSTRUCT query has to infer the correct relationships between the values on the spreadsheet.
Save the SKOS triples as an RDF file or in whatever format is appropriate to your applications that will use this data.

The following shows the SPARQLMotion script that I used to convert the spreadsheet above.

It has a module for each of the three steps listed above and an additional SetBaseURIStr module to set a ?baseURIStr variable. The script refers to the base URI of the output several times, and instead of hardcoding it in all those places, I decided to use this module to set this variable and to then reference the variable from other places so that resetting the base URI could be done in one place. The "set BaseURIString" module has a very simple SELECT query:

SELECT ?baseURIStr
WHERE {
    LET (?baseURIStr := "http://example.com/taxonomies/animals") .
}

When you import an Excel file into TopBraid, the "Import Excel Cell Instances" SPARQLMotion module can pull triples from the spreadsheet with information such as the fact that a given cell has a row value of 7 (using zero-based counting), a column value of 0, a type value of "xsd:string", and "giant panda" as its contents. This level of detail can be useful for picking apart complex spreadsheets, but for simpler ones, if you instead use an "Import RDF from Workspace" module (in other words, if you have the script open the spreadsheet as if it were an RDF file), TopBraid uses the headings of the spreadsheet to identify more of the semantics of the data. For example, it would create triples saying that the thing identified as Row-6 has a commonName value of "giant panda" and a genus value of "Ailuropoda". This will be easier to convert to SKOS with a CONSTRUCT query.

There are five basic tasks that the conversion module must perform, all through the creation of triples:

Declare that the dataset being created is an ontology.
Import the standard W3C SKOS ontology so that we can reference its classes and properties.
Declare a concept scheme. A SKOS vocabulary can have as many concept schemes as you like, but we'll just create one for our example.
Declare concepts for each species, genus, and family found in the input triples, with a skos:broader property pointing from each one to either the appropriate broader concept or, if there is none, to the concept scheme created in the previous step.
Create triples that attach any additional metadata to the appropriate concepts—in this case, to assign the common name value to each species concept. SKOS is very flexible, so if you had additional non-SKOS properties specific to your own applications that you wanted to assign to each concept, the steps would be similar to the ones for attaching the common name values from this spreadsheet to each concept.

Whe the "Import RDF from Workspace" module reads in a spreadsheet such as caniformia.xls, it uses the spreadsheet's filename to define a prefix for the spreadsheet's properties so that it can refer to those properties with names like caniformia:genus. After opening the spreadsheet directly in TopBraid Composer, I saw that the base URI created for the data and associated with the caniformia: prefix was file:///xls2skos/caniformia.xls, because I had it in a project named xls2skos. I wanted to use this prefix in my SPARQLMotion script's CONSTRUCT query, so I associated this URI with the caniformia: prefix in the Overview tab of the xls2skos.n3 file that stored the SPARQLMotion script.

The actual conversion takes place in the Apply Construct module that I named "convert XLSData". These modules can store multiple CONSTRUCT queries, so I used two. The first does the basic setup of the taxonomy being created, which are the first three of the five tasks listed above:

CONSTRUCT {
    ?baseURI a owl:Ontology .
    ?baseURI owl:imports <http://www.w3.org/2004/02/skos/core> .
    <http://example.com/taxonomies/animals/caniformia> a skos:ConceptScheme .
    <http://example.com/taxonomies/animals/caniformia> rdfs:label "Caniformia" .
}
WHERE {
    LET (?baseURI := smf:buildURI("<{?baseURIStr}>")) .
}

The second query performs steps 4 and 5:

CONSTRUCT {
    ?speciesURI a skos:Concept .
    ?genusURI a skos:Concept .
    ?familyURI a skos:Concept .
    ?speciesURI skos:prefLabel ?speciesName .
    ?speciesURI skos:altLabel ?commonName .
    ?speciesURI skos:broader ?genusURI .
    ?genusURI skos:broader ?familyURI .
    <http://example.com/taxonomies/animals/caniformia> skos:hasTopConcept ?familyURI .
}
WHERE {
    ?row caniformia:commonName ?commonName .
    ?row caniformia:species ?speciesName .
    LET (?species := smf:encodeURL(?speciesName)) .
    LET (?speciesURI := smf:buildURI("<{?baseURIStr}#{?species}>")) .
    ?row caniformia:genus ?genusName .
    LET (?genus := smf:encodeURL(?genusName)) .
    LET (?genusURI := smf:buildURI("<{?baseURIStr}#{?genus}>")) .
    ?row caniformia:family ?familyName .
    LET (?family := smf:encodeURL(?familyName)) .
    LET (?familyURI := smf:buildURI("<{?baseURIStr}#{?family}>")) .
}

In addition to creating a SKOS concept for each species, it creates one for each genus and family as well, using the skos:broader property to identify the connections between these concepts that make up the hierarchical taxonomy of terms.

One big decision to make with this query was how to create URIs that provided unique identifiers for each new concept being created. I knew that the species, family, and genus names must be unique, so I added those to the base URI after passing them to smf:encodeURL(), a SPARQLMotion extension function that escapes any characters that won't work well in a URI. If you have taxonomy data in a spreadsheet, there may already be a unique number or other form of ID assigned to some or all taxonomy terms that your conversion can grab so that you don't have to create URIs from the names on the spreadsheet like I did.

I also decided to use species names like "Canis lupus" as the skos:preferredLabel value in the output and to use labels from the spreadsheet's "common name" column like "gray wolf" as skos:altLabel values. If you wanted to to use common name values as preferred labels and species names as alternative labels, it would be a simple change to the query above.

My "convert XLSData" module also has its sml:replace value set to True so that it doesn't pass along the input triples to the final module, which saves the conversion result. This last module, which I named "Save as TDB", is an "Export to TDB" SPARQLMotion module that saves the conversion results using the Jena TDB format. I could have used a SPARQLMotion "Export to RDF" module, which saves triples as a Turtle or RDF/XML disk file, but I wanted to use the results of my conversion in EVN. EVN requires that you use Jena's TDB or SDB formats so that it can attach metadata to your work to support reporting and workflow tracking.

After running the conversion, here is one view of it in EVN:

This is a minimal example using a small spreadsheet with no non-SKOS metadata. If your spreadsheet includes columns for data that don't fit easily into the SKOS model, you can use TopBraid Composer to create a customized version of SKOS that includes your own properties, and your SPARQLMotion script's conversion module can then add triples for those properties to the result. Viewing them in EVN, they would appear under Custom Properties on the right. And of course, the screen shot above only hints at all that EVN lets you do with your controlled vocabulary once you convert it to SKOS.

Getting started with SPARQL Web Pages

2010-11-10T06:59:00.000-08:00

In earlier entries on this blog, we've seen how SPARQL Rules attached to classes can let you identify constraint violations in instances of those classes and implement other kinds of business logic, all using the SPARQL standard. With release 3.4, which is now in beta, TopQuadrant products have another new application of SPARQL that lets you attach useful metadata to class definitions: descriptions of how you want the class's instances to look in a browser. We call this SPARQL Web Pages.

TopQuadrant vice president of product development Holger Knublauch has written several blog entries introducing SPARQL Web Pages under its original name, UISPIN, such as UISPIN: Creating HTML and SVG Documents with SPARQL, Charts and Business Reports with UISPIN, and UISPIN Example: Documenting SPIN Functions. In this blog entry, we'll see how to get started with a simple but useful example.

Address book information is not as simple to represent in RDF as one might think. A street address, city name, and postal code should be shown in a specific order, but RDF facilities for ordering the property values for a given instance can add annoying layers of complexity to a data model. Without ordering, though a simple address book entry can be difficult to read, like the following fake address shown in Turtle format:

a:Entry_1
      rdf:type a:Entry ;
      a:city  "San Diego" ;
      a:email "jerry122@hotmail.com" ;
      a:firstName "Jerry" ;
      a:homePhone "(738) 610-2019" ;
      a:lastName "Snyder" ;
      a:mobile "(702) 382-4712" ;
      a:postalCode "39248" ;
      a:region "CA" ;
      a:streetAddress "3137 11th Ave." .

Using TopBraid Composer Maestro Edition, I created an RDF file with an Entry class for address book entries. I declared the properties shown above, with the Entry class as their domain, and then I added a few fake address book instances to the file. The file has a base URI of http://example.org/addressBook, which will be important later.

Next, I imported the html.rdf model from the UISPIN folder of the TopBraid project that is automatically added to every workspace. (In a production application, especially if I was working with a standard ontology instead of a hand-crafted little model defining an address book entry, I'd create a new file that imported the standard class and property definitions as well as html.rdf instead of adding the SPARQL Web page definitions directly into the file that defines the address book model.) The UISPIN folder also includes models to generate SVG, charts, and more; the html.rdf file lets you add HTML-generating SPARQL code to your classes. Below, you can see how I've added the ui:instanceView property from this file's model to the definition of my Entry class:

When I scroll the Class View down a little, you can see what I entered as the ui:instanceView property value. It's an HTML template, with instructions for plugging in instance data, that will be output whenever TopBraid sees an instance of this class:

It lays out an div element that begins with the address entry's mailing address and then has a small table showing the entry's email address and phone numbers, with these property names bolded in the output. You can take advantage of the full power of SPARQL in these templates, as you'll see in Holger's blog entries, but I kept things simple by mostly just using the spl:object() function to insert specific property values into various places in the HTML.

When I view the form tab of the Entry class and click on a row in the Instances view, I see a Resource Form for that instance, like I always did:

You can see a new Browser tab next to the Form tab, though, and it lets me see the instance view formatted according to the HTML template that I created in the ui:instanceView value:

Even better, TopBraid Live can serve up the data using these HTML templates outside of TopBraid Composer, so that sending a browser to the URLhttp://localhost:8083/tbl/uispin?_resource=http://example.org/addressBook%23Entry_1&_baseURI=http://example.org/addressBook displays the result in the browser (note the use of the base URI to specify the model with the data to display and the escaping of "#" as "%23"):

Of course, this HTML can also have links and reference CSS, Javascript, web services, and other applications on the TBL server where it's hosted, so you can build some very sophisticated user interfaces. Check out Holger's blog entries for further ideas on where you can take this, especially when you start incorporating SPARQL queries and their results into the templates.

TopQuadrant's new Enterprise Vocabulary Manager

2010-10-27T08:44:00.000-07:00

The TopBraid platform can be used to build all kinds of applications and solutions. We've recently noticed one particular area where more and more customers needed help, and where semantic technology and our tools were a great fit: the management of multiple connected vocabularies spread out across an enterprise. To meet this need, we've created TopBraid Enterprise Vocabulary Net (EVN), a solution that works out of the box while having all the power of TopBraid Suite behind its customization capabilities.

The EVN product page has a long list of its features, which provide everything you need to manage taxonomies and thesaurii (and even create simple ontologies) in multi-user environments. The ability to review proposed changes before rolling them into production, with a choice of reports and other options for analyzing those changes and their potential impact, will be especially useful in larger organizations.

The use of EVN requires no knowledge of SKOS, RDF, or the related W3C standards, but the use of these standards behind EVN's graphical user interface is what makes EVN both flexible and scalable. The use of public standards for data, models, and application logic makes it much easier to integrate EVN with other systems than any other vocabulary management solutions we've seen in the marketplace. They also make it easier for EVN to let you set up an environment where different vocabularies in different parts of a large organization can work cooperatively with no need to merge those vocabularies into a single large, central vocabulary.

EVN is included in TopBraid Composer Maestro Edition release 3.4, which is now in beta, so you can try it without purchasing a separate product. For a quick overview of the features and what the product looks like, start with thescreenshot tour, or jump right in to the tutorial included with EVN's documentation.

How to: read RSS and RDFa from the web with a SPARQLMotion script

2010-10-04T06:54:00.000-07:00

How do you get a SPARQLMotion script to read an RSS or Atom feed as RDF triples? How do you get a SPARQLMotion script to read triples that have been embedded into web pages using RDFa? The answer to both questions is the same: use the specialized SPARQLMotion module for the task. All you have to do is specify the URL of the file with the information you want to read.

To demonstrate both, we'll put together a short script that:

Reads the RSS feed about technology news from Newsweek magazine
Pulls the triples from the RDFa embedded in the Newsweek articles described in the feed
Saves the extracted triples in a Turtle file

Along with Dublin Core properties such as dc:title and dc:description, RDFa attributes in Newsweek articles store additional RDF metadata using the Open Graph vocabulary developed by Facebook. This makes it easier for Facebook to incorporate additional information about news articles in their applications—for example, if people click the Facebook button next to a Newsweek article in order to share it with their Facebook friends.

It also makes it easier for you to use information about these articles in your own applications. The sample application below just saves the retrieved triples in a file, but you could also pass them to other SPARQLMotion modules that could have OpenCalais analyze the text, combine the triples with data from another source, create a new, specialized RSS feed or SPARQL endpoint, or send an email message based on the results of your processing. Retrieving the data is just the beginning.

To create this application, start by creating a new SPARQLMotion file called getnewsweektech. (For more detailed background on the steps involved in creating and running a SPARQLMotion script, see the PDF tutorial TopBraid Application Development Quickstart Guide.)

Create a new SPARQLMotion script in your getnewsweektech.n3 file. For its first module, select sml:ImportNewsFeed from thesml:ImportFromRemoteModules category and name it GetNewsweekTechNewsFeed. To configure it, you only need to set its sml:url value tohttp://feeds.newsweek.com/newsweek/technology?format=xml, a URL I learned about from Newsweek's web page about their RSS feeds.

Once this module pulls down the RSS data and TopBraid converts it to triples, your script will look through these triples for web page URLs provided as RSS link values and then retrieve the triples that are stored as RDFa in those web pages. The script can't pull the triples from all those web pages at once, so we'll use an IterateOverSelect module to drive the next step. We'll specify a SPARQL SELECT query in the IterateOverSelect module to find the RSS link values, and then for each result that this SELECT query finds, another module will retrieve the triples from the web page named by the link value.

Drag an Iterate over select module from the Control Flow section of the SPARQLMotion palette and name it GetArticleLinks. Paste the following query in as the value for its sml:selectQuery property:

PREFIX rss: <http://purl.org/rss/1.0/> 
SELECT ?articleURLString
WHERE {
  ?s a rss:item .
  ?s rss:link ?articleURL .
  LET (?articleURLString := xsd:string(?articleURL)) .
}

The module that retrieves the RDFa needs a string version of the URL to specify where it should look for the RDFa, so the query above assigns a string version of each rss:item resource's rss:link value to the variable ?articleURLString. The script will execute the body of the IterateOverSelect (a separate module that we haven't created yet) once for each value bound to this variable. You're done configuring this module.

Next, we'll create the body of the IterateOverSelect. This can be a series of modules, but for this application we'll only need one. Drag anImport RDFa module from the Import from Remote section of the SPARQLMotion palette and name it ReadArticleRDFa. When configuring this new module, click the white triangle for its sml:url property's context menu and select Add SPARQL expression. This lets you add any combination of SPARQL keywords, symbols, function calls, and operators that returns a single value; for this, all you need here is the variable reference ?articleURLString. Each time this module retrieves triples from the RDFa in the web page at this URL, it will pass along the triples that it found to the next module. If this module has an sml:needsTidy property, set it to True to make it easier to read RSS that isn't well-formed XML.

For our script's last module, drag an Export to RDF File module from the palette's Export to Local section and call it SaveArticleTriples. Set its sml:targetFilePath value to newsweekTech.n3; it will write this file to the directory that holds the SPARQLMotion file with your script. Set the module's sml:baseURI to http://example.com/newsweek/tech/metadata or to any URI that you like.

All that's left is to connect up the four modules as shown below. When you add a connector out of your Get Article Links Iterate Over Selectmodule, TopBraid Composer will ask you whether your new connector is pointing at the body of the loop (the part to execute for each binding of the selected variable) or at the module that should take control of the script when the iteration is finished. Connect Get Article Links to the Read Article RDFa module with an sm:body link, because that's the part we want executed for each iteration, and connect Get Article Links to Save Article Triples with an sm:next link to transfer control (and the collected triples) there when the iteration is all done.

Select the Save Article Triples module and click the green triangle at the top of the workspace to execute the script up to that final module, and you should end up with a newsweekTech.n3 file in the same directory as your getnewsweektech.n3 file that holds the script. This new file will hold triples extracted from the various web pages named in the Newsweek tech news feed.

To branch out, you could substitute the names of other Newsweek feeds, or additional ones, and then collect all the triples together. You could drive the whole thing with a TopBraid Ensemble interface where an end user picks the category of Newsweek news (for example, their technology, politics, business, or entertainment categories) whose metadata should be retrieved. You could also find other publications that store RDFa metadata in their articles, or other websites, such as TopQuadrant's. And, as I mentioned earlier, you could combine this with other features of SPARQLMotion and TopBraid to make a very powerful application.

How to: Find SKOS constraint violations in AGROVOC with SPARQL Rules

2010-08-16T07:24:00.000-07:00

The Simple Knowledge Organization System (SKOS) vocabulary management specification is gaining popularity because, as a standard, it makes it easier to share taxonomies and thesaurii between different systems. It also guards investments in vocabulary development against the potential problems of dependence on a proprietary vendor format.

The W3C makes an OWL ontology for SKOS available, which makes it easier to ensure that your vocabulary conforms to the standard. As a comment near the beginning of it tells us, though,

A number of semantic conditions are *not* expressed formally in this schema. These are:

S12
S13
S14
S27
S36
S46

For the conditions listed above, rdfs:comments are used to indicate the conditions.

The comment for S13 says that "skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties". In plain English, this means that for a given concept, you can't use the same term for any two of these properties. For example, you shouldn't say that "dog" is both the preferred label and the alternate label for a given concept—it should be one or the other.

If these constraints are in the ontology as comments and not as something that can be implemented by executable code, how do you find violations of these constraints? The simplest way I've found is to use SPARQL Constraints (with SPIN). We've built on Paul Hermans' work to implement these constraints in the ontology at http://topbraid.org/spin/skosspin. It imports the W3C SKOS ontology and adds one rule to the skos:OrderedCollection class for constraint S36 and rules for the other five constraints to the skos:Concept class. For example, it adds the following three rules for constraint S13:

# Constraint S13: skos:prefLabel, skos:altLabel and skos:hiddenLabel 
# are pairwise disjoint properties.
ASK WHERE {
    ?this skos:prefLabel ?label .
    ?this skos:altLabel ?label .
}

ASK WHERE {
    ?this skos:prefLabel ?label .
    ?this skos:hiddenLabel ?label .
}

ASK WHERE {
    ?this skos:hiddenLabel ?label .
    ?this skos:altLabel ?label .
}

If any of these returns a boolean true, then we know that constraint S13 has been violated. (If you look at the skosspin ontology, you'll see these queries represented as triples, which is more difficult to read but easier to implement than rules expressed as SPARQL queries as shown above. TopBraid Composer can convert between the two formats, and so can a SPARQL Text to SPIN RDF Syntax Converter that Holger Knublauch has made available on the web.)

I tested this with the Food and Agriculture Organization of the United Nation's popular AGROVOC thesaurus, a vocabulary "designed to cover the terminology of all subject fields in agriculture, forestry, fisheries, food and related domains", and found over 1600 violations of constraint S13. Because this thesaurus has almost 29,000 concepts and preferred and alternate labels in multiple languages for most concepts, it's easier to violate these constraints than you might think, and I never would have found them without the ability to automate this search. For example, concept http://www.fao.org/aos/agrovoc#c_1135 has an English preferred label of "Buds", 14 preferred labels for other languages such as Farsi and Thai, and 24 alternate labels. Among these, the Slovak skos:prefLabel value and the Sloval skos:altLabel value are both "púèiky", so this concept violates constraint S13.

How do we find the violations? I tried it with the free edition of TopBraid Composer, because it has everything you need to define and use SPARQL Rules. (TopBraid Composer Maestro Edition's ability to use these rules from within applications has made it possible for me to add several nice features to applications for some of our clients.)

The screenshot below of TopBraid Composer's Navigator and Imports views shows that I created a checkAGROVOC project and added a checkAGROVOC.n3 ontology file to it. This ontology only does two things: it imports the ag_skos_080422.rdf file that I downloaded from the fao.org web site and it imports the skosspin ontology described above for its SPARQL Rules. (It imports a web version of the skosspin ontology and, because of its 62 meg size, the local copy of the actual AGROVOC thesaurus.)

With these two files imported, I opened TopBraid Composer's Problems view and clicked that view's "Refresh all problems of current TopBraid file" icon . A "Progress Information" message box told me that TopBraid Composer was "Checking SPIN constraints on skos:Concept", which took a few minutes because there were plenty to check.

After it finished checking, the Problems view said that there were Warnings and had a plus sign that I expanded to see the first few constraint violations:

(You may want to play with the column widths a bit, because the Location column is the one you really want to see.) Double-clicking anywhere on a specific warning line shows the details about that concept on the Resource Form, like this:

If you don't see the little yellow warning symbols on the Resource Form that show where the problems are, click the little "Display constraint violation warnings" icon at the top of TopBraid Composer . In this case, a bit of scrolling down when viewing concept c_1002 shows that the same term appears as both Hungarian skos:altLabel and skos:prefLabel property values for this concept.

The Problems view above only shows the first 100 of the 2,972 warnings. Clicking the context menu white triangle in the view's upper-right lets you configure Preferences for the view so that you can reset the number of items to be displayed; I had no problem with a figure of 3,000.

Now that we know what constraint has been violated, there are other ways to list the concepts that need to be corrected. For example, the following query in TopBraid Composer's SPARQL view lists the identifiers that have the same label for these two properties:

SELECT ?s ?label
WHERE {
  ?s skos:prefLabel ?label . 
  ?s skos:altLabel ?label . 
}

Once you execute this query you can export its results to a file and then use that as a reference point to address the issues in the vocabulary.

We could have started off by executing this query on the AGROVOC SKOS file, but remember, at the time we didn't know which constraints had been violated. Using SPARQL Rules as extra metadata for class definitions helps to automate the identification of quality issues with the data, letting us use other techniques to focus on the specific problems and how to fix them.

Are your SKOS vocabularies violating any of the six extra constraints described in the SKOS specification? As I mentioned, this all works with the free version of TopBraid Composer, which is available for Windows, Mac, and Linux, so you can try it yourself to find out. With the TopBraid Composer Maestro edition, you can build applications for end users who can then maintain these vocabularies with a web-based interface instead of using TopBraid Composer. The user interface for notification of constraint violations then becomes one of the many things you can customize to the needs of your end users. You can also define new constraints around your own shop's business rules—for example, to require that all labels begin with an upper-case letter—and you can set TopBraid Composer or your application to highlight these violations as soon as they occur, instead of checking in batch mode like I did above.

To summarize, what we've seen here is really just a starting point, and there are all kinds of places where you can take it to improve the consistency and value of your vocabularies.

How to: use the SPARQLMotion debugger

2010-07-27T06:38:00.000-07:00

Since release 3.3, TopBraid Composer has included an interactive debugger for SPARQLMotion scripts that can make your development go much faster. TopQuadrant VP of Product Development Holger Knublauch wrote a nice overview of the debugger's features in his blog; below is a short hands-on tutorial in the use of the debugger.

We're going to put together a short SPARQLMotion script with a problem that prevents it from running properly. Experienced SPARQLMotion developers may notice the problem when we add it, but leave it in there—we'll see how the SPARQLMotion debugger helps us locate it.

Creating our script

Our script will prompt the user for a string to search for and then list the first and last names of everyone in the sample kennedy data file included with TopBraid Composer who has that string as part of their first name.

First, create a new SPARQLMotion file and give it a base URI of http://www.topquadrant.com/debugdemo and a file name of debugdemo. (For a more detailed description of how to create a SPARQLMotion file, see the "Creating and running a SPARQLMotion script" chapter of the TopBraid Application Development Quickstart Guide.)

Once your new file is open, create the script by selecting Create SPARQLMotion Script from the TopBraid Composer Scripts menu, and for its initial module type select sml:EnterLiteral. This is the module that will ask the user to enter a query string, and you'll find it under sml:ImportModules -> sml:ImportFromVariousModules. For the name of your new module instance, enter GetQueryString. Click the OK button and you'll see your script on the SPARQLMotion workspace with its one module.

Double-click the GetQueryString module icon and enter queryString as the sm:outputVariable value and Enter query string as the sml:text value. When the script runs, this module will display a message box that prompts the user with this message. After the user enters a value into the field on that message box, the value will be stored in the variable queryString for use by later modules in the script. You are now done configuring this module.

There are two more modules to add. Drag an Import RDF From Workspace module from the Import from Local section of the SPARQLMotion palette onto your workspace and name it GetKennedysData. Double-click it to configure it, and set the sml:sourceFilePath value to /TopBraid/Examples/kennedys.rdf. For the third and final module, drag an Apply Construct module from the RDF Processing section of the Palette and name it FindMatchingNames. Set its sml:replace value to true so that the module passes along only the triples that it creates. Set its sml:constructQuery property to the following query:

PREFIX k: <http://topbraid.org/examples/kennedys#>
CONSTRUCT {   
 ?s k:firstName ?first .   
 ?s k:lastName ?last .
} WHERE {   
 ?s k:firstName ?first .   
 ?s k:lastName ?last .   
 FILTER regex(?first, ?searchString, "i") .
}

(Instead of setting the prefix for the kennedys data at the beginning of this query, you could also do it on the script file's Ontology Overview screen.) This query passes along the firstName and lastName triples for anyone in the data file who has the value of the searchString as part of their firstName value. (The "i" provided as the third parameter to the regex() function tells it to do a case-insensitive comparison.)

Connect up the three modules so that the script look like this:

To test the script, select the FindMatchingNames module and click the green arrow at the top of your workspace to run all the modules up to the selected one. When the GetQueryString module displays a message box asking you for a query string value, enter Carol.When the script finishes running and you see the SPARQLMotion Script Executed message box, make sure that the Display result triples checkbox is checked before continuing so that you can see what data the script found.

Although the kennedys data includes a Caroline and a Carolyn, you won't see any triples in the SPARQLMotion Results view when the script is finished running. Let's use the debugger to find out why.

Using the debugger

The small blue circle icon at the top of your SPARQLMotion workspace toggles whether the selected icon is a debug breakpoint. On the SPARQLMotion workspace, make sure that your FindMatchingNames module is selected and then click that small blue circle. This will add a blue circle to the module icon to indicate that it is now a breakpoint. You can set as many modules you like as breakpoints, but for this exercise we'll just set this one.

Click the green arrow to run the script again, enter "Carol" as the query string. When the script reaches the module with the breakpoint, it displays the debugger window:

The left part of the window has the Execution Plan, which lists the modules in the order that they will be executed, with a check mark next to each module that's been set as a break point. As you debug, you're free to check and uncheck any of these. The bottom of the window displays the arguments to the module and their values, and the top shows three tabs: Variables, Query, and Input Graph. Let's start with the last tab: click Input Graph.

On the tab's Input Graph panel, click the plus sign, and you'll see the that kennedys data is definitely being provided as input to the FindMatchingNames module. Click that line, as shown below, and you'll see some of its data appear in the lower panel. Scrolling down there shows that CarolineKennedy and CarolynBessette are in the data being passed along, so we can't blame our script's retrieval of data for its inability to show the expected results.

Now select the debugger window's Query tab. This lets you execute test queries on the data passed to the module. Because FindMatchingNames is an Apply Construct module, the default query shown on the Query tab is a SELECT version of the CONSTRUCT query that you entered when you created this module.

You can enter any SELECT query you want in in this panel of the debugger window and run it without affecting the state of the running script. Commenting and uncommenting lines of this query and then re-running it is a particularly valuable technique for exploring what information is available to your SPARQLMotion script at this point in its execution and what your application logic has done with that data.

Perhaps the problem is in the query's FILTER expression. Add a pound sign at the beginning of that line to comment it out, and click the green triangle in the upper-right to run the SELECT query shown in the debugger window. You'll see plenty of s, first, and last values appear in the lower panel, including Caroline Kennedy and Carolyn Bessette, so it looks like a problem in the FILTER expression prevented the query from passing along the requested data. The ?first variable is clearly being set properly, because we saw plenty of output when we commented out this line, so maybe the problem is with the ?searchString variable referenced in the FILTER line's regex() function call.

Click the debugger window's Variables tab to display it, and you'll see the problem: the GetQueryString module stored the entered value in a variable called queryString, and the query's FILTER expression was checking values against a non-existent searchString variable. (In a real debugging session, the Variables tab is probably the first one you'd check, which is why it displays first.) Go back to the Query tab, uncomment the FILTER line, change ?searchString to ?queryString, and click the green arrow. You should now see Caroline Kennedy and Carolyn Bessette and no one else show up under the query in lower part of the debugger window.

At this point, you've only fixed the debugger window's temporary query used to explore the workings of the script, and the application itself still needs to be fixed. Click theContinue button to resume execution of the script past the breakpoint. If there were more modules and any of them were set as break points, the Continue button would stop at each one and display the debugger window, but your script has no more breakpoints. (If there were more modules after FindMatchingNames module, the Step Into button would execute them one at a time so that you could review the three debugger tabs for information about those modules as they executed.) Once the script completes, change ?searchString to ?queryString in the FindMatchingNames module's query, click the debugger breakpoint icon while the FindMatchingNames icon is selected to turn off its breakpoint indicator, and run the query again with the same "Carol" input string. It should run with the expected results appearing in theSPARQLMotion Results view.

Debugging your own applications

Although we ran this SPARQLMotion script from within the SPARQLMotion editor, you can still set breakpoints and check all the same information about a SPARQLMotion script that is invoked from somewhere else—for example, from a TopBraid Ensemble Application or a from web service. (This assumes that the script is running under the TopBraid Live Personal Edition Server included with the TopBraid Composer Maestro Edition, which you use to develop these scripts.) This makes the debugger invaluable for just about all kinds of TopBraid development, and you'll find more uses for it as you use it more. Again, review Holger's blog posting for additional ideas.

How to: Publish your Linked Data with TopBraid Live SPARQL Endpoints

2010-05-07T08:36:00.000-07:00

SPARQL endpoints are an increasingly popular way to expose linked data. Invoking SPARQL Endpoints from TopBraid Composer's SPARQL view was the subject of a previous TQ blog on SPARQL Endpoints.In this entry we will discuss how TopBraid Live can be used to implement a SPARQL Endpoint using TopBraid Live. SPARQL Endpoints are Web services that conform to the SPARQL protocol. SPARQL queries are passed to a URL where a SPARQL service processes the query and returns results in a defined XML format. A number of SPARQL Endpoints exist for Web data (see the W3C list of current SPARQL Endpoints) and have become important sources for linked data.

A SPARQL Endpoint service implementation is packaged with TopBraid Live and is available out-of-the box for both TopBraid Live Personal Server (TopBraid Composer-ME running on localhost:8083), and TopBraid Live Enterprise Server (for more information, see TBL Home page). Creating a SPARQL Endpoint for your data is therefore an easy three-step process:

Load the model you wish to query into your TBL/TBC-ME workspace.
Use the GRAPH SPARQL keyword to access any named graph in the workspace.
Send a SPARQL query in the query string of a url that access the TBL SPARQL endpoint.

For example, if you have TBC-ME running, the TopBraid Live Personal Server is automatically available. Open a browser window and enter the following URL:

http://localhost:8083/tbl/sparql?query=SELECT DISTINCT ?p WHERE {GRAPH <http://topbraid.org/countries> {?s ?p ?o} }

This URL passes a query string that is applied to the specified graph, the countries.owl example included in the TopBraid library. The query is passed to TopBraid Live and executed using TBL's SPARQL engine. The results are converted to the SPARQL Endpoint format and returned via HTTP. The above URL specifies the TBL Personal Server (via TBC-ME's localhost:8083) as the endpoint. If you have TopBraid Live Enterprise Edition running on a server, just substitute the server address for your Enterprise server.

To further explore the ease of creating SPARQL Endpoints with TopBraid Live,
click here to access a page that defines an HTML form that submits a query to the TBL Personal Server SPARQL Endpoint. Copy and paste the following queries that use some of the example models included in the TopBraid library.

This query finds all countries and their abbreviations from the countries model in TopBraid/Examples:

# Get all countries and abbreviations from countries model
PREFIX countries: <http://topbraid.org/countries#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?name ?abbrev
WHERE
{ GRAPH <http://topbraid.org/countries>
  {  ?country a countries:Country .
     ?country rdfs:label ?name .
     ?country countries:abbreviation ?abbrev .
  }
}

This query finds all children of Joseph Kennedy from the kennedys model in TopBraid/Examples:

# Find Joe Kennedy's children in kennedys model
PREFIX k: <http://topbraid.org/examples/kennedys#>
SELECT ?cname
WHERE
{ GRAPH <http://topbraid.org/examples/kennedys>
  {  k:JosephKennedy k:child ?child .
     ?child k:name ?cname .
  }
}

Again, substitute your Live server address for "localhost:8083" in the action tag of the HTML file to apply queries to your Live server.

Using SPIN functions in SPARQL Endpoints

TopBraid SPARQLMotion Functions and user-defined SPIN functions registered in a Live workspace can also be used in SPARQL Endpoint queries. For example, the following query uses the TopBraid SPARQLMotion Function smf:if() to compute the age of all persons at death or their current age using the example kennedys model. Instead of returning variable bindings via SELECT, this query returns a RDF graph via CONSTRUCT. Since the graph is in RDF/XML format, the file returned by the endpoint can easily be imported into existing RDF/OWL models.

# infer age at death or age as of 2010
PREFIX k: <http://topbraid.org/examples/kennedys#>
PREFIX smf: <http://topbraid.org/sparqlmotionfunctions#>
CONSTRUCT {?person k:age ?age}
WHERE
{ GRAPH <http://topbraid.org/examples/kennedys>
  {  ?person k:birthYear ?byear .
     OPTIONAL {?person k:deathYear ?dyear}
     LET (?age := smf:if(bound(?dyear), ?dyear-?byear, 2010-?byear))
  }
}

Note that the age computation is hardcoded for 2010. A SPARQL query that returns the current year can be defined with a few statements. An example is shown in the kennedysSPIN model in the TopBraid Library, see TopBraid/Examples/kennedysSPIN.rdf in the Composer workspace. If you look at the SPIN function getCurrentYear (defined as a subclass of spin:Functions, which is a subclass of spin:Modules), it finds the current year as the first four characters returned in xsd:dateTime format returned from the function afn:now().

Instead of copying this code into the query, let's register this as a SPIN function so it can be called by any model in the workspace, including SPARQL Endpoints. Do the following:

Re-name the file kennedysSPIN.rdf to kennedysSPIN.spin.rdf. Adding the .spin extension registers all of the SPIN functions in this model with the workspace, allowing SPIN functions to be called without importing or opening the files.

From the TBC-ME menu, select Scripts > Refresh/Display SPARQLMotion functions... This will register the functions for the current session. When Live or Composer is started, the system will scan the files in the workspace for .spin files and register all functions. The extra step is needed here only if the file name was changed without stopping the Composer session. A Deploy (Export... Deploy in Composer) to a Live server will automatically refresh scripts.

Now try the same query with the following changes:

# infer age at death or age from current year
PREFIX k: <http://topbraid.org/examples/kennedys#>
PREFIX smf: <http://topbraid.org/sparqlmotionfunctions#>
PREFIX kspin: <http://topbraid.org/examples/kennedysSPIN#>
CONSTRUCT {?person k:age ?age}
WHERE
{ GRAPH <http://topbraid.org/examples/kennedys>
  {  ?person k:birthYear ?byear .
     OPTIONAL {?person k:deathYear ?dyear}
     LET (?age := smf:if(bound(?dyear), ?dyear-?byear, kspin:getCurrentYear()-?byear))
  }
}

Note the use of the user-defined SPIN function getCurrentYear(). This feature can be used to call any SPIN function including those that are defined by SPARQLMotion scripts. This raises the potential of using SPARQL endpoints for a wide range of processing capabilities, including importing models from outside of a Live workspace, processing triples before querying, applying queries to inference results, integrating models from different file types, and other kinds of SPARQL and RDFS/OWL processing. For example, a SPARQL Endpoint request could call a SPARQLMotion script that runs standard RDFS or OWL inferences before submitting the query, thus returning results from both inferred and asserted triples.

Advanced SPARQL Protocol: Federated SPARQL Queries

The SPARQL SERVICE keyword sends a query to remote service endpoint. Since TopBraid Live supports the SERVICE keyword, SPARQL endpoint queries to TopBraid Live can call other SPARQL Endpoints! Try the following query in the example query form.


PREFIX k: <http://topbraid.org/examples/kennedys#>
PREFIX smf: <http://topbraid.org/sparqlmotionfunctions#>
CONSTRUCT {?child k:birthDate ?birthdate}
WHERE
{ GRAPH <http://topbraid.org/examples/kennedys>
  {  k:RoseFitzgerald k:child ?child .
     ?child k:firstName ?fname .
     ?child k:lastName ?lname .
     ?child k:gender k:female .
     ?child k:spouse ?spouse .
     ?spouse k:lastName ?slname .
     LET (?dbpRsc := smf:buildURI("http://dbpedia.org/resource/{?fname}_{?lname}_{?slname}"))

     SERVICE <http://dbpedia.org/sparql>
     {  ?dbpRsc <http://dbpedia.org/ontology/Person/birthDate> ?birthdate .
     } .
  }
}

This query is applied to the kennedys example model to query for female children of Rose Fitzgerald and sends a query to the DBPedia SPARQL Endpoint to find their birth dates. The buildURI() function will generate a URI that is known in DBPedia, such as <http://dbpedia.org/resource/Eunice_Kennedy_Shriver>. The results from DBPedia bind the birth date to ?birthdate, which is returned in the TopBraid Live SPARQL endpoint response. As long as DBPedia is up and running, the result federates data from two SPARQL Endpoints, realizing the potential of linked data sources.

Conclusions

SPARQL endpoints are a complement to TopBraid Live's ability to create
RESTful Web services. While Web services are more flexible, allowing data to be returned in any text-based format, SPARQL endpoints can be used in a variety of applications expecting SPARQL result sets in an XML format. TopBraid Live significantly improves on existing SPARQL Endpoints with capabilities to federate queries and design functions and scripts that process data for external usage.

These examples demonstrate the power of TopBraid Live as an RDF back-end. Using a straightforward HTML form, one can access to full power of TopBraid Live and advanced SPARQL queries. These examples can be directly applied against the Personal Server version of TopBraid Live, packaged in TopBraid Composer-Maestro Edition (TBC-ME), which is freely available for a 30-day trial. TopBraid Live Enterprise Edition is deployed as a Tomcat servlet for Web-enabled access. For more information, see the TopBraid Live web page.