VOYAGES OF THE SEMANTIC ENTERPRISE: 2009

Wednesday, December 16, 2009

RDFa on topquadrant.com

RDFa is the W3C standard for embedding RDF triples in arbitrary XML using attributes. Its most popular use is in HTML, because it makes it easy to add machine-readable versions of a web page's information with minimal new markup.

You could add this markup to just about any web page, but it's especially useful on pages with information that is good for redistribution and can be described with popular vocabularies. We've just added RDFa to our products, management, and contact pages using the FOAF, Dublin Core, vcard, and GoodRelations vocabularies.

GoodRelations is the newest of these vocabularies, and it's getting popular quickly. A recent Semantic Technology Blog posting described a talk at the 2009 Search Engine Strategies conference in which BestBuy's Lead Web Development Engineer described how "GoodRelations + RDFa improved the rank of the respective pages in Google tremendously." In addition to improving page rank when selling products, RDFa makes it easier to share other kinds of information for use by applications on web pages where the data is already present for human consumption, whether it's research data, government data, or airline flight schedules.

An increasing number of tools are available for extracting triples from web pages with embedded RDFa, and the TopBraid Suite has supported this for a while. A TopBraid application can define a particular internet or intranet web page as an RDFa data source so that each time you run the application it will check for the latest set of triples in that page and then incorporate that data with the other data and logic used for that application.

And now, applications like this can pull data about TopQuadrant's products, people, and places from our HTML web pages!

Monday, November 2, 2009

What's new in TopBraid Suite 3.2

3.2 is a minor release of TopBraid Suite, but it adds more than just background improvements such as faster performance and improvements to both memory management and 64-bit support. Each of TopBraid Suite's components includes new features that will ease the development of Semantic Web applications and offer a wider range of features to include in those applications. The following lists some highlights:

TopBraid Composer

TopBraid Composer makes it easy to develop and integrate Semantic Web data models, SPARQL Inferencing Notation (SPIN) inferencing rules, and SPARQLMotion data processing pipelines into a standards-compliant application. New TopBraid Composer features in release 3.2 include:

OWL 2 support, with features such as property chain axioms, user-defined datatypes, and OWL 2 class axioms. Read more at VP of Product Development Holger Knublach's blog posting OWL 2 Support in TopBraid Composer.
New SPIN features such as spin:fix to implement suggestions for constraint violation fixes, easier control over rule execution, and the ability to distribute rules across devices. An important new improvement is support for user-defined magic properties in SPIN, allowing users to define rules with backward chaining. You can define a magic property, or property function, using SPARQL and then use it as the predicate in another SPARQL expression to compute new values based on the data being queried. Holger's Magic Properties with SPIN blog posting has some good examples.
A new SPARQL debugger that lets you set break points, display intermediate variable bindings, collect statistics, and see the internal algebra of SPARQL queries as they run. Holger's recent blog posting on this topic walks the reader through the underlying logic, the use of the debugger view, and the use of profiling.
More options for the display of long lists in the Properties and Classes view, making it easier to find what you're looking for quickly.
Easier packaging of a project developed with TopBraid Composer for deployment to a server running TopBraid Live for use by multiple users.
Support for the latest major release of the Eclipse platform, release 3.5.

TopBraid Ensemble

With TopBraid Ensemble, you can create web-based user interfaces for the applications you develop using Composer. End users, without using Composer, can interact with your application's data using a wide choice of graphical interface components that you customize for them. New Ensemble features in 3.2 include:

Support for pop-up windows, so that you can build wizards that lead your application's users through a series of steps.
Support for multi-page applications, letting you spread the interface for more complex applications across a series of pages, as with a tabbed interface.
A new, customizable button component that lets your application's users trigger events with a mouse click.
A new SPARQL Relay component, which can listen for events and trigger SPARQLMotion scripts in the background of your application.

TopBraid Live

TopBraid Live lets you deploy your application for use by hundreds of users with the screens that you designed with Ensemble, with screens that were custom-built using Adobe Flex, or with an automated web services interface. New features in 3.2 include:

User and group management, including optional integration with LDAP roles.
Access Control Lists for projects, folders and graphs.
The ability to host an application as a SPARQL end point.
Significant performance improvements when used with Oracle.

For a complete list of new features in TopBraid Ensemble release 3.2, see the 3.2 Release Notes.

Wednesday, October 14, 2009

SPIN Tutorial Available

I recently wrote in my personal blog about how, since joining TopQuadrant, I've grown to appreciate how well SPARQL can serve as a rules language. SPARQL Inferencing Notation, or SPIN, lets you associate rules and constraints expressed in SPARQL with classes of triples. While you don't need to use TopQuadrant products to take advantage of SPIN, they sure make it much easier, especially if you want to use those rules as part of an application. I just finished writing a tutorial (pdf) on how to implement SPIN rules and constraints with your models using TopBraid Composer, and except for one optional detail of the tutorial, it all works with the free edition, so it's available for anyone with a Mac or Windows machine to try.

Using a small collection of data about service contracts and materials purchases, the tutorial walks you through the creation of:

your own functions, written in SPARQL and returning values of whatever type you like
inferencing: the generation of new triples based on other data (in the case of the tutorial, the generation of ISO 8601 yyyy-mm-dd format invoiceDate values from "mm/dd/yy" date values stored in the original data)
constructors: the automatic generation of a postingDate value when a new MaterialsPurchase or ServiceContract instance is created
constraints: setting up the system to alert the user to unpaid materials purchases that are more than 90 days old or unpaid service contracts that are more than 60 days old. Instead of using a lot of redundant code to achieve these two different but similar goals, the tutorial shows how to define a reusable template with a SPARQL query and pass parameters to it (in this case, the numbers 60 or 90) when using the template.

I hope the tutorial demonstrates the potential connections between SPIN technology and real-world business issues to its readers, as well as the ease of implementing it all with TopBraid Composer.

Monday, September 7, 2009

Creating and Managing Metadata about RDF statements

Metadata about RDF statements can be quite useful and even required for a number of purposes. For example, let’s consider questions one may have about a statement “Washington, DC is a capital of the United States”:

What is the provenance of this statement – who said this, when did they say this?
What is the temporal scope of this statement – when did DC became a capital of the US, is it still the capital?
What is the access control for this statement – who can see it, who can change it?

Interest in this topic is evidenced by Paul Hermans' blog summarizing recent discussions on approaches to implementing such metadata http://www.proxml.be/users/paul/weblog/9d47d/A_must_read__Temporal_Scope_for_RDF_Triples.html.

We often get asked what is TopQuadrant’s recommended approach to supporting statements about statement – for versioning, for governance, etc? TopBraid Suite is fully flexible in this respect and can be used to implement any number of approaches. However, in our work, we found an approach based on RDF reification to be particularly useful.

TopBraid Composer provides a Change History view where one can see every added and deleted triples. The view is based on a small ontology called change.owl and available as part of TopBraid library, It contains a class Change and a handful of properties – added, deleted, graph and timestamp. The class Change is described as follows: “A change to an RDF Graph, encapsulating lists of added and/or deleted rdf:Statements. Additional metadata such as a timeStamp (or author or whatever) can be added”,

We often extend this model for particular applications to add the metadata required by the app, for example, author or scope. TopBraid Composer inserts change statements automatically. Every time there is a triple is added or deleted, there is a new change statement. In web applications deployed under TopBraid Live, we use a SPARQLMotion scripts that start with sml:TrackChanges module.

Sml:TrackChanges is used to implement services that shall be executed as a side effect of a change to an RDF model. In TopBraid, any script containing an instance of this module will be executed as part of each change. The output of this module is using the http://topbraid.org/change ontology, with triples describing the changes that have happened.

In other words, TopBraid listens for the changes and, when a change happens, it will trigger execution of a script(s) containing TrackChanges module. One can provide a filter to specify what type of changes a script should react to. And, as with any SPARQLMotion script, what happens when the script is triggered is up to the script designer. For example, in the Enterprise Vocabulary Management solution we use this approach to stamp every change with the author id and a timestamp and also to trigger the governance processes – send e-mails about the changes to the appropriate parties, promote approved changes, etc.

Saturday, July 25, 2009

Linked Data and what it really takes to create it

Recently there has been a number of popular discussions (some heated) on whether RDF is necessary for the Linked Data. For example:

http://cloudofdata.com/2009/07/does-linked-data-need-rdf/

http://www.semanticsincorporated.com/2009/07/if-linked-data-is-a-brand-it-has-big-problems-to-address.html

I believe that for the Linked Data to happen, we need a standard for representing semantic information. URIs are just an addressing scheme and do not carry any semantics.

Once one moves beyond the high level marketing statements and starts to consider specific technical details of how the data linking could work, it quickly becomes apparent that, at minimum, RDF (or something very much like RDF) is required. This is often lost on bloggers without a sufficiently deep grasp of the underlying technologies and results in somewhat unfocused and meandering discussions. Questions on how exactly URIs alone can bring the data together are kept unanswered and glossed over in a rhetoric of high level statements.

Unfortunately, this makes it hard to conduct substantive discussions. Perhaps, this is why I found the recent post on Data Reconciliation Strategies and Their Impact on the Web of Data more insightful and useful than discussions above. It makes a number of valid points and raises important questions. Having said this, I disagree with author's conclusion:

"I cannot help but think that any effort tasked to promote and increase such density will look and feel just like Freebase: carefully selecting datasets and painfully trying to reconcile as much data as possible right when it enters the system and entice a community of volunteers to maintain it, curate it and clean up eventual mistakes."

There are the following "connectivity" problems in the query proposed by the author - “the height of all towers located in Paris”:

1. Identity of a tower

2. Identity of predicate height

3. Identity of Paris

I believe it is important to encourage people to provide mappings from their vocabulary/schema to some established vocabularies. These will (and are) appear over time. Foaf:Person is a good example. Now, you do not have to and may not be able to use it directly in your data, but as you expose your data to the Linked Data Cloud (LOD) it would be a good practice to provide such mappings. Otherwise, someone else will have to do it.

This is where dbpedia becomes very useful. It has a URI for height. NASA/TopQuadrant units ontology (to be shortly released here http://www.oegov.org/) has that as well. We linked units of measure, quantities and dimensions to dbpedia. If a broader community finds this work useful, it may be that over time this ontology becomes the de-facto standard and people will link to it. But even today there is a possibility to connect between it and other relevant ontologies/datasets - through dbpedia.

The same premise applies to Paris. There are some authoritative sources that have emerged - the geonames and, again, dbpedia. If they both cross reference each other, then linking to either one should do the trick.

Tower is a trickier problem because it is a type and can be identified in a number of ways - may be you have a table (class) Tower. May be you have a column (predicate) indicating the type. May be you have a table of controlled values for building types and there is a link to the appropriate resource (this will be the case, if something is more than one type - let's say both, a tower and a lighthouse). A similar issue may apply to the height if you reify it to provide a unit of measure, for example, or a measurement date.

Some mechanisms are needed to describe less simplistic mappings. In our work SPARQL-based SPIN rules have proven to be an effective standard-based solution for more complex mappings. Overall, I believe there are only a handful of patterns that will constitute 80% of the cases. A good number of these are described above.

The key benefit of using such an approach is that people can start with their own vocabularies and then, at some later stage, add the links to dbpedia. Or they don't, and someone else does it. This freedom is lost if a system (such as Freebase) forces users to do the mapping up front. With the units ontology it was quite easy to add the mappings, and likewise it will be for most other existing data models.

Thursday, July 9, 2009

Data Transformation using Semantic Web Standards

I have created the presentation below in response to a recent discussion about converting XML to RDF.

A person I was talking to assumed that there was a mapping process one needed to go through before a translation of XML (or relational databases, spreadsheets, etc.) into RDF could take place.

Indeed mapping often happens, but it happens after translation. First the non RDF information is represented in RDF. Any mappings that are created are also captured in RDF/OWL - either by using constructs such as rdfs:subClassOf and owl:sameAs or, for more complex mappings, by using SPIN (SPARQL rules).

I am always surprised how often people find this approach novel and need time to understand what is going on. I guess this is because RDF is so flexible - it is quite easy to represent any data structures in RDF. And because both, data and models are represented in RDF, once imported, structural transformations are very straightforward. RDF is built for change. Other data models do not have this advantage. Hence, they require mappings before importing external data.

The presentation below explains in detail how RDF import and transformations are done including a step by step example. The benefits of the approach are also discussed.

Data Transformation using Semantic Web Standards

View more presentations from irenetq.

Monday, July 6, 2009

Presentations from the Second TopBraid User Group Meeting - SemTech2009

The Second TopBraid Suite Open User Group Meeting was conducted at the Semantic Technology Conference 2009 in San Jose, CA, Thursday, June 18, 2009, 09:45 AM - 12:45 PM. The Agenda for the meeting and links to the presentations used by the speakers is given below.

User Group Meeting Agenda and Presentations
•9:45am - 9:55am
Welcome (Robert Coyne, TopQuadrant)

•9:55am - 10:30am
Keynote User Talk 1:
"Using SPARQLMotion to Execute Task Networks among Distributed Cyber Physical Systems",
Cyber-physical systems define networks of interactive sensors and actuators, grounded in the physical world. Such systems require a high degree of interoperation to achieve the system's objective. SPARQLMotion, a model-driven scripting language, has been used to achieve that level of interoperation. In addition, this approach results in greater operational redundancy among networks through distributed control. Three recent extensions to SPARQLMotion will be shared along with a motivating example for their use.
John T. Carson, Software Engineer, Lockheed Martin Aeronautics
(slides)

•10:30am - 11:00am
TopBraid - New Capabilities - Sampler 1
Presentation/demo of key aspects of TopQuadrant's Enterprise Vocabulary Management (EVM) Solution Package (just announced. Customers across a range of industries are building EVM solutions on top of the TopBraid Suite platform. In response to customer requirements, TopQuadrant is offering an EVM Solution Package of commonly needed, high-value components including models, scripts (e.g. for workflow management with approvals), and application configuration templates.
Irene Polikoff, CEO and Co-founder, TopQuadrant, Inc.
(slides)

•11:00am - 11:35am
Keynote User Talk 2:
"Managing Your Online Social Graph with TopBraid Composer"
In this short user experience session Marco will show you how he makes use of TopBraid Composer to keep track of the Semantic Web Meetup community (http://www.swnyc.org) and how he manages events, rsvps and security. In addition we will take a look at some of the editing and reporting features readily available and build into TopBraid Composer to visualize community data for evaluation and the identification of trends.
Marco Neumann, Information Scientist and CEO & Founder, KONA
(slides)

•11:35am - 12:05pm
TopBraid - New Capabilities - Sampler 2
For QA of ontologies, many users have interest in knowing key statistics or metrics regarding their models, such as the number of properties that reference each class. A simple, flexible, convenient to use solution will be demonstrated within TopBraid. By importing a special ontology, and running provided scripts, desired metrics fields get populated, and a reporting script is used to pretty-print the results to html.
Ralph Hodgson, CTO and Co-founder, TopQuadrant, Inc.
(slides)

•12:05pm - 12:40pm
User Feedback Session (conducted through lunch),
moderated by Tom Fitzgerald, Director of Sales, TopQuadrant
Notes from the session:
1. Education –tremendous demand for online training materials. Customers recommend we provide more training material (tutorials, videos, examples). Training is a key to success for many TopBraid customers.
2. Help Menu – context search and more examples –need a “Getting Started” tutorial. The tutorial would include how to use the Help Menu effectively.
3. User’s Forum – “great asset” – would like to see it expanded to include wiki format and possibly all of our products. Discsussion of how the forum can be expanded and provide more extensive support, e.g., examples of customer applications.
4. Site Spin – Tim Smith – new capability
5. Mind Mapping integration – Tim Smith talked about the value of providing a easy tool for business users to map out model requirements. The mapping could then be integrated into TBC.
6. Customer Use Cases – request to expand website to include tab for Use Cases and customer scroll

TopQuadrant appreciated and valued this input and is responding quickly, in particular by making the extensive Help provided in TopBraid Composer more accessible, and in providing more support assets through our web site. See links on the TopBraid Composer page, Support and other product pages, and this recent post. for further details and links. Additionally, two new slide sets are being provided for download the give an extensive tour of TopBraid Composer capabilities, and in depth details on importing data into RDF and transforming data with TopBraid utilities and power tools.

•12:40pm - 12:45pm
Announcements and Closing Remarks
Robert Coyne, TopQuadrant

(See also Reflections from the First TopBraid Suite User Group Meeting, May 21, 2008,San Jose, CA)

Saturday, July 4, 2009

The Meaning of "semantic" - post II

After writing this earlier post, I've decided to expand on my thoughts about the intersection between the Semantic Web and text mining, natural language processing, etc.

The write up was ready just in time for TopQuadrant's submission to the monthly Semantic Universe column, so I've ended up publishing it there, but here is a link http://www.semanticuniverse.com/articles-using-semantic-web-standards-improved-text-mining.html.

RDFa - a good way to provide access to your data?

I have been thinking about RDFa recently. With the announcement from Google and continued support from Yahoo/Search Monkey there is an increased buzz around RDFa. So, why RDFa and what is it good for?

TopBraid have had support for RDFa as long as I can remember – at least two years now. A user can point to a page with RDFa markups and TopBraid will import them. I remember getting existed about this and wanting to mark up all our web pages with RDF. This did not happen. At least partially because RDFa’s interaction with HTML formatting tags is pretty funky – the pages become harder to maintain. Then, there was also a persistent question on why do it at all. If one wants to provide data in RDF, why not do exactly that?

Each web page on a site, could have a corresponding N3 page. There is a standard tag in HTML that can be used to refer to related information. It can be used to point to the N3 page and/or the naming convention could be the same as for the given HTML page, but with the N3 extension. In TopQuadran’t case this would be an only alternative solution since the information on our web site is not in a database (at least not yet, this is changing). If it was in a database, then a way to go would be to provide a SPARQL endpoint.
I looked at the RDFa presentation by Mark Birbeck at the Semantic Technologies conference. I did not get a chance to attend – 7:30 AM is way too early for me , but I browsed through slides. Here is an example of RDFa markup (from the presentation):

This says that there is a dc:creator relationship between the header “RDFa: Now everyone can have an API” and a string “Mark Birbeck”.

Good, but we have not given a URI to the thing we are talking about – a presentation entitled “RDFa: Now everyone can have an API”.

Absence of the URI makes it somewhat hard to talk about the presentation. Any RDFa crawler/importer would have to generate some kind of URI for it. If we used the URI to begin with, we could have simply put the triple {:RDFa_presentation dc:creator “Mark Birbeck” } into an RDF file.

One issue may be the maintenance – having 2 files to maintain. But, embedding RDFa into HTML arguably creates even worse maintenance problems. And, if RDFa markup was automatically generated (most serious publishing happens by generation, not hand crafting), then the maintenance issue is not there – it is easier to generate RDF file in addition to HTML file that it is to generate and insert markups. Not to mention that automatic generation means there is a database that could be exposed through SPARQL.

There must be something I am missing here. While I could not attend Mark Birbeck’s presentation, I just discovered he is giving a webinar on July 12^th: http://skillsmatter.com/event/ajax-ria/the-possibilities-of-rdfa-and-the-semantic-web/ng-94 . I think I will sign up and see if some of my questions get answered.

I’ll report what I learn here, so stay tuned.

Wednesday, June 24, 2009

Reflections on the User Feedback Panel or More Help, please ...

One striking difference between user input this year compared to the last, is that this year we received considerably smaller number of new functionality requests. It seems that our users are still digesting all the new features that came out in 3.0 and in the ramp up to 3.0 - SPARQLMotion, SPIN, the new TopBraid Ensemble with the end-user composable web applications, etc.

User feedback is very important to us as a key input to the product development plans. Of course, many requirements come on a regular basis through the web user forum, but in-person interaction adds different dimensions. It is immediately clear if a requirement voiced by one user is shared by the rest. You can also quickly explore requirements in more depth. For example, at the last year User Group meeting, we received repeated requests for the support of version control and governance. It took some time to fully understand the requirements and design the solution to address them, but after close interactions with several users, version control is now available as part of the new Enterprise Vocabulary Management solution.

This year most of the requests were for better documentation and educational resources - a SPIN tutorial, more videos, example applications for TBE, etc. It seems that the richer the product suite becomes, the more we need to work on providing resources explaining how to use it.

TopBraid Composer already has a pretty good help facility. However, we've learned that not all its features are well understood by the users. After returning from the conference we have created a new page dedicated to Help http://www.topquadrant.com/products/ComposerHelp.html. It is accessible from a number of places including the download page.

Other requests already in the works include a Powerpoint tour of TBC features, example TBE application for download with, probably, a video explaining how it was developed and, yes, a SPIN tutorial (once I get a chance). Help is being updated as well, in preparation for the 3.1 release.

Thursday, June 18, 2009

TopBraid User Group Meeting at SemTech2009

TopQuadrant conducted its second TopBraid Open User Group Meeting at the Semantic Technology Conference 2009, on Thurs., June 18, San Jose, CA.

Topics presented and discussed included:

• User Talk 1: Using SPARQLMotion to Execute Task Networks among Distributed Cyber Physical Systems

• TopBraid - New Capabilities Sampler 1: Key aspects of TopQuadrant's Enterprise Vocabulary Management (EVMS) Solution package (just announced)

• User Talk 2: Managing Your Online Social Graph with TopBraid Composer

• TopBraid - New Capabilities - Sampler 2: Ontology Metrics & Report Generation for Quality Assurance of Knowledge Models

• User Feedback Session

More details to follow!

The Meaning of "semantic"

Going through the exhibit hall of the Semantic Technologies conference one quickly notices that there are two types of vendors:

Providers of the middleware and tools that leverage Semantic Web standards (RDF, RDFS, OWL and SPARQL) to support a variety of business applications.This is an area where TopQuadrant plays.
Providers of the search and text mining products.

The word "semantic" is used when talking about both types of products. The reason for this is quite clear. Before W3C coined the term "Semantic Web", the word "semantics" was used to describe "smart software" capable of extracting some meaning from text. With the development of the Semantic Web standards, it is increasingly being used to describe semantics (schemas) of the data and to support integration across different sources and formats - databases, XML, spreadsheets, etc. As well as to support new, model driven ways to develop applications.

I wonder if this creates confusion. In fact, I know it does. At a recent customer meeting, we gave a two hour product presentation. There were many good questions afterward. I did not think there was any confusion until one of the attendees asked if our software had multi-lingual support. I explained that yes, language specific labels can be provided and information can then be displayed in a selected language. He looked puzzled, thought a little and then said "I am not talking about labels. Different languages have different semantics, like sentence structure, etc. How do you address that?"

I explained that TopBraid Suite does not directly provide text extraction. Instead, we integrate with software that has these capabilities. It can be a product like Calais which provides the extracted concepts directly in RDF or a product that provides results in XML. We can convert it to RDF in an automated way. Integration with the text mining software makes it possible to bring together (and, consequently, query) structured and unstructured information using a common RDF infrastructure.

With this, I wander if there should be two very distinct tracks at the conference? Is there some other space where these two different streams of technologies come together? If so, can the intersection be better explained and positioned?

Tuesday, June 16, 2009

Starfleet Day One

Well, not exactly day one... We have been on a journey for quite some time now. And a few of us at TopQuadrant have been blogging for awhile. However, until today we did not have a shared company blog. Our hope is to make the blog be a place for publishing tips on using the TopBraid Suite, discussing best practices for working with the Semantic Web standards, sharing thoughts and ideas, learning from each other and from our customers.

Today is not only the first day for the blog, but also the first day of the exhibit at the Semantic Technologies 2009. A busy day, many familiar faces, many new ones as well. Some reflections:

No one asked me today to explain RDF. Such a difference from just a year ago!
Almost no one asked me "what is this technology good for". Everyone I talked to had a pretty clear idea what problems they wanted to solve. Integration was probably the most common theme.
The simplest demos seem to be the most effective ones. We've build a few fairly elaborate demos, but there is only so much one can show and absorb at a busy conference. Luckily, the demos we have are pretty versatile. I plan to post at least one of the demo apps for everyone to download and try.

Tomorrow a good amount of demoing will be done by our partners - Oracle and CTG. Both of the partner demos are focusing on medical informatics.

There is clear evidence that people are ready to adopt the technologies. Many spoke of pilot projects and the need to make IT people comfortable with new infrastructure. For TQ this was good confirmation of our TopBraid Ensemble/Live approach and product direction.

At the same time, some of the things we take for granted in our product are not always known to the market. We will therefore need to do some more communication of the "out-of-the-box" capabilities of the suite.

Wednesday, December 16, 2009

RDFa on topquadrant.com

Monday, November 2, 2009