Saturday, July 4, 2009

RDFa - a good way to provide access to your data?

I have been thinking about RDFa recently. With the announcement from Google and continued support from Yahoo/Search Monkey there is an increased buzz around RDFa. So, why RDFa and what is it good for?

TopBraid have had support for RDFa as long as I can remember – at least two years now. A user can point to a page with RDFa markups and TopBraid will import them. I remember getting existed about this and wanting to mark up all our web pages with RDF. This did not happen. At least partially because RDFa’s interaction with HTML formatting tags is pretty funky – the pages become harder to maintain. Then, there was also a persistent question on why do it at all. If one wants to provide data in RDF, why not do exactly that?

Each web page on a site, could have a corresponding N3 page. There is a standard tag in HTML that can be used to refer to related information. It can be used to point to the N3 page and/or the naming convention could be the same as for the given HTML page, but with the N3 extension. In TopQuadran’t case this would be an only alternative solution since the information on our web site is not in a database (at least not yet, this is changing). If it was in a database, then a way to go would be to provide a SPARQL endpoint.
I looked at the RDFa presentation by Mark Birbeck at the Semantic Technologies conference. I did not get a chance to attend – 7:30 AM is way too early for me , but I browsed through slides. Here is an example of RDFa markup (from the presentation):

This says that there is a dc:creator relationship between the header “RDFa: Now everyone can have an API” and a string “Mark Birbeck”.

Good, but we have not given a URI to the thing we are talking about – a presentation entitled “RDFa: Now everyone can have an API”.

Absence of the URI makes it somewhat hard to talk about the presentation. Any RDFa crawler/importer would have to generate some kind of URI for it. If we used the URI to begin with, we could have simply put the triple {:RDFa_presentation dc:creator “Mark Birbeck” } into an RDF file.

One issue may be the maintenance – having 2 files to maintain. But, embedding RDFa into HTML arguably creates even worse maintenance problems. And, if RDFa markup was automatically generated (most serious publishing happens by generation, not hand crafting), then the maintenance issue is not there – it is easier to generate RDF file in addition to HTML file that it is to generate and insert markups. Not to mention that automatic generation means there is a database that could be exposed through SPARQL.

There must be something I am missing here. While I could not attend Mark Birbeck’s presentation, I just discovered he is giving a webinar on July 12th: . I think I will sign up and see if some of my questions get answered.

I’ll report what I learn here, so stay tuned.


Holger Knublauch said...

Irene, I think the key benefit of RDFa is that content markup can be attached to specific locations on the screen. For example, if you post a collection of Event instances (with lat/long/time etc) on some larger web page, then the user could simply mouse over a given Event's description to "see" the triples that are underneath. Browser plug-ins could then allow users to automatically extract those triples to their own calendar etc. But in general I agree with your observation that most other use cases are better handled by SPARQL end points or separate RDF files.

Mark Birbeck said...

Hi Irene,

I'll come back to the RDFa-related issues in another comment. For this one I'd like to just point out that:

(a) You've spelt my surname as "Birbank" in one place, even though you did manage to copy it correctly in other places.

(b) The free seminar is on July 13th, not July 12th.



Mark Birbeck said...

Hi Irene,

I'm not sure I would admit in public that the reason I don't follow RDFa is because I couldn't get out of bed. :)


Obviously it's possible to have separate RDF/XML or RDF/N3 pages, and of course you could set up a SPARQL end-point to deliver your triples in all sorts of formats.

But that misses the point about why RDFa is unique amongst the various RDF serialisations.

It's the only one that can be used within HTML, XHTML, SVG, and so on.

Which means that if you have a means of publishing HTML -- and in this day and age, who hasn't -- then you can now publish RDF.

By the way, you say: "it is easier to generate RDF file (sic) in addition to HTML file that (sic) it is to generate and insert markups". That's simply not true.

First, adding a the attributes to your existing HTML-generating pages in ASP.NET, Rails, PHP or whatever, is very easy; as Google pointed out in their SemTech presentation (that one was in the afternoon ;)), it took partners such as Yelp! around a day to add RDFa and Microformats to their sites. And my presentation showed how using RDFa in HTML made possible a project that would simply not have got off the ground otherwise.

But second, adding an additional channel in the way you describe involves fiddling with MIME types and .htaccess files. This is rarely possible for the average blogger, and tricky for the school or small business.

So to recap, no-one is saying that this is the only way to publish RDF. But for many people, it is going to be the only way that their data will be able to 'join' the semantic web, and that is crucial for the semweb community.

One last thing, I'm not sure why you think that there is no URI generated when parsing. Unless overridden with @about, all triples have as their subject the current document.

(Usually we would add an @about="#me" or something like that, to make things clearer in the context of information resources v. resources, but in simple examples we don't always do that.)



Irene Polikoff said...

Hi Mark.

I fixed the spelling of your name.

Originally, I thought your July 13 session was a webinar. I later understood it was an in person meeting. Alas, I am not in London, so we will need to wait for another opportunity to interact.

I do follow RDFa even if I still did not get a chance to attend your talk :) In fact, as far as I know, our product, TopBraid Composer, is the only tool that allows one to have an ontology-based approach to RDFa.

This is a blog by TopQuadrant, developers of the TopBraid Suite, created to support the pursuit of our ongoing mission - to explode strange semantic myths, to seek out new models that support a new generation of dynamic business applications, to boldly integrate data that no one has integrated before.