Monday, December 2, 2013

TopQuadrant supports PhUSE Semantic technology Initiative creating RDF Representations of CDISC Standards for Clinical Trial Information.

Recently, PhUSE and CDISC announced the completion of Phase I of the FDA/PhUSE Semantic Technology Working Group Project. The PhUSE Semantic Technology Working Group aims to investigate how formal semantic standards can support the clinical and non-clinical trial data life cycle from protocol to submission. This deliverable includes a draft set of existing CDISC standards represented in RDF.

In this stage of the project, the focus is on describing meta-models and models that describe and allow flexible use of the standards. This important first step was created out of an industry regulator collaboration initiated through PhUSE and supported by many CDISC volunteers who originally developed these foundational standards.

The ADaM representation of this project was co-led by Phil Ashworth, semantic solution architect at TopQuadrant, with Josephine Gough and contributions from Nate Freimark, Dave Jordan, Kirsten Langendorf and Mitra Rocca. ADaM is the last standard in the food chain when it comes to presenting the results of a clinical trial for revision by the authorities. Designed to standardize the way that analysis results are stored and thus presented to authorities, ADaM provides standards for statistical analyses to reduce and eliminate any processing required by the reviewer. 

According to Phil, the current diverse set of standards leave the understanding of data and data relationships more open to interpretation. This initiative will create a more efficient, accurate way of representing and submitting data to the FDA and allowing data to be analyzed based on a common model where the meaning is well understood. For more details on Phil’s involvement in this project, read this wiki on ADaM in RDF standards.

In the next stage of this project, PhUSE and CDISC will take a closer look at demonstrating the capture of information based on these initial models and then submitting to the FDA. Additional information about this project can be found on the PhUSE wiki.

Through our collaboration with PhUSE and CDISC, we hope to enable quicker decision-making for life science professionals. At TopQuadrant, we understand they need to simply and readily bring data together and view, explore and analyze it.

Friday, November 22, 2013

Event Recap: Robert Coyne presented at MANAGE VARIETY—create VALUE from Big Data

Gartner named semantic technologies one of the 10 most important trends in 2013 and beyond. When it comes to Big Data, semantic technologies deliver on the variety aspect of the three Vs of Big Data (Volume, Velocity, and Variety) by enabling the right data to be processed in the correct way. During the Gartner Symposium ITxpo in Barcelona, TopQuadrant’s partner Computas held a satellite event, MANAGE VARIETY—create VALUE from Big Data, on November 10th to share how customers are getting value by managing the variety of their Big Data.

Robert Coyne, a co-founder of TopQuadrant, presented on the company’s vision for semantic information ecosystems and showcased several case studies where Semantic Web Technology is a core element in their solutions. As Robert told the MANAGE VARIETY audience, semantic ecosystem solutions are evolutionary, allowing organizations to build a modern information infrastructure incrementally with capabilities that let them adapt and capitalize on rapidly changing scenarios, have better access to information resources, discover new opportunities, and inform better decision making.

Attendees also learned more about TopQuadrant’s partnership with Computas, which is aimed at delivering efficient semantic solutions in Europe. Current projects include delivering IT services for data exchange and collaboration in the Oil and Gas industry and an ontology management and semantic enrichment infrastructure for the Organization for Economic Co-operation and Development (OECD), whose Knowledge Information Management (KIM) program aims to establish an integrated strategy and framework for managing and delivering information and knowledge.

TopQuadrant is working with Computas to fulfill the vision of the Exploration & Production Information Management Association (EPIM) to build a shared suite of knowledge based-applications for operators on the Norwegian Continental Shelf using semantic technology and industry-standard domain concepts. Computas and TopQuadrant are engaged with EPIM to develop EnvironmentHub, a solution using TopBraid capabilities and tools to provide a flexible semantic web standards-based platform for environmental reporting and data integration. TopQuadrant previously developed the ReportingHub system, a standards-based information-exchange solution that uses semantic web standards to store, query and analyze exploration and production data.

Want to learn more about these initiatives with EPIM and Computas or semantic information ecosystems? Leave us a comment and we’ll fill you in.

Friday, October 11, 2013

TopQuadrant’s Bob Ducharme speaking at Taxonomy Boot Camp 2013

Bob DuCharme is speaking on the upcoming "Semantic Search" panel during Taxonomy Boot Camp on November 6th in Washington D.C. Bob’s topic, "Enhancing Searches With Taxonomies and Semantic Technology," will cover how taxonomies based on semantic technology can help with focusing and augmenting searches, correcting terms, disambiguation and using other vocabulary metadata sources to increase the quality of your search results. Attendees will learn more about how TopBraid EVN, acting as a vocabulary server, can improve a search engine's ability to get more accurate and relevant results, and the role of semantic technology standards in making this happen.

Taxonomy Boot Camp is the premiere event for people who manage vocabularies. Bob has spoken at this conference before, noting “it's been interesting to see ideas about the value of semantic technology spread out from thought leaders in this field to a wider audience.”

In addition to speaking, Bob is looking forward to attending presentations at the conference to learn more about the latest techniques for managing controlled vocabularies–specifically, how people use these vocabularies to get more value out of other information assets (for example, by aiding search engines). According to Bob, “talking with attendees also brings me up to date about the latest challenges facing taxonomists, and this provides great input for new TopBraid EVN features."

Are you attending the show? If so, drop us a line and don’t miss out on Bob and the Semantic Search panel from 3:15 p.m. to 4:00 p.m on November 6th!

Wednesday, May 29, 2013

Looking forward to Semtechbiz in San Francisco next week

The Semantic Technology & Business series of conferences has held successful events in London, Berlin, New York City, and Washington D.C. in the last few years, but the annual one in San Francisco is the big one. We're a silver sponsor of next week's event, where we'll be looking forward to showing everyone what we've been up to and learning more about what everyone else has been doing with standards-based semantic technology.

In the exposition hall, we'll be at booth 116, right by the exhibit hall entrance. In our booth, we'll be showing TopBraid Life Sciences Insight, a Logical Data Warehouse aimed at people in the Life Sciences industry, and TopBraid Enterprise Vocabulary Net, our standards-based multi-user vocabulary management solution. We'll also be happy to talk about any of our other products and projects.

We'll be giving four talks at the conference:

Stop by our booth or one of our talks to say hi and to tell us about what you've been doing with semantic technologies and what you hope to do. We love the opportunity that this conference gives us to learn the latest about what people are doing with semantic web standards.

Friday, April 5, 2013

Introducing TopBraid Life Sciences Insight—at Bio-IT World 2013

TB LSI and Bio-IT logos We're looking forward very much to showing TopBraid Life Sciences Insight (TopBraid LSI) at the Bio-IT World conference and expo next week in Boston.

Through our research, we've found efforts in drug discovery, clinical trial research, and other data-intensive life sciences tasks are often hampered by the need for more efficient federated queries across silos, bad alignment of related data, and dependence on expensive, inflexible tools.

Working with experts in the field, we've developed TopBraid LSI as a Logical or Virtual Data Warehouse: coordinated views on multiple data sources that let you query and use these data sources without actually loading them into a single data warehouse. A web-based interface lets users identify alignments between different data sources without requiring them to know the standards used to store and leverage these alignments, and then, using an approach similar to Map-Reduce, queries can be efficiently distributed across the data sources to return federated answer sets. You can learn more about how TopBraid LSI works and how it can help life sciences professionals from our new white paper (PDF) on it.

TopBraid LSI has also been selected as a finalist for Bio-IT's Best of Show award in the Informatics Tools & Data Category, so we're also looking forward to showing it to the judges. And, we're taking part in the conference's New Product Showcase, which has a very interesting mix of cutting edge research and IT tools for the life sciences field. If you'll be at Bio-IT World, stop by and see us at Booth 323, where you can find out more about TopBraid Life Sciences Insight.

Thursday, July 26, 2012

Who needs SKOS-XL? Maybe no one

The SKOS-XL extension to the W3C’s SKOS standard for vocabulary management adds flexibility in how you track concept names, but it adds complexity and potential confusion that are rarely, if ever, worth it. 

What is the appeal of SKOS-XL? Information modelers wanting to separate concepts (so-called conceptual ideas) from terms (the names people use for concepts) often base their thinking on the model of Semiotic Triangle or Peirce’s Triangle. Sometimes also called a triangle of meaning, this philosophy distinguishes a concept that exists in a human mind—a thought—from how it is referred to and from a symbol that evokes it. 

A referent is understood as a word. A symbol is typically explained as a pictorial depiction. A key aspect of this theory is its focus on human cognition. It postulates that there can be no name or identity intrinsic to a concept as it only exists as a thought in a human mind.

The challenge of applying this thinking to information modeling is that, ultimately, in information modeling we must commit everything to paper, electronic or otherwise. Thus, every concept must have an identity and a name. As a result, a separate model for concepts and terms where terms themselves have identity, names, relationships and are tracked separately from concepts is typically an over-complication that does not deliver practical value. For one thing, even explaining to a business audience a difference between a concept and a term is not simple. Colloquially, these words are often used interchangeably. Once explained, distinguishing and keeping track of these on an ongoing basis, when both “concepts” and “terms” are more often than not named using the same words, can be mind boggling. 

SKOS takes a simpler and what we believe to be a more practical approach to information modeling. It provides a way to describe concepts by giving each one:
  •  A globally unique identity
  • A preferred label that is unique for a human language (such as English or German) within a scope of a particular “concept scheme”. It is called skos:prefLabel.
  • Any number of alternative labels called skos:altLabel. Concept's alternative labels in a given language should not be the same as its preferred label in this language.
  • Whatever other properties (attributes and relationships) are deemed necessary:
  • SKOS supplies some standard relationships such as skos:broader, skos:related and skos:exactMatch and a number of annotations that are thought to be universally useful such as skos:definition and skos:editorialNote.
  • Users of SKOS are free to add properties specific to their domain. For example, when using SKOS to describe different companies, a user may want to add a stock ticker field.

If needed, metadata about labels can be captured without giving them identity of their own. TopBraid EVN is a good example of a tool that offers this capability. Besides the language part, such metadata is typically not just about the label itself, but about its relationship to the concept—for example, who said that this is a preferred label for this concept and when. All the relationships are between concepts, not between the labels.

The W3C has published an optional extension to SKOS called SKOS-XL (SKOS eXtension for Labels) that accommodates those who want to give separate identity to concepts and terms. It does not use the word “term”—presumably, because informally terms are often understood as concepts and vice versa. Instead it introduces a class Label explained as a “lexical entity”. While the extension is small with only one new class and five new properties, its implications are far-reaching. As a result, providing tool support for SKOS-XL is considerably more complex than for SKOS proper. 

Following the SKOS-XL model, labels are not strings as in SKOS proper, but RDF resources with their own identity. Each label can have only one literal form; this is where the actual text string (the name) goes. The literal form is not one per Label per language as with SKOS’s constraint for assigning preferred labels, but one per Label. So, to accommodate different languages, different label resources must be created. At the same time, there can be multiple Label resources with the same literal form (for example, two different Label resources with the literal form “Mouse”). Even a simple SKOS-XL vocabulary is considerably bulkier than its SKOS alternative. Since SKOS-XL format takes far more more space, storage, import/export and performance of search and query can become an issue for larger vocabularies.

Concepts are connected to Labels by relationships that indicate preferred (skosxl:prefLabel) and alternative Labels (skosxl:altLabel) for a Concept. There is no cardinality restrictions on these relationships–that is, a Concept can be linked to multiple Labels using skosxl:prefLabel link. Labels can be linked to each other using skosxl:labelRelation relationship. These links are separate from the relationships between Concepts.

Direct use of SKOS properties that associate label strings with Concepts can be tricky when using SKOS-XL. According to SKOS-XL label strings for Concepts are derived using rules such as:

The property chain (skosxl:prefLabel, skosxl:literalForm) is a sub-property of skos:prefLabel

This means that if there is a Label ex:Label1 with literal form “love” and a Concept ex:Concept1 where ex:Concept1 connects to ex:Label1 using a skosxl:prefLabel relationship, we can conclude that ex:Concept1 has a skos:prefLabel  value of “love”.  Since simultaneously keeping the integrity of directly entered and inferred values is problematic, any tool supporting SKOS-XL must protect the user from directly entering label strings for Concepts. This makes it difficult to use the same tool to edit for SKOS and SKOS-XL vocabularies, especially if users want to intermix different vocabulary formats. 

Furthermore, a user will see the same text label for different entities. This will be not only because different Labels can have the same literal forms, but also because the Concept resources “inherit” string labels from the associated Label resources. This can easily lead to confusing results.

There are also various integrity clashes between SKOS and SKOS-XL. For example:

1.       Two different preferred labels in the same language

ex:Concept1skosxl:prefLabelex:Label1; skosxl:prefLabelex:Label2.
ex:Label1 skosxl:literalForm "love"@en .
ex:Label2 skosxl:literalForm "adoration"@en .

This is not “wrong” according to SKOS-XL because a Concept can be connected to multiple Labels using the skosxl:prefLabel relationship. But, it means that ex:Concept1has skos:prefLabel values of both "love"@en and "adoration"@en. This is a violation of SKOS constraint S14, which prohibits a concept from having more than one preferred label string in a given language.
      Clash between preferred and alternative labels

ex:Concept1skosxl:prefLabel ex:Label1; skosxl:altLabel ex:Label2
ex:Label1 skosxl:literalForm "love"@en .
ex:Label2 skosxl:literalForm "love"@en .

Again, this is not “wrong” according to SKOS-XL because different Labels can have the same literal form, but it’s a problem for SKOS because it implies identical English language preferred and alternative label strings for ex:Concept1.

Without a doubt, these issues play a role in the fact that while the use and tool support for SKOS is growing, there are few if any tools for SKOS-XL or published SKOS-XL vocabularies. An even more important factor is the lack of compelling business value that would justify SKOS-XL complexity. Having talked to a wide range of users working on business vocabularies, we have yet to hear a use case that cannot be supported by SKOS alone.

Does SKOS-XL look like the only viable approach to your vocabulary management needs? Let’s discuss it—maybe we can help you find a simpler solution.

Tuesday, May 29, 2012

New white paper: Controlled vocabularies, taxonomies, and thesauruses (and ontologies)

What's the difference between a controlled vocabulary, a taxonomy, a thesaurus, and an ontology? We've found people using some of these terms interchangeably, and it's difficult to find reliable official definitions for each.
We've come up with some unofficial working definitions that have served us well when discussing different kinds of controlled vocabularies and their associated metadata. Our new white paper Controlled vocabularies, taxonomies, and thesauruses (and ontologies) describes each of these and which TopQuadrant products give customers the control they need over the kinds of vocabularies and metadata that they're working with.
We hope that this short white paper gives you a better idea of why people build, use, and store controlled vocabularies and the advantages that standards bring to this work.

Roget's Thesaurus

This is a blog by TopQuadrant, developers of the TopBraid Suite, created to support the pursuit of our ongoing mission - to explode strange semantic myths, to seek out new models that support a new generation of dynamic business applications, to boldly integrate data that no one has integrated before.