Linked Data Blog Aggregator

HyperDanja (Danny Ayers)links for 2008-11-18

HyperDanja (Danny Ayers)links for 2008-11-17

DBpedia BlogDBpedia version 3.2 released including the new DBpedia Ontology

we are happy to announce the release of DBpedia version 3.2.

The new knowledge base has been extracted from the October 2008 Wikipedia dumps. Compared to the last release, the new knowledge base provides three mayor improvements:

1. DBpedia Ontology

DBpedia now features a shallow, cross-domain ontology, which has been manually created based on the most commonly used infoboxes within Wikipedia. The ontology currently covers over 170 classes which form a subsumption hierarchy and have 940 properties. The ontology is instanciated by a new infobox data extraction method which is based on hand-generated mappings of Wikipedia infoboxes to the DBpedia ontology. The mappings define fine-granular rules on how to parse infobox values. The mappings also adjust weaknesses in the Wikipedia infobox system, like having different infoboxes for the same class (currently 350 Wikipedia templates are mapped to 170 ontology classes), using different property names for the same property (currently 2350 template properties are mapped to 940 ontology properties), and not having clearly defined datatypes for properties. Therefore, the instance data within the infobox ontology is much cleaner and better structured than the infobox data within the DBpedia infobox dataset which is generated using the old infobox extraction code. The DBpedia Ontology currently contains about 882.000 instances.

More information about the ontology is found at http://wiki.dbpedia.org/Ontology

2. RDF Links to Freebase

Freebase is an open-license database which provides data about million of things from various domains. Freebase has recently released an Linked Data interface to their content. As there is a big overlap between DBpedia and Freebase, we have added 2.4 million RDF links to DBpedia pointing at the corresponding things in Freebase. These links can be used to smush and fuse data about a thing from DBpedia and Freebase.

For more information about the Freebase links see
http://blog.dbpedia.org/2008/11/15/dbpedia-is-now-interlinked-with-freebase-links-to-opencyc-updated/

3. Cleaner Abstacts

Within the old DBpedia dataset it occurred that the abstracts for different languages contained Wikpedia markup and other strange characters. For the 3.2 release, we have improved DBpedia’s abstract extraction code which results in much cleaner abstracts that can safely be displayed in user interfaces.

Access the new DBpedia knowledge base 

The new DBpedia release can be downloaded from:

http://wiki.dbpedia.org/Downloads32

and is also available via the DBpedia SPARQL endpoint at

http://dbpedia.org/sparql

and via DBpedia’s Linked Data interface. Example URIs:

http://dbpedia.org/resource/Berlin
http://dbpedia.org/page/Oliver_Stone

Lots of thanks to everybody who contributed to the Dbpedia 3.2 release!

Especially:

1. Georgi Kobilarov (Freie Universität Berlin) who designed and implemented the new infobox extraction framework.
2. Anja Jentsch (Freie Universität Berlin) who contributed to implementing the new extraction framework and wrote the infobox to ontology class mappings.
3. Paul Kreis (Freie Universität Berlin) who improved the datatype extraction code.
4. Andreas Schultz (Freie Universität Berlin) for generating the Freebase to DBpedia RDF links.
5. Everybody at OpenLink Software for hosting DBpedia on a Virtuoso server and for providing the statistics about the new Dbpedia knowledge base.

Have fun with the new DBpedia knowledge base!

HyperDanja (Danny Ayers)links for 2008-11-16

HyperDanja (Danny Ayers)links for 2008-11-15

DBpedia BlogDBpedia is now interlinked with Freebase. Links to OpenCyc updated.

Freebase is an open-license database which provides data about million of things from various domains. Freebase has recently released an Linked Data interface to their content (See release note). As there is a big overlap between DBpedia and Freebase, we have added 2.4 million RDF links to DBpedia pointing at the corresponding things in Freebase. These links can be used to smush and fuse data about a thing from DBpedia and Freebase. For instance, you can use the Marbles Linked Data browser to view data about the Lord of the Rings from Freebase and DBpedia smushed together.

 We have also updated the the RDF links to OpenCyc, which allow you to use DBpedia instance data together with conceptual knowledge of OpenCyc.

Example Freebase Link

http://dbpedia.org/resource/Woody_Allen owl:sameAs  http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000004064f

Example Open Cyc Link

http://dbpedia.org/resource/Tetris owl:sameAs http://sw.opencyc.org/2008/06/10/concept/Mx4rv9-ZUpwpEbGdrcN5Y29ycA

The links are available via the DBpedia Linked Data interface and via SPARQL endpoint and can also be downloaded as single files:

HyperDanja (Danny Ayers)links for 2008-11-14

AI3:::Adaptive Information (Mike Bergman)Multi-part Federated Search Interview

Topics Range from the Deep Web to Semantic Web in this Search Luminaries Series

I’m pleased to wrap up a multi-part interview with the Federated Search Blog as part of their ongoing ‘Search Luminaries’ series. Sol Lederman, editor of the blog, does a thorough and comprehensive job!  Over the past month on every Friday, I have answered some 25 or so of his detailed questions.

Federated Search Blog was particularly interested in the deep Web, its discovery and size.  Many of the early questions deal with those themes.  However, by Part 4 things get a bit more current, with the topics shifting to the semantic Web, linked data and Zitgist.

Here are the links to the series:

To give you a flavor of the interview, here is an example of one of the questions (and probably my favorite):

20. Tim Berners-Lee, credited with inventing the World Wide Web, has been talking about the importance and value of the Semantic Web for years yet common folks don’t see much evidence of the Semantic Web gaining traction. Is there substance to the Semantic Web? What’s happening with it now and what does its future look like?

Wow, in 10,000 words or less?

No, actually, this is a very good question. As things go, I am a relative newbie to the semantic Web, only having studied and followed it closely since about 2005. I’m sure my perspective in coming later to the party may not be shared by those at the beginning, which dates to the mid-1990s as Berners-Lee’s vision naturally progressed from a Web of documents, as most of us currently know the Web, to a Web of data.

I think there is indeed incredibly important substance to the semantic Web. But, as I have written elsewhere, the semantic Web is more of a vision than a discernable point in time or a milestone.

The basic idea of the semantic Web is to shift the focus from documents to data. Give data a unique Web address. Characterize that data with rich metadata. Describe how things are related to one another so that relationships and connections can be traced. Provide defined structures for what these things and relationships “mean”; this is what provides the semantics, with the structures and their defined vocabularies known as “ontologies” (which in one analog can be seen as akin to a relational database schema).

As these structures and definitions get put in place, the Web itself then becomes the infrastructure for relating information from everywhere and anywhere on any given topic or subject. While this vision may sound grandiose, just think back to what the Web itself has done for us and documents over the past decade or so. This same architecture and infrastructure can and should be extended to the actual information in those documents, the data. And, oh, by the way, conventional databases can now join this party as well. The vision is very powerful and very cool.

Progress has indeed been slow. Many advocates fairly point to how long it takes to get standards in place and for a while people spoke of the “chicken-and-egg” problem of getting over the threshold of having enough structured data to consume to make it worthwhile to create the tools and applications and showcases that consume that data.

From my perspective, the early visions of the semantic Web were too abstract, a bit off perhaps. First, there was the whole idea of artificial intelligence and machines using the data as opposed to better ways for humans to draw use from the data at hand. The fundamental and exciting engine underneath the semantic Web — the RDF (Resource Description Framework) data model — was not initially treated on its own. It got admixed with XML that made understanding difficult and distinctions vague. There is and remains too much academia and not enough pragmatics driving the bus.

But that is changing and fast.

There is now an immediate and practical “flavor” of the semantic Web called linked data. It has three simple bases:

(1) RDF as the simple but adaptable data model that can represent any information — structured or unstructured — as the basic “triple” statement of subject-predicate-object. That sounds fancy, but just substitute verb for predicate and noun for subject and object. In other words: Dick sees Jane; or the ball is round. It sounds like a kindergartner reader, but that is how data can be easily represented and built up into more complex structures and stories

(2) Give all objects a unique Web identifier. Unique identifiers are common to any database; in linked data, we just make sure those identifiers conform to the same URIs we see constantly in the address bar of our Web browsers, and:

(3) Post and expose this stuff as accessible on the Web (namely, HTTP).

My company adds some essential “spice” to these flavors with respect to reference structures and concepts to give the information context, but these simple bases remain the foundation.

These are really not complex steps. They are really no different than the early phases of posting documents on the Web. Only now, we are exposing data.

More importantly, we can forget the chicken-and-egg problem. Each new data link we make brings value, in the similar way that adding a node to a network brings value according to Metcalfe’s Law. Only with linked data, we already have the nodes — the data — we are just establishing the link connections (the verbs, predicates or relations) to flesh out the network graph. Same principle, only our focus is now to connect what is there rather than to add more nodes. (Of course, adding more linked nodes helps as well!)

The absolutely amazing thing about our current circumstance as Web users is that we truly now have simple and readily deployable mechanisms available to finally overcome the decades of enterprise stovepipes. The whole answer is so simple it can be mistaken as snake oil when first presented and not inspected a bit.

As an industry accustomed to hype and cynical about so much of this, I only ask that your readers check out these assertions for themselves and suspend their normal and expected disbelief. For me, in a career of more than 30 years focusing on information and access, I feel like we finally now have the tools, data model and architecture at hand to actually achieve data interoperability.

Thanks again to Sol and Federated Search Blog for this opportunity.

DBTune BlogReuters OpenCalais joins the linked data cloud

Still more fancy linked data to play with - just a couple of weeks after Freebase announced that they publish linked data, OpenCalais just announced that they are going to publish linked data as well, by joining up the results of their entity extraction service to DBpedia URIs.

HyperDanja (Danny Ayers)links for 2008-11-13

HyperDanja (Danny Ayers)links for 2008-11-12

HyperDanja (Danny Ayers)Machine Admin - Ubuntu, dead HD, EeePC space, USB stick backups etc.

Some recent experiences that someone else might find handy.

A big tip to start with - whatever your hardware, get yourself a reasonably sized USB stick (say 2GB+) and create a live-bootable Ubuntu image on it (making 2 partitions seems a good idea - one for live Ubuntu and the rest for moving files around). On the Ubuntu (8.10) menus there’s System - Administration - Create Startup Disk.

Main Laptop

When I got back from my European fugue, the HD on my main machine (Dell Latitude D820) was broken, I could only boot to recovery on an old kernel. Bit annoying as I’d only taken the EeePC 900 travelling, and to my knowledge the Dell hadn’t had any bumps. No joy with fsck, on boot the auto-checking/repair was running for more than 48 hrs before I gave up.

I’d left a few GB free space on the Dell so attempted to make that the new boot partition & reinstall Ubuntu there. I don’t know whether it was due to the drive damage or as bug in gparted, but in creating the new partition it lost the table of existing partitions. A bit of googling suggested this is probably recoverable, but my priority was just to get a working machine again (I think all that’s not backed up elsewhere on that drive is a bunch of notes and a handful of photos). So I bought a new HD - I think it was 65 euro for 120GB. It was easy to replace, and (using Ubuntu-on-a-stick) getting a basic install back up was really quick. Downloading and installing the apps I use a lot took a lot longer.

But as it turned out, my core apps plus a handful of trivia only came to about 12GB, so I bought a 16GB USB stick (about 40 euro I think), formatted it to one big ext3 partition and used rsync (as root) to copy everything from the newly populated HD onto it.

Here’s what I’m using on the Dell (in /home/danny):

backup.sh

rsync -avh --delete --exclude-from '/home/danny/rsync_exclude.txt' /  /media/DellBackup/

The stick I named DellBackup, the script needs chmod 777 backup.sh (or similar) and I run it as root.

also:
rsync_exclude.txt

/home/danny/Desktop/downloads
/media
/dev
/var/run
/var/lockwin
/var/log
/var/spool
/var/backups
/tmp
/home/danny/Desktop/torrents

It’s no doubt well suboptimal, but seems to work ok.

EeePC

Ok, my first issue was buying the WinXP model, which I believe has smaller drives. Not sure if this was another mistake but I put the / mount on sda1 (the whole of the 4GB flash drive) and /home on sdb5 the whole of the 8GB flash drive - dunno why it went for ‘5′).

Yesterday I discovered I’d filled the sda1 partition, though there was a fair bit of space left on sdb5. While the blog post Moving /usr to another partition is very handy, it doesn’t help much when there isn’t unassigned space or a free partition. Fortunately it turns out that if you boot from an external drive (remember that Ubuntu Live stick?) it’s possible to resize & create partitions on the internal drives. So after ensuring I had backups of /usr and /home (see below) I resized the sdb5 (/home) partion and created another in the free space, which I mounted temporarily, copied the whole of /usr onto it (as root, cp -dpR /usr/* /usr_new/) and then edited /etc/fstab to add the following:

/dev/sdb6 /usr auto defaults,errors=remount-ro 0 1

btw, although I’ve yet to do all the EeePC optimizations that Google can suggest, these bits (in fstab) seemed a good idea:

tmpfs      /var/log        tmpfs        defaults           0    0
tmpfs      /tmp            tmpfs        defaults           0    0
tmpfs      /var/tmp        tmpfs        defaults           0    0

So they all get wiped on reboot.

Anyhow, backup-wise on the EeePC I did the same USB backup trick as above, the scripts this time being:
backup.sh

rsync -avh --delete --exclude-from '/home/danny/rsync_exclude.txt' / /media/eeepcbackup/

rsync_exclude.txt

/home/danny/Desktop/downloads
/media
/dev
/var/run
/var/lock
/var/log
/var/spool
/var/backups
/tmp
/proc
# add /sys ?

Additionally, the EeePC has a flash card socket that wasn’t really doing much, so I thought I might as well use that as well. So I bought a 4GB card and again formatted it to ext3, labeling it eeepchome . Another script:
backup-home.sh

rsync -avh --delete /home /media/eeepchome/
rsync -avh --delete /etc /media/eeepchome/

The card is mounted by default, so I think I’ll add that as a daily cron job.

Incidentally, the backup setup I aim for is (at least) two local copies of everything, plus one remote. I use svn for code, so that’s sorted (I have copies on both these laptops). Some time soon I’ll set up rysnc to my remote server for the laptops, more or less as above.

I’ve not had much luck with external USB HDs, the USB circuit on my first one broke not long after I bought it, so I pulled out the drive and stuck it into Caro’s machine. The one I’ve got now has a flaky interface, it’ll often drop the connection/not be recognised.

I’ve backed up most of the work-in-progress from my music workstation (currently Frankenstein WinXP desktop) onto that external drive, but once I’ve archived (and/or published) the user data from Caro’s (Frankenstein WinXP) desktop I’ll stack that up with big HDs (probably with RAID - is it 5 that’s the safest?) and use that as the home file/print server.

HyperDanja (Danny Ayers)links for 2008-11-11

HyperDanja (Danny Ayers)links for 2008-11-10

AI3:::Adaptive Information (Mike Bergman)Thinking ‘Inside the Box’ with Description Logics

Inside the Box

Linked Data Need Not Rediscover the Past; A Surprise in Every Box

A standard cliché of management consultants is the exhortation to think “outside the box.” Of course, what is meant by this is to question assumptions, to think differently, to look at problems from new perspectives.

With our recent release of the (linked open data) ‘LOD constellation‘ of linked data classes based around UMBEL, I have been fielding a lot of inquiries on what the relationship is of UMBEL to DBpedia. (See, for example, this current interview by the Semantic Web Company with me and Sören Auer of the DBpedia project.) This also fits into the ongoing distinction we have made in the UMBEL project between our subject concepts (classes) and named entities (instances).

What has actually most been helping my thinking is to get fully inside the box (or, rather, boxes, hehe). Let me explain.

The problem with urging outside-the-box thinking is that many of us do a less-than-stellar job of thinking inside the box. We often fail to realize the options and opportunities that are blatantly visible inside the box that could dramatically improve our chances of success.

Naomi Karten [1]

The Description Logics Underpinnings of the Semantic Web

Description logics are one of the key underpinnings to the semantic Web. They grew out of earlier frame-based logic systems from Marvin Minsky and also semantic networks; the term and discipline was first given definition in the 1980s by Ron Brachman, among many others [2].

Description logics (DL, most often expressed in the plural) are a logic semantics for knowledge representation (KR) systems based on first-order predicate logic (FOL). They are a kind of logical metalanguage that can help describe and determine (with various logic tests) the consistency, decidability and inferencing power of a given KR language. The semantic Web ontology languages, OWL Lite and OWL DL (which stands for description logics), are based on DL and were themselves outgrowths of earlier DL languages.

Description logics and their semantics traditionally split concepts and their relationships from the different treatment of individuals and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships.

Thus, the model is an abstraction of a concrete world where the concepts are interpreted as subsets of the domain as required by the TBox and where the membership of the individuals to concepts and their relationships with one another in terms of roles respect the assertions in the ABox.

Franz Baader and Werner Nutt [3]

The second split of individuals is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of individuals, the roles between individuals, and other assertions about individuals regarding their class membership with the TBox concepts. Both the TBox and ABox are consistent with set-theoretic principles.

TBox and ABox logic operations differ and their purposes differ. TBox operations are based more on inferencing and tracing or verifying class memberships in the hierarchy (that is, the structural placement or relation of objects in the structure). ABox operations are more rule-based and govern fact checking, instance checking, consistency checking, and the like [3]. ABox reasoning is generally more complex and at a larger scale than that for the TBox.

Early semantic Web systems tended to be very diligent about maintaining these “box” distinctions of purpose, logic and treatment. One might argue, as I do herein, that the usefulness and basis for these splits has been lost somewhat in our first implementations and publishing of linked data systems.

ABox and TBox Analogs in the Linked Data Web

Most of the semantic Web work at the beginning of this decade was pretty explicit about references to description logics and related inferencing engines and computational efficiency. Some of the early commercial semantic Web vendors are still very much focused on this space.

However, with the first release and emphasis on linked data about two years ago, the emphasis seemed to shift to the more pragmatic questions of actually posting and getting data out there. Best practices for cool URIs and publishing and linkage modes assumed prominence. The linking open data (LOD) movement began in earnest and gained mindshare. Of course, many in the DL and OWL development communities continued to discuss logic and inferencing, but now seemingly more as a separate camp to which the linked data tribe paid little heed.

The central hub of this linked data effort has been DBpedia and its pivotal place within the ‘LOD cloud.’ What is remarkable about the LOD cloud, however, is that it is almost entirely an ABox representation of the world and its instances. Starting from the core set of individual instances within Wikipedia, this cloud has now grown to many other sources and the central place for finding linked instance data. If one looks carefully at the LOD cloud and its linkages we can see the prevalence of instance-level relationships and attributes.

Linking Open Data’s “ABox”

Linking Open Data’s “TBox”

In fact, the LOD cloud diagram to upper right from the Wikipedia article on linked data has become the key visual metaphor for the movement. But, as noted, this view is almost exclusively one at the ABox instance level.

The UMBEL project began at roughly the same time and as a response to the release of DBpedia. My question in looking at the first data linked to DBpedia was, What is this content about? Sure, I might be able to find multiple records discussing Abraham Lincoln as a US president regarding attributes like birth date and a list of children, but where could I retrieve records about other presidents or, more broadly, other types of leaders such as prime ministers, kings or dictators?

The intuition was that the linked data and the various FOAF and other distributed instance records it was combining lacked a coherent reference structure of subject topics or concepts with which to describe content. The further intuition was that — while tagging systems and folksonomies would allow any and all users to describe this content with their own metadata — a framework for relating these various assignments to one another was still lacking.

In the nearly two years of development leading to the first beta release of UMBEL we have tried many analogies and metaphors to describe the basis of the 20,000 subject concept classes within UMBEL in relation to its role and other linked data initiatives. While many of those metaphors help visualize use and role, the more formal basis offered by description logics actually helps to most precisely cast UMBEL’s role. For example, in today’s interview with the Semantic Web Company, I note:

“. . . we have described UMBEL as a roadmap, or middleware, or a backbone, or a concept ontology, or an infocline, or a meta layer for metadata, and others. Today, what I tend to use, particularly in reference to DBpedia, is the TBox-ABox distinction in computer science and description logics. UMBEL is more of a class or structural and concept relationships schema — a TBox — while DBpedia is more of an an instance and entity layer with attributes — an ABox. I think they are pretty complementary. . . “

The resulting class level structure produced by UMBEL and its mappings to other classes within existing linked data enabled us to create and then publish the ‘LOD constellation‘, a complementary TBox structure to the linked data’s existing ABox one. This diagram to the lower right from the Wikipedia article on linked data now shows this complement.

Completeness and Sufficiency

Description logics have arisen to aid our creating and understanding of knowledge representation systems. From this basis, we can see that the first efforts of the linked data initiative have lacked context, the TBox. At a specific level, the question is not DBpedia v. UMBEL or cloud v. constellation. Both types of structure are required in order to complete the logical framework. By thinking inside the box — by paying attention to our logical underpinnings — we can see that both TBoxes and ABoxes are essential and complementary to creating a useful knowledge representation system.

By more explicitly adopting a description logics framework we can also better address many prior questions of context, coherence and sufficiency. These have been constant themes in my recent writings that I will be revisiting again through the helpful prism of formal description logics.

My interview today with Sören Auer also brought up some important points regarding context. As we have said in other venues, it is important that any TBox be available for context purposes. Whether that should be UMBEL or some other framework depends on the use case. As I noted in the interview, “UMBEL’s specific purpose is to provide a coherent framework for serious knowledge engineers looking to federate data.” Other uses may warrant other frameworks, and certainly not always UMBEL.

But, in any event, I have two cautions to the linked data community: 1) do not take the suggestion to have a reference framework of concepts as being equivalent to adopting a single ontology for the Web; think of any reference structure as an essential missing TBox, and not some call to adopt “one ontology to rule them all,” but 2) in adopting alternative frameworks, take care that whatever is designed or adopted itself be able to meet basic DL logic tests of consistency and coherence.

A Serendipitous Surprise

No one has yet elaborated the significant advantages from design, performance, architectural and flexibility perspectives from a distinct and explicit separation of TBox from ABox — but they’re there!

The many advantages from separate TBox and ABox frameworks are one serendipitous surprise coming from the early development of linked data. To my knowledge, no one has yet elaborated the significant advantages from design, performance, architectural and flexibility perspectives from a distinct and explicit separation of TBox from ABox. We believe these advantages to be substantial.

Realize, as distributed, UMBEL already has both TBox and ABox components. The TBox component is the lightweight UMBEL ontology, with its 20,000 subject concept classes and their hierarchical and other relationships. This component has a vocabulary (or teminology) for aiding the linking to external ontologies. The vocabulary is quite suitable for extension into new domains as well.

The ABox component is the named entities part of instances drawn from Wikipedia and the BBC’s John Peel sessions. Besides being of common, broad interest, these 1.5 million instances (per the current version) are included in the distribution to instantiate the ontology for demonstration and sandbox purposes.

So, UMBEL’s world is quite simple: subject concepts (SCs) and named entities (NEs). Subject concepts are the TBox and classes that define the structure and concept relationships. Named entities are the individual “things” in the world (some lower case such as animals or foods) and are the ABox of instances that populate this structure.

In our early efforts, we concentrated on the SC portion of UMBEL. Most recently, we have been concentrating on the NE component and its NE dictionaries. It was these investigations that drew us into an ABox perspective when looking at design options. The logic and rationale had been sitting there for some years, but it took cracking open the older textbooks to become reacquainted with it.

Once we again began looking inside the box, we began to see and enumerate some significant advantages to an explicit TBox-ABox design, as well as advantages for keeping these components distinct:

  • Easier understood ontologies with a very limited number of predicates
  • Lightweight schema design that is easy to extend
  • Ability to “triangulate” between separate SC (concept) and NE (instance) disambiguation approaches to improve overall precision and recall
  • Attribute information is kept separate from structural and conceptual relationships
  • Easy to swap in varied, multiple and private or public named entity dictionaries
  • Relatively easy extension of the schema ontology into specific domains
  • A design suitable to computation efficiency (rules for ABox; inference and standard reasoning for TBox), and
  • Assignment of NEs to distinct and disjoint “super types” [4] that can bring significant tableaux benefits to ABox reasoning.

We are still learning about these advantages and will document them further in pending work on coherence and named entity dictionary (NED) creation.

Thinking Inside the TBox and ABox

The two main points of this article have been to: 1) recognize the important intellectual legacy of description logics and how they can inform the linked data enterprise moving forward; and 2) be explicit about the functional and architectural splits of the TBox from the ABox. Making this split brings many advantages.

There will continue to be many design challenges as linked data proliferates and actually begins to play its role of aiding meaningful knowledge work. The grounding in description logics and the use of DL for testing alternative designs and approaches is a powerful addition to our toolkit.

Sometimes there are indeed many benefits to thinking inside the box.


[1] See http://www.stickyminds.com/sitewide.asp?Function=edetail&ObjectType=COL&ObjectId=8279.
[2] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2003. See Chapter 1. Sample chapters may be viewed from Enrico Franconi’s Description Logics course notes and tutorial at http://www.inf.unibz.it/~franconi/dl/course/, which is an excellent starting reference point on the subject.
[3] Ibid.; see Chapter 2.
[4] These are akin to the lexicographer supersenses that have been applied in WordNet for nouns and verbs (though only nouns are used here). See Massimiliano Ciaramita and Mark Johnson, 2003. Supersense Tagging of Unknown Nouns in WordNet, in Proceedings of the Conf. on Empirical Methods in Natural Language Processing, pp. 168173, 2003. See http://www.aclweb.org/anthology-new/W/W03/W03-1022.pdf.

HyperDanja (Danny Ayers)links for 2008-11-09

HyperDanja (Danny Ayers)links for 2008-11-08

Blog Data Space (Kingsley Idehen)Cool Fractal Animations

The Web is essentially Fractal in form, so when thinking about the Web it sometimes helps to have cool fractal animations at one's disposal.

Also, when you watch these fractal animations, you should ultimately understand why a centralized approach to "Web Presence" is inherently flawed :-)

HyperDanja (Danny Ayers)links for 2008-11-07

HyperDanja (Danny Ayers)links for 2008-11-06

AI3:::Adaptive Information (Mike Bergman)Virtuoso Now Supports UMBEL

UMBEL (Upper Mapping and Binding Exchange Layer)OpenLink Software

Version 5.0.9 includes UMBEL Class Lookups and Named Entity Extraction

I first wrote about OpenLink Software’s stellar suite of structured Web-related software back in April 2007, with a spotlight on Virtuoso, the company’s flagship ‘universal server’ product. As it has for years, OpenLink continues a steady drumbeat of new releases and extensions. The most recent version upgrade, 5.0.9, was announced today.

In the intervening period I have now personally had the chance to experience Virtuoso first hand, both as the standard hosting platform for Zitgist’s linked data products and services, and as the hosting environment for UMBEL’s various and growing Web services. I can state quite categorically that our ability to get things done fast with few resources depends critically on the unbelievable high-productivity platform that Virtuoso provides. (And, hehe, given our close relationship to OpenLink, we also get great responsiveness and technical support! :) Though, truthfully, OpenLink continues to amaze with its outreach and embrace of all of the important initiatives within the semantic Web community.)

I normally let these standard Virtuoso release announcements pass without comment. But today’s release v. 5.0.9 has an especially important feature from my parochial perspective: the first support for UMBEL.

Virtuoso Reprised

Just to refresh memories, OpenLink’s Virtuoso is a cross-platform universal server for SQL, XML, and RDF data, including data management. It includes a powerful virtual database engine, full-text indexing, native hosting of existing applications, Web Services (WS*) deployment platform, Web application server, and bridges to numerous existing programming languages. Now in version 5.0, Virtuoso is also offered in an open source version. The basic technical architecture of Virtuoso and its robust capabilities is:

Virtuoso Architecture
[Click on image for full-size pop-up]

From an RDF and linked data perspective, Virtuoso is the most scalable and fastest platform on the market. Critically from Zitgist’s perspective is Virtuoso’s more than 100 built-in RDF-izers (or “Sponger cartridges”) that address all major data formats, serializations, relational data and Web 2.0 APIs. But don’t take my word for it: Check out OpenLink’s impressive list of these cartridges and their various linkages throughout the linked data space.

UMBEL Support

The key aspect of the new UMBEL support in Virtuoso is its incorporation of UMBEL lookups and its use of Named Entity extraction into the RDF-izer cartridges. This is but the first of growing support anticipated for UMBEL.

Other New Features

In addition to UMBEL, this version 5.0.9 includes significant performance optimizations to the SQL Engine, SPARQL+RDF Engine, and the ODBC and JDBC drivers.

Other new features include:

  • An Excel mime-type output option in the SPARQL endpoint
  • Enhanced triple options for bif:contains plus new options for transitivity
  • New RDF-izer Cartridges for the Sponger RDF Middleware Layer
  • Support for very large HTTP client requests
  • A sparql-auth endpoint with digest authentication for using SPARUL via SPARQL Protocol
  • New commands for the Ubiquity Firefox plugin.

Finally, per usual, there are also minor bug-fixes:

  • Memory leaks
  • SQL query syntax handling
  • SPARQL ’select distinct’
  • XHTML and Javascript validation and other UI issues in the ODS application suite.

For More Details

For more details, you can see these Virtuoso release notes: https://sourceforge.net/project/shownotes.php?release_id=626647&group_id=161622

You can also get information on the Virtuoso open source edition or download it.

HyperDanja (Danny Ayers)links for 2008-11-05

Blog Data Space (Kingsley Idehen)Master Data Management (MDM) & RDF based Linked Data

It is getting clearer by the second that Master Data Management and RDF based Linked data are two realms separated by a common desire to provide "Entity Oriented Data Access" to heterogeneous data sources (within the enterprise and/or across the World Wide Web).

Here is how I see Linked Data providing tangible value to MDM tools vendors and users:

  1. Open access to Entities across MDM instances served up by different MDM solutions acting as Linked Data publishers (i.e., expose MDM Entities as RDF resources endowed with de-referencable URIs thereby enabling Hyperdata-style linking)
  2. Use of RDF-ization middleware to hook disparate data sources (SQL, XML, and other data sources) into existing MDM packages (i.e., the MDM solutions become consumers of RDF Linked Data).

Of course Virtuoso was designed and developed to deliver the above from day one (circa. 1998 re. the core and 2005 re. the use of RDF for the final mile) as depicted below:

Image

Related

Blog Data Space (Kingsley Idehen)YODA & the Data FORCE

The original design document (by TimBL) that lead to the WWW (*an important read*) was very clear about the need to create an "information space" that connects heterogeneous data sources. Unfortunately, in trying to create a moniker to distinguish one aspect of the Web (the Linked Document Web) from the part that was overlooked (the Linked Data Web), we ended up with a project code name that's fundamentally a misnomer in the form of: "The Semantic Web".

If we could just take "The Semantic Web" moniker for what it was -- a code name for an aspect of the Web -- and move on, things will get much clearer, fast!

Basically, what is/was the "Semantic Web" should really have been code named: ("You" Oriented Data Access) as a play on: Yoda's appreciation of the FORCE (Fact ORiented Connected Entities) -- the power of inter galactic, interlinked, structured data, fashioned by the World Wide Web courtesy of the HTTP protocol.

Image

As stated in a earlier post, the next phase of the Web is all about the magic of entity "You". The single most important item of reference to every Web user would be the Person Entity ID (URI). Just by remembering your Entity ID, you will have intelligent pathways across, and into, the FORCE that the Linked Data Web delivers. The quality of the pathways and increased density of the FORCE are the keys to high SDQ (tomorrows SEO). Thus, the SDQ of URIs will ultimately be the unit determinant of value to Web Users along the following personal lines:

  • Does your platform give me Identity (a URI) with high SDQ?
  • Do the Data Source Names (URIs) in your Data Spaces deliver high SDQ?

While most industry commentators continue to ponder and pontificate about what "The Semantic Web" is (unfortunately), the real thing (the "FORCE") is already here, and self-enhancing rapidly.

Assuming we now accept the FORCE is simply an RDF based Linked Data moniker, and that RDF Linked Data is all about the Web as a structured database, we should start to move our attention over to practical exploitation of this burgeoning global database, and in doing so we should not discard knowledge from the past such as the many great examples available gratis from the Relational Database realm. For instance, we should start paying attention to the discovery, development, and deployment of high level tools such as query builders, report writers, and intelligence oriented analytic tools, none of which should -- at first point of interaction -- expose raw RDF or the SPARQL query language. Along similar lines of thinking, we also need development environments and frameworks that are counterparts to Visual Studio, ACCESS, File Maker, and the like.

Related

Blog Data Space (Kingsley Idehen)Entity Oriented Data Access

Recent perturbations in Data Access and Data Management technology realms are clear signs of an imminent inflection. In a nutshell, the focus of data access is moving from the "Logical Level" (what you see if you've ever looked at a DBMS schema derived from an Entity Data Model) to the "Conceptual Level" (i.e., the Entity Model becoming concrete).

In recent times I've stumbled across Master Data Management (MDM) which is all about entities that provide holistic views of enterprise data (or what I call: Context Lenses). I've also stumbled across emerging tensions in the .NET realm between Linq to Entities and Linq to SQL, where in either case the fundamental issues comes down to the optimal paths "Conceptual Level Access" over the "Logical Logical Level" when dealing with data access in the .NET realm.

Strangely, the emerging realm of RDF Linked Data, MDM, and .NET's Entity Frameworks, remain strangely disconnected.

Another oddity is the obvious, but barely acknowledged, blurring of the lines between the "traditional enterprise employee" and the "individual Web netizen". The fusion between these entities is one of the most defining characteristics of how the Web is reshaping the data landscape.

At the current time, I tend to crystalize my data access world view under the moniker: YODA ("You" Oriented Data Access), based on the following:

  1. Entities are the new focal point of data access, management, and integration
  2. "You" are the entry point (Data Source Name) into this new realm of inter connected Entities that the Web exposes
  3. "You" the "Person" Entity is associated with many other "Things" such as "Organizations", "Other People", "Books", "Music", "Subject Matter" etc.
  4. "You" the "Person" needs Identity in this new global database, which is why "You" need to Identify "Yourself" using an an HTTP based Entity ID (aka. URI)
  5. When "You" have an ID for "Yourself" it becomes much easier for the essence of "You" to be discovered via the Web
  6. When "Others" have IDs for "Themselves" on the Web it becomes much easier for "You" to serendipitously discover or explicitly "Find" things on the Web.

Related

DBpedia BlogDBpedia Mobile won the 2nd prize of the Semantic Web Challenge 2008

We are happy to announce that DBpedia Mobile has won the 2nd prize of the Semantic Web Challenge at the 7th International Semantic Web Conference.

DBpedia Mobile is a location-aware client for the Semantic Web that can be used on an iPhone and other mobile devices. Based on the current GPS position of a mobile device, DBpedia Mobile renders a map indicating nearby locations from the DBpedia dataset. Starting from this map, the user can explore background information about his surroundings by navigating along data links into otherWeb data sources. DBpedia Mobile has been designed for the use case of a tourist exploring a city. As the application is not restricted to a xed set of data sources but can retrieve and display data from arbitrary Web data sources, DBpedia Mobile can also be employed within other use cases, including ones unforeseen by its developers. Besides accessing Web data, DBpedia Mobile also enables users to publish their current location, pictures and reviews to the Semantic Web so that they can be used by other Semantic Web applications. Instead of simply being tagged with geographical coordinates, published content is interlinked with a nearby DBpedia resource and thus contributes to the overall richness of the Geospatial Semantic Web.

For more information about DBpedia Mobile please refer to:

Blog Data Space (Kingsley Idehen)Virtuoso Installation Screencasts

As promised in an earlier post titled: Virtuoso, PHP 3.5 Runtime Hosting, phpBB3, and Linked Data, here are direct links to the "silent movies" mentioned in the past:

Virtuoso is an extremely compact product that is very easy to install. The ease of installation carries over to the PHP runtime when bound to Virtuoso.

HyperDanja (Danny Ayers)links for 2008-11-01

HyperDanja (Danny Ayers)Muzzle Velocity

A little video.

A couple of months ago I bought an air pistol with the hope of frightening away a vicious feral cat that had been taking chunks out of our cats. I was talked out of using it on the basis of “what if you get it in the eye“, and as it happens it would have been after the horse had bolted anyhow. But I’ve found it great fun for target shooting - I’m now inspired to try archery. (Yeah, ok, and it really brings out the 14 yr-old : you should see what happens to a cheese triangle, pure CSI).

Anyhow it’s also quite an interesting piece of engineering, so the natural impulse was to measure stuff. I twittered a few weeks ago for suggestions for testing muzzle velocity using household equipment. Quoll suggested (something like) firing through a disk spinning at a known speed, i.e. on a drill. Probably feasible, but I couldn’t come up with a good approach. The other day it occurred to me it should be possible to do using acoustics - record the sound as the pellet passes through things a measured distance apart. Perfect for a little diversion on a Saturday afternoon.

As it turned out the way I tried this had the impulses too close together to identify visually (overlaid with room reverb). I’ve not tried running FFT or anything over the sound, but I suspect it’s way too distorted. However I got rather a nice percussion effect, so played with Audacity and a sequencer a bit and made the backing track - no other sounds included.

In case anyone fancies trying a bit of analysis, here’s the original audio

Blog Data Space (Kingsley Idehen)Welcoming Freebase to the Linked Data Web

Finally! That's all I can say re. Freebase :-) They've now plugged their database and their community driven data curation efforts into the burgeoning Linked Data Web.

Here are some examples of how we distill Entities (People, Places, Music, and other things) from Freebase (X)HTML pages (meaning: we don't have to start from RDF information resources as data sources for the eventual RDF Linked Data we generate):

Tip: Install our OpenLink Data Explorer extension for Firefox. Once installed, simply browse through Freebase, and whenever you encounter a page about something of interest, simply use the following sequences to distill (via the Page Description feature) the entities from the page you are reading:

  • CTRL-Click (Mac OS X)
  • Right+Click (Windows & Linux)

Related

DBTune BlogSPARQLing a funk legend

I just came across this awesome blog post from Kurt. It starts from a real music question (he saw Maceo Parker live, and wanted to know if he wrote one of the song he played in the encore: Pass the Peas), and finds an answer to it using Semantic Web technologies, in particular SPARQL.

Great stuff!

DBTune BlogFreebase does linked data!

Just a small post, live from ISWC: Freebase does linked data!

You can try it there, and you can try this instance, for example.

Freebase linked data

Added to the wonderful David Huynh's Parallax, that's a lot of great news coming from the other side of the Atlantic :-)

Now, to see whether their linked data actually use the Web! Do they link to other web identifiers, available outside Freebase?

I just noticed something weird, also: the read/write permissions are attached to the tracks/films/whatever resources, instead of being attached to the RDF document itself.

AI3:::Adaptive Information (Mike Bergman)WOA! So RESTful it is UMBELievable!

It's UMBELievable!

UMBEL’s New Web Services Embrace a Full Web-Oriented Architecture

I recently wrote about WOA (Web-oriented architecture), a term coined by Nick Gall, and how it represented a natural marriage between RESTful Web services and RESTful linked data. There was, of course, a method behind that posting to foreshadow some pending announcements from UMBEL and Zitgist.

Well, those announcements are now at hand, and it is time to disclose some of the method behind our madness.

As Fred Giasson notes in his announcement posting, UMBEL has just released some new Web services with fully RESTful endpoints. We have been working on the design and architecture behind this for some time and, all I can say is, it’s UMBELievable!

As Fred notes, there is further background information on the UMBEL project — which is a lightweight reference structure based on about 20,000 subject concepts and their relationships for placing Web content and data in context with other data — and the API philosophy underlying these new Web services. For that background, please check out those references; that is not my main point here.

A RESTful Marriage

We discussed much in coming up with the new design for these UMBEL Web services. Most prominent was taking seriously a RESTful design and grounding all of our decisions in the HTTP 1.1 protocol. Given the shared approaches between RESTful services and linked data, this correspondence felt natural.

What was perhaps most surprising, though, was how complete and well suited HTTP was as a design and architectural basis for these services. Sure, we understood the distinctions of GET and POST and persistent URIs and the need to maintain stateless sessions with idempotent design, but what we did not fully appreciate was how content and serialization negotiation and error and status messages also were natural results of paying close attention to HTTP. For example, here is what t