[This is a brief overview of the content at this conference; I'll add and edit soon.]
Personal overview: I enjoyed this conference immensely because of the people involved – a BIG shout out to Lukas, Martin, Adrian, Felix, Roxana, two Patricks, Nicolas, Richard, Simeon, Magnus — SO many Dutch/Belgian people and others who make attending these events valuable even before the content.
Regarding the content, what it tells me is that linked data is a technology that is maturing, but it has quite a way to go before it we can say this stuff is ready to roll out in libraries. There are a lot of projects that seem to be doing the same things independently of one another; this situation is in one sense a horrible waste of money and in another a good thing because I suspect that there are different insights in each of these implementations. At some point someone should aggregate these and make a killer product.
From our NTNU perspective, there is a lot to be said for our approach of abandoning all formats except linked data; ETL seems to be a key concept in many projects. This is an immense waste of resources and isn’t moving us on. Some might argue that it’s pragmatism, but this isn’t so — our approach and similarly Libris who we largely copied off shows that abandoning traditional workflows is a pragmatic decision. Keep that stuff out of your systems!
In order for linked data to become a real thing in libraries, there needs to be out of the box software that just works. What’s happening in the communities presenting at SWIB13 can be seen as a race towards this end, where we’re all competing to be the first to deliver. As I said above, this is harmful and positive; I suspect that radical pragmatism will create this product, but I doubt that it will be the “winner” that produces the product that becomes adopted.
A final note: a problem with the format of presentations like many of those seen at SWIB13 is projects! There are too many project descriptions going on. It would be better to have a short (2 minutes each) run down of those running projects so that we could know who to approach, while long presentations should refer to a few key ideas gleaned from this kind of work. Several managed this, a many didn’t.
Salo spoke about how hard it is to teach linked data to librarians because linked data is defined in terms that are outside the experience of librarians — it takes comparatively little time to teach HTML, XML, but RDF seems to be very hard to explain. The lack of tools, reliable library data resources, etc. means that linked data take-up is slow.
A chief worry is that there will be too little traction for linked data in libraries because it isn’t easy — Salo used the term “negative path dependence” whereby a lesser, but easier technology that is inferior to some competitor wins out because not because it is good, but because it can be seen to work. MARC was seen as the inferior technology.
It was pointed out that in order for library linked data to work, the tools to do the job need to be better.
The talk left me with a few questions: I wonder why librarians don’t interest themselves in the key Internet technologies more? An understanding of the basics of HTTP is a good thing for librarians that work with URLs anyway… Why should libraries move off the already embedded technology (MARC), when it seems to largely be fit for purpose? What things motivate the use of linked data in libraries? The only “new” need Salo identified was “sharing”, but perhaps MARC can be more easily shared…
Pfeffer talked about ontology matching in classification systems using matching of manifestations of the terms in bibliographic data.
There are many similarities to the work of Knut Hegna ##REF##
The statistical method used was comparison of intersection and union of matches. The perceived problem with this method was the lack of negative inferences (where there are definitely no links). The links are evaluated by measuring recall and precision against a “gold standard”, feedback from this evaluation is fed back into the gold standard.
Pfeffer identified issues in SKOS related to qualification of relations.
This talk was interesting for me as it relates directly to work carried out at University of Oslo/NTNU that I have been involved in.
Again, this talk was about matching in authority files, this time between a bibliographic data set and DBPedia.
Here, issues involving how concepts are treated (broad strokes) in DBPedia, while classification systems are largely narrow concepts.
Zumstein created a browser plugin that helped subject specialists in their book-ordering workflow; rather than the traditional, multi-search process, the data from various resources (holdings, catalogues, OPAC, etc) is aggregated and presented together so that time is saved.
A mix of different technology is used: RDF, Z39.50, etc.
As the system uses ISBNs, the traditional issues are noted (ISBN is not an ID, linking editions, etc.) Might be worth looking at the methods in http://journal.code4lib.org/articles/5339
Talked about motivations and specifications that caused them to choose VIVO; how they adapted it to work for them by modifying core VIVO and using Drupal on top. Problems in the data: lack of identifiers.
A lot of examples of how small OSS projects are sometimes problematic — one guy knows a lot and there’s little documentation, but I suspect that using a commercial product that was “fixed” would leave a much bigger mess (cf. all experience with custom solutions and upgrades in the history of IT).
I have some issues with the approach taken in VIVO; as far as I can see, using normal linked data techniques (like vocabulary re-use) from domains outside the core VIVO ontology makes upgrades break — unless one jumps though vocabulary importing hoops. As this is the case, I wouldn’t call VIVO a linked data tool, but rather an ontology driven tool. Additionally, technology choices like Jena SDB and Solr date this solution.
Spoke about his projects on authorities and ebooks.
[I'm always positively inclined towards Richard as he got me started on linked data.]
What is the benefit for libraries? What is the benefit for users?
No web of library data yet. Traction is coming in the form of Google’s knowledge graph — RDF-like structures (entities). Why is this relevant to libraries?
Asks where are our users? Google…
BnF: 80+% of search comes from search engines; people aren’t using search interfaces we provide. This is just true, so exposing data on the web is very important.
Changing from cataloguing to catalinking (Eric Miller).
Use the Web. Linked data. Schema.org
Schema.org is a good tool for search engines. BIBFRAME is companion vocabulary for libraries.
Things are happening. Quickly.
[LGS has the best accent in the world.]
We know we need to change, but how?
BIBFRAME is an exchange format built on linked data principles. (A direct replacement for MARC).
Linked data is more agnostic than MARC, data can be anything; but it should support different models, support RDA and other cataloguing rules. Be extensible for new material types.
German national library is an early experimenter…
Provide data in many profiles encoded in RDF; Link; < http://example.com/a> ;;rel=”profile”
BIBFRAME must be enriched and stablized. What about version-controlled?
Showed a nasty big form that represents BIBFRAME from the LoC.
GNL implements RDA and FRBR, how well does BIBFRAME transport RDA?
BIBFRAME can replace MARC, but lots of changes must happen in order for this to work out. Benefit is participation with GLAMs
A very sensible presentation.
Large-scale ontology project.
Centralized. The reverse of distributed data.
Improve interoperability across the spectrum of users.
Lightweight SKOS ontologies intended for annotations. They provide an upper ontology.
Ontologies are easy to link. Rigid definitions. Can be explicitly related to one-another.
Ontologies used to link different resources together, but the changes are difficult to manage directly because there are many users and systems.
The upper ontology provides an abstract layer that is rarely updated; links between the domain ontologies are reduced. Interaction happens via the upper ontology.
Trilingual ontology. YSO top level based on DOLCE. Cognitive. Daft. Culture before language. Represent informations in the way Finnish culture represents them, not like Swedish/English language.
Holdings ontology is many small ontologies.
Open, feedback welcome
[Another person to whom I'm both indebted and positively inclined towards]
Conversion to RDF from various formats used in library data helps to give an understanding of the data — RDF-thinking helps you to get what the data is modelling, not just simple description
Next logical context: data –> linked data (more data with links).
Sick of ineffectivity of MARC in workflows because it didn’t support what they needed to do — converting data between the formats was a pain in the arse, so they started creating a linked-data-based system. Linked data system adds the functionality they needed. Use JSON-LD because it’s a tool non-RDF people can use.
RDF lets you solve problems as they arise, while other formats (MARC) can’t extend to cover new functionality.
Open source, self-programmed.
Important distinction is that there’s no difference between you data and other people’s data. You need to be able to handle the distributed model.
They have MARC, but it’s generated from RDF. Co-existence with the legacy data is necessary. MODS is also there. The formats need to be kept away from the core system — they are exchange formats. Inside, they have linked data, outside they have many data formats. This separation keeps the core system safe from the influence of the bad ideas.
Data aggregates from publishers/vendors breaks the idea of linked data — everyone should publish their own data, if not the aggregate must be transparent.
Linked data needs interfaces for users. In order to convince people, they created an interface and helped institutions create their own linked data.
TBLs 5-star model isn’t enough, actually using the data makes it real; to be useful it needs to be used. Linked data and a UX is useful data. Do things quickly, try to understand what people are trying to achieve.
RDF isn’t a format; it’s a way of representing data.
Around about here, I noted that my head really started shutting down — the following is a very brief overview of the content. My lack of reporting in no way reflects anything about the content & I apologize beforehand.
Have produced data. It is used a bit by non-library people. It is also used by libraries. Created an OPAC+ based on linked data. Users are happy with results like FRBR-ization, enrichment, navigation changes.
Talked about the Europeana project and its tools and workflows.
[Kai is a good guy who puts a lot of energy into important work on provenance in RDF, unfortunately I missed this year's provenance workshop]
Digitized manuscripts to Europeana.
Tool chain for data migration, Pundit and the openglam community.
Heterogenous data formats, TEI, MARC, MAB2, etc.
Europeana Data Model is generic in order to tackle heterogenous data formats. I have obvious problems with this approach.
Application profiles (schema.org etc.)
Retain original semantics, use existing URIs and type resources.
Reusing existing namespaces.
Approach uncertainties…with confidence level (and reification or named graphs).
Application of LOD to Enrich the Collection of Digitized Medieval Manuscripts at the University of Valencia
Jose Manuel Barrueco Cruz
Talked about the process of converting traditional metadata to linked data and presenting this.
As well as a nice introduction featuring some good – and realistic — ideas around library resource management, Warner gave a coherent, smart and understandable overview of ResourceSync, which is no mean feat.
Enhancing an OAI-PMH Service Using Linked Data: A Report from the Sheet Music Consortium
This was a great presentation that captured my attention even at this late stage in the conference. Some of the methods of creating context seemed very well designed.
At this point my brain really shut down…