Tuesday, October 11, 2011

Data sources

What databases would you like to draw on to enrich your data?  For example, VIAF and id.loc.gov are great sources. Which others are important to you?  

Are there barriers to accessing them, such as charges for access, limits on how often you can access them or how much data you can use?

Are there concerns about availability or persistence?

Please tell us what data you want to be able to draw upon and what pitfalls you see.


  1. "Pitfall" is the "wet-ware". I don't know the basics of HOW to use these things. I hit internal discussions of "why fix something that isn't broken" and my reply "not broken YET" does not seem to get very far

  2. For the pools of data; Open Metadata Registry

  3. The #discodev (Discovery Developer) competition hightlighted a range of sources that were used by developers building around various types of metadata (not just bibliographic) - the examples are at http://discovery.ac.uk/developers/competition/

    Just to pick an example I know (because it was mine), I created a bookmarklet that used a record in the COPAC union catalogue as a starting point and drew in information from the BBC, Dbpedia, MusicBrainz and MusicNet to deliver an enhanced display for the catalogue record - the details are here http://www.meanboyfriend.com/overdue_ideas/2011/07/compose-yourself/

    The 'pit fall' was the reliance on remote data sources - suddenly the BBC relaunched some of their web services, and the thing I was using broke. I was able to work around the problem by using a cached version of the same data from elsewhere - but it shows the problems of relying on third party data, and also the advantages of caching data in multiple places.

  4. I did some work with using LCSH as made available via http://id.loc.gov in a Linked Data project - and encountered some issues - you can see some slides on this at http://www.slideshare.net/ostephens/linking-lcsh-and-other-stuff (from page 8 onwards), or see me rant about it in this video http://www.ustream.tv/recorded/15986081 from 02:12:30 onwards

  5. Not that it's a key resource for my library, but the National Agricultural Library's Agricultural Thesaurus is now available as Linked Open Data: http://agclass.nal.usda.gov/

    I wish the Getty vocabularies could be freely accessible as linked open data! Which other thesauri (beyond VIAF and those at id.loc.gov) are out there as LOD?

  6. OCLC FAST, LOC genre and other genre lists, TGM I and II, DC type, ULAN, AAT, TGN

  7. The reports from the W3C Library Linked Data Incubator are up, including a very timely (for our purposes) one on Data Sources: http://www.w3.org/2005/Incubator/lld/XGR-lld-vocabdataset-20111025/