User:Jyoung

From LD4 WBStack Test
Jump to navigation Jump to search

Jeff Young - OCLC Research

While I like the idea of mocking up LCSH in a Wikibase instance, I think that basing it on Nomen (as opposed to "concepts" and/or "RWOs") is misguided and increases complexity rather than resolving it.

I understand and agree that Access Control Points are fundamental to MARC (and community localization in general) and that Wikidata isn't a substitute. I would vehemently argue, though, that most of the problems being addressed here CAN be solved using a dedicated RWO-oriented Wikibase model combined with careful thought and enforcement of immutable labels. It would be easier to demonstrate the pros and cons with examples and historical context than explain in words, but I'll try starting with a few examples:

Example 1

Is Hello Kitty one thing or three?

Note that the properties that can be associated with each are radically different.

The Fundamental theorem of software engineering states:

"We can solve any problem by introducing an extra level of indirection."

With the humorous clause...

... except for the problem of too many levels of indirection.

Example 2

Is it merely the Nomens that are offensive in these examples, or is it an identifiable concept that underlies these words that is offensive?

  • illegal alien
    • English singular
  • illegal aliens
    • English plural
  • immigrante ilegal
    • Spanish singular
  • etc.

Aside regarding "info" URIs as a solution

During discussion, there was a suggestion that "info" URIs (http://info-uri.info/) was a potential alternative to identify something without regard to human labeling. The example used was the "info" URI defined for "illegal aliens" in LCSH:

https://id.loc.gov/authorities/subjects/sh85003553.html

The suggestion seemed to be that info:lc/authorities/sh85003553 could be used to identify some kind of abstraction that could be harmlessly decoupled from the harmful string "illegal aliens".

As the administrator of http://info-uri.info/, I immediately objected to that interpretation. The "info" URI registry was established in 2003 as a workaround for perceived limitations in "http" URLs. In those days, URLs (aka "http" URIs) were defined to identify resources that were were located on the web, which LCSH was not at the time. W3C overturned this "located on the web" interpretation as a consequence of the httpRange-14 decision in 2005, which effectively negated "info" URIs as a practical solution. In effect, "info" URIs didn't solve the problem, they just made the identifier ambiguous and subject to interpretation by 3rd party systems that were beyond LC's control.

Nomens and VIAF

An early RDF representation of VIAF included Nomens (aka skosxl:Label) in its data model. These were factored out in 2011 because they didn't deliver the benefits that were expected. An analysis of the decision can be found here: https://outgoing.typepad.com/outgoing/2011/04/changes-to-viafs-rdf.html

Nomens vs Lexicographical data?

The problem with "Nomen" isn't that they aren't meaningful or useful, it's that they conflate several ideas that deserve to be teased apart to be practical. Wikidata's Lexicographical model is a great example of such an approach with practical applications for natural language processing:

https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation