User:Xavi Danto

From Rhizome Artbase

About Me

Xavi Danto is a current MSLIS candidate at Pratt Institute, and a recent intern working on Shape Expressions for Rhizome 🧽

Introduction: ShEx for Rhizome's ArtBase

Shape expressions (ShEx, nested in the EntitySchema namespace in Wikibase) provide a mechanism for defining and validating the structure of RDF in a user-friendly format. They allow data curators to specify patterns that data must conform to, enabling both validation and transformation of data structures. Shape expressions can describe various constraints, such as required fields, data types, and relationships between triples. This makes them particularly useful in contexts like APIs, data interchange formats, and semantic web applications, where ensuring data integrity and compliance is crucial. By providing a clear framework for data modeling, shape expressions enhance interoperability and facilitate better data management across different systems.

For Rhizome, there have been 23 EntitySchemas defined for ArtBase from an existing data entry form. Likewise, these schemas provide a starting place for further integration goals:

  • Improving manual item editing with generated input forms.
  • Providing feedback for item conformity to shape expressions.
  • Developing an API for automated checks against ShEx.
  • Generating visual data model representations.
  • Integrating with query interfaces for filtered results.


In this guide, I will define how to write and format a ShEx schema for the EntitySchema namespace in a Wikibase instance. To apply this guidance, I have selected a few examples, translated in a tabular format with a corresponding ShEx code block below. Then, I will outline a guide for how to ensure that ShEx schemas are well formed, and an in-progress workflow for data validation. Lastly, I will reflect on aspects of this project that are still in development, and how the progress thus far has met the stated goals of data validation for Rhizome's ArtBase.

How to ShEx: A Brief Guide

Namespaces

This is the easiest part. Below are the namespaces used in ShEx documents actively, although this can be copy and pasted directly from any item data formatted as "https://artbase.rhizome.org/wiki/Special:EntityData/[Q-ID].ttl":

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix wd: <https://artbase.rhizome.org/entity/>
prefix wdt: <https://artbase.rhizome.org/prop/direct/>
prefix : <https://example.org/>

Adding an empty prefix (i.e. prefix : <https://example.org/>) can help with troubleshooting shape label format preferences depending on the validation tool used, which is mentioned in the next section.

Language

There is ample documentation on all of the ways this language can work, and the official W3C Community Group guidelines, linked to shex.io, does not always reflect certain quirks of each implementation option available at the moment. Likewise, it's best to keep it simple. I'll use our first EntitySchema, describing how an artwork is modeled:

 1 start=@<artwork>
 2 
 3 <artwork>{
 4 wdt:P3 wd:Q5
 5 // rdfs:comment "instance of [artwork]"
 6 // rdfs:label "instance of artwork";
 7 
 8 wdt:P29 @<artist>
 9 // rdfs:comment "artist [item:person] [item:collective]"
10 // rdfs:label "artist is (person/collective)";
11 
12 wdt:P26 xsd:dateTime 
13 // rdfs:comment "inception [date]"
14 // rdfs:label "inception";
15 
16 wdt:P123 @<description> ? 
17 // rdfs:comment "description [item:summary description of], [item:Description of]"
18 // rdfs:label "description";
19 
20 wdt:P45 @<variant> ? 
21 // rdfs:comment "has variant [item: variants] [item: static files]"
22 // rdfs:label "has variant";
23 
24 wdt:P129 @<typeOfAccession> ? 
25 // rdfs:comment "type of accession [item: type of accession]"
26 // rdfs:label "type of accession";
27 
28 wdt:P85 xsd:dateTime 
29 // rdfs:comment "date of accession [date]"
30 // rdfs:label "date of accession";
31 
32 wdt:P126 [ <https://artbase.rhizome.org/wiki/File:>~ ] * 
33 // rdfs:comment "image [link to image]"
34 // rdfs:label "image";
35 
36 wdt:P30 xsd:string 
37 // rdfs:comment "optional: serial id in the classic Artbase database"
38 // rdfs:label "artbase legacy id";
39 
40 wdt:P31 xsd:string 
41 // rdfs:comment "optional: identifier used in Rhizome's Collective Access instance"
42 // rdfs:label "collective access legacy id";
43 
44 wdt:P52 xsd:string 
45 // rdfs:comment "optional: legacy serial id in Collective Access"
46 // rdfs:label "ca id";
47 
48 wdt:P49 xsd:string 
49 // rdfs:comment "canonical unique string identifier"
50 // rdfs:label "slug";
51 
52 wdt:P48 xsd:string 
53 // rdfs:comment "optional: keywords from classic Artbase"
54 // rdfs:label "artbase legacy tags";
55 }
56 
57 <artist> {
58 wdt:P3 [wd:Q6 wd:Q7] +
59 // rdfs:comment "instance of [item:person] / [item:collective]"
60 // rdfs:label "instance of person or collective" ; 
61 }
62 
63 <description> {
64 wdt:P3 [wdt:Q9759 wdt:Q4985]
65 // rdfs:comment "instance of description [item:summary description of], [item:Description of]"
66 // rdfs:label "instance of description";
67 }
68 
69 <variant> {
70 wdt:P3 [wd:Q1168 wd:Q11992]+
71 // rdfs:comment "instance of variants/static files"
72 // rdfs:label "instance of variant";
73 }
74 
75 <typeOfAccession> {
76 wdt:P3 wd:Q11996
77 // rdfs:comment "instance of [item: type of accession]"
78 // rdfs:label "instance of accession";
79 }

'START' & Shape Labels

Line 3 features the beginning shape, also known as a shape label. In the instance above, line 1 defines where the start of validation should occur, from the first shape, which will come in handy when defining our shape map later. Some community members have specified that the start value is not required, but it does allow us to navigate around this issue:

Since our current preferred implementation for validation is rudof, validation will not work unless the shape label <artwork> is formatted as ':artwork'. When a value shape is used in the first shape, this is formatted as either '@<example>' OR '@:example', and the value shape is started with '<example>' OR ':example' below the first closed shape.

Constraints

“A ShEx schema is built on node constraints and triple constraints that define what it means for a given RDF data graph to conform. An RDF triple is the three-part data structure of subject, property, and object with which all RDF data is expressed, and an RDF node is the piece of data found in the subject or object position of a triple. […] Node constraints and triple constraints are called "constraints" because they define, or ‘constrain’, the set of RDF nodes and data triples that will pass a conformance test.” (Shape Expressions (ShEx) 2.1 Primer, 2019)

In other words, constraints define how information should be expressed, connected, and controlled. They are the most direct translation of a metadata application profile to a machine readable format. Luckily, in ShEx, this is pretty straightforward. A basic expression consists of a triple constraint, which is a property (predicate in ShExJ) + value (datatype in ShExJ) + cardinality. To aid human-readable qualities of this information, Jose Emilio Labra Gayo and Andra Waagmeester have suggested the use of annotations, which I will describe below.

Property

In this instance, a property is always an ArtBase defined property, with prefix "wdt:", followed by a P-value.

Values

The desired value of a property are defined by node constraints:

value example explanation & notes
ArtBase item wd:Q4985 the property should link to a ArtBase item (wd:), which is a summary in this case (Q4985).
Datatype xsd:dateTime matches a literal with datatype xsd:dateTime.

xsd:string should be rarely used in this instance, as all fields should link if possible.

Reference shape @:derivedFrom if the item being linked requires various factors, this can be linked to another shape in the schema, as mentioned above.

If your urge is to make this value a property rather than a direct item, then you will need a reference shape.

External links [ <http://webenact.rhizome.org/>~ ] If a link should direct outside of ArtBase the formatting should consist of [ <link/>~ ]

This will let the root be defined but the extension can change depending on the desired location

ArtBase internal link [ wd:~ ] Since 'wd' is defined as '<https://artbase.rhizome.org/entity/>' this shortcut should link to any entity.

This is not used often but is helpful in 'made of' instances, where software varies frequently depending on the acquistion.

Value set [wd:Q12215 wd:Q11994] If either of these values are present in the data being validated, the data will pass

Checking for a specific output should be defined in a shape map if you want to check for only one value versus the other

The value must be wd:Q12215 or wd:Q11994 in this case

Kind (options) IRI BNode Literal NonLiteral The object must have that kind

This is primarily used for IRIs -- where the link is undefined (often artist's sites)

Composed with AND OR NOT xsd:string OR IRI This is used especially in defining descriptions, where either Rhizome Staff OR an artist (who is either a person or collective) can affiliate with the text connected to the work in question.
IMPORT IMPORT <http://schema.example/schema2> This would be nested above the first shape, and called in according to the guide linked.

Since existing reference shapes are very basic, this could be a nice way to nest existing shapes that may change over time.

Likewise, I can imagine this causing some problems depending on the implementation library used.

Cardinalities

The following regular expression conventions are used to specify cardinalities other than the default of "exactly one":

  • "+" - one or more
  • "?" - zero or one
  • "{m}" - exactly m
  • "{m,n}" - at least m, no more than n
  • "{m,}" - m or more repetitions
Annotations

As seen above, structured notes are called "annotations" in this context. By using '//', a validator is able to read this information without interfering with how validation is carried out. For now, let's stick to rdfs:comment(s) and rdfs:label(s):

  • rdf:comment(s) are meant to guide user input if an API interprets this data. In all of our ShEx thus far, this was directly copied from the existing data entry guide.
  • rdf:label(s) are used to define a form label for API integration down the road.

ShExC in ShExJ

In order to further understand this language, it's nice to analyze how a shape expression is parsed into a ShExJ, which is ShEx in JSON format. Using rudof, you can look at a ShExJ output by writing the command:

rudof shex -s shex/E1-Artwork.shex

This output should look something like this if you're only looking at the start and first constraint:

 1 {
 2   "@context": "http://www.w3.org/ns/shex.jsonld",
 3   "type": "Schema",
 4   "start": {
 5     "type": "NodeConstraint",
 6     "datatype": "@:artwork"
 7   },
 8   "shapes": [
 9     {
10       "type": "ShapeDecl",
11       "id": "artwork",
12       "abstract": false,
13       "shapeExpr": {
14         "type": "Shape",
15         "expression": {
16           "type": "EachOf",
17           "expressions": [
18             {
19               "type": "TripleConstraint",
20               "predicate": "https://artbase.rhizome.org/prop/direct/P3",
21               "valueExpr": {
22                 "type": "NodeConstraint",
23                 "datatype": "https://artbase.rhizome.org/entity/Q5"
24               },
25               "annotations": [
26                 {
27                   "type": "Annotation",
28                   "predicate": "http://www.w3.org/2000/01/rdf-schema#comment",
29                   "object": {
30                     "value": "instance of [artwork]"
31                   }
32                 },
33                 {
34                   "type": "Annotation",
35                   "predicate": "http://www.w3.org/2000/01/rdf-schema#label",
36                   "object": {
37                     "value": "instance of artwork"
38                   }
39                 }
40               ]
41             },

Shape Maps

Shape maps can be formatted either as a SPARQL Query, called a Query Map, or as a Fixed Map, which is formatted as a one-to-one relationship. Currently, ArtBase has a sort of tricky public endpoint that does not integrate well with existing ShEx implementations that use query maps. Additionally, the formatting of Query Maps depends on the validator used. Likewise, only Fixed Maps are possible in this instance, which are very simple. Here is a basic format for maps:

Examples Kind Explanations
wd:Q110@START Fixed Map <ttl node>@<START> // <ttl node>@<shape label>
rudof validate megattl/artwork/Q1228-monochrome-landscapes.ttl --schema shex/E1-Artwork.shex --node wd:Q15225 --shape-label :artwork Fixed map formatted for rudof cli the command line does not like bracketed shape labels, so troubleshooting with ':label' has helped. the results are not yet consistent but promising.
SPARQL '''SELECT ?item ?itemLabel WHERE {

  ?item wdt:P3* wd:Q1168 .

  SERVICE wikibase:label {

    bd:serviceParam wikibase:language "en"

  }

} LIMIT 10'''@START

Query Map Query map formatted for Simple Online Validator

SPARQL query to load items where "instance of" is "Variant"

This map will load ten items, and validate that they comply to the ShEx specs.

This is not yet implemented in rudof, and only works with Wikidata in ShEx2 Simple Online Validator.

Query maps are usually nested under the namespace portion as "# comments" in a ShEx document.

Adapting Rhizome's Data Model for ShEx Validation

This is just a selection of ShEx examples adapted from an internal data entry guide. More are located here.

Artwork

 1 prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 2 prefix xsd: <http://www.w3.org/2001/XMLSchema#>
 3 prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 4 prefix wd: <https://artbase.rhizome.org/entity/>
 5 prefix wdt: <https://artbase.rhizome.org/prop/direct/>
 6 prefix : <https://example.org/>
 7 
 8 #SPARQL '''SELECT ?item ?itemLabel WHERE {
 9 #  ?item wdt:P3* wd:Q5 .
10 #  SERVICE wikibase:label {
11 #    bd:serviceParam wikibase:language "en"
12 #  }
13 #} LIMIT 10'''@START
14 
15 start=@<artwork>
16 
17 <artwork>{
18 wdt:P3 wd:Q5
19 // rdfs:comment "instance of [artwork]"
20 // rdfs:label "instance of artwork";
21 
22 wdt:P29 @<artist>
23 // rdfs:comment "artist [item:person] [item:collective]"
24 // rdfs:label "artist is (person/collective)";
25 
26 wdt:P26 xsd:dateTime 
27 // rdfs:comment "inception [date]"
28 // rdfs:label "inception";
29 
30 wdt:P123 @<description> ? 
31 // rdfs:comment "description [item:summary description of], [item:Description of]"
32 // rdfs:label "description";
33 
34 wdt:P45 @<variant> ? 
35 // rdfs:comment "has variant [item: variants] [item: static files]"
36 // rdfs:label "has variant";
37 
38 wdt:P129 @<typeOfAccession> ? 
39 // rdfs:comment "type of accession [item: type of accession]"
40 // rdfs:label "type of accession";
41 
42 wdt:P85 xsd:dateTime 
43 // rdfs:comment "date of accession [date]"
44 // rdfs:label "date of accession";
45 
46 wdt:P126 [ <https://artbase.rhizome.org/wiki/File:>~ ] * 
47 // rdfs:comment "image [link to image]"
48 // rdfs:label "image";
49 
50 wdt:P30 xsd:string 
51 // rdfs:comment "optional: serial id in the classic Artbase database"
52 // rdfs:label "artbase legacy id";
53 
54 wdt:P31 xsd:string 
55 // rdfs:comment "optional: identifier used in Rhizome's Collective Access instance"
56 // rdfs:label "collective access legacy id";
57 
58 wdt:P52 xsd:string 
59 // rdfs:comment "optional: legacy serial id in Collective Access"
60 // rdfs:label "ca id";
61 
62 wdt:P49 xsd:string 
63 // rdfs:comment "canonical unique string identifier"
64 // rdfs:label "slug";
65 
66 wdt:P48 xsd:string 
67 // rdfs:comment "optional: keywords from classic Artbase"
68 // rdfs:label "artbase legacy tags";
69 }
70 
71 <artist> {
72 wdt:P3 [wd:Q6 wd:Q7] +
73 // rdfs:comment "instance of [item:person] / [item:collective]"
74 // rdfs:label "instance of person or collective" ; 
75 }
76 
77 <description> {
78 wdt:P3 [wdt:Q9759 wdt:Q4985]
79 // rdfs:comment "instance of description [item:summary description of], [item:Description of]"
80 // rdfs:label "instance of description";
81 }
82 
83 <variant> {
84 wdt:P3 [wd:Q1168 wd:Q11992]+
85 // rdfs:comment "instance of variants/static files"
86 // rdfs:label "instance of variant";
87 }
88 
89 <typeOfAccession> {
90 wdt:P3 wd:Q11996
91 // rdfs:comment "instance of [item: type of accession]"
92 // rdfs:label "instance of accession";
93 }
E1: Artwork
property value cardinality rdfs:comment rdfs:label reference shape
wdt:P3 wd:Q5 instance of [artwork] instance of artwork
wdt:P29 @:artist + artist [item:person] [item:collective] artist is (person/collective)
property value cardinality rdfs:comment rdfs:label
wdt:P3 [wd:Q6 wd:Q7] + instance of [item:person] / [item:collective] instance of person or collective
wdt:P26 xsd:dateTime inception [date] inception
wdt:P123 @:description ? description [item:summary description of], [item:Description of] description
property value cardinality rdfs:comment rdfs:label
wdt:P3 wd:Q9759 wd:Q4985 + instance of description [item:summary description of], [item:Description of] instance of description
wdt:P45 @:variant ? has variant [item: variants]s has variant
property value cardinality rdfs:comment rdfs:label
wdt:P3 wd:Q1168 wd:Q11992 instance of variants/static files instance of variant
wdt:P129 @:typeOfAccession ? type of accession [item: type of accession] type of accession
property value cardinality rdfs:comment rdfs:label
wdt:P3 wd:Q11996 instance of [item: type of accession] instance of accession
wdt:P85 xsd:dateTime date of accession [date] date of accession
wdt:P126 [ <https://artbase.rhizome.org/wiki/File:>~ ] * image [link to image] image
wdt:P30 xsd:string ? optional: serial id in the classic Artbase database artbase legacy id
wdt:P31 xsd:string ? optional: identifier used in Rhizome's Collective Access instance collective access legacy id
wdt:P52 xsd:string ? optional: legacy serial id in Collective Access ca id
wdt:P49 xsd:string canonical unique string identifier slug
wdt:P48 xsd:string ? optional: keywords from classic Artbase artbase legacy tags

Artist

E2: Artist
property value cardinality rdfs:comment rdfs:label reference shape
wdt:P3 [wd:Q6 wd:Q7] + instance of [item:person] [item:collective] instance of person/collective
wdt:P135 xsd:string administrative property used to generate artists' lists sorted by \"lastname, firstname\ sort by name
wdt:P60 @:artwork artwork [item: artwork main], [item: artwork main] artwork
property value cardinality rdfs:comment rdfs:label
wdt:P3 wd:Q5 + instance of artwork instance of artwork
wdt:P17 IRI ? optional: official website [URL: of the website] official website
wdt:P2 IRI ? optional: (URLs only) used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably (for Wikidata items primarily) exact match
wdt:P52 xsd:string ? optional: legacy serial id in Collective Access ca id
 1 prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 2 prefix xsd: <http://www.w3.org/2001/XMLSchema#>
 3 prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 4 prefix wd: <https://artbase.rhizome.org/entity/>
 5 prefix wdt: <https://artbase.rhizome.org/prop/direct/>
 6 prefix : <https://example.org/>
 7 
 8 #SPARQL '''SELECT ?item ?itemLabel WHERE {
 9 #  ?item wdt:P3* wd:Q6 .
10 #  SERVICE wikibase:label {
11 #    bd:serviceParam wikibase:language "en"
12 #  }
13 #} LIMIT 10'''@START
14 
15 start=@<artist>
16 
17 <artist> { 
18 wdt:P3 [wd:Q6 wd:Q7]+ 
19 // rdfs:comment "instance of [item:person] [item:collective]"
20 // rdfs:label "instance of person/collective" 
21 ; 
22 
23 wdt:P135 xsd:string 
24 // rdfs:comment "administrative property used to generate artists' lists sorted by 'lastname, firstname'"
25 // rdfs:label "sort by name" ;
26 
27 wdt:P60 @<artwork>
28 // rdfs:comment "artwork [item: artwork main], [item: artwork main]"
29 // rdfs:label "artwork" ;
30 
31 wdt:P17 IRI ?
32 // rdfs:comment "optional: official website [URL: of the website]"
33 // rdfs:label "official website" ;
34 
35 wdt:P2 IRI ? 
36 // rdfs:comment "optional: (URLs only) used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably (for Wikidata items primarily)"
37 // rdfs:label "exact match [wikidata item]" ;
38 
39 wdt:P52 xsd:string ?
40 // rdfs:comment "optional: legacy serial id in Collective Access"
41 //rdfs:label "ca id" ;
42 
43 }
44 
45 <artwork> {
46   wdt:P3 wd:Q5
47 // rdfs:comment "instance of artwork"
48 // rdfs:label "instance of artwork" ;  
49 }

Description

E3: Description of [artwork]
property value cardinality rdfs:comment rdfs:label reference shape
wdt:P3 wd:Q9759 wd:Q4985 + instance of [item:summary] instance of summary and/or description
wdt:P124 @:artwork description of [item:Artwork Main] description of artwork
property value cardinality rdfs:comment rdfs:label
wdt:P3 wd:Q5 instance of [item:Artwork Main] description of artwork
wdt:P127 wd:Q11967 OR @:artist attributed to Rhizome Staff or artist attributed to Rhizome Staff or artist
property value cardinality rdfs:comment rdfs:label
wdt:P3 [wd:Q6 wd:Q7] + instance of [item:person] / [item:collective] instance of person or collective
wdt:P26 xsd:dateTime inception [date] inception
wdt:P102 @:derivedFrom ? optional: derived from [item: Summary description by the artist] derived from
property value cardinality rdfs:comment rdfs:label
wdt:P3 wd:Q9759 wd:Q4985 instance of [summary description by the artist] or [description] derived from summary or description by artist/s
wdt:P133 wd:Q11967 ? optional: copyedited by [item:Rhizome staff] copyedited by
 1 prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 2 prefix xsd: <http://www.w3.org/2001/XMLSchema#>
 3 prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 4 prefix wd: <https://artbase.rhizome.org/entity/>
 5 prefix wdt: <https://artbase.rhizome.org/prop/direct/>
 6 prefix : <https://example.org/>
 7 
 8 #SPARQL '''SELECT ?item ?itemLabel WHERE {
 9 #  ?item wdt:P3* wd:Q9759 .
10 #  SERVICE wikibase:label {
11 #    bd:serviceParam wikibase:language "en"
12 #  }
13 #} LIMIT 10'''@START
14 
15 start=@<description>
16 
17 <description>{
18 
19 wdt:P3 [wd:Q9759 wd:Q4985]+ 
20 // rdfs:comment "instance of [item:summary] [item:description]"
21 // rdfs:label "instance of summary and/or description" ;
22 
23 wdt:P124 @<descriptionOf> 
24 // rdfs:comment "description of [item:Artwork Main]"
25 // rdfs:label "description of artwork" ; 
26 
27 wdt:P127 wd:Q11967 OR @<artist>
28 // rdfs:comment "attributed to [item:person] [item:collective] [item: Rhizome Staff]"
29 // rdfs:label "attributed to artist or Rhizome Staff" ;
30 
31 wdt:P26 xsd:dateTime 
32 // rdfs:comment "inception [date]"
33 // rdfs:label "inception" ;
34 
35 wdt:P102 @<derivedFrom> ? 
36 // rdfs:comment "optional: derived from [item: summary description by the artist]"
37 // rdfs:label "derived from summary by artist/s" ; 
38 
39 wdt:P133 wd:Q11967 ? 
40 // rdfs:comment "copyedited by [item:Rhizome staff]"
41 // rdfs:label "derived from summary description by artist/s" ;
42 }
43 
44 <descriptionOf> {
45 wdt:P3 wd:Q5
46 // rdfs:comment "instance of [item:Artwork Main]"
47 // rdfs:label "description of artwork" ; 
48 }
49 
50 <artist> {
51 wdt:P3 [wd:Q6 wd:Q7]
52 // rdfs:comment "instance of [item:person] [item:collective]"
53 // rdfs:label "instance of person or collective" ;
54 }
55 
56 <derivedFrom> {
57 wdt:P3 [wd:Q4985 wd:Q9759]
58 // rdfs:comment "instance of [summary description by the artist] or [description]"
59 // rdfs:label "derived from summary or description by artist/s" ;
60 }

Artifact

E4: Artifact - Static Files
property value cardinality rdfs:comment rdfs:label reference shape
wdt:P3 [wd:Q12215 wd:Q11992] Instance of [item: artifact]/ [item:static files] Instance of
wdt:P141 :download artifact of [item:[download artifact]] artifact of
property value cardinality rdfs:comment rdfs:label
wdt:P3 wd:Q12081 + instance of [item:[download artifact]] instance of download
wdt:P26 xsd:dateTime inception [date] inception
wdt:P129 :typeOfAccession type of accession [item: (type of accession)] type of accession
property value cardinality rdfs:comment rdfs:label
wdt:P3 wd:Q11996 + instance of [item: (type of accession)] instance of accession type
wdt:P85 xsd:dateTime date of accession [date] date of accession
wdt:P118 wd:Q11967

OR @:artist

associated with [item:Rhizome Staff]/[item: artist name] associated with
property value cardinality rdfs:comment rdfs:label
wdt:P3 wd:Q6 wd:Q7 + instance of [item: person OR collective] instance of artist
wdt:P81 wd: * made of [item: (technical item)s] made of
wdt:P117 wd:Q12139 generated by [item:file copy] generated by
wdt:P102 IRI * derived from [item:artwork outside link] derived from
wdt:P46 http://webenact.rhizome.org/ * access URL [http://webenact.rhizome.org/…] access URL
 1 prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 2 prefix xsd: <http://www.w3.org/2001/XMLSchema#>
 3 prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 4 prefix wd: <https://artbase.rhizome.org/entity/>
 5 prefix wdt: <https://artbase.rhizome.org/prop/direct/>
 6 prefix : <https://example.org/>
 7 
 8 start=@<artifact>
 9 
10 <artifact> {
11     wdt:P3 [wd:Q12215 wd:Q11992] 
12     // rdfs:comment "Instance of [item: artifact]/ [item:static files]"
13     // rdfs:label "Instance of" ;
14     wdt:P141 @<download>
15     // rdfs:comment "artifact of [item:[download artifact]]"
16     // rdfs:label "artifact of" ;
17     wdt:P26 xsd:dateTime     
18     // rdfs:comment "inception [date]"
19     // rdfs:label "inception" ;
20     wdt:P129 @<typeOfAccession> 
21     // rdfs:comment "type of accession [item: (type of accession)]"
22     // rdfs:label "type of accession" ;
23     wdt:P85 xsd:dateTime 
24     // rdfs:comment "date of accession [date]"
25     // rdfs:label "date of accession" ;
26     wdt:P118 wd:Q11967 OR @<artist> 
27     // rdfs:comment "associated with [item:Rhizome Staff] OR [item: artist name]"
28     // rdfs:label "associated with" ;
29     wdt:P81 wd:* 
30     // rdfs:comment "made of [item: (technical item)s]"
31     // rdfs:label "made of" ;
32     wdt:P117 wd:Q12139
33     // rdfs:comment "generated by [item:file copy]"
34     // rdfs:label "generated by" ;
35     wdt:P102 IRI * 
36     // rdfs:comment "derived from [item:artwork outside link]"
37     // rdfs:label "derived from" ;
38     wdt:P46 [ <http://webenact.rhizome.org/>~ ] * 
39     // rdfs:comment "access URL [http://webenact.rhizome.org/…] "
40     // rdfs:label "access URL" ;
41 }
42 
43 <download> {
44     wdt:P3 wd:Q12081
45     // rdfs:comment "Instance of [item:[download artifact]]"
46     // rdfs:label "Instance of download" ;
47 }
48 
49 <typeOfAccession> {
50     wdt:P3 wd:Q11996
51     // rdfs:comment "instance of [item: (type of accession)]"
52     // rdfs:label "instance of accession" ;
53 }
54 
55 <artist> {
56     wdt:P3 [wd:Q6 wd:Q7]
57     // rdfs:comment "instance of person or collective"
58     // rdfs:label "instance of person or collective" ;
59 }

Variant

E20: Variant - Web Archive
property value cardinality rdfs:comment rdfs:label reference shape
wdt:P3 wd:Q1168 Instance of [item: variant] Instance of
wdt:P56 :artwork variant of [item:Artwork Main] variant of
property value cardinality rdfs:comment rdfs:label
wdt:P3 wd:Q5 Instance of [item:artwork main] instance of artwork
wdt:P26 xsd:dateTime inception [date] inception
wdt:P118 wd:Q11967 OR @:artist associated with [item:Rhizome Staff] OR [item: artist name] associated with
property value cardinality rdfs:comment rdfs:label
wdt:P3 [wd:Q6 wd:Q7] instance of person or collective instance of person or collective
wdt:P117 wd:Q4983 generated by [item:web capture] generated by
wdt:P46 <http://webenact.rhizome.org/~> * access URL [http://webenact.rhizome.org/…] access URL
wdt:P139 :webCaptureArtifact artifact [item:[web capture artifact]] artifact
property value cardinality rdfs:comment rdfs:label
wdt:P3 [wd:Q12215 wd:Q11994] instance of [item:[web capture artifact]] instance of web capture artifact
 1 prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 2 prefix xsd: <http://www.w3.org/2001/XMLSchema#>
 3 prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 4 prefix wd: <https://artbase.rhizome.org/entity/>
 5 prefix wdt: <https://artbase.rhizome.org/prop/direct/>
 6 prefix : <https://example.org/>
 7 
 8 start=@<variant>
 9 
10 <variant> {
11     wdt:P3 wd:Q1168
12     // rdfs:comment "Instance of [item: variant]"
13     // rdfs:label "Instance of" ;
14     wdt:P56 @<artwork>
15     // rdfs:comment "variant of [item:Artwork Main]"
16     // rdfs:label "variant of" ;
17     wdt:P26 xsd:dateTime
18     // rdfs:comment "inception [date]"
19     // rdfs:label "inception" ; 
20     wdt:P118 wd:Q11967 OR @<artist> 
21     // rdfs:comment "associated with [item:Rhizome Staff] OR [item: artist name]"
22     // rdfs:label "associated with" ;
23     wdt:P117 wd:Q4983  
24     // rdfs:comment "generated by [item:web capture]"
25     // rdfs:label "generated by" ; 
26     wdt:P46 <http://webenact.rhizome.org/~> * 
27     // rdfs:comment "access URL [http://webenact.rhizome.org/…]"
28     // rdfs:label "access URL" ;
29     wdt:P139 @<webCaptureArtifact>
30     // rdfs:comment "artifact [item:[web capture artifact]]"
31     // rdfs:label "artifact" ;
32 }
33 
34 <artwork> {
35     wdt:P3 wd:Q5
36     // rdfs:comment "Instance of [item:artwork main]"
37     // rdfs:label "Instance of artwork" ;
38 }
39 
40 <artist> {
41     wdt:P3 [wd:Q6 wd:Q7]
42     // rdfs:comment "instance of person or collective"
43     // rdfs:label "instance of person or collective" ;
44 }
45 
46 <webCaptureArtifact> {
47     wdt:P3 [wd:Q12215 wd:Q11994]
48     // rdfs:comment "instance of [item:[web capture artifact]]"
49     // rdfs:label "instance of web capture artifact" ;
50 }

Current Workflow

  1. Data entry guide → ShExC (human written) → rudof analysis (ShExJ) → visualization in RDF Shape = ShEx
    • I didn't mention the visualization option above since it's not crucial, but it can help reify how these relationships work if interpreting ShExJ is vague or confusing.
  2. Compare ShEx to a good model example in the archive. Does it reflect the desired form? Why or why not?
  3. Data → megattl (Jupyter Notebook, has issues) = test data
    • To my understanding, no ShEx implementation can really test items beneath the surface level RDF information, which is frustrating. Likewise, the megattl.ipynb file can concatenate linked objects into one "megattl" for testing. The only problem we're having is that it assigns random namespaces to concatenated items, and does not really communicate the relationships between concatenated data sets in a sensible way.
  4. <node being tested/Q# of base entity from which the megattl is derived> : START = fixed shape map
  5. rudof validate (tinkering, example below)
    • rudof validate megattl/artwork/Q1228-monochrome-landscapes.ttl --schema shex/E1-Artwork.shex --node wd:Q1228 --shape-label :artwork
  6. Further action to be determined (!!)

Implementation & Toolbox

This is the most difficult discussion at the moment. The Simple Online Validator is not integrated with ArtBase currently, although it could be using a configuration. Yet, we have tested a handful of libraries, and below are our preferred options at the moment.

Name Type Description Evaluation
rudof Validation (command-line) 'This repo contains an RDF data shapes library implemented in Rust. The implementation supports ShEx, SHACL, DCTap and conversions between different RDF data modeling formalisms.' Useable, and preferred for this stage of work. Requires megattl to validate how linked item pages relate to each other.
ShEx2—Simple Online Validator Validation (web-based app) From github source: 'shex.js javascript implementation of Shape Expressions' Effective, but requires either public SPARQL endpoint, 'megattl', or configuration. Also, it is a bit buggy on the UX side.
MediaWiki Configuration Configuration Configures shex.js to embed on item page in Wikibase or Wikidata. May work depending on the institution, but not used in this instance.
vscode-shex-extensions Extension ShEx extensions for Visual Studio Code. Effective and easy to implement to support code in VS Code. The related extension, YASHE, opposes some preferences from rudof, and is thus not helpful in this case (although it seems good for wikidata specifically).
RDF Shape Validator, Shape Extraction, Visualization 'RDFShape offers an RDF playground which can be used to teach RDF related technologies. Paper describing RDFShape: RDFShape: An RDF playground based on Shapes, Jose Emilio Labra Gayo, Daniel Fernåndez Álvarez, Herminio García Gonzålez, Demo presented at International Semantic Web Conference, Monterey, California - 2018' This does a lot, and is helpful in developing a strategy for implementation, but it has the same issues with validation as ShEx2.

Conclusion

Although the implementation of ShEx for Rhizome's ArtBase is still in development, it is worthwhile to reflect on the outcome of this project at this stage. An extensive amount of knowledge generated by community efforts supporting the use of Shape Expressions has allowed for a detailed translation from an original data entry guide to a multidimensional, machine-readable, and user-friendly format guiding data curation. Although tools for implementation are still in development, there seems to be a growing needed for a centralized, flexible option for data validation in Wikibase instances, and plenty of community support.

Works Consulted & Resources

[via shex.io]

[json-ld] - Manu Sporny; Gregg Kellogg; Markus Lanthaler. W3C. 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/json-ld/

[rdf-schema] - Dan Brickley; Ramanathan Guha. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/rdf-schema/

[rdf11-primer] - Guus Schreiber; Yves Raimond. W3C. 24 June 2014. W3C Note. URL: https://www.w3.org/TR/rdf11-primer/

[shex-semantics] - Eric Prud'hommeaux; Iovka Boneva; Jose Labra Gayo; Gregg Kellogg. URL: http://shex.io/shex-semantics/

[turtle] - Eric Prud'hommeaux; Gavin Carothers. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/turtle/

[shape-map] Eric Prud'hommeaux; Thomas Baker. URL: http://shex.io/shape-map/

[shex-vocab] Gregg Kellogg. URL: http://www.w3.org/ns/shex#

[via scholia & elsewhere]

Gayo, J. E. L. (2022, August 4). WShEx: A language to describe and validate Wikibase entities. arXiv.Org. https://arxiv.org/abs/2208.02697v1

Thornton, K., & Seals-Nutt, K. (2022). A Digital Preservation Wikibase. https://doi.org/10.17605/OSF.IO/XKW89

Samuel, J. (2021). ShExStatements: Simplifying Shape Expressions for Wikidata. Companion Proceedings of the Web Conference 2021, 610–615. https://doi.org/10.1145/3442442.3452349

Thornton, K., Solbrig, H., Stupp, G. S., Labra Gayo, J. E., Mietchen, D., Prud’hommeaux, E., & Waagmeester, A. (2019). Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation. In P. Hitzler, M. Fernández, K. Janowicz, A. Zaveri, A. J. G. Gray, V. Lopez, A. Haller, & K. Hammar (Eds.), The Semantic Web (pp. 606–620). Springer International Publishing. https://doi.org/10.1007/978-3-030-21348-0_39

Waagmeester, A., Thornton, K., Werkmeister, L., & Stupp, G. (Directors). (200 C.E., 00:00). Using Shape Expressions for data quality and consistency in Wikidata [Video recording]. https://media.ccc.de/v/wikidatacon2017-10032-using_shape_expressions_for_data_quality_and_consistency_in_wikidata

Gayo, J. E. L., Prud’hommeaux, E., Boneva, I., & Kontokostas, D. (2018a). Comparing ShEx and SHACL. In J. E. L. Gayo, E. Prud’hommeaux, I. Boneva, & D. Kontokostas (Eds.), Validating RDF Data (pp. 233–266). Springer International Publishing. https://doi.org/10.1007/978-3-031-79478-0_7

Gayo, J. E. L., Prud’hommeaux, E., Boneva, I., & Kontokostas, D. (2018b). Shape Expressions. In J. E. L. Gayo, E. Prud’hommeaux, I. Boneva, & D. Kontokostas (Eds.), Validating RDF Data (pp. 55–117). Springer International Publishing. https://doi.org/10.1007/978-3-031-79478-0_4

Gayo, J. E. L., Préstamo, Á. I., Fernández, D. M., & Arnaud, M.-A. (n.d.). rudof: A Rust Library for handling RDF data models and Shapes | Labra’s home page. International Semantic Web Conference, ISWC24, Posters and Demos. Retrieved September 8, 2024, from https://labra.weso.es/publication/2024_rudof_demo/

[on Rhizome's ArtBase and data model]

Espenschied, D. (2021). Basics of Born-Digital Preservation. Rhizome Almanac. Retrieved September 8, 2024, from https://almanac.rhizome.org/pages/preservation-basics

Fauconnier, S. (2018, September 6). Many faces of Wikibase: Rhizome’s archive of born-digital art and digital preservation. Wikimedia Foundation. https://wikimediafoundation.org/news/2018/09/06/rhizome-wikibase/

Ma, X., Espenschied, D., & Moulds, L. J. (2022). Access Quality Metrics for Net Art. In the Proceedings of the 18th International Conference on Digital Preservation. https://maximalmargin.com/96d6a4f016ec0087cd81833ad10c712b/iPres22_Short-Paper_Access-Quality-Metrics-for-Net-Art_20220915.pdf

Rossenova, L. (2020). ArtBase Archive–Context and History: Discovery phase and user research 2017–2019. Rhizome, 2020a. https://scholar.google.com/scholar?cluster=17112232127183010524&hl=en&oi=scholarr

Rossenova, L. (2017). Presentation and contextualisation in the online archive of internet art. Electronic Visualisation and the Arts (EVA 2017). https://www.scienceopen.com/hosted-document?doi=10.14236/ewic/EVA2017.18

[conversations online]

Wikidata talk:Schemas (linked)

Telegram channel for Wikidata EntitySchemas