User:Xavi Danto
About Me
Xavi Danto is a current MSLIS candidate at Pratt Institute, and a recent intern working on Shape Expressions for Rhizome đ§˝
Introduction: ShEx for Rhizome's ArtBase
Shape expressions (ShEx, nested in the EntitySchema namespace in Wikibase) provide a mechanism for defining and validating the structure of RDF in a user-friendly format. They allow data curators to specify patterns that data must conform to, enabling both validation and transformation of data structures. Shape expressions can describe various constraints, such as required fields, data types, and relationships between triples. This makes them particularly useful in contexts like APIs, data interchange formats, and semantic web applications, where ensuring data integrity and compliance is crucial. By providing a clear framework for data modeling, shape expressions enhance interoperability and facilitate better data management across different systems.
For Rhizome, there have been 23 EntitySchemas defined for ArtBase from an existing data entry form. Likewise, these schemas provide a starting place for further integration goals:
- Improving manual item editing with generated input forms.
- Providing feedback for item conformity to shape expressions.
- Developing an API for automated checks against ShEx.
- Generating visual data model representations.
- Integrating with query interfaces for filtered results.
In this guide, I will define how to write and format a ShEx schema for the EntitySchema namespace in a Wikibase instance. To apply this guidance, I have selected a few examples, translated in a tabular format with a corresponding ShEx code block below. Then, I will outline a guide for how to ensure that ShEx schemas are well formed, and an in-progress workflow for data validation. Lastly, I will reflect on aspects of this project that are still in development, and how the progress thus far has met the stated goals of data validation for Rhizome's ArtBase.
How to ShEx: A Brief Guide
Namespaces
This is the easiest part. Below are the namespaces used in ShEx documents actively, although this can be copy and pasted directly from any item data formatted as "https://artbase.rhizome.org/wiki/Special:EntityData/[Q-ID].ttl":
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix wd: <https://artbase.rhizome.org/entity/>
prefix wdt: <https://artbase.rhizome.org/prop/direct/>
prefix : <https://example.org/>
Adding an empty prefix (i.e. prefix : <https://example.org/>) can help with troubleshooting shape label format preferences depending on the validation tool used, which is mentioned in the next section.
Language
There is ample documentation on all of the ways this language can work, and the official W3C Community Group guidelines, linked to shex.io, does not always reflect certain quirks of each implementation option available at the moment. Likewise, it's best to keep it simple. I'll use our first EntitySchema, describing how an artwork is modeled:
1 start=@<artwork>
2
3 <artwork>{
4 wdt:P3 wd:Q5
5 // rdfs:comment "instance of [artwork]"
6 // rdfs:label "instance of artwork";
7
8 wdt:P29 @<artist>
9 // rdfs:comment "artist [item:person] [item:collective]"
10 // rdfs:label "artist is (person/collective)";
11
12 wdt:P26 xsd:dateTime
13 // rdfs:comment "inception [date]"
14 // rdfs:label "inception";
15
16 wdt:P123 @<description> ?
17 // rdfs:comment "description [item:summary description of], [item:Description of]"
18 // rdfs:label "description";
19
20 wdt:P45 @<variant> ?
21 // rdfs:comment "has variant [item: variants] [item: static files]"
22 // rdfs:label "has variant";
23
24 wdt:P129 @<typeOfAccession> ?
25 // rdfs:comment "type of accession [item: type of accession]"
26 // rdfs:label "type of accession";
27
28 wdt:P85 xsd:dateTime
29 // rdfs:comment "date of accession [date]"
30 // rdfs:label "date of accession";
31
32 wdt:P126 [ <https://artbase.rhizome.org/wiki/File:>~ ] *
33 // rdfs:comment "image [link to image]"
34 // rdfs:label "image";
35
36 wdt:P30 xsd:string
37 // rdfs:comment "optional: serial id in the classic Artbase database"
38 // rdfs:label "artbase legacy id";
39
40 wdt:P31 xsd:string
41 // rdfs:comment "optional: identifier used in Rhizome's Collective Access instance"
42 // rdfs:label "collective access legacy id";
43
44 wdt:P52 xsd:string
45 // rdfs:comment "optional: legacy serial id in Collective Access"
46 // rdfs:label "ca id";
47
48 wdt:P49 xsd:string
49 // rdfs:comment "canonical unique string identifier"
50 // rdfs:label "slug";
51
52 wdt:P48 xsd:string
53 // rdfs:comment "optional: keywords from classic Artbase"
54 // rdfs:label "artbase legacy tags";
55 }
56
57 <artist> {
58 wdt:P3 [wd:Q6 wd:Q7] +
59 // rdfs:comment "instance of [item:person] / [item:collective]"
60 // rdfs:label "instance of person or collective" ;
61 }
62
63 <description> {
64 wdt:P3 [wdt:Q9759 wdt:Q4985]
65 // rdfs:comment "instance of description [item:summary description of], [item:Description of]"
66 // rdfs:label "instance of description";
67 }
68
69 <variant> {
70 wdt:P3 [wd:Q1168 wd:Q11992]+
71 // rdfs:comment "instance of variants/static files"
72 // rdfs:label "instance of variant";
73 }
74
75 <typeOfAccession> {
76 wdt:P3 wd:Q11996
77 // rdfs:comment "instance of [item: type of accession]"
78 // rdfs:label "instance of accession";
79 }
'START' & Shape Labels
Line 3 features the beginning shape, also known as a shape label. In the instance above, line 1 defines where the start of validation should occur, from the first shape, which will come in handy when defining our shape map later. Some community members have specified that the start value is not required, but it does allow us to navigate around this issue:
Since our current preferred implementation for validation is rudof, validation will not work unless the shape label <artwork> is formatted as ':artwork'. When a value shape is used in the first shape, this is formatted as either '@<example>' OR '@:example', and the value shape is started with '<example>' OR ':example' below the first closed shape.
Constraints
âA ShEx schema is built on node constraints and triple constraints that define what it means for a given RDF data graph to conform. An RDF triple is the three-part data structure of subject, property, and object with which all RDF data is expressed, and an RDF node is the piece of data found in the subject or object position of a triple. [âŚ] Node constraints and triple constraints are called "constraints" because they define, or âconstrainâ, the set of RDF nodes and data triples that will pass a conformance test.â (Shape Expressions (ShEx) 2.1 Primer, 2019)
In other words, constraints define how information should be expressed, connected, and controlled. They are the most direct translation of a metadata application profile to a machine readable format. Luckily, in ShEx, this is pretty straightforward. A basic expression consists of a triple constraint, which is a property (predicate in ShExJ) + value (datatype in ShExJ) + cardinality. To aid human-readable qualities of this information, Jose Emilio Labra Gayo and Andra Waagmeester have suggested the use of annotations, which I will describe below.
Property
In this instance, a property is always an ArtBase defined property, with prefix "wdt:", followed by a P-value.
Values
The desired value of a property are defined by node constraints:
| value | example | explanation & notes |
|---|---|---|
| ArtBase item | wd:Q4985 | the property should link to a ArtBase item (wd:), which is a summary in this case (Q4985). |
| Datatype | xsd:dateTime | matches a literal with datatype xsd:dateTime.
xsd:string should be rarely used in this instance, as all fields should link if possible. |
| Reference shape | @:derivedFrom | if the item being linked requires various factors, this can be linked to another shape in the schema, as mentioned above.
If your urge is to make this value a property rather than a direct item, then you will need a reference shape. |
| External links | [ <http://webenact.rhizome.org/>~ ] | If a link should direct outside of ArtBase the formatting should consist of [ <link/>~ ]
This will let the root be defined but the extension can change depending on the desired location |
| ArtBase internal link | [ wd:~ ] | Since 'wd' is defined as '<https://artbase.rhizome.org/entity/>' this shortcut should link to any entity.
This is not used often but is helpful in 'made of' instances, where software varies frequently depending on the acquistion. |
| Value set | [wd:Q12215 wd:Q11994] | If either of these values are present in the data being validated, the data will pass
Checking for a specific output should be defined in a shape map if you want to check for only one value versus the other The value must be wd:Q12215 or wd:Q11994 in this case |
| Kind (options) | IRI BNode Literal NonLiteral | The object must have that kind
This is primarily used for IRIs -- where the link is undefined (often artist's sites) |
| Composed with AND OR NOT | xsd:string OR IRI | This is used especially in defining descriptions, where either Rhizome Staff OR an artist (who is either a person or collective) can affiliate with the text connected to the work in question. |
| IMPORT | IMPORT <http://schema.example/schema2> | This would be nested above the first shape, and called in according to the guide linked.
Since existing reference shapes are very basic, this could be a nice way to nest existing shapes that may change over time. Likewise, I can imagine this causing some problems depending on the implementation library used. |
Cardinalities
The following regular expression conventions are used to specify cardinalities other than the default of "exactly one":
- "+" - one or more
- "?" - zero or one
- "{m}" - exactly m
- "{m,n}" - at least m, no more than n
- "{m,}" - m or more repetitions
Annotations
As seen above, structured notes are called "annotations" in this context. By using '//', a validator is able to read this information without interfering with how validation is carried out. For now, let's stick to rdfs:comment(s) and rdfs:label(s):
- rdf:comment(s) are meant to guide user input if an API interprets this data. In all of our ShEx thus far, this was directly copied from the existing data entry guide.
- rdf:label(s) are used to define a form label for API integration down the road.
ShExC in ShExJ
In order to further understand this language, it's nice to analyze how a shape expression is parsed into a ShExJ, which is ShEx in JSON format. Using rudof, you can look at a ShExJ output by writing the command:
rudof shex -s shex/E1-Artwork.shex
This output should look something like this if you're only looking at the start and first constraint:
1 {
2 "@context": "http://www.w3.org/ns/shex.jsonld",
3 "type": "Schema",
4 "start": {
5 "type": "NodeConstraint",
6 "datatype": "@:artwork"
7 },
8 "shapes": [
9 {
10 "type": "ShapeDecl",
11 "id": "artwork",
12 "abstract": false,
13 "shapeExpr": {
14 "type": "Shape",
15 "expression": {
16 "type": "EachOf",
17 "expressions": [
18 {
19 "type": "TripleConstraint",
20 "predicate": "https://artbase.rhizome.org/prop/direct/P3",
21 "valueExpr": {
22 "type": "NodeConstraint",
23 "datatype": "https://artbase.rhizome.org/entity/Q5"
24 },
25 "annotations": [
26 {
27 "type": "Annotation",
28 "predicate": "http://www.w3.org/2000/01/rdf-schema#comment",
29 "object": {
30 "value": "instance of [artwork]"
31 }
32 },
33 {
34 "type": "Annotation",
35 "predicate": "http://www.w3.org/2000/01/rdf-schema#label",
36 "object": {
37 "value": "instance of artwork"
38 }
39 }
40 ]
41 },
Shape Maps
Shape maps can be formatted either as a SPARQL Query, called a Query Map, or as a Fixed Map, which is formatted as a one-to-one relationship. Currently, ArtBase has a sort of tricky public endpoint that does not integrate well with existing ShEx implementations that use query maps. Additionally, the formatting of Query Maps depends on the validator used. Likewise, only Fixed Maps are possible in this instance, which are very simple. Here is a basic format for maps:
| Examples | Kind | Explanations |
|---|---|---|
| wd:Q110@START | Fixed Map | <ttl node>@<START> // <ttl node>@<shape label> |
| rudof validate megattl/artwork/Q1228-monochrome-landscapes.ttl --schema shex/E1-Artwork.shex --node wd:Q15225 --shape-label :artwork | Fixed map formatted for rudof cli | the command line does not like bracketed shape labels, so troubleshooting with ':label' has helped. the results are not yet consistent but promising. |
| SPARQL '''SELECT ?item ?itemLabel WHERE {
?item wdt:P3* wd:Q1168 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } } LIMIT 10'''@START |
Query Map | Query map formatted for Simple Online Validator
SPARQL query to load items where "instance of" is "Variant" This map will load ten items, and validate that they comply to the ShEx specs. This is not yet implemented in rudof, and only works with Wikidata in ShEx2 Simple Online Validator. Query maps are usually nested under the namespace portion as "# comments" in a ShEx document. |
Adapting Rhizome's Data Model for ShEx Validation
This is just a selection of ShEx examples adapted from an internal data entry guide. More are located here.
Artwork
| E1: Artwork | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| property | value | cardinality | rdfs:comment | rdfs:label | reference shape | ||||||||||
| wdt:P3 | wd:Q5 | instance of [artwork] | instance of artwork | ||||||||||||
| wdt:P29 | @:artist | + | artist [item:person] [item:collective] | artist is (person/collective) |
| ||||||||||
| wdt:P26 | xsd:dateTime | inception [date] | inception | ||||||||||||
| wdt:P123 | @:description | ? | description [item:summary description of], [item:Description of] | description |
| ||||||||||
| wdt:P45 | @:variant | ? | has variant [item: variants]s | has variant |
| ||||||||||
| wdt:P129 | @:typeOfAccession | ? | type of accession [item: type of accession] | type of accession |
| ||||||||||
| wdt:P85 | xsd:dateTime | date of accession [date] | date of accession | ||||||||||||
| wdt:P126 | [ <https://artbase.rhizome.org/wiki/File:>~ ] | * | image [link to image] | image | |||||||||||
| wdt:P30 | xsd:string | ? | optional: serial id in the classic Artbase database | artbase legacy id | |||||||||||
| wdt:P31 | xsd:string | ? | optional: identifier used in Rhizome's Collective Access instance | collective access legacy id | |||||||||||
| wdt:P52 | xsd:string | ? | optional: legacy serial id in Collective Access | ca id | |||||||||||
| wdt:P49 | xsd:string | canonical unique string identifier | slug | ||||||||||||
| wdt:P48 | xsd:string | ? | optional: keywords from classic Artbase | artbase legacy tags | |||||||||||
Artist
| E2: Artist | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| property | value | cardinality | rdfs:comment | rdfs:label | reference shape | ||||||||||
| wdt:P3 | [wd:Q6 wd:Q7] | + | instance of [item:person] [item:collective] | instance of person/collective | |||||||||||
| wdt:P135 | xsd:string | administrative property used to generate artists' lists sorted by \"lastname, firstname\ | sort by name | ||||||||||||
| wdt:P60 | @:artwork | artwork [item: artwork main], [item: artwork main] | artwork |
| |||||||||||
| wdt:P17 | IRI | ? | optional: official website [URL: of the website] | official website | |||||||||||
| wdt:P2 | IRI | ? | optional: (URLs only) used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably (for Wikidata items primarily) | exact match | |||||||||||
| wdt:P52 | xsd:string | ? | optional: legacy serial id in Collective Access | ca id | |||||||||||
1 prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
2 prefix xsd: <http://www.w3.org/2001/XMLSchema#>
3 prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
4 prefix wd: <https://artbase.rhizome.org/entity/>
5 prefix wdt: <https://artbase.rhizome.org/prop/direct/>
6 prefix : <https://example.org/>
7
8 #SPARQL '''SELECT ?item ?itemLabel WHERE {
9 # ?item wdt:P3* wd:Q6 .
10 # SERVICE wikibase:label {
11 # bd:serviceParam wikibase:language "en"
12 # }
13 #} LIMIT 10'''@START
14
15 start=@<artist>
16
17 <artist> {
18 wdt:P3 [wd:Q6 wd:Q7]+
19 // rdfs:comment "instance of [item:person] [item:collective]"
20 // rdfs:label "instance of person/collective"
21 ;
22
23 wdt:P135 xsd:string
24 // rdfs:comment "administrative property used to generate artists' lists sorted by 'lastname, firstname'"
25 // rdfs:label "sort by name" ;
26
27 wdt:P60 @<artwork>
28 // rdfs:comment "artwork [item: artwork main], [item: artwork main]"
29 // rdfs:label "artwork" ;
30
31 wdt:P17 IRI ?
32 // rdfs:comment "optional: official website [URL: of the website]"
33 // rdfs:label "official website" ;
34
35 wdt:P2 IRI ?
36 // rdfs:comment "optional: (URLs only) used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably (for Wikidata items primarily)"
37 // rdfs:label "exact match [wikidata item]" ;
38
39 wdt:P52 xsd:string ?
40 // rdfs:comment "optional: legacy serial id in Collective Access"
41 //rdfs:label "ca id" ;
42
43 }
44
45 <artwork> {
46 wdt:P3 wd:Q5
47 // rdfs:comment "instance of artwork"
48 // rdfs:label "instance of artwork" ;
49 }
| |||||||||||||||
Description
| E3: Description of [artwork] | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| property | value | cardinality | rdfs:comment | rdfs:label | reference shape | ||||||||||
| wdt:P3 | wd:Q9759 wd:Q4985 | + | instance of [item:summary] | instance of summary and/or description | |||||||||||
| wdt:P124 | @:artwork | description of [item:Artwork Main] | description of artwork |
| |||||||||||
| wdt:P127 | wd:Q11967 OR @:artist | attributed to Rhizome Staff or artist | attributed to Rhizome Staff or artist |
| |||||||||||
| wdt:P26 | xsd:dateTime | inception [date] | inception | ||||||||||||
| wdt:P102 | @:derivedFrom | ? | optional: derived from [item: Summary description by the artist] | derived from |
| ||||||||||
| wdt:P133 | wd:Q11967 | ? | optional: copyedited by [item:Rhizome staff] | copyedited by | |||||||||||
1 prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
2 prefix xsd: <http://www.w3.org/2001/XMLSchema#>
3 prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
4 prefix wd: <https://artbase.rhizome.org/entity/>
5 prefix wdt: <https://artbase.rhizome.org/prop/direct/>
6 prefix : <https://example.org/>
7
8 #SPARQL '''SELECT ?item ?itemLabel WHERE {
9 # ?item wdt:P3* wd:Q9759 .
10 # SERVICE wikibase:label {
11 # bd:serviceParam wikibase:language "en"
12 # }
13 #} LIMIT 10'''@START
14
15 start=@<description>
16
17 <description>{
18
19 wdt:P3 [wd:Q9759 wd:Q4985]+
20 // rdfs:comment "instance of [item:summary] [item:description]"
21 // rdfs:label "instance of summary and/or description" ;
22
23 wdt:P124 @<descriptionOf>
24 // rdfs:comment "description of [item:Artwork Main]"
25 // rdfs:label "description of artwork" ;
26
27 wdt:P127 wd:Q11967 OR @<artist>
28 // rdfs:comment "attributed to [item:person] [item:collective] [item: Rhizome Staff]"
29 // rdfs:label "attributed to artist or Rhizome Staff" ;
30
31 wdt:P26 xsd:dateTime
32 // rdfs:comment "inception [date]"
33 // rdfs:label "inception" ;
34
35 wdt:P102 @<derivedFrom> ?
36 // rdfs:comment "optional: derived from [item: summary description by the artist]"
37 // rdfs:label "derived from summary by artist/s" ;
38
39 wdt:P133 wd:Q11967 ?
40 // rdfs:comment "copyedited by [item:Rhizome staff]"
41 // rdfs:label "derived from summary description by artist/s" ;
42 }
43
44 <descriptionOf> {
45 wdt:P3 wd:Q5
46 // rdfs:comment "instance of [item:Artwork Main]"
47 // rdfs:label "description of artwork" ;
48 }
49
50 <artist> {
51 wdt:P3 [wd:Q6 wd:Q7]
52 // rdfs:comment "instance of [item:person] [item:collective]"
53 // rdfs:label "instance of person or collective" ;
54 }
55
56 <derivedFrom> {
57 wdt:P3 [wd:Q4985 wd:Q9759]
58 // rdfs:comment "instance of [summary description by the artist] or [description]"
59 // rdfs:label "derived from summary or description by artist/s" ;
60 }
| |||||||||||||||
Artifact
| E4: Artifact - Static Files | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| property | value | cardinality | rdfs:comment | rdfs:label | reference shape | ||||||||||
| wdt:P3 | [wd:Q12215 wd:Q11992] | Instance of [item: artifact]/ [item:static files] | Instance of | ||||||||||||
| wdt:P141 | :download | artifact of [item:[download artifact]] | artifact of |
| |||||||||||
| wdt:P26 | xsd:dateTime | inception [date] | inception | ||||||||||||
| wdt:P129 | :typeOfAccession | type of accession [item: (type of accession)] | type of accession |
| |||||||||||
| wdt:P85 | xsd:dateTime | date of accession [date] | date of accession | ||||||||||||
| wdt:P118 | wd:Q11967
OR @:artist |
associated with [item:Rhizome Staff]/[item: artist name] | associated with |
| |||||||||||
| wdt:P81 | wd: | * | made of [item: (technical item)s] | made of | |||||||||||
| wdt:P117 | wd:Q12139 | generated by [item:file copy] | generated by | ||||||||||||
| wdt:P102 | IRI | * | derived from [item:artwork outside link] | derived from | |||||||||||
| wdt:P46 | http://webenact.rhizome.org/ | * | access URL [http://webenact.rhizome.org/âŚ] | access URL | |||||||||||
1 prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
2 prefix xsd: <http://www.w3.org/2001/XMLSchema#>
3 prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
4 prefix wd: <https://artbase.rhizome.org/entity/>
5 prefix wdt: <https://artbase.rhizome.org/prop/direct/>
6 prefix : <https://example.org/>
7
8 start=@<artifact>
9
10 <artifact> {
11 wdt:P3 [wd:Q12215 wd:Q11992]
12 // rdfs:comment "Instance of [item: artifact]/ [item:static files]"
13 // rdfs:label "Instance of" ;
14 wdt:P141 @<download>
15 // rdfs:comment "artifact of [item:[download artifact]]"
16 // rdfs:label "artifact of" ;
17 wdt:P26 xsd:dateTime
18 // rdfs:comment "inception [date]"
19 // rdfs:label "inception" ;
20 wdt:P129 @<typeOfAccession>
21 // rdfs:comment "type of accession [item: (type of accession)]"
22 // rdfs:label "type of accession" ;
23 wdt:P85 xsd:dateTime
24 // rdfs:comment "date of accession [date]"
25 // rdfs:label "date of accession" ;
26 wdt:P118 wd:Q11967 OR @<artist>
27 // rdfs:comment "associated with [item:Rhizome Staff] OR [item: artist name]"
28 // rdfs:label "associated with" ;
29 wdt:P81 wd:*
30 // rdfs:comment "made of [item: (technical item)s]"
31 // rdfs:label "made of" ;
32 wdt:P117 wd:Q12139
33 // rdfs:comment "generated by [item:file copy]"
34 // rdfs:label "generated by" ;
35 wdt:P102 IRI *
36 // rdfs:comment "derived from [item:artwork outside link]"
37 // rdfs:label "derived from" ;
38 wdt:P46 [ <http://webenact.rhizome.org/>~ ] *
39 // rdfs:comment "access URL [http://webenact.rhizome.org/âŚ] "
40 // rdfs:label "access URL" ;
41 }
42
43 <download> {
44 wdt:P3 wd:Q12081
45 // rdfs:comment "Instance of [item:[download artifact]]"
46 // rdfs:label "Instance of download" ;
47 }
48
49 <typeOfAccession> {
50 wdt:P3 wd:Q11996
51 // rdfs:comment "instance of [item: (type of accession)]"
52 // rdfs:label "instance of accession" ;
53 }
54
55 <artist> {
56 wdt:P3 [wd:Q6 wd:Q7]
57 // rdfs:comment "instance of person or collective"
58 // rdfs:label "instance of person or collective" ;
59 }
| |||||||||||||||
Variant
| E20: Variant - Web Archive | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| property | value | cardinality | rdfs:comment | rdfs:label | reference shape | ||||||||||
| wdt:P3 | wd:Q1168 | Instance of [item: variant] | Instance of | ||||||||||||
| wdt:P56 | :artwork | variant of [item:Artwork Main] | variant of |
| |||||||||||
| wdt:P26 | xsd:dateTime | inception [date] | inception | ||||||||||||
| wdt:P118 | wd:Q11967 OR @:artist | associated with [item:Rhizome Staff] OR [item: artist name] | associated with |
| |||||||||||
| wdt:P117 | wd:Q4983 | generated by [item:web capture] | generated by | ||||||||||||
| wdt:P46 | <http://webenact.rhizome.org/~> | * | access URL [http://webenact.rhizome.org/âŚ] | access URL | |||||||||||
| wdt:P139 | :webCaptureArtifact | artifact [item:[web capture artifact]] | artifact |
| |||||||||||
1 prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
2 prefix xsd: <http://www.w3.org/2001/XMLSchema#>
3 prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
4 prefix wd: <https://artbase.rhizome.org/entity/>
5 prefix wdt: <https://artbase.rhizome.org/prop/direct/>
6 prefix : <https://example.org/>
7
8 start=@<variant>
9
10 <variant> {
11 wdt:P3 wd:Q1168
12 // rdfs:comment "Instance of [item: variant]"
13 // rdfs:label "Instance of" ;
14 wdt:P56 @<artwork>
15 // rdfs:comment "variant of [item:Artwork Main]"
16 // rdfs:label "variant of" ;
17 wdt:P26 xsd:dateTime
18 // rdfs:comment "inception [date]"
19 // rdfs:label "inception" ;
20 wdt:P118 wd:Q11967 OR @<artist>
21 // rdfs:comment "associated with [item:Rhizome Staff] OR [item: artist name]"
22 // rdfs:label "associated with" ;
23 wdt:P117 wd:Q4983
24 // rdfs:comment "generated by [item:web capture]"
25 // rdfs:label "generated by" ;
26 wdt:P46 <http://webenact.rhizome.org/~> *
27 // rdfs:comment "access URL [http://webenact.rhizome.org/âŚ]"
28 // rdfs:label "access URL" ;
29 wdt:P139 @<webCaptureArtifact>
30 // rdfs:comment "artifact [item:[web capture artifact]]"
31 // rdfs:label "artifact" ;
32 }
33
34 <artwork> {
35 wdt:P3 wd:Q5
36 // rdfs:comment "Instance of [item:artwork main]"
37 // rdfs:label "Instance of artwork" ;
38 }
39
40 <artist> {
41 wdt:P3 [wd:Q6 wd:Q7]
42 // rdfs:comment "instance of person or collective"
43 // rdfs:label "instance of person or collective" ;
44 }
45
46 <webCaptureArtifact> {
47 wdt:P3 [wd:Q12215 wd:Q11994]
48 // rdfs:comment "instance of [item:[web capture artifact]]"
49 // rdfs:label "instance of web capture artifact" ;
50 }
| |||||||||||||||
Current Workflow
- Data entry guide â ShExC (human written) â rudof analysis (ShExJ) â visualization in RDF Shape = ShEx
- I didn't mention the visualization option above since it's not crucial, but it can help reify how these relationships work if interpreting ShExJ is vague or confusing.
- Compare ShEx to a good model example in the archive. Does it reflect the desired form? Why or why not?
- Data â megattl (Jupyter Notebook, has issues) = test data
- To my understanding, no ShEx implementation can really test items beneath the surface level RDF information, which is frustrating. Likewise, the megattl.ipynb file can concatenate linked objects into one "megattl" for testing. The only problem we're having is that it assigns random namespaces to concatenated items, and does not really communicate the relationships between concatenated data sets in a sensible way.
- <node being tested/Q# of base entity from which the megattl is derived> : START = fixed shape map
- rudof validate (tinkering, example below)
- rudof validate megattl/artwork/Q1228-monochrome-landscapes.ttl --schema shex/E1-Artwork.shex --node wd:Q1228 --shape-label :artwork
- Further action to be determined (!!)
Implementation & Toolbox
This is the most difficult discussion at the moment. The Simple Online Validator is not integrated with ArtBase currently, although it could be using a configuration. Yet, we have tested a handful of libraries, and below are our preferred options at the moment.
| Name | Type | Description | Evaluation |
|---|---|---|---|
| rudof | Validation (command-line) | 'This repo contains an RDF data shapes library implemented in Rust. The implementation supports ShEx, SHACL, DCTap and conversions between different RDF data modeling formalisms.' | Useable, and preferred for this stage of work. Requires megattl to validate how linked item pages relate to each other. |
| ShEx2âSimple Online Validator | Validation (web-based app) | From github source: 'shex.js javascript implementation of Shape Expressions' | Effective, but requires either public SPARQL endpoint, 'megattl', or configuration. Also, it is a bit buggy on the UX side. |
| MediaWiki Configuration | Configuration | Configures shex.js to embed on item page in Wikibase or Wikidata. | May work depending on the institution, but not used in this instance. |
| vscode-shex-extensions | Extension | ShEx extensions for Visual Studio Code. | Effective and easy to implement to support code in VS Code. The related extension, YASHE, opposes some preferences from rudof, and is thus not helpful in this case (although it seems good for wikidata specifically). |
| RDF Shape | Validator, Shape Extraction, Visualization | 'RDFShape offers an RDF playground which can be used to teach RDF related technologies. Paper describing RDFShape: RDFShape: An RDF playground based on Shapes, Jose Emilio Labra Gayo, Daniel FernĂĄndez Ălvarez, Herminio GarcĂa GonzĂĄlez, Demo presented at International Semantic Web Conference, Monterey, California - 2018' | This does a lot, and is helpful in developing a strategy for implementation, but it has the same issues with validation as ShEx2. |
Conclusion
Although the implementation of ShEx for Rhizome's ArtBase is still in development, it is worthwhile to reflect on the outcome of this project at this stage. An extensive amount of knowledge generated by community efforts supporting the use of Shape Expressions has allowed for a detailed translation from an original data entry guide to a multidimensional, machine-readable, and user-friendly format guiding data curation. Although tools for implementation are still in development, there seems to be a growing needed for a centralized, flexible option for data validation in Wikibase instances, and plenty of community support.
Works Consulted & Resources
[json-ld] - Manu Sporny; Gregg Kellogg; Markus Lanthaler. W3C. 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/json-ld/
[rdf-schema] - Dan Brickley; Ramanathan Guha. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/rdf-schema/
[rdf11-primer] - Guus Schreiber; Yves Raimond. W3C. 24 June 2014. W3C Note. URL: https://www.w3.org/TR/rdf11-primer/
[shex-semantics] - Eric Prud'hommeaux; Iovka Boneva; Jose Labra Gayo; Gregg Kellogg. URL: http://shex.io/shex-semantics/
[turtle] - Eric Prud'hommeaux; Gavin Carothers. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/turtle/
[shape-map] Eric Prud'hommeaux; Thomas Baker. URL: http://shex.io/shape-map/
[shex-vocab] Gregg Kellogg. URL: http://www.w3.org/ns/shex#
[via scholia & elsewhere]
Gayo, J. E. L. (2022, August 4). WShEx: A language to describe and validate Wikibase entities. arXiv.Org. https://arxiv.org/abs/2208.02697v1
Thornton, K., & Seals-Nutt, K. (2022). A Digital Preservation Wikibase. https://doi.org/10.17605/OSF.IO/XKW89
Samuel, J. (2021). ShExStatements: Simplifying Shape Expressions for Wikidata. Companion Proceedings of the Web Conference 2021, 610â615. https://doi.org/10.1145/3442442.3452349
Thornton, K., Solbrig, H., Stupp, G. S., Labra Gayo, J. E., Mietchen, D., Prudâhommeaux, E., & Waagmeester, A. (2019). Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation. In P. Hitzler, M. FernĂĄndez, K. Janowicz, A. Zaveri, A. J. G. Gray, V. Lopez, A. Haller, & K. Hammar (Eds.), The Semantic Web (pp. 606â620). Springer International Publishing. https://doi.org/10.1007/978-3-030-21348-0_39
Waagmeester, A., Thornton, K., Werkmeister, L., & Stupp, G. (Directors). (200 C.E., 00:00). Using Shape Expressions for data quality and consistency in Wikidata [Video recording]. https://media.ccc.de/v/wikidatacon2017-10032-using_shape_expressions_for_data_quality_and_consistency_in_wikidata
Gayo, J. E. L., Prudâhommeaux, E., Boneva, I., & Kontokostas, D. (2018a). Comparing ShEx and SHACL. In J. E. L. Gayo, E. Prudâhommeaux, I. Boneva, & D. Kontokostas (Eds.), Validating RDF Data (pp. 233â266). Springer International Publishing. https://doi.org/10.1007/978-3-031-79478-0_7
Gayo, J. E. L., Prudâhommeaux, E., Boneva, I., & Kontokostas, D. (2018b). Shape Expressions. In J. E. L. Gayo, E. Prudâhommeaux, I. Boneva, & D. Kontokostas (Eds.), Validating RDF Data (pp. 55â117). Springer International Publishing. https://doi.org/10.1007/978-3-031-79478-0_4
Gayo, J. E. L., PrĂŠstamo, Ă. I., FernĂĄndez, D. M., & Arnaud, M.-A. (n.d.). rudof: A Rust Library for handling RDF data models and Shapes | Labraâs home page. International Semantic Web Conference, ISWC24, Posters and Demos. Retrieved September 8, 2024, from https://labra.weso.es/publication/2024_rudof_demo/
[on Rhizome's ArtBase and data model]
Espenschied, D. (2021). Basics of Born-Digital Preservation. Rhizome Almanac. Retrieved September 8, 2024, from https://almanac.rhizome.org/pages/preservation-basics
Fauconnier, S. (2018, September 6). Many faces of Wikibase: Rhizomeâs archive of born-digital art and digital preservation. Wikimedia Foundation. https://wikimediafoundation.org/news/2018/09/06/rhizome-wikibase/
Ma, X., Espenschied, D., & Moulds, L. J. (2022). Access Quality Metrics for Net Art. In the Proceedings of the 18th International Conference on Digital Preservation. https://maximalmargin.com/96d6a4f016ec0087cd81833ad10c712b/iPres22_Short-Paper_Access-Quality-Metrics-for-Net-Art_20220915.pdf
Rossenova, L. (2020). ArtBase ArchiveâContext and History: Discovery phase and user research 2017â2019. Rhizome, 2020a. https://scholar.google.com/scholar?cluster=17112232127183010524&hl=en&oi=scholarr
Rossenova, L. (2017). Presentation and contextualisation in the online archive of internet art. Electronic Visualisation and the Arts (EVA 2017). https://www.scienceopen.com/hosted-document?doi=10.14236/ewic/EVA2017.18
[conversations online]
Wikidata talk:Schemas (linked)
Telegram channel for Wikidata EntitySchemas