UGain
Linked Data on the Web

Ruben Verborgh, Ghent Universityimec

Lecture in the UGain Big Data series, 19 March 2020

🎥 Watch the recording of this lecture

UGain
Linked Data
on the Web

Ruben Verborgh

Ghent University imec IDLab

Creative Commons License Except where otherwise noted, the content of these slides is licensed under a Creative Commons Attribution 4.0 International License.

UGain
Linked Data on The Web

UGain
Linked Data on The Web

Many Linked Data life cycles are proposed.
This simple cycle consists of 5 steps.

Generation is the step in which
we convert non-RDF data to RDF.

Most representations on the Web
are generated by templates.

Templating is not always sufficient
to create 5-star Linked Data.

Linked Data can be generated in batch
through a mapping process.

Mapping can be performed with
ad-hoc scripts for a specific dataset.

R2RML is an RDF vocabulary to describe
a mapping of relational data into RDF.

RML is a generalization of R2RML
toward heterogeneous data sources.

Let us consider an example
of musical performances.

A JSON file contains a list of performances:

{ ... "Performance" :
  { "Perf_ID": "567",
    "Venue": { "Name": "Vooruit",
               "Venue_ID": "78" },
    "Location": { "longitude": "3.725379",
                  "latitude": "51.0477644" } },
    ...
}

The venues could be mapped
using the following RML document.

<#VenuesMapping>
    rml:logicalSource [
        rml:source "https://ex.com/performances.json";
        rml:referenceFormulation ql:JSONPath;
        rml:iterator "$.Performance.[*]"
    ];
    rr:subjectMap [
        rr:template "https://ex.com/venues/{Venue_ID}"
    ].

The venues could be mapped
using the following RML document.

<#VenuesMapping>
    rr:predicateObjectMap [
        rr:predicate geo:long;
        rr:objectMap [ rml:reference "longitude";
                       rr:datatype xsd:float ]
    ], [
        rr:predicate geo:lat;
        rr:objectMap [ rml:reference "latitude";
                       rr:datatype xsd:float ]
    ].

The execution of the mapping
results in an RDF dataset.

...
<https://ex.com/venues/78>
    geo:long 3.725379;
    geo:lat 51.0477644.

<https://ex.com/venues/91>
    geo:long 3.728515;
    geo:lat 51.056008.
...

The mapping can be extended
to include other resources and properties.

After Linked Data has been generated,
we can validate it using semantics.

Validation can be applied
on different data levels.

Databases only allow for
rudimentary constraint validation.

Schemas can validate conformance to
a reusable set of structural constraints.

Ontologies allow for more specific
content-based validation.

In the following example, type checking
identifies an incorrect triple.

The ontology defines the following constraints:

foaf:knows rdfs:domain foaf:Person;
           rdfs:range foaf:Person.
:Mathematics a :Course.
:Course owl:disjointWith foaf:Person.

This triple violates those constraints:

:Albert foaf:knows :Mathematics.

Violations across triples can be identified,
but not always automatically resolved.

The ontology defines the following constraints:

:isBiologicalFatherOf a owl:IrreflexiveProperty;
                        owl:InverseFunctionalProperty.

The triples below are inconsistent:

:Albert  :isBiologicalFatherOf :Albert.
:Albert  :isBiologicalFatherOf :Delphine.
:Jacques :isBiologicalFatherOf :Delphine.

Which ones are correct is not known.

Automated validation tells you
whether data makes sense.

By validating during the mapping process,
we detect quality issues before they occur.

As soon as Linked Data is ready,
it can be published for consumption.

There are roughly 3 ways of
publishing Linked Data on the Web.

A data dump places all dataset triples
in one or more archive files.

A data dump places all dataset triples
in one or more archive files.

A SPARQL endpoint lets clients evaluate
arbitrary (read-only) queries on a server.

A SPARQL endpoint lets clients evaluate
arbitrary (read-only) queries on a server.

Linked Data documents provide
per-topic access to a dataset.

Linked Data documents provide
per-topic access to a dataset.

Once Linked Data is published on the Web,
clients can evaluate queries over it.

Just like on the “human” Web,
querying goes beyond browsing.

The possibilities for query evaluation
depend on how data is made available.

Evaluating queries over a federation
of interfaces introduces new challenges.

Enhancements let client feedback
find its way back to the source.

Data doesn’t stop when published.
It only just begins.

Unfortunately, such feedback loops
are still rare for Linked Data.

Open challenges include:

Provenance allows modeling
the history trail of facts.

Reverse mappings could feed edits
back to the original source.

UGain
Linked Data on The Web

The original Semantic Web vision
features intelligent agents.

Schedule bi-weekly appointments
with a licensed physical therapist,
specialized in a particular field,
living nearby home or my workplace.

adapted from The Semantic Web

Do we still need the Semantic Web
with a smartphone in our pockets?

[an iPhone running Siri]

The current generation of agents
only performs preprogrammed acts.

Before Linked Data, the Semantic Web
suffered from a chicken-and-egg problem.

UGain
Linked Data on The Web

The Web strives to be universal
through independence of many factors.

Your freedom on the Web shouldn’t be influenced by:

The Web brings freedom of expression
to everyone across the world.

The Web brings permissionless innovation
at an unprecedented scale.

The Web has changed tremendously
in a short timespan.

© Tim McDonagh

Many social media platforms
reduce the Web to television.

Our data has become centralized
in a handful of Web platforms.

Within the walled gardens of social media,
you have to move either data or people.

© David Simonds

Ironically, permissionless innovation
even allows platforms that prevent it.

The Facebook founder has no intention of
allowing anyone to build anything on his platform
that does not have his express approval.

Having profited mightily from the Web’s openness,
he has kicked away the ladder that elevated him
to his current eminence.

John Naughton, The Guardian
[photo of a ladder]
© Vinayak Shankar Rao

UGain
Linked Data on The Web

Decentralization in Solid is not needing
centralized platforms to enjoy the Web.

Different platforms tackle decentralization
at very different scales.

You can choose where you store
every single piece of data you produce.

You can grant apps and people access
to very specific parts of your data.

Separating app and storage competition
drives permissionless innovation.

Solid is not a company or organisation.
Solid is not (just) software.

[the Solid logo]

Anyone can build or host
software for Solid.

The Solid server acts as a data pod
that stores and guards your data.

A typical data pod can contain
any data you create or need online.

Solid clients are browser or native apps
that read from or write to your data pod.

Any app you can envision,
you can build with Solid.

UGain
Linked Data on The Web

Decentralized apps have many back-ends. Back-ends work with many apps.

Linked Data in the RDF model
solves crucial challenges for Solid.

Through URLs and RDF, every piece of data
can link to any other piece of data.

PREFIX as: <https://www.w3.org/ns/activitystreams#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
<#ruben-likes-ugain> a as:Like;
  as:actor  <https://ruben.verborgh.org/profile/#me>;
  as:object <https://www.ugain.ugent.be/#bigdata2020>;
  as:published "2020-03-19T20:00:00Z"^^xsd:dateTime.

Shapes (and hopefully soon semantics)
enable layered compatibility.

PREFIX as: <https://www.w3.org/ns/activitystreams#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
<#ruben-likes-ugain> a as:Like;
  as:actor  <https://ruben.verborgh.org/profile/#me>;
  as:object <https://www.ugain.ugent.be/#bigdata2020>;
  as:published "2020-03-19T20:00:00Z"^^xsd:dateTime.

Different source data can be concatenated
(but let’s track provenance).

PREFIX as: <https://www.w3.org/ns/activitystreams#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
<#ruben-likes-ugain> a as:Like;
  as:actor  <https://ruben.verborgh.org/profile/#me>;
  as:object <https://www.ugain.ugent.be/#bigdata2020>;
  as:published "2020-03-19T20:00:00Z"^^xsd:dateTime.
<#peter-likes-ugain> a as:Like;
  as:actor  <http://www.peterlambert.be/#me>;
  as:object <https://www.ugain.ugent.be/#bigdata2020>;
  as:published "2020-03-19T20:05:00Z"^^xsd:dateTime.

The traditional way of building apps
does not work well with decentralization.

Building apps over decentralized data
requires different app techniques.

UGain
Linked Data on The Web