Distributed Decentralised Semantic Web

John Domingue, Knowledge Media InstituteThe Open University

Ruben Verborgh, Ghent Universityimec

International Semantic Web Research Summer School, 4 July 2018

Distributed Decentralised Semantic Web

John Domingue

Ruben Verborgh

Distributed Decentralised Semantic Web

Distributed Decentralised Semantic Web

Distributed Decentralised Semantic Web

Distributed Decentralised Semantic Web

Possible Linked Data interfaces exist
in between the two extremes.

Linked Data Fragments is a uniform view
on Linked Data interfaces.

Every Linked Data interface
offers specific fragments
of a Linked Data set.

Each type of Linked Data Fragment
is defined by three characteristics.

Linked Data Fragment

data
What triples does the fragment contain?
metadata
Do we know more about the data/fragment?
controls
How can we access more data?

Each type of Linked Data Fragment
is defined by three characteristics.

data dump

data
all dataset triples
metadata
number of triples, file size
controls
(none)

Each type of Linked Data Fragment
is defined by three characteristics.

SPARQL query result

data
triples matching the query
metadata
(none)
controls
(none)

Each type of Linked Data Fragment
is defined by three characteristics.

Linked Data document

data
triples about a topic
metadata
creator, maintainer, …
controls
links to other Linked Data documents

We designed a new trade-off mix
with low cost and high availability.

A Triple Pattern Fragments interface
is low-cost and enables clients to query.

A Triple Pattern Fragment is designed
to have a good information/cost balance.

Triple Pattern Fragment

data
matches of a triple pattern (paged)
metadata
total number of matches
controls
access to all other Triple Pattern Fragments
of the same dataset

Triple Pattern Fragments are lightweight,
because they do not require a triple store.

Triple patterns are not the final answer.
No interface ever will be.

Distributed Decentralised Semantic Web

Triple Pattern Fragment servers
enable clients to be intelligent.

controls
The HTML representation explains:
you can query by triple pattern.

Triple Pattern Fragment servers
enable clients to be intelligent.

controls
The RDF representation explains:
you can query by triple pattern.
<http://fragments.dbpedia.org/2016-04/en#dataset> hydra:search [
  hydra:template "http://fragments.dbpedia.org/2016-04/en{?s,p,o}";
  hydra:mapping
    [ hydra:variable "s"; hydra:property rdf:subject ],
    [ hydra:variable "p"; hydra:property rdf:predicate ],
    [ hydra:variable "o"; hydra:property rdf:object ]
].

Triple Pattern Fragment servers
enable clients to be intelligent.

metadata
The HTML representation explains:
this is the number of matches.

Triple Pattern Fragment servers
enable clients to be intelligent.

metadata
The RDF representation explains:
this is the number of matches.
<#fragment> void:triples 7937.

How can a client evaluate
a SPARQL query over a TPF interface?

Let’s follow the execution
of an example SPARQL query.

Find artists born in cities named Waterloo.

SELECT ?person ?city WHERE {
    ?person rdf:type dbpedia-owl:Artist.
    ?person dbpedia-owl:birthPlace ?city.
    ?city foaf:name "Waterloo"@en.
}

Fragment: http://fragments.dbpedia.org/2016-04/en

The client looks inside of the fragment
to see how it can access the dataset.

<http://fragments.dbpedia.org/2016-04/en#dataset> hydra:search [
  hydra:template "http://fragments.dbpedia.org/2016-04/en{?s,p,o}";
  hydra:mapping
    [ hydra:variable "s"; hydra:property rdf:subject ],
    [ hydra:variable "p"; hydra:property rdf:predicate ],
    [ hydra:variable "o"; hydra:property rdf:object ]
].

You can query the dataset by triple pattern.

The client splits the query
into the available fragments.

  1. ?person rdf:type dbo:Artist.
  2. ?person dbo:birthPlace ?city.
  3. ?city foaf:name "Waterloo"@en.

It gets the first page of all fragments
and inspects their metadata.

  1. ?person rdf:type dbo:Artist. 96.000
    • (first 100 triples)
  2. ?person dbo:birthPlace ?city. 12.000.000
    • (first 100 triples)
  3. ?city foaf:name "Waterloo"@en. 26
    • (first 100 triples)

It starts with the smallest fragment,
because it is most selective.

  1. ?person rdf:type dbo:Artist.
  2. ?person dbo:birthPlace ?city.
  3. ?city foaf:name "Waterloo"@en. 26

This process continues recursively
until all options have been tested.

  1. ?person rdf:type dbo:Artist.
  2. ?person dbo:birthPlace dbr:Waterloo,_Iowa.
  3. ?city foaf:name "Waterloo"@en.
    • dbr:Waterloo,_Iowa foaf:name "Waterloo"@en.
    • dbr:Waterloo,_London foaf:name "Waterloo"@en.
    • dbr:Waterloo,_Ontario foaf:name "Waterloo"@en.

It gets the first page of all fragments
and inspects their metadata.

  1. ?person rdf:type dbo:Artist. 96.000
    • (first 100 triples)
  2. ?person dbo:birthPlace dbr:Waterloo,_Iowa. 45
    • (first 100 triples)

It starts with the smallest fragment,
because it is most selective.

  1. ?person rdf:type dbo:Artist. 96.000
  2. ?person dbo:birthPlace dbr:Waterloo,_Iowa. 45
    • dbr:Allan_Carpenter dbo:birthPlace dbr:Waterloo,_Iowa.
    • dbr:Adam_DeVine dbo:birthPlace dbr:Waterloo,_Iowa.
    • dbr:Bonnie_Koloc dbo:birthPlace dbr:Waterloo,_Iowa.

This process continues recursively
until all options have been tested.

  1. dbr:Allan_Carpenter rdf:type dbo:Artist.
  2. ?person dbo:birthPlace dbr:Waterloo,_Iowa. 26
    • dbr:Allan_Carpenter dbo:birthPlace dbr:Waterloo,_Iowa.
    • dbr:Adam_DeVine dbo:birthPlace dbr:Waterloo,_Iowa.
    • dbr:Bonnie_Koloc dbo:birthPlace dbr:Waterloo,_Iowa.

It gets the first page of the fragment,
which provides mappings for a solution.

  1. dbr:Allan_Carpenter rdf:type dbo:Artist. 1
    • dbr:Allan_Carpenter rdf:type dbo:Artist.

We found a solution mapping.

?person
dbr:Allan_Carpenter
?city
dbr:Waterloo,_Iowa

Some paths will result in empty fragments.
They do not lead to a consistent solution.

  1. dbr:Adam_DeVine rdf:type dbo:Artist. 0

No solution mapping.

At least, according to DBpedia.
It turns out that Adam DeVine is actually an actor.

Distributed Decentralised Semantic Web

We evaluated Triple Pattern Fragments
for server cost and availability.

We ran the Berlin SPARQL benchmark
on Amazon EC2 virtual machines.

We evaluated Triple Pattern Fragments
for server cost and availability.

We configured the Amazon machines
to generate large loads in a Web-like setting.

The query throughput is lower,
but resilient to high client numbers.

The server traffic is higher,
but individual requests are lighter.

Caching is significantly more effective,
as clients reuse fragments for queries.

The server requires much less CPU,
allowing higher availability at lower cost.

The server enables clients to be intelligent,
so it can remain simple and lightweight.

These experiments verify the possibility
(and necessity) of new types of solutions.

Distributed Decentralised Semantic Web

Decentralization can be realized
at very different scales.

Every piece of data in decentralized apps
can come from a different place.

Solid is an application platform for
decentralization through Linked Data.

Distributed Decentralised Semantic Web

Multiple decentralized Web apps
share access to data stores.

Different app and storage providers
compete independently.

Distributed Decentralised Semantic Web

Distributed Decentralised Semantic Web

Distributed Decentralised Semantic Web

John Domingue

Ruben Verborgh