Distributed Decentralised Semantic Web

John Domingue

Ruben Verborgh

Distributed Decentralised Semantic Web

Some (societal) issues with data John
Decentralized consensus: blockchains John
Decentralized data on the Web Ruben
Decentralized social networking Ruben
Connecting blockchain and Linked Data John

Distributed Decentralised Semantic Web

Some (societal) issues with data
Decentralized consensus: blockchains
Decentralized data on the Web
Decentralized social networking
Connecting blockchain and Linked Data

Distributed Decentralised Semantic Web

Some (societal) issues with data
Decentralized consensus: blockchains
Decentralized data on the Web
Decentralized social networking
Connecting blockchain and Linked Data

Possible Linked Data interfaces exist
in between the two extremes.

Linked Data Fragments is a uniform view
on Linked Data interfaces.

Every Linked Data interface
offers specific fragments
of a Linked Data set.

Each type of Linked Data Fragment
is defined by three characteristics.

Linked Data Fragment

data: What triples does the fragment contain?
metadata: Do we know more about the data/fragment?
controls: How can we access more data?

Each type of Linked Data Fragment
is defined by three characteristics.

data dump

data: all dataset triples
metadata: number of triples, file size
controls: (none)

Each type of Linked Data Fragment
is defined by three characteristics.

SPARQL query result

data: triples matching the query
metadata: (none)
controls: (none)

Each type of Linked Data Fragment
is defined by three characteristics.

Linked Data document

data: triples about a topic
metadata: creator, maintainer, …
controls: links to other Linked Data documents

We designed a new trade-off mix
with low cost and high availability.

A Triple Pattern Fragments interface
is low-cost and enables clients to query.

A Triple Pattern Fragment is designed
to have a good information/cost balance.

Triple Pattern Fragment

data: matches of a triple pattern (paged)
metadata: total number of matches
controls: access to all other Triple Pattern Fragments
of the same dataset

Triple Pattern Fragments are lightweight,
because they do not require a triple store.

The interface can be realized with many back-ends.
- A SPARQL endpoint could serve as back-end.
Since queries are relatively simple,
a less expensive data infrastructure is sufficient.
The Header–Dictionary–Triples (HDT) format
stores triples in a compressed file.
- Especially triple-pattern lookups (and counts) are fast.

Triple patterns are not the final answer.
No interface ever will be.

There’s no silver bullet.
Publication and querying always involves trade-offs.
Triple Pattern Fragments aim to test how far
we can get with simple servers and smart clients.
To verify this, we need to execute the same queries
on different systems and measure the impact.

Distributed Decentralised Semantic Web

Some (societal) issues with data
Decentralized consensus: blockchains
Decentralized data on the Web
Decentralized social networking
Connecting blockchain and Linked Data

Triple Pattern Fragment servers
enable clients to be intelligent.

controls: The HTML representation explains:
you can query by triple pattern.

Triple Pattern Fragment servers
enable clients to be intelligent.

controls: The RDF representation explains:
you can query by triple pattern.

<http://fragments.dbpedia.org/2016-04/en#dataset> hydra:search [
  hydra:template "http://fragments.dbpedia.org/2016-04/en{?s,p,o}";
  hydra:mapping
    [ hydra:variable "s"; hydra:property rdf:subject ],
    [ hydra:variable "p"; hydra:property rdf:predicate ],
    [ hydra:variable "o"; hydra:property rdf:object ]
].

Triple Pattern Fragment servers
enable clients to be intelligent.

metadata: The HTML representation explains:
this is the number of matches.

Triple Pattern Fragment servers
enable clients to be intelligent.

metadata: The RDF representation explains:
this is the number of matches.

<#fragment> void:triples 7937.

How can a client evaluate
a SPARQL query over a TPF interface?

Give the client a SPARQL query,
and the URL of any TPF of the dataset.
It uses the controls inside of the fragment
to determine how to access the dataset.
It reads the metadata to decide
how to plan the query.

Let’s follow the execution
of an example SPARQL query.

Find artists born in cities named Waterloo.

SELECT ?person ?city WHERE {
    ?person rdf:type dbpedia-owl:Artist.
    ?person dbpedia-owl:birthPlace ?city.
    ?city foaf:name "Waterloo"@en.
}

Fragment: http://fragments.dbpedia.org/2016-04/en

The client looks inside of the fragment
to see how it can access the dataset.

<http://fragments.dbpedia.org/2016-04/en#dataset> hydra:search [
  hydra:template "http://fragments.dbpedia.org/2016-04/en{?s,p,o}";
  hydra:mapping
    [ hydra:variable "s"; hydra:property rdf:subject ],
    [ hydra:variable "p"; hydra:property rdf:predicate ],
    [ hydra:variable "o"; hydra:property rdf:object ]
].

You can query the dataset by triple pattern.

The client splits the query
into the available fragments.

?person rdf:type dbo:Artist.
?person dbo:birthPlace ?city.
?city foaf:name "Waterloo"@en.

It gets the first page of all fragments
and inspects their metadata.

?person rdf:type dbo:Artist. 96.000
- (first 100 triples)
?person dbo:birthPlace ?city. 12.000.000
- (first 100 triples)
?city foaf:name "Waterloo"@en. 26
- (first 100 triples)

It starts with the smallest fragment,
because it is most selective.

?person rdf:type dbo:Artist.
?person dbo:birthPlace ?city.
?city foaf:name "Waterloo"@en. 26
- dbr:Waterloo,_Iowa foaf:name "Waterloo"@en.
- dbr:Waterloo,_London foaf:name "Waterloo"@en.
- dbr:Waterloo,_Ontario foaf:name "Waterloo"@en.
- …

This process continues recursively
until all options have been tested.

?person rdf:type dbo:Artist.
?person dbo:birthPlace dbr:Waterloo,_Iowa.
~~?city foaf:name "Waterloo"@en.~~
- dbr:Waterloo,_Iowa foaf:name "Waterloo"@en.
- dbr:Waterloo,_London foaf:name "Waterloo"@en.
- dbr:Waterloo,_Ontario foaf:name "Waterloo"@en.
- …

It gets the first page of all fragments
and inspects their metadata.

?person rdf:type dbo:Artist. 96.000
- (first 100 triples)
?person dbo:birthPlace dbr:Waterloo,_Iowa. 45
- (first 100 triples)

It starts with the smallest fragment,
because it is most selective.

?person rdf:type dbo:Artist. 96.000
?person dbo:birthPlace dbr:Waterloo,_Iowa. 45
- dbr:Allan_Carpenter dbo:birthPlace dbr:Waterloo,_Iowa.
- dbr:Adam_DeVine dbo:birthPlace dbr:Waterloo,_Iowa.
- dbr:Bonnie_Koloc dbo:birthPlace dbr:Waterloo,_Iowa.
- …

This process continues recursively
until all options have been tested.

dbr:Allan_Carpenter rdf:type dbo:Artist.
~~?person dbo:birthPlace dbr:Waterloo,_Iowa.~~ 26
- dbr:Allan_Carpenter dbo:birthPlace dbr:Waterloo,_Iowa.
- dbr:Adam_DeVine dbo:birthPlace dbr:Waterloo,_Iowa.
- dbr:Bonnie_Koloc dbo:birthPlace dbr:Waterloo,_Iowa.
- …

It gets the first page of the fragment,
which provides mappings for a solution.

dbr:Allan_Carpenter rdf:type dbo:Artist. 1
- dbr:Allan_Carpenter rdf:type dbo:Artist.

We found a solution mapping.

?person: dbr:Allan_Carpenter
?city: dbr:Waterloo,_Iowa

Some paths will result in empty fragments.
They do not lead to a consistent solution.

dbr:Adam_DeVine rdf:type dbo:Artist. 0

No solution mapping.

At least, according to DBpedia.
It turns out that Adam DeVine is actually an actor.

Distributed Decentralised Semantic Web

Some (societal) issues with data
Decentralized consensus: blockchains
Decentralized data on the Web
Decentralized social networking
Connecting blockchain and Linked Data

We evaluated Triple Pattern Fragments
for server cost and availability.

We ran the Berlin SPARQL benchmark
on Amazon EC2 virtual machines.

100 million triples
high query diversity
BGP, UNION, FILTER, …

We evaluated Triple Pattern Fragments
for server cost and availability.

We configured the Amazon machines
to generate large loads in a Web-like setting.

1 server (4 cores)
1 cache
1–244 simultaneous clients (1 core each)

The query throughput is lower,
but resilient to high client numbers.

The server traffic is higher,
but individual requests are lighter.

Caching is significantly more effective,
as clients reuse fragments for queries.

The server requires much less CPU,
allowing higher availability at lower cost.

The server enables clients to be intelligent,
so it can remain simple and lightweight.

These experiments verify the possibility
(and necessity) of new types of solutions.

Processing everything on the server is costly.
Processing everything on the client isn’t Web.
Solutions that divide the workload
can offer new perspectives,
if we accept the trade-offs they bring.
Is it realistic to make all queries on the Web fast?
- Maybe we should focus on obtaining first results soon.

Distributed Decentralised Semantic Web

Some (societal) issues with data
Decentralized consensus: blockchains
Decentralized data on the Web
Decentralized social networking
- Creators become owners
- Apps become views

Decentralization can be realized
at very different scales.

Every piece of data in decentralized apps
can come from a different place.

Solid is an application platform for
decentralization through Linked Data.

Distributed Decentralised Semantic Web

Some (societal) issues with data
Decentralized consensus: blockchains
Decentralized data on the Web
Decentralized social networking
- Creators become owners
- Apps become views

Multiple decentralized Web apps
share access to data stores.

Different app and storage providers
compete independently.

Distributed Decentralised Semantic Web

Some (societal) issues with data
Decentralized consensus: blockchains
Decentralized data on the Web
Decentralized social networking
Connecting blockchain and Linked Data

Distributed Decentralised Semantic Web

Some (societal) issues with data
Decentralized consensus: blockchains
Decentralized data on the Web
Decentralized social networking
Connecting blockchain and Linked Data

Distributed Decentralised Semantic Web

John Domingue

Ruben Verborgh

Distributed Decentralised Semantic Web

Distributed Decentralised Semantic Web

Distributed Decentralised Semantic Web

Distributed Decentralised Semantic Web

Distributed Decentralised Semantic Web

Possible Linked Data interfaces exist in between the two extremes.

Linked Data Fragments is a uniform view on Linked Data interfaces.

Each type of Linked Data Fragment is defined by three characteristics.

Linked Data Fragment

Each type of Linked Data Fragment is defined by three characteristics.

data dump

Each type of Linked Data Fragment is defined by three characteristics.

SPARQL query result

Each type of Linked Data Fragment is defined by three characteristics.

Linked Data document

We designed a new trade-off mix with low cost and high availability.

A Triple Pattern Fragments interface is low-cost and enables clients to query.

A Triple Pattern Fragment is designed to have a good information/cost balance.

Triple Pattern Fragment

Triple Pattern Fragments are lightweight, because they do not require a triple store.

Triple patterns are not the final answer. No interface ever will be.

Distributed Decentralised Semantic Web

Triple Pattern Fragment servers enable clients to be intelligent.

Triple Pattern Fragment servers enable clients to be intelligent.

Triple Pattern Fragment servers enable clients to be intelligent.

Triple Pattern Fragment servers enable clients to be intelligent.

How can a client evaluate a SPARQL query over a TPF interface?

Let’s follow the execution of an example SPARQL query.

The client looks inside of the fragment to see how it can access the dataset.

The client splits the query into the available fragments.

It gets the first page of all fragments and inspects their metadata.

It starts with the smallest fragment, because it is most selective.

This process continues recursively until all options have been tested.

It gets the first page of all fragments and inspects their metadata.

It starts with the smallest fragment, because it is most selective.

This process continues recursively until all options have been tested.

It gets the first page of the fragment, which provides mappings for a solution.

Some paths will result in empty fragments. They do not lead to a consistent solution.

Distributed Decentralised Semantic Web

We evaluated Triple Pattern Fragments for server cost and availability.

We evaluated Triple Pattern Fragments for server cost and availability.

The query throughput is lower, but resilient to high client numbers.

The server traffic is higher, but individual requests are lighter.

Caching is significantly more effective, as clients reuse fragments for queries.

The server requires much less CPU, allowing higher availability at lower cost.

The server enables clients to be intelligent, so it can remain simple and lightweight.

These experiments verify the possibility (and necessity) of new types of solutions.

Distributed Decentralised Semantic Web

Decentralization can be realized at very different scales.

Every piece of data in decentralized apps can come from a different place.

Solid is an application platform for decentralization through Linked Data.

Distributed Decentralised Semantic Web

Multiple decentralized Web apps share access to data stores.

Different app and storage providers compete independently.

Distributed Decentralised Semantic Web

Distributed Decentralised Semantic Web

Distributed Decentralised Semantic Web

Possible Linked Data interfaces exist
in between the two extremes.

Linked Data Fragments is a uniform view
on Linked Data interfaces.

Each type of Linked Data Fragment
is defined by three characteristics.

Each type of Linked Data Fragment
is defined by three characteristics.

Each type of Linked Data Fragment
is defined by three characteristics.

Each type of Linked Data Fragment
is defined by three characteristics.

We designed a new trade-off mix
with low cost and high availability.

A Triple Pattern Fragments interface
is low-cost and enables clients to query.

A Triple Pattern Fragment is designed
to have a good information/cost balance.

Triple Pattern Fragments are lightweight,
because they do not require a triple store.

Triple patterns are not the final answer.
No interface ever will be.

Triple Pattern Fragment servers
enable clients to be intelligent.

Triple Pattern Fragment servers
enable clients to be intelligent.

Triple Pattern Fragment servers
enable clients to be intelligent.

Triple Pattern Fragment servers
enable clients to be intelligent.

How can a client evaluate
a SPARQL query over a TPF interface?

Let’s follow the execution
of an example SPARQL query.

The client looks inside of the fragment
to see how it can access the dataset.

The client splits the query
into the available fragments.

It gets the first page of all fragments
and inspects their metadata.

It starts with the smallest fragment,
because it is most selective.

This process continues recursively
until all options have been tested.

It gets the first page of all fragments
and inspects their metadata.

It starts with the smallest fragment,
because it is most selective.

This process continues recursively
until all options have been tested.

It gets the first page of the fragment,
which provides mappings for a solution.

Some paths will result in empty fragments.
They do not lead to a consistent solution.

We evaluated Triple Pattern Fragments
for server cost and availability.

We evaluated Triple Pattern Fragments
for server cost and availability.

The query throughput is lower,
but resilient to high client numbers.

The server traffic is higher,
but individual requests are lighter.

Caching is significantly more effective,
as clients reuse fragments for queries.

The server requires much less CPU,
allowing higher availability at lower cost.

The server enables clients to be intelligent,
so it can remain simple and lightweight.

These experiments verify the possibility
(and necessity) of new types of solutions.

Decentralization can be realized
at very different scales.

Every piece of data in decentralized apps
can come from a different place.

Solid is an application platform for
decentralization through Linked Data.

Multiple decentralized Web apps
share access to data stores.

Different app and storage providers
compete independently.