Distributed Decentralised Semantic Web
John Domingue
Ruben Verborgh
Distributed Decentralised Semantic Web
Distributed Decentralised Semantic Web
Distributed Decentralised Semantic Web
Distributed Decentralised Semantic Web
Possible Linked Data interfaces exist
in between the two extremes.
Linked Data Fragments is a uniform view
on Linked Data interfaces.
Every Linked Data interface
offers specific fragments
of a Linked Data set.
Each type of Linked Data Fragment
is defined by three characteristics.
Linked Data Fragment
- data
- What triples does the fragment contain?
- metadata
- Do we know more about the data/fragment?
- controls
- How can we access more data?
Each type of Linked Data Fragment
is defined by three characteristics.
data dump
- data
- all dataset triples
- metadata
- number of triples, file size
- controls
- (none)
Each type of Linked Data Fragment
is defined by three characteristics.
SPARQL query result
- data
- triples matching the query
- metadata
- (none)
- controls
- (none)
Each type of Linked Data Fragment
is defined by three characteristics.
Linked Data document
- data
- triples about a topic
- metadata
- creator, maintainer, …
- controls
- links to other Linked Data documents
We designed a new trade-off mix
with low cost and high availability.
A Triple Pattern Fragments interface
is low-cost and enables clients to query.
A Triple Pattern Fragment is designed
to have a good information/cost balance.
Triple Pattern Fragment
- data
- matches of a triple pattern (paged)
- metadata
- total number of matches
- controls
- access to all other Triple Pattern Fragments
of the same dataset
Triple Pattern Fragments are lightweight,
because they do not require a triple store.
-
The interface can be realized with many back-ends.
- A SPARQL endpoint could serve as back-end.
-
Since queries are relatively simple,
a less expensive data infrastructure is sufficient.
-
The Header–Dictionary–Triples (HDT) format
stores triples in a compressed file.
- Especially triple-pattern lookups (and counts) are fast.
Triple patterns are not the final answer.
No interface ever will be.
-
There’s no silver bullet.
Publication and querying always involves trade-offs.
-
Triple Pattern Fragments aim to test how far
we can get with simple servers and smart clients.
-
To verify this, we need to execute the same queries
on different systems and measure the impact.
Distributed Decentralised Semantic Web
How can a client evaluate
a SPARQL query over a TPF interface?
-
Give the client a SPARQL query,
and the URL of any TPF of the dataset.
-
It uses the controls inside of the fragment
to determine how to access the dataset.
-
It reads the metadata to decide
how to plan the query.
Let’s follow the execution
of an example SPARQL query.
Find artists born in cities named Waterloo.
SELECT ?person ?city WHERE {
?person rdf:type dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "Waterloo"@en.
}
Fragment: http://fragments.dbpedia.org/2016-04/en
The client looks inside of the fragment
to see how it can access the dataset.
<http://fragments.dbpedia.org/2016-04/en#dataset> hydra:search [
hydra:template "http://fragments.dbpedia.org/2016-04/en{?s,p,o}";
hydra:mapping
[ hydra:variable "s"; hydra:property rdf:subject ],
[ hydra:variable "p"; hydra:property rdf:predicate ],
[ hydra:variable "o"; hydra:property rdf:object ]
].
You can query the dataset by triple pattern.
The client splits the query
into the available fragments.
-
?person rdf:type dbo:Artist.
-
?person dbo:birthPlace ?city.
-
?city foaf:name "Waterloo"@en.
It gets the first page of all fragments
and inspects their metadata.
-
?person rdf:type dbo:Artist.
96.000
-
?person dbo:birthPlace ?city.
12.000.000
-
?city foaf:name "Waterloo"@en.
26
It starts with the smallest fragment,
because it is most selective.
-
?person rdf:type dbo:Artist.
-
?person dbo:birthPlace ?city.
-
?city foaf:name "Waterloo"@en.
26
dbr:Waterloo,_Iowa foaf:name "Waterloo"@en.
dbr:Waterloo,_London foaf:name "Waterloo"@en.
dbr:Waterloo,_Ontario foaf:name "Waterloo"@en.
- …
This process continues recursively
until all options have been tested.
-
?person rdf:type dbo:Artist.
-
?person dbo:birthPlace dbr:Waterloo,_Iowa.
-
?city foaf:name "Waterloo"@en.
dbr:Waterloo,_Iowa foaf:name "Waterloo"@en.
dbr:Waterloo,_London foaf:name "Waterloo"@en.
dbr:Waterloo,_Ontario foaf:name "Waterloo"@en.
- …
It gets the first page of all fragments
and inspects their metadata.
-
?person rdf:type dbo:Artist.
96.000
-
?person dbo:birthPlace dbr:Waterloo,_Iowa.
45
It starts with the smallest fragment,
because it is most selective.
-
?person rdf:type dbo:Artist.
96.000
-
?person dbo:birthPlace dbr:Waterloo,_Iowa.
45
dbr:Allan_Carpenter dbo:birthPlace dbr:Waterloo,_Iowa.
dbr:Adam_DeVine dbo:birthPlace dbr:Waterloo,_Iowa.
dbr:Bonnie_Koloc dbo:birthPlace dbr:Waterloo,_Iowa.
- …
This process continues recursively
until all options have been tested.
-
dbr:Allan_Carpenter rdf:type dbo:Artist.
-
?person dbo:birthPlace dbr:Waterloo,_Iowa.
26
dbr:Allan_Carpenter dbo:birthPlace dbr:Waterloo,_Iowa.
dbr:Adam_DeVine dbo:birthPlace dbr:Waterloo,_Iowa.
dbr:Bonnie_Koloc dbo:birthPlace dbr:Waterloo,_Iowa.
- …
It gets the first page of the fragment,
which provides mappings for a solution.
-
dbr:Allan_Carpenter rdf:type dbo:Artist.
1
dbr:Allan_Carpenter rdf:type dbo:Artist.
We found a solution mapping.
- ?person
- dbr:Allan_Carpenter
- ?city
- dbr:Waterloo,_Iowa
Some paths will result in empty fragments.
They do not lead to a consistent solution.
-
dbr:Adam_DeVine rdf:type dbo:Artist.
0
No solution mapping.
At least, according to DBpedia.
It turns out that Adam DeVine is actually an actor.
Distributed Decentralised Semantic Web
We evaluated Triple Pattern Fragments
for server cost and availability.
We configured the Amazon machines
to generate large loads in a Web-like setting.
- 1 server (4 cores)
- 1 cache
- 1–244 simultaneous clients (1 core each)
The query throughput is lower,
but resilient to high client numbers.
The server traffic is higher,
but individual requests are lighter.
Caching is significantly more effective,
as clients reuse fragments for queries.
The server requires much less CPU,
allowing higher availability at lower cost.
The server enables clients to be intelligent,
so it can remain simple and lightweight.
These experiments verify the possibility
(and necessity) of new types of solutions.
-
Processing everything on the server is costly.
Processing everything on the client isn’t Web.
-
Solutions that divide the workload
can offer new perspectives,
if we accept the trade-offs they bring.
-
Is it realistic to make all queries on the Web fast?
- Maybe we should focus on obtaining first results soon.
Distributed Decentralised Semantic Web
Decentralization can be realized
at very different scales.
Every piece of data in decentralized apps
can come from a different place.
Solid is an application platform for
decentralization through Linked Data.
Distributed Decentralised Semantic Web
Multiple decentralized Web apps
share access to data stores.
Different app and storage providers
compete independently.
Distributed Decentralised Semantic Web
Distributed Decentralised Semantic Web
Distributed Decentralised Semantic Web
John Domingue
Ruben Verborgh