Piecing the puzzle

Self-publishing queryable research data on the Web

Ruben Verborgh, Ghent Universityimec

10th Workshop on Linked Data on the Web (LDOW2017), 3 April 2017

Read the accompanying research article.

Piecing the puzzle

Self-publishing queryable research data on the Web

Ruben Verborgh

Ghent University – imec

Who thinks publishing
Linked Data on the Web
is a good idea?

Who publishes their
own Linked Data?

Who enables people to
query their Linked Data?

My open-source pipeline
improves the queryability of
Linked Data on websites.

This pipeline unlocks
the value of your data
for client-side applications.

Piecing the puzzle

Piecing the puzzle

My personal website contains metadata
about my research and publications.

This includes metadata for:

My data is published following
the Linked Data principles.

My data is modeled using several
ontologies and vocabularies.

I publish my own Linked Data because
we need to practice what we preach.

I publish my own Linked Data because
others already publish itwrongly.

I struggle to keep up with the incompleteness, inaccuracies, duplicates, and wrong entries of:

But who am I generating this data for?

Piecing the puzzle

The value of my Linked Data
needs to be unlocked.

I want to:

Traversal-based Linked Data querying
cannot answer all questions adequately.

Solving querying fully at the server side
is too expensive for personal data.

I designed a simple ETL pipeline
to enrich and publish my website’s data.

This process runs every night:

Reasoning on the data and its ontologies
makes hidden semantics explicit.

Reasoning expresses the same data
in different ways for different clients.

 time (s)# triples
extraction 170 17,000
skolemization ontologies 1 44,000
closure ontologies 39145,000
closure ontologies & data 62183,000
subtraction 1 39,000
removal 1 36,000
total 273 36,000

Reasoning fills ontological gaps
before querying happens.

 # pre# post
dc:title 657714
rdfs:label 473714
foaf:name 394714
schema:name 439714
schema:isPartOf 263263
schema:hasPart 0263
cito:cites 0 33
cito:citesAsAuthority 14 14

The resulting data is published
in a Triple Pattern Fragments interface.

TPF query clients find all results
and find them faster.

# resultstime (s)
 LDTPFLDTPF
people I know 0196 5.62.1
publications I wrote 020510.84.0
my publications 13420512.64.1
works I cite 0 33 4.00.5
my interests (federated) 0 4 4.00.4

Piecing the puzzle

Open questions about
creating Linked Data:

Open questions about
modeling Linked Data:

Open questions about
Linked Data identifiers:

Piecing the puzzle

With minimal tooling,
querying my Linked Data
became better, faster,
and more flexible
even across datasets.

Your website’s Linked Data
can become queryable too.
Just use the pipeline.

So no more excuses ;-)
Self-publish
your Linked Data.

Piecing the puzzle

Self-publishing queryable research data on the Web

@RubenVerborgh, Ghent University – imec

Browse my Linked Data at data.verborgh.org.
Query my Linked Data at query.verborgh.org.