Een oceaan van data

Ruben Verborgh, Ghent Universityimec

Informatie aan Zee, 14 October 2021

Een oceaan van data

Ruben Verborgh

Ghent University – imec

© Melvin Redeker

ecosystem |ˈēkōˌsistəm|
noun Ecology

Three inventions changed
the world of communication.

  1. writing
    • one-to-one communication
  2. the printing press
    • one-to-many communication
  3. the World Wide Web
    • many-to-many communication

The Web was created as a solution
to heterogeneous information systems.

The Web allows for an ocean of data
to thrive all around the world.

Tables store one kind of data
into rows with identical shapes.

Relational databases store
multiple kinds of data in tables.

Hierarchical models allow for
nested data representations.

Linked Data enables knowledge graphs
that are highly flexible and distributed.

Linked Data lets us capture knowledge
in different and flexible ways.

© David Simonds

Every piece of data created by a person
or about them, is stored in a data pod.

Apps and services appear similarly,
but they blend data from many sources.

A person can grant apps and people access
to very specific parts of their data.

Separating app and storage competition
creates better offerings for all parties.

By abandoning data harvesting,
we restore permissionless innovation.

Solid is not a company or organisation.
Solid is not (just) software.

Crucial challenges in Solid
are solved by Linked Data.

With Linked Data, every piece of data
can link to any other piece of data.

{
  "@context":  "https://www.w3.org/ns/activitystreams",
  "id":        "#ruben-likes-iaz-2021",
  "type":      "Like",
  "actor":     "https://ruben.verborgh.org/profile/#me",
  "object":    "https://www.vvbad.be/Informatie-Aan-Zee-2021#this",
  "published": "2021-10-14T08:00:00Z"
}

Data shapes and their semantics
enable layered compatibility.

{
  "@context":  "https://www.w3.org/ns/activitystreams",
  "id":        "#ruben-likes-apidays2018",
  "type":      "Like",
  "actor":     "https://ruben.verborgh.org/profile/#me",
  "object":    "https://www.vvbad.be/Informatie-Aan-Zee-2021#this",
  "published": "2021-10-14T08:00:00Z"
}

Different source data
can be concatenated.

{
  "@context":  "https://www.w3.org/ns/activitystreams",
  "@graph": [{
    "type":      "Like",
    "actor":     "https://ruben.verborgh.org/profile/#me",
    "object":    "https://www.vvbad.be/Informatie-Aan-Zee-2021#this",
    "published": "2021-10-14T08:00:00Z"
  },{
    "type":      "Like",
    "actor":     "https://example.org/people/erhan#me",
    "object":    "https://www.vvbad.be/Informatie-Aan-Zee-2021#this",
    "published": "2021-10-14T08:05:00Z"
  }]
}

Collection data starts decentralized.
Why do we centralize via aggregation?

The case of a small metadata producer:
my scholarly publications.

Yet others think they know better.

I have been publishing my own metadata
since before most of these existed.

I want to be the source of truth.
I don’t need to be the only source.

What flows back to data producers
as a return from aggregators?

Imagine all sorts of feedback
we are missing out on.

Current networks are centered
around the aggregator.

We need to create network flows
to and from the aggregator.

The individual network nodes
need to become the source of truth.

Aggregators need to become part
of a larger network.

Aggregators serve as a crucial
but transparent layer in the network.

Aggregators’ main responsibility becomes
fostering a network between nodes.

© Melvin Redeker

Een oceaan van data

@RubenVerborgh

ruben.verborgh.org