Decentralizing queries at Web scale
    Ruben Verborgh
    Ghent University – imec
   
  
    
      My biggest frustration
      about the Semantic Web community?
    
  
  
    
      Only few contributions
      are about the Web.
    
  
  
    
      (Another frustration:
          few things are
          about semantics.)
    
  
  
    
      The Web is our
      main differentiator from
      related communities.
    
  
  
    
      It’s time to talk
      about Web.
    
  
  
    Decentralizing queries at Web scale
    
   
  
    Decentralizing queries at Web scale
    
   
  
    
      Why are we publishing
      Linked Data again?
    
    
      - 
        Linked Data provides a flexible data model.
        
          - no more field overloading
 
        
       
      - 
        Linked Data facilitates metadata integration.
        
      
 
      - 
        Linked Data connects metadata across the Web.
        
          - no single source of truth
 
        
       
    
   
  
    
      Why are we publishing
      Linked Data again?
    
    
      - 
        We demand metadata ownership.
        
          - Aggregators are allowed access—we remain the authority.
 
        
       
      - 
        We have our own metadata priorities.
        
          - We choose the vocabularies.
 
        
       
      - 
        We cannot maintain all metadata ourselves.
        
          - We link to other authorities.
 
        
       
    
   
  
    
      Why are we publishing
      Linked Data again?
    
    
      - 
        Consumers can browse metadata.
        
          - Show metadata for a specific subject.
 
        
       
      - 
        Consumers can query metadata.
        
          - Show a custom selection of metadata.
 
        
       
      - 
        Aggregators can harvest metadata.
        
          - Why is this still necessary?
 
        
       
    
   
  
    
      Linked Data is decentralized.
      There’s no single source of truth.
    
    
      - 
        Data diversity is highly important
        in an increasingly centralized landscape.
       
      - 
        Linked Data does it the right way.
        
          - No disconnected silos, but a connected knowledge graph.
 
        
       
      - 
        We can browse easily across nodes…
        but how do we query?
       
    
   
  
  
  
    
      Aggregators merge multiple collections
      into a single centralized view.
    
    
      - 
        They facilitate exploration across datasets.
      
 
      - 
        Although currently a technical necessity,
        aggregation comes with drawbacks.
        
          - Are records up-to-date and complete?
 
          - 
            Can/should every dataset be included?
            
              - If so, how to guarantee quality?
 
            
           
          - Where is the benefit for individual publishers?
 
        
       
    
   
  
    
      Are we publishing Linked Data
      only for the happy few?
    
    
      - 
        If aggregation is a necessity for querying,
        then only those with large infrastructures
        can make sense of the Web’s Linked Data.
        
          - They offer intelligence as a service.
 
        
       
      - 
        We own (our part of) the data,
        but not the intelligence around it.
       
    
   
  
    Decentralizing queries at Web scale
    
   
  
  
    
      Heterogeneity exists on multiple levels
      across metadata collections.
    
    
      - 
        Heterogeneity exists on the data level.
        
          - We can choose our own vocabularies.
 
          - How do we ensure they align?
 
        
       
      - 
        Heterogeneity exists on the interface level.
        
          - We can choose how consumers can query our data.
 
          - How can clients consume multiple datasets easily?
 
        
       
    
   
  
    
      Heterogeneity is our best friend
      and our largest enemy.
    
    
      - 
        Anybody on the Web is free
        to publish however they want.
        
          - This works great for people—sometimes.
 
          - It often doesn’t work great for machines.
 
        
       
      - 
        Standardization helps us align.
        
          - delicate balance between flexibility and interoperability
 
        
       
    
   
  
  
    
      Standardization and agreement
      have provided us with foundations.
    
    
      - 
        the Semantic Web family of standards
        
      
 
      - 
        ontologies and vocabularies
        
          - Dublin Core
 
          - DBpedia ontology
 
          - Wikidata ontology
 
          - Schema.org
 
          - …
 
        
       
      - 
        
Web APIs
        
          - Linked Data Platform
 
          - OAI-ORE
 
        
       
    
   
  
    
      The current level of standardization
      still leaves some areas uncovered.
    
    
      - 
        vocabulary usage
        
      
 
      - 
        vocabulary agreement
        
          - the right terms for the right clients
 
        
       
      - 
        Web APIs
        
          - stop reinventing the wheel
 
        
       
    
   
  
    
      Which vocabularies should we use
      to describe our metadata, and how?
    
    
      - 
        We need to develop examples and guidance.
        
          - vocabulary usage
 
          - URL strategy
 
          - …
 
        
       
      - 
        Reasoning can fill vocabulary gaps.
        
      
 
      - 
        We can never cover all vocabularies.
        
      
 
    
   
  
    
      Web APIs are the Achilles’ heel
      of interoperability on the Web.
    
    
      - 
        Shall we all have our SPARQL endpoints?
        
      
 
      - 
        Shall we all support the Linked Data Platform?
        
          - That doesn’t solve querying…
 
        
       
      - 
        Shall we all have our own custom APIs?
        
          - That’s not a sustainable way.
 
        
       
    
   
  
    
      The Europeana API evolved
      from nightmare to dream.
    
    
      - 
        obtain a record on the website as a human
        
      
 
      - 
        obtain a record on the Web API as a machine
        
      
 
      - 
        obtain a record on the website as a machine
        
          - 1 step: just 
GET its URL—hurray for content negotiation! 
        
       
    
   
  
    Decentralizing queries at Web scale
    
   
  
    
      Building better clients starts
      with building better servers.
    
    
      - 
        Web APIs do too much,
        and too much in their own way.
       
      - 
        We don’t need intelligent servers,
        but servers that enable intelligence.
       
      - 
        Client-side intelligence enables use cases
        we cannot foresee yet.
       
    
   
  
    
      Web APIs essentially need
      what Linked Data did for data.
    
    
      - 
        interoperable
        
          - different clients for different purposes
 
        
       
      - 
        flexible
        
          - not one API to rule them all
 
        
       
      - 
        across silos
        
      
 
    
   
  
    
      The road to better Web APIs:
      evolution or revolution?
    
    
      - 
        API descriptions can reduce
        difficulties with heterogeneity.
        
       
      - 
        Unfortunately, they facilitate the status quo.
        
          - dealing with difficult APIs just becomes easier
 
        
       
      - 
        We need to evolve toward
        an ecosystem of Web APIs.
        
          - smartAPI brings the FAIR principles to APIs
 
        
       
    
   
  
    
      The current way of building Web APIs
      is monolithic and top-down.
    
    
      
    
   
  
    
      We need to evolve toward Web APIs
      that are built from the bottom up.
    
    
      
    
   
  
    
      We need a vocabulary for APIs
      to describe their parts.
    
    
      - 
        An API consists of modular blocks.
        
          - These blocks describe themselves with RDF.
 
        
       
      - 
        Clients recognize blocks across APIs.
        
          - Unsupported blocks are just ignored.
 
        
       
      - 
        APIs reuse blocks that are relevant.
        
          - Blocks can be combined at will.
 
        
       
    
    
      This brings an ecosystem
      of Web API features.
    
   
  
    
      Reusing API building blocks
      enables serendipity in clients.
    
    
      - 
        With Linked Data Fragments,
        we search for interesting building blocks.
       
      - 
        Clients can query different APIs simultaneously.
        
      
 
      - 
        Decentralization is crucial for making this work.
        
      
 
    
   
  
    
      It’s not just about others’ data:
      we need to practice what we preach.
    
    
      - 
        We cannot tell others how to do Linked Data
        if we don’t do it ourselves.
        
       
      - 
        I publish my own metadata
        in a queryable way.
        
          - I’m the authority for my metadata!
 
        
       
      - 
        Do the same, and we’ll have very interesting queries.
        
      
 
    
   
  
    Decentralizing queries at Web scale
    
   
  
    
      The secret of decentralization:
      we can each have our own
      if we sufficiently look at each other.
    
  
  
    
      Let’s not all reinvent the wheel.
      Let’s not all try to build a car either.
    
  
  
    
      Think reuse.
      Think clients.
      Think Web-scale.
    
  
  
    Decentralizing queries at Web scale
    @RubenVerborgh
    Ghent University – imec
   
  
    Questions to discuss
    
      - 
        How does the relation we want with aggregators
        differ from the one we have?
       
      - 
        Centralized, decentralized, or hybrid—and how?