Your JSON is not my JSON
A case for more fine-grained content negotiation
Ruben Verborgh
Ghent University – imec
What is the location of SDSVoc?
What is the location of SDSVoc
by public transport?
- Take Sprinter 4627 at 8:40
- Get off at Amsterdam Science Park at 8:48
- Cross the street (Carolina MacGillavrylaan) at the crosswalk
- Walk past the building of Amsterdam University College
- You will be able to see CWI’s main entrance on your left
What is the location of SDSVoc
by car?
- Exit the A10 ringroad at S113/Watergraafsmeer
- Follow the Science Park signs directing you to the Kruislaan
- Turn left onto the Carolina MacGillavrylaan after passing through the railroad tunnel
- Take the Science Park entry at your right and enter the gate
Independently of my preferences,
SDSVoc is located at the same address.
Science Park 123
1098 XG Amsterdam
Netherlands
Why not always use
the same address?
Exactly.
Content negotiation is not an academic exercise,
but rather a necessity for sustainable publication.
How can a single address
give access to data
my client understands?
This involves resources such as:
- people
- places
- institutions
- laws & decrees
The challenge is to expose this dataset
in a sustainable way.
- backward compatibility
- Applications using today's entry points
should work in the future.
- forward compatibility
- Entry points of the future
should not break today's applications.
Domain resources are identified
by specific URLs.
This URL identifies a local council decision:
https://council.my/decisions/3147/
Each representation of a resource
can also be identified with a specific URL.
https://council.my/decisions/3147.html
https://council.my/decisions/3147.json
https://api.council.my/decisions.php?id=3147&format=xml
- …
Minting only representations-specific URLs
leads to several problems.
- sustainability
- How long will the XML URL exist?
- extensibility
- What URLs will be minted for new formats?
- interoperability
- Do clients using different URLs talk about the same thing?
Each resource should always have
a representation-independent URL.
- mandatory: URL for the decision resource
https://council.my/decisions/3147/
- optional: URLs for representations of the resource
https://council.my/decisions/3147.json
https://council.my/decisions/3147.ttl
https://council.my/decisions/en/3147.xml
- …
Using HTTP content negotiation,
clients send representation preferences.
request
GET /decisions/3147.json HTTP/1.1
Host: council.my
Accept: application/ld+json;q=1.0,text/turtle;q=0.8
Accept-Language: en
…
Using HTTP content negotiation,
servers send a matching representation.
response
HTTP/1.1 200 OK
Content-Type: application/ld+json
Content-Language: en
…
Content negotiation allows for
sustainable data publication.
- sustainability
- The identifiers remain constant.
- extensibility
- New MIME types can be added.
- interoperability
- Clients consuming different representations
use the same URL for the same resource.
MIME types are an underspecification
of the interpretation clients need.
MIME types underspecify on:
A MIME type does not indicate
the most specific syntax.
3147.json
is all of these:
application/octet-stream
application/json
application/ld+json
A MIME type does not indicate
the structure within the syntax.
- The JSON MIME type allows many structures.
- Yet API clients typically assume a very specific structure.
- RDF formats are immune to this.
- All possible serializations represent the same graph.
- JSON-LD is especially tricky in this regard.
- When interpreted as RDF triples, structure is unimportant.
- When interpreted as a JSON tree, framing is crucial.
A MIME type does not indicate
how a representation is modeled.
- Non-semantic formats: identical structures
can still have different meanings.
- JSON doesn't specify an interpretation.
- Semantic formats: the same meaning
can be expressed with different vocabularies.
- Matching them requires reasoning and/or ontologies.
Should we simply introduce
more specific MIME types then?
A council decision could have MIME type
application/vnd.council.my.decision+json
- unclear what parser to use
- highly application-specific
- finite extensibility
Especially with RDF syntaxes,
specific MIME types become problematic.
text/vnd.council.my.decision+turtle
application/vnd.council.my.decision+ld+json
application/vnd.foaf.person+ld+json
application/vnd.council.my.decision.foaf.person+ld+json
Since RDF is self-descriptive,
can representations use multiple models?
- responses become large
- modeling choices remain uncertain
- If no
foaf:Person
instances exist,
are there no people, or was another vocabulary used?
- some models might be incompatible
- Schema.org is great for discovery,
but has (purposely) sloppy semantics.
HTTP Content negotiation is extensible
in multiple dimensions.
- HTTP spec defines several dimensions.
Content-Type
Content-Language
- …
- Other dimensions have been proposed.
- The Memento protocol negotiates over time.
- client:
Accept-Datetime
- server:
Memento-Datetime
Since MIME types are limited,
we should negotiate in more dimensions.
A profile
can be defined as:
…additional semantics
that can be used to process a resource representation,
such as constraints, conventions, extensions,
or any other aspects that do not alter the basic media type semantics…
RFC 6906
Servers indicate what profiles
a representation uses.
-
as
profile
parameter of Content-Type
header
- MIME type indicates which parser to use
- profile indicates structure and model
- or using the
Link
header
with rel=profile
- multiple profiles can be combined
- “has this JSON-LD context”
- “uses FOAF for people”
- “uses Wikidata for places”
Clients indicate what profiles
they support or prefer.
-
by adding
profile
parameter(s) to Accept
- might lead to combinatorial explosion
-
through a new
Accept-Profile
header
- would need to be standardized
-
profiles can be cumulative
- other negotiation dimensions typically mutually exclusive
Activity Streams
are an example
of a MIME-type/profile combination.
- MIME-type:
application/activity+json
- Activity Stream clients
- recognized as Activity Stream
- Other clients
- not recognized
- MIME-type:
application/ld+json; profile="https://www.w3.org/ns/activitystreams"
- Activity Stream clients
- recognized as Activity Stream
- Other clients
- recognized as JSON-LD
I'm not suggesting that
we all must negotiate
over multiple dimensions.
I am suggesting that,
if you decide to publish multiple dimensions,
you should provide
content negotiation.
HTTP content negotiation
is the extensible path
towards sustainable content interoperability.
Your JSON is not my JSON
A case for more fine-grained content negotiation
@RubenVerborgh
Ghent University – imec