JSON, or JavaScript Object Notation has become the most widely used serialization and transport mechanism for information across various web-services. From it's initial conception, the format garnered swift and wide appreciation for being really simple and non-verbose.
Lets say you want to consume the following JSON object via an API:
Now, let's assume that you want to ensure that before consuming this data,
email
and contact.zipcode
must be present in the JSON. If that
data is not present, you shouldn't be using it. The typical way is to
check for presence of those fields but this whack-a-mole quickly gets
tiresome.
Similarly, lets say you are an API provider and you want to let your API users know the basic structure to which data is going to conform to, so that your API users can automatically test validity of data.
If you ever had to deal with above two problems, you should be using JSON schemas.
What's a Schema?
A schema is defined in Wikipedia as a way to define the structure, content, and to some extent, the semantics of XML documents; which probably is the simplest way one could explain it. For every element — or node — in a document, a rule is given to which it needs to conform. Having constraints defined at this level will make it unnecessary to handle the edge cases in the application logic. This is a pretty powerful tool. This was missing from the original JSON specification but efforts were made to design one later on.
Why do we need a Schema?
If you're familiar with HTML, the doctype declaration on the first line is a schema declaration. (Specific to HTML 4 and below.)
HTML 4 Transitional DOCTYPE declaration:
This line declares that the rest of the document conforms to the
directives specified at the url http://www.w3.org/TR/html4/loose.dtd.
That means, if you declare the document as strict, then the usage of any
new elements like <sp></sp>
will cause the page to display nothing. In
other words, if you make a typo or forget to close a tag somewhere, then
the page will not get rendered and your users will end up with a blank
page.
At first glance, this looks like a pain — and it is, actually. That's part of the reason why this was abandoned altogether in the newer version of HTML. However, HTML is not really a good use case for a schema. Having a well-defined schema upfront helps in validating user input at the language/protocol level than at the application's implementation level. Let's see how defining a schema makes it easy to handle user input errors.
JSON Schema
The JSON Schema specification is divided into three parts:
-
JSON Schema Core: The JSON Schema Core specification is where the terminology for a schema is defined. Technically, this is simply the JSON spec with the only addition being definition of a new media type of
application/schema+json
. Oh! a more important contribution of this document is the$schema
keyword which is used to identify the version of the schema and the location of a resource that defines a schema. This is analogous to the DOCTYPE declaration in the HTML 4.01 and other older HTML versions.The versions of the schema separate changes in the keywords and the general structure of a schema document. The resource of a schema is usually a webpage which provides a JSON object that defines a specification. Confused? Go open up the url
http://www.w3.org/TR/html4/loose.dtd
which I'm linking to here in a browser and go through the contents. This is the specification of HTML 4.01 Loose API. Tags like ENTITY, ELEMENT, ATTLIST are used to define the accepted elements, entities and attributes for a valid HTML document.Similarly, the JSON Schema Core resource URL (downloads the schema document) defines a superset of constraints.
-
JSON Schema Validation: The JSON Schema Validation specification is the document that defines the valid ways to define validation constraints. This document also defines a set of keywords that can be used to specify validations for a JSON API. For example, keywords like
multipleOf
,maxLength
,minLength
etc. are defined in this specification. In the examples that follow, we will be using some of these keywords. -
JSON Hyper-Schema: This is another extension of the JSON Schema spec, where-in, the hyperlink and hypermedia-related keywords are defined. For example, consider the case of a globally available avatar (or, Gravatar). Every Gravatar is composed of three different components:
- A Picture ID,
- A Link to the picture,
- Details of the User (name and email ID).
When we query the API provided by Gravatar, we get a reponse typically having this data encoded as JSON. This JSON response will not download the entire image but will have a link to the image. Let's look at a JSON representation of a fake profile I've setup on Gravatar:
In this JSON response, the images are represented by hyperlinks but they are encoded as strings. Although this example is for a JSON object returned from a server, this is how traditional APIs handle input as well. This is due to the fact that JSON natively does not provide a way to handle hyperlinks; they are only Strings.
JSON hyperschema attempts to specify a way to have a more semantic
way of representing hyperlinks and images. It does this by defining
keywords (as JSON properties) such as links
, rel
, href
. Note
that this specification does not try to re-define these words in
general (as they are defined in HTTP protocol already) but it tries to
normalize the way those keywords are used in JSON.
Drafts
The schema is still under development and the progress can be tracked by comparing the versions known as "drafts". Currently, the schema is in the 4th version. The validation keywords can be dropped or added between versions. This article — and many more over the interwebs — refer to the 4th version of the draft.
Usage
Let's build a basic JSON API that accepts the following data with some constraints:
- A post ID. This is a number and is a required parameter.
- Some free-form text with an attribute of
body
. This is a required parameter. - A list of tags with an attribute of 'tags'. Our paranoid API cannot accept more than 6 tags though. This is a required parameter.
- An optional list of hyperlinks with an attribute of 'references'
Let's face it, almost every app you might've ever written must've had some or the other constraints. We end up repeating the same verification logic everytime. Let's see how we can simplify that.
We will be using Sinatra for building the API. This is the basic
structure of our app.rb
:
The Gemfile
:
We will be using the JSON-Schema gem for the app.
Let's look at the schema that we will define in a schema.json
file:
- The
properties
attribute holds the main chunk of the schema definition. This is the attribute under which each of the individual API attribute is explained in the form of a schema of it's own. - The
required
attribute takes in a list of strings that mention which of the API parameters are required. If any of these parameters is missing from the JSON input to our app, an error will be logged and the input won't get validated. - The
type
keyword specifies the schema type for that particular block. So, at the first level, we say it's anobject
(analogous to a Ruby Hash). For thebody
,tags
andreferences
, the types arestring
,array
andarray
respectively. - In case an API parameter can accept an array, the items inside that
array can be explained by a schema definition of their own. This is
done by using an
items
attribute and defining how each of the item in the array should be validated. - The
format
attribute is a built-in format for validation in the JSON Schema specification. This alleviates the pain of adding regex for validating common items likeuri
,ip4
,ip6
,email
,date-time
andhostname
. That's right, no more copy-pasting URI validation regexes from StackOverflow. - The
$schema
attribute is a non-mandatory attribute that specifies the type of the schema being used. For our example, we will be using the draft#4 of the JSON Schema spec.
To use this schema in our app, we will create a helper method that uses
validates the input with the schema we just defined. The json-schema
gem provides three methods for validation — a validate
method that
returns either true
or false
, a validate!
that raises an
exception when validation of an attribute fails and a fully_validate
method that builds up an array of errors similar to what Rails'
ActiveRecord#save
method provides.
We will be using the JSON::Validator.fully_validate
method in our app
and return a nicely formatted JSON response to the user if the
validation fails.
Now, we can use this helper inside routes to check the validity of the input JSON like so:
If the input is valid, the errors object will be empty. Otherwise, it
will hold a list of errors. This object will be returned as a JSON
response with the appropriate HTTP status code. For example, if we run
this app and send in a request with a missing id
parameter, the
response will be something similar to the following:
Let's say if we send in a request with id
having a string parameter.
The errors
object will hold the following:
Last example. Let's try sending a references
parameter with a
malformed URI. We will send the following request:
(This input is in the file not_working_wrong_uri.txt
)
The output of this would be:
Thus, with a really simple validation library and a standard that
library implementers in different languages use, we can achieve input
validation with a really simple setup. One really great advantage of
following a schema standard is that we can be sure about the basic
implementation no matter what the language which might implment the
schema. For example, we can use the same schema.json
description with a
JavaScript library for validating the user input — for
example, in the front-end of the API we've just built.
Summary
The full app, some sample input files are present in this
repo. The json-schema
gem is not yet official
and might have some unfinished components — For example, the format
validations of hostname
and email
for a string
type have not been
implemented yet — and the JSON Schema specification itself is under
constant revisions. But that doesn't mean it's not ready for usage. Few
of our developers use the gem in one of our projects and are pretty
happy with it. Try out the gem and go through the specfication to gain
an idea of why this would be beneficial yourself.
More Reading
- Understanding JSON Schema
- JSON Schema Documentation
- This excellent article by David Walsh
- JSON Schema Example: This example uses more
keywords that weren't discussed in this post. For example,
title
anddescription
.