blog
An Overview of JSON Capabilities Within PostgreSQL
What is JSON?
JSON stands for “JavaScript Object Notation” which is a type of data format popularly used by web applications. This means, the data would be transmitted between web applications and servers in such a format. JSON was introduced as an alternative to the XML format. In the “good old days” the data used to get transmitted in XML format which is a heavy weight data type compared to JSON.Below is an example of JSON formatted string:
{ "ID":"001","name": "Ven", "Country": "Australia", "city": "Sydney", "Job Title":"Database Consultant"}
A JSON string can contain another JSON object with-in itself like shown below:
{ "ID":"001", "name": "Ven", "Job Title":"Database Consultant", "Location":{"Suburb":"Dee Why","city": "Sydney","State":"NSW","Country": "Australia"}}
Modern day web and mobile applications mostly generate the data in JSON format, also termed as “JSON Bytes” which are picked up by the application servers and are sent across to the database. The JSON bytes are in-turn processed, broken down into separate column values and inserted into an RDBMS table.
Example:
{ "ID":"001","name": "Ven", "Country": "Australia", "city": "Sydney", "Job Title":"Database Consultant"}
Above JSON data is converted to an SQL like below..
Insert into test (id, name, country,city,job_title) values (001,'Ven','Australia','Sydney','Database Consultant');
When it comes to storing and processing the JSON data, there are various NoSQL databases supporting it and the most popular one is MongoDB. When it comes to RDBMS databases, until recent times, JSON strings were treated as normal text and there were no data types which specifically recognize, store or process JSON format strings. PostgreSQL, the most popular open-source RDBMS database has come up with JSON data-type which turned out to be highly beneficial for performance, functionality and scalability when it comes to handling JSON data.
PostgreSQL + JSON
PostgreSQL database has become more-and-more popular ever since the JSON data-type was introduced. In-fact, PostgreSQL has been outperforming MongoDB when it comes to processing a large amount of JSON data. The applications can store JSON strings in the PostgreSQL database in the standard JSON format. Developers just need to tell the application to send across the JSON strings to the database as a json data-type and retrieve back in the JSON format. Storing of JSON string in JSON data-type has several advantageous compared to storing the same in TEXT data-type. JSON data-type can accept only valid JSON formatted strings, if the string is not in correct JSON format, an error is generated. JSON data-type helps the application perform efficient and Index based searches which, we will see in detail shortly.
The JSON data-type was introduced in PostgreSQL-9.2 post which, significant enhancements were made. The major addition came-up in PostgreSQL-9.4 with the addition of JSONB data-type. JSONB is an advanced version of JSON data-type which stores the JSON data in binary format. This is the major enhancement which made a big difference to the way JSON data was searched and processed in PostgreSQL. Let us have a detailed look at the advantages of JSON data types.
JSON and JSONB Data Types
JSON data-type stores json formatted strings as a text which is not very powerful and does not support many JSON related functions used for searches. It supports only traditional B-TREE indexing and does not support other Index types which are imperative for faster and efficient search operations across JSON data.
JSONB, the advanced version of JSON data type, is highly recommended for storing and processing JSON documents. It supports a wide range of json operators and has numerous advantages over JSON, like storing JSON formatted strings in binary format, and supporting JSON functions and indexing, to perform efficient searches.
Let us look at the differences.
JSON | JSONB | |
---|---|---|
1 | Pretty much like a TEXT data type which stores only valid JSON document. | Stores the JSON documents in Binary format. |
2 | Stores the JSON documents as-is including white spaces. | Trims off white spaces and stores in a format conducive for faster and efficient searches |
3 | Does not support FULL-TEXT-SEARCH Indexing | Supports FULL-TEXT-SEARCH Indexing |
4 | Does not support wide range of JSON functions and operators | Supports all the JSON functions and operators |
Example for #4 Listed Above
JSON
Below is a table with JSON data type
dbt3=# d product
Table "dbt3.product"
Column | Type | Collation | Nullable | Default
----------------+--------+-----------+----------+---------
item_code | bigint | | not null |
productdetails | json | | |
Indexes:
"product_pkey" PRIMARY KEY, btree (item_code)
Does not support traditional JSON operators (like “@>” or “#>”). Full-Text-Search through JSON data is done using “@>” or “#>” in an SQL which is not supported by JSON data type
dbt3=# select * from product where productdetails @> '{"l_shipmode":"AIR"}' and productdetails @> '{"l_quantity":"27"}';
ERROR: operator does not exist: json @> unknown
LINE 1: select * from product where productdetails @> '{"l_shipmode"...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
dbt3=#
JSONB
Below is a table with JSONB data type
dbt3=# d products
Table "dbt3.products"
Column | Type | Collation | Nullable | Default
---------------+--------+-----------+----------+---------
item_code | bigint | | not null |
order_details | jsonb | | |
Indexes:
"products_pkey" PRIMARY KEY, btree (item_code)
Supports FULL-TEXT-SEARCHING through JSON data using operators (like “@>”)
dbt3=# select * from products where order_details @> '{"l_shipmode" : "AIR"}' limit 2;
item_code | order_details
-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
4 | {"l_partkey": 21315, "l_orderkey": 1, "l_quantity": 28, "l_shipdate": "1996-04-21", "l_shipmode": "AIR", "l_commitdate": "1996-03-30", "l_shipinstruct": "NONE", "l_extendedprice": 34616.7}
8 | {"l_partkey": 42970, "l_orderkey": 3, "l_quantity": 45, "l_shipdate": "1994-02-02", "l_shipmode": "AIR", "l_commitdate": "1994-01-04", "l_shipinstruct": "NONE", "l_extendedprice": 86083.6}
(2 rows)
How to Query JSON Data
Let us take a look at some PostgreSQL JSON capabilities related to data operationsBelow is how the JSON data looks in a Table. Column “order_details” is of type JSONB
dbt3=# select * from product_details ;
item_code | order_details
-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | {"l_partkey": 1551894, "l_orderkey": 1, "l_quantity": 17, "l_shipdate": "1996-03-13", "l_shipmode": "TRUCK", "l_commitdate": "1996-02-12", "l_shipinstruct": "DELIVER IN PERSON", "l_extendedprice": 33078.9}
2 | {"l_partkey": 673091, "l_orderkey": 1, "l_quantity": 36, "l_shipdate": "1996-04-12", "l_shipmode": "MAIL", "l_commitdate": "1996-02-28", "l_shipinstruct": "TAKE BACK RETURN", "l_extendedprice": 38306.2}
3 | {"l_partkey": 636998, "l_orderkey": 1, "l_quantity": 8, "l_shipdate": "1996-01-29", "l_shipmode": "REG AIR", "l_commitdate": "1996-03-05", "l_shipinstruct": "TAKE BACK RETURN", "l_extendedprice": 15479.7}
4 | {"l_partkey": 21315, "l_orderkey": 1, "l_quantity": 28, "l_shipdate": "1996-04-21", "l_shipmode": "AIR", "l_commitdate": "1996-03-30", "l_shipinstruct": "NONE", "l_extendedprice": 34616.7}
5 | {"l_partkey": 240267, "l_orderkey": 1, "l_quantity": 24, "l_shipdate": "1996-03-30", "l_shipmode": "FOB", "l_commitdate": "1996-03-14", "l_shipinstruct": "NONE", "l_extendedprice": 28974}
6 | {"l_partkey": 156345, "l_orderkey": 1, "l_quantity": 32, "l_shipdate": "1996-01-30", "l_shipmode": "MAIL", "l_commitdate": "1996-02-07", "l_shipinstruct": "DELIVER IN PERSON", "l_extendedprice": 44842.9}
7 | {"l_partkey": 1061698, "l_orderkey": 2, "l_quantity": 38, "l_shipdate": "1997-01-28", "l_shipmode": "RAIL", "l_commitdate": "1997-01-14", "l_shipinstruct": "TAKE BACK RETURN", "l_extendedprice": 63066.3}
8 | {"l_partkey": 42970, "l_orderkey": 3, "l_quantity": 45, "l_shipdate": "1994-02-02", "l_shipmode": "AIR", "l_commitdate": "1994-01-04", "l_shipinstruct": "NONE", "l_extendedprice": 86083.6}
9 | {"l_partkey": 190355, "l_orderkey": 3, "l_quantity": 49, "l_shipdate": "1993-11-09", "l_shipmode": "RAIL", "l_commitdate": "1993-12-20", "l_shipinstruct": "TAKE BACK RETURN", "l_extendedprice": 70822.1}
10 | {"l_partkey": 1284483, "l_orderkey": 3, "l_quantity": 27, "l_shipdate": "1994-01-16", "l_shipmode": "SHIP", "l_commitdate": "1993-11-22", "l_shipinstruct": "DELIVER IN PERSON", "l_extendedprice": 39620.3}
(10 rows)
Select all the item codes including their shipment dates
dbt3=# select item_code, order_details->'l_shipdate' as shipment_date from product_details ;
item_code | shipment_date
-----------+---------------
1 | "1996-03-13"
2 | "1996-04-12"
3 | "1996-01-29"
4 | "1996-04-21"
5 | "1996-03-30"
6 | "1996-01-30"
7 | "1997-01-28"
8 | "1994-02-02"
9 | "1993-11-09"
10 | "1994-01-16"
(10 rows)
Get the item_code, quantity and price of all the orders arrived by air
dbt3=# select item_code, order_details->'l_quantity' as quantity, order_details->'l_extendedprice' as price, order_details->'l_shipmode' as price from product_details where order_details->>'l_shipmode'='AIR';
item_code | quantity | price | price
-----------+----------+---------+-------
4 | 28 | 34616.7 | "AIR"
8 | 45 | 86083.6 | "AIR"
(2 rows)
The JSON operators “->” and “->>” are used for selections and comparisons in the SQL query. The “->” operator returns the JSON Object field as a field in quotes and the operator “->>” returns the JSON object field as TEXT. The above two SQLs are examples of displaying JSON field values as-is. Below is the example of extracting the JSON field in the TEXT form.
Below is an example of fetching the JSON field in the form of TEXT
dbt3=# select item_code, order_details->>'l_shipdate' as shipment_date from product_details ;
item_code | shipment_date
-----------+---------------
1 | 1996-03-13
2 | 1996-04-12
3 | 1996-01-29
4 | 1996-04-21
5 | 1996-03-30
6 | 1996-01-30
7 | 1997-01-28
8 | 1994-02-02
9 | 1993-11-09
10 | 1994-01-16
(10 rows)
There is another operator called “#>” which is used to query the data part of a JSON Element which is in-turn part of a JSON string. Let us look at an example.
Below is the data in the table.
dbt3=# select * from test_json ;
id | details
-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10000 | {"Job": "Database Consultant", "name": "Venkata", "Location": {"city": "Sydney", "State": "NSW", "Suburb": "Dee Why", "Country": "Australia"}}
20000 | {"Job": "Database Consultant", "name": "Smith", "Location": {"city": "Sydney", "State": "NSW", "Suburb": "Manly", "Country": "Australia"}}
30000 | {"Job": "Developer", "name": "John", "Location": {"city": "Sydney", "State": "NSW", "Suburb": "Brookvale", "Country": "Australia"}}
50000 | {"cars": {"Ford": [{"doors": 4, "model": "Taurus"}, {"doors": 4, "model": "Escort"}], "Nissan": [{"doors": 4, "model": "Sentra"}, {"doors": 4, "model": "Maxima"}, {"doors": 2, "model": "Skyline"}]}}
40000 | {"Job": "Architect", "name": "James", "Location": {"city": "Melbourne", "State": "NSW", "Suburb": "Trugnania", "Country": "Australia"}}
I want to see all the details with “State” “NSW” and “State” is the JSON object key which is part of the key “Location”. Below is how to query the same.
dbt3=# select * from test_json where details #> '{Location,State}'='"NSW"';
id | details
-------+------------------------------------------------------------------------------------------------------------------------------------------------
10000 | {"Job": "Database Consultant", "name": "Venkata", "Location": {"city": "Sydney", "State": "NSW", "Suburb": "Dee Why", "Country": "Australia"}}
20000 | {"Job": "Database Consultant", "name": "Smith", "Location": {"city": "Sydney", "State": "NSW", "Suburb": "Manly", "Country": "Australia"}}
30000 | {"Job": "Developer", "name": "John", "Location": {"city": "Sydney", "State": "NSW", "Suburb": "Brookvale", "Country": "Australia"}}
30000 | {"Job": "Architect", "name": "James", "Location": {"city": "Melbourne", "State": "NSW", "Suburb": "Trugnania", "Country": "Australia"}}
(4 rows)
Arithmetic operations can be performed on JSON data. Type casting is needed as the data part of JSON column is TEXT.
dbt3=# select item_code, order_details->'l_quantity' as quantity, order_details->'l_extendedprice' as price, order_details->'l_shipmode' as price from product_details where (order_details->'l_quantity')::int > 10;
item_code | quantity | price | price
-----------+----------+---------+---------
1 | 17 | 33078.9 | "TRUCK"
2 | 36 | 38306.2 | "MAIL"
4 | 28 | 34616.7 | "AIR"
5 | 24 | 28974 | "FOB"
6 | 32 | 44842.9 | "MAIL"
7 | 38 | 63066.3 | "RAIL"
8 | 45 | 86083.6 | "AIR"
9 | 49 | 70822.1 | "RAIL"
10 | 27 | 39620.3 | "SHIP"
(9 rows)
Apart from all of the above, following operations can also be performed on JSON using SQLs including JOINs
- Sorting the data using ORDER BY clause
- Aggregation using aggregate functions like SUM, AVG, MIN, MAX etc
- Group the data using GROUP BY clause
How About Performance?
The data in JSON columns will be of text in nature and based on the data size performance problems can be expected. Searches through JSON data can taketime and computing power resulting in slow responses to the application(s). It is imperative for DBAs to ensure SQLs hitting the JSON columns are responding fast enough and rendering good performance. Since the data extraction is done via SQL, The option the DBAs would look for is the possibility of Indexing and yes, JSON Data types do support Indexing options.
Let us take a look at the Indexing options JSON brings us.
Indexing JSONB
JSONB data type supports FULL-TEXT-SEARCH Indexing. This is the most important capability of JSONB which DBAs will be looking forward to when using JSONB data types. A normal Index on an JSON Object key may not help when using JSON specific operators in the search queries.Below is an TEXT SEARCH query which goes for a FULL-TABLE-SCAN
dbt3=# explain select * from products where order_details @> '{"l_shipmode" : "AIR"}';
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on products (cost=0.00..4205822.65 rows=59986 width=252)
Filter: (order_details @> '{"l_shipmode": "AIR"}'::jsonb)
(2 rows)
JSONB supports FULL-TEXT-SEARCH Index type called GIN which helps queries like above.
Now, let me create a GIN Index and see if that helps
dbt3=# create index od_gin_idx on products using gin(order_details jsonb_path_ops);
CREATE INDEX
If you can observe below, the query pickups the GIN Index
dbt3=# explain select * from products where order_details @> '{"l_shipmode" : "AIR"}';
QUERY PLAN
-------------------------------------------------------------------------------
Bitmap Heap Scan on products (cost=576.89..215803.18 rows=59986 width=252)
Recheck Cond: (order_details @> '{"l_shipmode": "AIR"}'::jsonb)
-> Bitmap Index Scan on od_gin_idx (cost=0.00..561.90 rows=59986 width=0)
Index Cond: (order_details @> '{"l_shipmode": "AIR"}'::jsonb)
And a B-TREE index instead of GIN would NOT help
dbt3=# create index idx on products((order_details->>'l_shipmode'));
CREATE INDEX
dbt3=# d products
Table "dbt3.products"
Column | Type | Collation | Nullable | Default
---------------+--------+-----------+----------+---------
item_code | bigint | | not null |
order_details | jsonb | | |
Indexes:
"products_pkey" PRIMARY KEY, btree (item_code)
"idx" btree ((order_details ->> 'l_shipmode'::text))
You can see below, the query prefers FULL-TABLE-SCAN
dbt3=# explain select * from products where order_details @> '{"l_shipmode" : "AIR"}';
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on products (cost=0.00..4205822.65 rows=59986 width=252)
Filter: (order_details @> '{"l_shipmode": "AIR"}'::jsonb)
What is GIN Index ?
GIN stands for Generalised Inverted Index. The core capability of GIN Index is to speed up full text searches. When performing searching based on specific Keys or elements in a TEXT or a document, GIN Index is the way to go. GIN Index stores “Key” (or an element or a value) and the “position list” pairs. The position list is the rowID of the key. This means, if the “Key” occurs at multiple places in the document, GIN Index stores the Key only once along with its position of occurrences which not only keeps the GIN Index compact in size and also helps speed-up the searches in a great way. This is the enhancement in Postgres-9.4.
Challenges with GIN Index
Depending on the complexity of the data, maintaining GIN Indexes can be expensive. Creation of GIN Indexes consumes time and resources as the Index has to search through the whole document to find the Keys and their row IDs. It can be even more challenging if the GIN index is bloated. Also, the size of the GIN index can be very big based on the data size and complexity.
Indexing JSON
JSON does not support text searching and Indexes like GIN
dbt3=# create index pd_gin_idx on product using gin(productdetails jsonb_path_ops);
ERROR: operator class "jsonb_path_ops" does not accept data type json
Normal Indexing like B-TREE is supported by both JSON and JSONB
Yes, normal indexes like B-TREE Index is supported by both JSON and JSONB data types and is not conducive for text search operations. Each JSON object key can be Indexed individually which would really help ONLY when the same object key is used in the WHERE clause.
Let me create an B-TREE Index on JSONB and see how it works
dbt3=# create index idx on products((order_details->>'l_shipmode'));
CREATE INDEX
dbt3=# d products
Table "dbt3.products"
Column | Type | Collation | Nullable | Default
---------------+--------+-----------+----------+---------
item_code | bigint | | not null |
order_details | jsonb | | |
Indexes:
"products_pkey" PRIMARY KEY, btree (item_code)
"idx" btree ((order_details ->> 'l_shipmode'::text))
We have already learned above that a B-TREE index is NOT useful for speeding up SQLs doing FULL-TEXT-SEARCHING on the JSON data using operators (like “@>”) , and such Indexes would ONLY help speed-up the queries like the one below, which are typical RDBMS type SQLs (which are not search queries). Each of the JSON Object key can be Indexed individually, which would help queries speed-up when those Indexed JSON Object Keys are used the WHERE clause.
The example below query uses “l_shipmode” Object Key in the WHERE clause and since it is Indexed the query is going for an index scan. If you wish to search using a different Object Key, then, the query would choose to do a FULL-TABLE-SCAN.
dbt3=# explain select * from products where order_details->>'l_shipmode'='AIR';
QUERY PLAN
---------------------------------------------------------------------------------
Index Scan using idx on products (cost=0.56..1158369.34 rows=299930 width=252)
Index Cond: ((order_details ->> 'l_shipmode'::text) = 'AIR'::text)
Same works with JSON data type as well
dbt3=# create index idx on products((order_details->>'l_shipmode'));
CREATE INDEX
dbt3=# d products
Table "dbt3.products"
Column | Type | Collation | Nullable | Default
---------------+--------+-----------+----------+---------
item_code | bigint | | not null |
order_details | json | | |
Indexes:
"products_pkey" PRIMARY KEY, btree (item_code)
"idx" btree ((order_details ->> 'l_shipmode'::text))
If you can observe, the query is using the Index
dbt3=# explain select * from products where order_details->>'l_shipmode'='AIR';
QUERY PLAN
---------------------------------------------------------------------------------
Index Scan using idx on products (cost=0.56..1158369.34 rows=299930 width=252)
Index Cond: ((order_details ->> 'l_shipmode'::text) = 'AIR'::text)
Conclusion
Here are some things to remember when using PostgreSQL JSON Data…
- PostgreSQL is one of the best options to store and process JSON Data
- With all the powerful features, PostgreSQL can be your document database
- I have seen architectures where two or more data stores are chosen, with a mixture of PostgreSQL and NoSQL databases like MongoDB or Couchbase database. A REST API would help applications push the data to different data stores. With PostgreSQL supporting JSON this complexity in architecture can be avoided by just choosing one data store.
- JSON data in PostgreSQL can be queried and Indexed which renders incredible performance and scalability
- JSONB Data type is the most preferred option as it is good at storage and performance. Fully supports FULL-TEXT-SEARCHING and Indexing. Renders good performance
- Use JSON data type only if you want to store JSON strings as JSON and you are not performing much complex text searches
- The biggest advantage of having JSON in PostgreSQL is that the search can be performed using SQLs
- The JSON search performance in PostgreSQL has been on-par with best NoSQL databases like MongoDB