blog
Understanding Check Constraints in PostgreSQL
Managing data is a big challenge. As our world turns, data continues to be widespread, abundant, and intensive. Therefore, we must take measures to handle the influx.
Validating every single piece of data ‘by hand‘ around the clock is simply impractical. What a fantastic dream. But, after all, it is just that. A dream. Bad data is bad data. No matter how you slice it or dice it (pun intended). It is a problem from the onset, leading to even more issues.
Modern databases handle much of the heavy lifting for us. Many provide built-in solutions to assist in managing this particular area of data.
A sure way to control the data entered into a table’s column is with a data type. Need a column with decimal numbers, having a total digit count of 4, with 2 of those after the decimal?
Sure thing! No problem at all.
NUMERIC(4,2), a viable option, is guarding that column like a watchdog. Can character text values slip in there? Not a snowball’s chance.
PostgreSQL offers a multitude of data types. Chances are, one already exists to satisfy your need(s). If not, you can create your own. (See: PostgreSQL CREATE TYPE)
Yet, data types alone are not enough. You cannot assure the most specific requirements are covered and conform to such broad structuring. Compliance rules and some sort of ‘standard’ are typically required when designing a schema.
Suppose in that same NUMERIC(4,2) column, you only want values greater than 25.25 but less than 74.33? In the event, value 88.22 is stored, the data type is not at fault. By allowing 4 total digits, with 2 at most after the decimal, it is doing its job. Lay the blame elsewhere.
How do we win on this front when it comes to controlling the data allowed in our database? Data consistency is of utmost priority and is integral for any sound data solution. On the (off) chance you controlled the collected data from the onset of its origination source, consistency would likely be less of an issue.
But, a perfect world only exists (maybe) in one of those many fantasy novels I love to read.
Unfortunately incomplete, inconsistent, and ‘dirty’ data are all too common characteristics and realities present in a database-centric field.
However, not all is lost in doom and gloom for we have Check constraints to mitigate these issues. For those specific rules, we must put in place, out of necessity, that ensures we handle and store only consistent data. By mandating those specifications in the database, we can minimize the impact inconsistent data has on our business goals and solutions carrying forward.
What is a Constraint? – A High-Level definition
In this context, a constraint is a type of rule or restriction placed on a database table column. This specificity requires that the data coming in must comply to the set requirement(s) prior to being stored. Said requirement(s) tend to be ‘professionally’ coined (and often are) as business rules. This boils down to a validation boolean test for truth. If the data passes (true), it’s stored. If not, no entry (false).
Constraints Available in PostgreSQL
At the time of writing, the PostgreSQL documentation list 6 categories of constraints.
They are:
- Check Constraints
- Not-Null Constraints
- Unique Constraints
- Primary Keys
- Foreign Keys
- Exclusion Constraints
Check Constraints
A simple example for an INTEGER column would be to disallow values greater than say, 100.
learning=> CREATE TABLE no_go(id INTEGER CHECK (id < 100));
CREATE TABLE
learning=> INSERT INTO no_go(id) VALUES(101);
ERROR: new row for relation "no_go" violates check constraint "no_go_id_check"
DETAIL: Failing row contains (101).
As seen above, attempts to INSERT any values that violate the Check constraint fails.
Check constraints not only monitor columns during INSERT, even UPDATE statements (and others e.g., copy and COPY) must adhere to the restrictions as well.
Suppose the no_go table has this value:
learning=> TABLE no_go;
id
----
55
(1 row)
An UPDATE on the id column value to one that does not conform to the Check constraint also fails:
learning=> UPDATE no_go SET id = 155
learning-> WHERE id = 55;
ERROR: new row for relation "no_go" violates check constraint "no_go_id_check"
DETAIL: Failing row contains (155).
Check constraints must ‘make sense’ for the target column data type. It is invalid to attempt and constrain an INTEGER column to prohibit storing text values since the data type itself will not allow it.
See this example where I try to impose that type of Check constraint during table creation:
learning=> CREATE TABLE num_try(id INTEGER CHECK(id IN ('Bubble', 'YoYo', 'Jack-In-The-Box')));
ERROR: invalid input syntax for integer: "Bubble"
Life Without Check Constraints
An old saying I’ve heard that resonates with me is: “You don’t miss the water until the well runs dry.“
Without Check constraints, we can surely relate for their remarkable benefit is most appreciated when you have to make do without them.
Take this example…
To start we have this table and data that represents trail surface materials:
learning=> SELECT * FROM surface_material;
surface_id | material
------------+--------------
101 | Gravel
202 | Grass
303 | Dirt
404 | Turf
505 | Concrete
606 | Asphalt
707 | Clay
808 | Polyurethane
(8 rows)
And this table with trail names and its own surface_id:
learning=> SELECT * FROM trails;
id | name | surface_id
----+-----------------+------------
1 | Dusty Storm | 303
2 | Runners Trip | 808
3 | Pea Gravel Pass | 101
4 | Back 40 Loop | 404
(4 rows)
We want to assure that table trails only contains surface_id‘s for corresponding values over in table surface_material.
Yes yes, I know. You are screaming at me.
“Can’t this be taken care of with a FOREIGN KEY?!?”
Yes, it can. But I am using it to demonstrate a generic use, along with a pitfall to know (mentioned later in the post).
Without Check constraints, you can resort to a TRIGGER and prevent inconsistent values from being stored.
Here is a crude (but working) example:
CREATE OR REPLACE FUNCTION check_me()
RETURNS TRIGGER AS
$$
BEGIN
IF NEW.surface_id NOT IN (SELECT surface_id FROM surface_material)
THEN Raise Exception '% is not allowed for surface id', NEW.surface_id;
END IF;
RETURN NEW;
END;
$$ LANGUAGE PLpgSQL;
CREATE TRIGGER check_me BEFORE INSERT OR UPDATE ON trails
FOR EACH ROW EXECUTE PROCEDURE check_me();
Attempts to INSERT a value that does not have a corresponding surface_id in table trails, fails:
learning=> INSERT INTO trails(name, surface_id)
learning-> VALUES ('Tennis Walker', 110);
ERROR: 110 is not allowed for surface id
CONTEXT: PL/pgSQL function check_me() line 4 at RAISE
The query results below confirm the ‘offending‘ value was not stored:
learning=> SELECT * FROM trails;
id | name | surface_id
----+-----------------+------------
1 | Dusty Storm | 303
2 | Runners Trip | 808
3 | Pea Gravel Pass | 101
4 | Back 40 Loop | 404
(4 rows)
That sure is a lot of work to prohibit unwanted values.
Let’s re-implement this requirement with a Check constraint.
Since you cannot use a subquery (here is why I used the example above) in the actual Check constraint definition, the values must be hard-coded.
For a small table, or trivial example such as presented here, this is fine. In other scenarios, incorporating more values, you may be better served to seek an alternative solution.
learning=> ALTER TABLE trails ADD CONSTRAINT t_check CHECK (surface_id IN (101, 202, 303, 404, 505, 606, 707, 808));
ALTER TABLE
Here I have named the Check constraint t_check versus letting the system name it.
(Note: The previously defined check_me() FUNCTION and accompanying TRIGGER were dropped (not shown) prior to running the below INSERT.)
learning=> INSERT INTO trails(name, surface_id)
VALUES('Tennis Walker', 110);
ERROR: new row for relation "trails" violates check constraint "t_check"
DETAIL: Failing row contains (7, Tennis Walker, 110).
Would you look at how easy that was! No TRIGGER and FUNCTION needed.
Check constraints make this type of work easy-peasy.
Want to get crafty in the Check constraint definition?
You can.
Suppose you need a table listing trails that are a bit kinder to those with sensitive ankles and knees. No hard surfaces desired here.
You want to assure that any hiking trail or track listed in table nice_trail has a surface material of either ‘Gravel’ or ‘Dirt’.
This Check constraint handles that requirement no problem:
learning=> CREATE TABLE nice_trail(id SERIAL PRIMARY KEY,
learning(> name TEXT, mat_surface_id INTEGER CONSTRAINT better_surface CHECK(id IN (101, 303)));
CREATE TABLE
That absolutely works just fine.
But, how about a FUNCTION that returns both id’s required to make the Check work? Is a FUNCTION allowed in the Check constraint definition?
Yes, one can be incorporated.
Here is a working example.
Up first, the function body and definition:
CREATE OR REPLACE FUNCTION easy_hike(id INTEGER)
RETURNS BOOLEAN AS
$$
BEGIN
IF id IN (SELECT surface_id FROM surface_material WHERE material IN ('Gravel', 'Dirt'))
THEN RETURN true;
ELSE RETURN false;
END IF;
END;
$$ LANGUAGE PLpgSQL;
Notice in this CREATE TABLE statement, I define the Check constraint at the ‘table‘ level whereas previously I have only provided examples at the ‘column‘ level.
Check constraints defined at the table level are perfectly valid:
learning=> CREATE TABLE nice_trail(nt_id SERIAL PRIMARY KEY,
learning(> name TEXT, mat_surface_id INTEGER,
learning(> CONSTRAINT better_surface_check CHECK(easy_hike(mat_surface_id)));
CREATE TABLE
These inserts are all good:
learning=> INSERT INTO nice_trail(name, mat_surface_id)
learning-> VALUES ('Smooth Rock Loop', 101), ('High Water Bluff', 303);
INSERT 0 2
Now comes along an INSERT for a trail that does not meet the restriction on column mat_surface_id:
learning=> INSERT INTO nice_trail(name, mat_surface_id)
learning-> VALUES('South Branch Fork', 404);
ERROR: new row for relation "nice_trail" violates check constraint "better_surface_check"
DETAIL: Failing row contains (3, South Branch Fork, 404).
Our FUNCTION call in the Check constraint definition works as designed, restricting the unwanted column values.
Smoke and Mirrors?
Is everything as it seems with Check constraints? All black and white? No facade up front?
An example worth noting.
We have a simple table in which we want the DEFAULT value to be 10 for the lone INTEGER column present:
learning=> CREATE TABLE surprise(id INTEGER DEFAULT 10, CHECK (id <> 10));
CREATE TABLE
But, I have also included a Check constraint that prohibits a value of 10, by defining id can not be equal to that number.
Which one will win out the day? The DEFAULT or Check constraint?
You might be surprised to know which it is.
I was.
An arbitrary INSERT, working fine:
learning=> INSERT INTO surprise(id) VALUES(17);
INSERT 0 1
learning=> SELECT * FROM surprise;
id
----
17
(1 row)
And an INSERT with the DEFAULT value:
learning=> INSERT INTO surprise(id) VALUES(DEFAULT);
ERROR: new row for relation "surprise" violates check constraint "surpise_id_check"
DETAIL: Failing row contains (10).
Oops…
Again, with an alternate syntax:
learning=> INSERT INTO surprise DEFAULT VALUES;
ERROR: new row for relation "surprise" violates check constraint "surpise_id_check"
DETAIL: Failing row contains (10).
The Check constraint wins out over the DEFAULT value.
Oddball Example
The Check constraint can appear pretty much anywhere in the table definition during creation. Even at the column level, it can be set on a column not involved in the check whatsoever.
Here is an example to illustrate:
learning=> CREATE TABLE mystery(id_1 INTEGER CHECK(id_2 > id_3),
learning(> id_2 INTEGER, id_3 INTEGER);
CREATE TABLE
An INSERT to test the constraint:
learning=> INSERT INTO mystery(id_1, id_2, id_3) VALUES (1, 2, 3);
ERROR: new row for relation "mystery" violates check constraint "mystery_check"
DETAIL: Failing row contains (1, 2, 3).
Works as intended.
VALIDATION and NOT VALID
We have this simple table and data:
learning=> CREATE TABLE v_check(id INTEGER);
CREATE TABLE
learning=> INSERT INTO v_check SELECT * FROM generate_series(1, 425);
INSERT 0 425
Suppose we now need to implement a Check constraint that prohibits any values less than 50.
Imagine this is a large table in production and we can not really afford any acquired lock at the moment, resulting from from an ALTER TABLE statement. But, do need to get this constraint in place, moving forward.
ALTER TABLE will acquire a lock (dependent on each different subform). As mentioned, this table is in production, so we wish to wait until we are out of ‘peak hours‘.
You can use the NO VALID option when creating the Check constraint:
learning=> ALTER TABLE v_check ADD CONSTRAINT fifty_chk CHECK(id > 50) NOT VALID;
ALTER TABLE
Continuing operations, should an attempt for an INSERT or UPDATE that violates the Check constraint:
learning=> INSERT INTO v_check(id) VALUES(22);
ERROR: new row for relation "v_check" violates check constraint "fifty_chk"
DETAIL: Failing row contains (22).
The ‘offending’ column value is prohibited.
Then, during downtime, we validate the Check constraint to apply it against (any) pre-existing columns that may be in violation:
learning=> ALTER TABLE v_check VALIDATE CONSTRAINT fifty_chk;
ERROR: check constraint "fifty_chk" is violated by some row
The message is quite cryptic in my opinion. But, it does inform us that there are rows not in compliance with the constraint.
Here are some key points I wanted to include from the ALTER TABLE documentation (Verbiage directly from the docs in quotes):
- Syntax: ADD table_constraint [ NOT VALID ] – Accompanying description (partial) “This form adds a new constraint to a table using the same syntax as CREATE TABLE, plus the option NOT VALID, which is currently only allowed for foreign key and CHECK constraints. If the constraint is marked NOT VALID, the potentially-lengthy initial check to verify that all rows in the table satisfy the constraint is skipped.”
- Syntax: VALIDATE CONSTRAINT constraint_name – Accompanying description (partial) “This form validates a foreign key or check constraint that was previously created as NOT VALID, by scanning the table to ensure there are no rows for which the constraint is not satisfied.” “Validation acquires only a SHARE UPDATE EXCLUSIVE lock on the table being altered.”
As an aside, two points worth noting I learned along the way. Set-returning functions and subqueries are not allowed in Check constraint definitions. I am sure there are others and I welcome any feedback on them in the comments below.
Check constraints are awesome. Using the ‘built-in’ solutions provided by the PostgreSQL database itself, to enforce any data restriction(s) makes perfect sense. Time and effort spent implementing Check constraints for necessary column(s), far outweighs not implementing any at all. Thus saving time in the long run. The more we lean on the database to handle these kinds of requirements, the better. Enabling us to focus and apply our resources to other areas/aspects of database management.
Thank you for reading.