blog
An Overview of Generated Columns for PostgreSQL
PostgreSQL 12 comes with a great new feature, Generated Columns. The functionality isn’t exactly anything new, but the standardization, ease of use, accessibility, and performance has been improved in this new version.
A Generated Column is a special column in a table that contains data automatically generated from other data within the row. The content of the generated column is automatically populated and updated whenever the source data, such as any other columns in the row, are changed themselves.
Generated Columns in PostgreSQL 12+
In recent versions of PostgreSQL, generated columns are a built-in feature allowing the CREATE TABLE or ALTER TABLE statements to add a column in which the content is automatically ‘generated’ as a result of an expression. These expressions could be simple mathematical operations from other columns, or a more advanced immutable function.Some benefits of implementing a generated column into a database design include:
- The ability to add a column to a table containing computed data without need of updating application code to generate the data to then include it within INSERT and UPDATE operations.
- Reducing processing time on extremely frequent SELECT statements that would process the data on the fly. Since the processing of the data is done at the time of INSERT or UPDATE, the data is generated once and the SELECT statements only need to retrieve the data. In heavy read environments, this may be preferable, as long as the extra data storage used is acceptable.
- Since generated columns are updated automatically when the source data itself is updated, adding a generated column will add an assumed guarantee that the data in the generated column is always correct.
In PostgreSQL 12, only the ‘STORED’ type of generated column is available. In other database systems, a generated column with a type ‘VIRTUAL’ is available, which acts more like a view where the result is calculated on the fly when the data is retrieved. Since the functionality is so similar to views, and simply writing the operation into a select statement, the functionality isn’t as beneficial as the ‘STORED’ functionality discussed here, but there’s a chance future versions will include the feature.
Creating a Table With a Generated Column is done when defining the column itself. In this example, the generated column is ‘profit’, and is automatically generated by subtracting the purchase_price from the sale_price columns, then multiplied by the quantity_sold column.
CREATE TABLE public.transactions (
transactions_sid serial primary key,
transaction_date timestamp with time zone DEFAULT now() NOT NULL,
product_name character varying NOT NULL,
purchase_price double precision NOT NULL,
sale_price double precision NOT NULL,
quantity_sold integer NOT NULL,
profit double precision NOT NULL GENERATED ALWAYS AS ((sale_price - purchase_price) * quantity_sold) STORED
);
In this example, a ‘transactions’ table is created to track some basic transactions and profits of an imaginary coffee shop. Inserting data into this table will show some immediate results.
severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('House Blend Coffee', 5, 11.99, 1);
severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('French Roast Coffee', 6, 12.99, 4);
severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('BULK: House Blend Coffee, 10LB', 40, 100, 6);
severalnines=# SELECT * FROM public.transactions;
transactions_sid | transaction_date | product_name | purchase_price | sale_price | quantity_sold | profit
------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------
1 | 2020-02-28 04:50:06.626371+00 | House Blend Coffee | 5 | 11.99 | 1 | 6.99
2 | 2020-02-28 04:50:53.313572+00 | French Roast Coffee | 6 | 12.99 | 4 | 27.96
3 | 2020-02-28 04:51:08.531875+00 | BULK: House Blend Coffee, 10LB | 40 | 100 | 6 | 360
When updating the row, the generated column will automatically update:
severalnines=# UPDATE public.transactions SET sale_price = 95 WHERE transactions_sid = 3;
UPDATE 1
severalnines=# SELECT * FROM public.transactions WHERE transactions_sid = 3;
transactions_sid | transaction_date | product_name | purchase_price | sale_price | quantity_sold | profit
------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------
3 | 2020-02-28 05:55:11.233077+00 | BULK: House Blend Coffee, 10LB | 40 | 95 | 6 | 330
This will ensure that the generated column is always correct, with no additional logic needed on the application side.
NOTE: Generated columns cannot be INSERTED into or UPDATED directly, and any attempt to do so will return in an ERROR:
severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold, profit) VALUES ('BULK: House Blend Coffee, 10LB', 40, 95, 1, 95);
ERROR: cannot insert into column "profit"
DETAIL: Column "profit" is a generated column.
severalnines=# UPDATE public.transactions SET profit = 330 WHERE transactions_sid = 3;
ERROR: column "profit" can only be updated to DEFAULT
DETAIL: Column "profit" is a generated column.
Generated Columns on PostgreSQL 11 and Before
Even though built-in generated columns are new to version 12 of PostgreSQL, the functionally can still be achieved in earlier versions, it just needs a bit more setup with stored procedures and triggers. However, even with the ability to implement it on older versions, in addition to the added functionality that can be beneficial, strict data input compliance is harder to achieve, and depends on PL/pgSQL features and programming ingenuity.
BONUS: The below example will also work on PostgreSQL 12+, so if the added functionality with a function / trigger combo is needed or desired in newer versions, this option is a valid fallback and not restricted to just versions older than 12.
While this is a way to do it on previous versions of PostgreSQL, there are a couple of additional benefits of this method:
- Since mimicking the generated column uses a function, more complex calculations are able to be used. Generated Columns in version 12 require IMMUTABLE operations, but a trigger / function option could use a STABLE or VOLATILE type of function with greater possibilities and likely lesser performance accordingly.
- Using a function that has the option to be STABLE or VOLATILE also opens up the possibility to UPDATE additional columns, UPDATE other tables, or even create new data via INSERTS into other tables. (However, while these trigger / function options are much more flexible, that’s not to say an actual “Generated Column” is lacking, as it does what’s advertised with greater performance and efficiency.)
In this example, a trigger / function is set up to mimic the functionality of a PostgreSQL 12+ generated column, along with two pieces that raise an exception if an INSERT or UPDATE attempt to change the generated column. These can be omitted, but if they are omitted, exceptions will not be raised, and the actual data INSERTed or UPDATEd will be quietly discarded, which generally wouldn’t be recommended.
The trigger itself is set to run BEFORE, which means the processing happens before the actual insert happens, and requires the RETURN of NEW, which is the RECORD that is modified to contain the new generated column value. This specific example was written to run on PostgreSQL version 11.
CREATE TABLE public.transactions (
transactions_sid serial primary key,
transaction_date timestamp with time zone DEFAULT now() NOT NULL,
product_name character varying NOT NULL,
purchase_price double precision NOT NULL,
sale_price double precision NOT NULL,
quantity_sold integer NOT NULL,
profit double precision NOT NULL
);
CREATE OR REPLACE FUNCTION public.generated_column_function()
RETURNS trigger
LANGUAGE plpgsql
IMMUTABLE
AS $function$
BEGIN
-- This statement mimics the ERROR on built in generated columns to refuse INSERTS on the column and return an ERROR.
IF (TG_OP = 'INSERT') THEN
IF (NEW.profit IS NOT NULL) THEN
RAISE EXCEPTION 'ERROR: cannot insert into column "profit"' USING DETAIL = 'Column "profit" is a generated column.';
END IF;
END IF;
-- This statement mimics the ERROR on built in generated columns to refuse UPDATES on the column and return an ERROR.
IF (TG_OP = 'UPDATE') THEN
-- Below, IS DISTINCT FROM is used because it treats nulls like an ordinary value.
IF (NEW.profit::VARCHAR IS DISTINCT FROM OLD.profit::VARCHAR) THEN
RAISE EXCEPTION 'ERROR: cannot update column "profit"' USING DETAIL = 'Column "profit" is a generated column.';
END IF;
END IF;
NEW.profit := ((NEW.sale_price - NEW.purchase_price) * NEW.quantity_sold);
RETURN NEW;
END;
$function$;
CREATE TRIGGER generated_column_trigger BEFORE INSERT OR UPDATE ON public.transactions FOR EACH ROW EXECUTE PROCEDURE public.generated_column_function();
NOTE: Make sure the function has the correct permissions / ownership to be executed by the desired application user(s).
As seen in the previous example, the results are the same in previous versions with a function / trigger solution:
severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('House Blend Coffee', 5, 11.99, 1);
severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('French Roast Coffee', 6, 12.99, 4);
severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold) VALUES ('BULK: House Blend Coffee, 10LB', 40, 100, 6);
severalnines=# SELECT * FROM public.transactions;
transactions_sid | transaction_date | product_name | purchase_price | sale_price | quantity_sold | profit
------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------
1 | 2020-02-28 00:35:14.855511-07 | House Blend Coffee | 5 | 11.99 | 1 | 6.99
2 | 2020-02-28 00:35:21.764449-07 | French Roast Coffee | 6 | 12.99 | 4 | 27.96
3 | 2020-02-28 00:35:27.708761-07 | BULK: House Blend Coffee, 10LB | 40 | 100 | 6 | 360
Updating the data will be similar.
severalnines=# UPDATE public.transactions SET sale_price = 95 WHERE transactions_sid = 3;
UPDATE 1
severalnines=# SELECT * FROM public.transactions WHERE transactions_sid = 3;
transactions_sid | transaction_date | product_name | purchase_price | sale_price | quantity_sold | profit
------------------+-------------------------------+--------------------------------+----------------+------------+---------------+--------
3 | 2020-02-28 00:48:52.464344-07 | BULK: House Blend Coffee, 10LB | 40 | 95 | 6 | 330
Lastly, attempting to INSERT into, or UPDATE the special column itself will result in an ERROR:
severalnines=# INSERT INTO public.transactions (product_name, purchase_price, sale_price, quantity_sold, profit) VALUES ('BULK: House Blend Coffee, 10LB', 40, 95, 1, 95);
ERROR: ERROR: cannot insert into column "profit"
DETAIL: Column "profit" is a generated column.
CONTEXT: PL/pgSQL function generated_column_function() line 7 at RAISE
severalnines=# UPDATE public.transactions SET profit = 3030 WHERE transactions_sid = 3;
ERROR: ERROR: cannot update column "profit"
DETAIL: Column "profit" is a generated column.
CONTEXT: PL/pgSQL function generated_column_function() line 15 at RAISE
In this example, it does act differently than the first generated column setup in a couple of ways that should be noted:
- If the ‘generated column’ is attempted to be updated but no row is found to be updated, it will return success with an “UPDATE 0” result, while an actual Generated Column in version 12 will still return an ERROR, even if no row is found to UPDATE.
- When attempting to update the profit column, which ‘should’ always return an ERROR, if the specified value is the same as the correctly ‘generated’ value, it will succeed. Ultimately the data is correct, however, if the desire is to return an ERROR if the column is specified.
Documentation and PostgreSQL Community
The official documentation for the PostgreSQL Generated Columns is located at the official PostgreSQL Website. Check back when new major versions of PostgreSQL are released to discover new features when they appear.
While generated columns in PostgreSQL 12 are fairly straight forward, implementing similar functionality in previous versions has the potential to get much more complicated. The PostgreSQL community is a very active, massive, worldwide, and multilingual community dedicated to helping people of any level of PostgreSQL experience solve problems and create new solutions such as this.
- IRC: Freenode has a very active channel called #postgres, where users help each other understand concepts, fix errors, or find other resources. A full list of available freenode channels for all things PostgreSQL can be found on the PostgreSQL.org website.
- Mailing Lists: PostgreSQL has a handful of mailing lists that can be joined. Longer form questions / issues can be sent here, and can reach many more people than IRC at any given time. The lists can be found on the PostgreSQL Website, and the lists pgsql-general or pgsql-admin are good resources.
- Slack: The PostgreSQL community has also been thriving on Slack, and can be joined at postgresteam.slack.com. Much like IRC, an active community is available to answer questions and engage in all things PostgreSQL.