The journey to virtual generated columns

October 02, 2025

One of the headlining features of PostgreSQL 18 is virtual generated columns, which I worked on.  This is the story of that feature.

Just to catch everyone up, this is what a virtual generated column is:

CREATE TABLE t1 (a int, b int GENERATED ALWAYS AS (a * 2));

Here is a less abstract example:

CREATE TABLE products (
    id int PRIMARY KEY,
    title varchar,
    price numeric,
    tax numeric GENERATED ALWAYS AS (price * 0.19)
);

(not tax advice).

I started working on generated columns in 2017.  This project followed my earlier project on identity columns (GENERATED ALWAYS AS IDENTITY). These are sometimes confused, because they both use the syntax "GENERATED AS something", but they are at least internally quite different.  Identity columns function more like a default value, while generated columns function are more like a trigger or a view, depending on how you see them.

That was actually the first thing to figure out, how you want generated columns to behave.  Users in the PostgreSQL community had requested them from time to time, as they were coming from other SQL database systems such as DB2, MySQL, and Oracle, and I was familiar with them from having studied the SQL standard extensively in connection with the identity columns project.  But the SQL standard is an abstract description of what commands you run and what results to expect; it doesn't concern itself with how things are stored and when computations actually take place.

After some research, I figured out that there are two different ways generated columns could function: They are either computed when the row is written, or they are computed when the row is read. This distinction is not surprising: There are materialized views (updated on write) and (non-materialized, standard) views (computed on read). Note that the SQL standard only specifies plain views, not materialized views, for the same reason: It doesn't concern itself when things are computed or stored.  If you read between the lines of the commentary from the times when generated columns were invented, it seems they were originally meant to be updated on write, to save having to compute expensive expressions on every read. But then some people figured they could also be useful the other way around, as a nice option to enhance your table schemas. Without generated columns, if you wanted a computed column, you would need to create a view, and while views are powerful, and updatable views exist, views are not without complications and are sometimes overkill.

There has appeared a kind of ad-hoc standard of designating these two kinds of generated columns as "virtual" and "stored". These are not SQL standard terms, but several implementations have adopted them, and so I decided to go with them as well.  Virtual generated columns are like views, in that they are computed on read, stored generated columns are like materialized views in that they are computed on write (except that stored generated columns are always up to date, which is not always the case for materialized views).

Trials and tribulations

For some reason, when I started out on this project, I decided to start with virtual generated columns. I think on balance that might have appeared to be the most requested kind, and maybe I also thought it was the easiest to implement first. I was wrong! And then if I recall this correctly, at some point I thought, hey, I'm making so much progress, let's throw in stored generated columns as well.  This was in some ways sensible, because all the DDL code and the catalog structures and so on are shared, and the difference only happens somewhere deeper in the code where you actually evaluate the expression. So it actually turned out to be useful to consider both kinds at the same time, even if they ended up taking diverging roads.

Stored generated columns ended up being easier because they are really a kind of optimized trigger, and PostgreSQL knows how to handle that. Virtual generated columns are more of a different kind of thing that needed more consideration.

One of the problems with virtual generated columns that I spent a lot of time on initially was figuring out how to not store them. The PostgreSQL code in various layers and subsystems expects a row to consist of stuff, I mean columns. If you try to tell the system, actually, the column at position 5 is virtual and doesn't exist, that turns out to be extremely difficult and fragile.  (If you think further about this, it leads you to other interesting and very difficult projects like column reordering.)  So after a while, I gave up and ripped out the support for virtual generated columns, and PostgreSQL 12 (released in 2019) shipped with support for stored generated columns only.

But this has been a gaping hole in my heart ever since.

Indirectly, I had been hoping that the column reordering project might proceed, and that that would give me new opportunities that pursue my project with the architecture that I had envisioned.  However, this did not end up happening.

So about a year ago, I decided to take a different approach and that something is better than nothing. Instead of trying to rearchitect half of PostgreSQL to allow non-existing columns (or, to be fair, waiting for someone else to do it), let's just use a little bit of storage (pun intended, read on): Make the virtual generated column null on disk. Nulls are stored in a null bitmap, and so if you already have other nullable columns in the table, an additional one will only take one bit, and depending on how many other columns you have, it might not take additional space in the stored row at all. This is still better than having to use a stored generated column when you wanted a virtual one, which was the workaround that people had resorted to until now. (Ok, there could be specific cases where there the storage overhead of both variants is equal, but it's never worse with virtual columns.)  And if we ever get to column reordering and all that, I have an idea how we could make pg_upgrade transition to actual zero storage.

The implementation of virtual generated columns is analogous to a view in that the expansion of the expression happens during rewriting. Except that that was wrong and it needed to be moved to the optimizer, for some complicated reasons having to do with outer joins. I got a lot of help in this area during development, in particular from Dean Rasheed and Richard Guo.

What's next?

One important thing to consider when trying to get a feature into a PostgreSQL release on a yearly release cycle is to limit and contain the scope of a project.  So, first, I tried to make virtual generated columns work analogously to stored generated columns as much as possible.  Some people have asked questions like, why can't I use random() in a generation expression of a virtual generated column?  I guess this might work, but then it wouldn't work like a stored generated column.  I didn't want to have to think about that.  We know how stored generated columns behave, and we have accumulated extensive test coverage for them; let's stick to that.  Second, several features currently do not work together with virtual generated columns.  For example, you can't use them in an index definition.  This just requires more work, and some of that work is already in progress for PostgreSQL 19. But to get the project done for PostgreSQL 18, I had to draw the line somewhere.  So expect we will have more news in this area in the next major releases of PostgreSQL.
 

Share this