PostgreSQL Upserts Demo

postgresql
Posted on: 2021-04-16

You've probably heard of PostgreSQL "upserts", which are INSERTs with an ON CONFLICT clause which says "if there's already a row like this, update it."

In that description, "a row like this" is called the "conflict target", and "update it" is the "conflict action", which could be "do nothing" if you want to leave the existing row as-is (although I wouldn't call that an "upsert").

One thing you might not know is that they can include a WHERE clause, which lets you decide whether you actually do want to update the row. For instance, if the incoming data is older than what you have, you might want to ignore it. As my colleague Luke Imhoff discovered, this matters if data is coming in fast, because updating (even if the new data is identical to the old) creates work for the database (creating new tuples and vacuuming up the old), and a WHERE can prevent that.

Here's a little demo of upserts. You can paste it into psql to play along.

-- Set up a table with a unique id.
CREATE TABLE cats (id integer, name text, age integer);
CREATE UNIQUE INDEX unique_id ON cats (id);

-- Returns the new row's data.
INSERT INTO cats (id, name, age)
VALUES (1, 'Mittens', 3)
RETURNING *;

-- Returns nothing; no rows inserted or updated.
INSERT INTO cats (id, name, age)
VALUES (1, 'Mittens', 4)
ON CONFLICT (id) DO NOTHING
RETURNING *;

-- Returns the new row's data.
INSERT INTO cats (id, name, age)
VALUES (2, 'Mojo', 4)
ON CONFLICT (id) DO NOTHING
RETURNING *;

/*
Returns both rows.  Although order is not specified, and
therefore not guaranteed, in my testing it returns these in
the order inserted.
*/
SELECT * FROM cats; 

-- Modifies the row and returns the updated data.
INSERT INTO cats (id, name, age)
VALUES (1, 'Gloves', 3)
ON CONFLICT (id) DO UPDATE set name = EXCLUDED.name
RETURNING *;

/*
Returns both rows. Although order is not specified, and
therefore not guaranteed, in my testing it returns the
updated row second, presumably because an update actually
means "create a new tuple and delete the old one".
(A "tuple" is the underlying representation of a row.)
*/
SELECT * FROM cats; 

/*
Returns nothing; no rows inserted or updated. The WHERE
shows how you might ignore incoming data if it's (eg) older
than what you already have.
*/
INSERT INTO cats (id, name, age)
VALUES (2, 'Pi', 1)
ON CONFLICT (id) DO UPDATE set name = EXCLUDED.name
WHERE length(EXCLUDED.name) > length(cats.name)
RETURNING *;

/*
Returns both rows. Although order is not specified, and
therefore not guaranteed, in my testing it has not modified
the ordering from the second SELECT, presumably because the
WHERE meant it didn't insert a new tuple.
*/
SELECT * FROM cats; 

-- Modifies the row and returns the new data.
INSERT INTO cats (id, name, age)
VALUES (2, 'Peach Pi', 1)
ON CONFLICT (id) DO UPDATE set name = EXCLUDED.name
WHERE length(EXCLUDED.name) > length(cats.name)
RETURNING *;

/*
Returns both rows. Although order is not specified, and
therefore not guaranteed, in my testing it modified the
ordering, presumably because this time it did insert a tuple
and delete the old one.
*/
SELECT * FROM cats; 

-- Clean up
DROP table cats;

So there you have it.

One other thing to note: an upsert's ON CONFLICT can be ON CONSTRAINT constriant_name, which means they can work with fancy things like exclusion constraints.