The PostgreSQL citext
type (docs here) is a good choice for lots of user-facing text.
For example, if you want to sort names alphabetically, you probably want Banana
to come after apple
, even though upper case Latin characters have smaller codepoint values than lower case ones (see the decimal set in man ascii
).
And if you have a unique index, you probably don't want to allow both foo@example.com
and Foo@example.com
.
citext
does the right thing in both cases: it store the text with the case given, but for comparisons it ignores case.
(Note that citext
is an extension; you have to CREATE EXTENSION IF NOT EXISTS citext
in order to use it.)
Ordering
WITH test_data(name) AS (
VALUES
('Banana'),
('APE'),
('apple')
)
SELECT name FROM test_data ORDER BY name ASC;
name
--------
APE
Banana
apple
(3 rows)
vs
WITH test_data(name) AS (
VALUES
('Banana'::citext),
('APE'::citext),
('apple'::citext)
)
SELECT name FROM test_data ORDER BY name ASC;
name
--------
APE
apple
Banana
(3 rows)
Uniqueness
WITH test_data(name) AS (
VALUES
('Apple'),
('apple'),
('APPLE')
)
SELECT DISTINCT name FROM test_data;
name
-------
APPLE
apple
Apple
(3 rows)
vs
WITH test_data(name) AS (
VALUES
('Apple'::citext),
('apple'::citext),
('APPLE'::citext)
)
SELECT DISTINCT name FROM test_data;
name
-------
Apple
(1 row)