Don't Get Tangled in Your Database Constraints

sql postgresql ruby elixir
Posted on: 2017-06-22

This post was originally published on the Big Nerd Ranch blog.

In my last post, I made a case that database constraints - unique indexes, foreign keys, and more advanced tools like exclusion constraints - fill a necessary role that application-level validations can't.

For example, when two users send simultaneous requests to claim the same username or rent the same property on a holiday weekend, application code alone can't prevent conflicting data from being inserted. Only the database can guarantee consistent data, and constraints are the simplest way to have it do so.

But if they're misused, constraints can cause a lot of frustration.

Some Guidelines For Constraints

For best results, keep these guidelines in mind:

1. Constrain For Certainty

If users should never be missing an email address, make the column NOT NULL to guarantee it. If users should never have duplicate email addresses, add a unique index to guarantee it. If product prices should never be negative, add a CHECK constraint to guarantee it. If reservations should never overlap, add an EXCLUDE constraint to guarantee it.

These guarantees greatly reduce the number of cases your code has to handle and the number of bugs you could create. And in cases like uniqueness or overlap, they protect you from race conditions that your application code can't.

2. Choose Ironclad Rules

Once in place, constraints are - by design - hard to bypass. If you set a column to NOT NULL and find you need to insert a NULL value, you'll have to change the schema first.

So try to distinguish "ironclad" rules from context-dependent ones. "Two users can't rent the same cabin simultaneously" is an ironclad rule. "We need 3 hours between check out and check in for cleaning" may not apply in special cases, or may change based on staffing.

If you think a rule won't always apply, keep it out of the database.

3. Consider the Consequences Of Being Wrong

What if you're unsure whether to add a particular constraint? There are some tradeoffs to consider.

On one hand, it's always possible to remove constraints, but it may be impossible to add them if you've allowed messy data in your system. That suggests you should err on the side of over-constraining your data.

On the other hand, while a missing constraint may force you to do cleanup work, an ill-considered one may prevent users from doing something reasonable, like omitting a "last name" because they actually don't have one. That suggests you should err on the side of under-constraining (and under-validating) your data.

In the end, you have to decide what's likely to cause the fewest, mildest problems for the specific data in your application.

Constraints and Your Code

OK, let's say you have some use cases for constraints. Depending on your language and tools, this may present some challenges in your codebase.

Suppose we run a concert hall, and we have an application for managing events. Every event must have a unique name, and no two events may overlap dates. We'll check both of these conditions with validations, and enforce them with database constraints.

Let's compare the challenges when using Ruby's Active Record and Elixir's Ecto.

Active Record Challenge: Handle Constraint Errors Gracefully

Using ActiveRecord (v5.0), it's straightforward to validate uniqueness: validates :name, uniqueness: true will do it. Checking that events don't overlap requires a custom validation, but it's not very hard (see my example code).

Constraint violations, on the other hand, cause exceptions, and having to rescue multiple exceptions in a method gets ugly fast.

For example, you wouldn't want to do this:

def create
  @event = Event.new(event_params)
  if @event.save
    redirect_to events_path
  else
    render :new
  end
rescue ActiveRecord::RecordNotUnique => e
  if e.is_a?(ActiveRecord::RecordNotUnique) && e.cause.message.match('unique constraint "index_events_on_name"')
    event.errors.add(:name, "has been taken")
    render :new
  else
    # not something we expected
    raise
  end
rescue ActiveRecord::StatementInvalid => e
  if e.cause.is_a?(PG::ExclusionViolation) && e.message.match("no_overlaps")
    event.errors.add(:base, "cannot overlap existing events")
    render :new
  else
    # not something we expected
    raise
  end
end

That's pretty ugly even for one controller action. Repeating all that in the update action would make it even worse.

The default path in Rails is "don't worry about those exceptions", and it's not unreasonable. After all, if you're validating uniqueness of name, the race-condition case where two users try get claim the same name nearly simultaneously should be rare. You could just return an HTTP 500 in that case and be done with it.

That's especially true if there's nothing the user could do to fix the error anyway, as Derek Prior has pointed out. For example, if your code encrypts users' passwords before saving to the database, there's no point validating the presence of encrypted_password or rescuing a NOT NULL exception if your code doesn't set the field. You don't need an error message for the user, you need an exception to alert you of buggy code.

But if you do decide to provide friendly user feedback for race-condition data conflicts, try to keep the controller code minimal and clear.

First, since a violated constraint can generate several different exceptions, we need a nice way to catch them all. Starr Horne blogged about an interesting technique: define a custom class for use in the rescue clause that knows which exceptions we care about.

For example, if we had an Event model, we could nest a ValidationRaceCondition class and override its === class method:

class Event::ValidationRaceCondition
  # returns true if this is something we should rescue
  def self.===(exception)
    return true if exception.is_a?(ActiveRecord::RecordNotUnique) && exception.cause.message.match('unique constraint "index_events_on_name"')
    return true if exception.cause.is_a?(PG::ExclusionViolation) && exception.message.match("no_overlaps")
    false
  end
end

There's some ugly digging around in there, but at least it's contained in one place.

We can then define Event#save_with_constraints, using Event::ValidationRaceCondition as a stand-in for "any of the errors we'd expect if one of our constraints were violated":

# like a normal save, but also returns false if a constraint failed
def save_with_constraints
  save
rescue Event::ValidationRaceCondition
  # re-run validations to set a user-friendly error mesage for whatever the
  # validation missed the first time but the constraints caught
  valid?
  false
end

The rescue clause will catch only the constraint-related exceptions that Event::ValidationRaceCondition describes. At that point, we can re-run our validations and this time, they'll see the conflicting data and set a helpful error message for the user.

With this all wrapped up in save_with_constraints, the controller code is as simple as usual:

def create
  @event = Event.new(event_params)
  if @event.save_with_constraints
    flash[:notice] = "Event created successfully"
    redirect_to events_path
  else
    flash[:error] = "There was a problem creating this event"
    render :new
  end
end

And there you have it! Our validations catch all input errors in one pass, our constraints ensure that we don't allow last-second conflicts, and our users get friendly error messages if constraints are violated, all with minimal fuss.

See my example Rails code if you want more details.

Ecto Challenge: Provide Good Feedback to Users

Update 2017-08-18 - Ecto 2.2 will allow validating for uniqueness.

Elixir's database library Ecto (v2.1) presents different challenges.

Unlike Active Record, Ecto makes constraints the easiest way to guard against conflicting data. For instance, it has built-in support for creating a unique index in its migrations. Calling Ecto.Changeset.unique_constraint(changeset, :name) marks the changeset, signaling that if this constraint is violated, we want to parse the database error into a friendly user-facing message. Ecto has similar functions to work with check constraints and exclusion constraints.

That's great!

However, Ecto doesn't provide a uniqueness validation; the documentation specifically say that Ecto validations "can be executed without a need to interact with the database", which leaves all checks for conflicting data to be done exclusively by constraints.

This is unfortunate, because even if an INSERT would violate 3 constraints, PostgreSQL will only display an error message for the first one it notices. In the worst case, a user might have to submit a form once, fix the validation errors, submit again, fix the first constraint error, and submit two more times to fix the remaining constraint errors!

So despite the fact that it runs counter to the documentation, I recommend that for the best user experience, you layer your Ecto validations and constraints. That is, let your validations check for intrinsic errors like "email can't be blank" and also for conflicting data errors like "username is taken".

That's because in the vast majority of cases, the conflicting data was inserted long before the current request, not milliseconds before. And catching conflicts with a validation lets you inform the user of all these conflicts at once.

To do that, you could use a custom Ecto validation like this:

def validate_no_conflicting_usernames(changeset = %Ecto.Changeset{changes: %{username: username}}) when not is_nil(username) do
  dups_query = from e in User, where: e.username == ^username
  # For updates, don't flag user record as a dup of itself
  id = get_field(changeset, :id)
  dups_query = if is_nil(id) do
    dups_query
  else
    from e in dups_query, where: e.id != ^id
  end

  exists_query = from q in dups_query, select: true, limit: 1
  case Repo.one(exists_query) do
    true -> add_error(
      changeset, :username, "has already been taken", [validation: :validate_no_conflicting_usernames]
    )
    nil  -> changeset
  end
end

# If changeset has no username or a nil username, it isn't a conflict
def validate_no_conflicting_usernames(changeset), do: changeset

With this validation in place, users will get faster feedback than with a uniqueness constraint alone.

See my example Phoenix code if you want more details.

Keep It Friendly, Keep It Clean

Whatever tools you're using, the rule suggested by Derek Prior is a good one: use constraints to prevent bad data and validations to provide user feedback. Let each do the job it's best at.

And as always, strive to keep your code DRY and clear.

You can compare the Rails and Phoenix example apps from this post if you want to explore further.