I recently built a full-text recipe search feature using Ecto and PostgreSQL for Mealthy.com. You can try it out there, or check out this quick demo video.
I thought this was interesting enough to write up (with Mealthy's permission).
This search feature replaced a simpler one, and needed to:
- Support substring matches. For example, "bu" should find recipes like "Vegan Butternut Squash Soup". This was because search could be used globally, or to filter a user's saved recipes as they typed a search term. It would be confusing if typing "bu" filtered out everything, but continuing to type "butternut" made some recipes reappear.
- Understand plurals. For example, searches for "strawberries" and "strawberry" should return the same results.
- Ignore accents. For example, a search for "sauté" vs "saute" should return the same results.
- Search tags. For example, searches for "gluten-free" should return recipes tagged as gluten free, whether or not the recipe title includes those words.
- Support ranking. For example, when searching for "strawberry", a recipe with "strawberry" in the title should be shown before a recipe that's merely tagged "strawberries" because that's one of its ingredients.
- Honor user preferences. For example, a vegetarian user searching for "burger" should have meaty results filtered out, leaving only matches like "Vegan Black Bean Burgers".
- Perform searches quickly. Everybody likes speed, but it was especially important here for the "search after every keystroke" use case. A performance hit when adding or updating recipes was OK, as this was done infrequently, and always by staff.
Substring matches were easy: a simple ILIKE
query on recipe titles would do.
Accent-agnostic, pluralization-agnostic, multi-table search (recipes
+ tags
) was slightly harder, and we discussed using something like Solr. But we were already using PostgreSQL as our primary data store, and its full-text search capabilities were enough for this task, which kept us from having to add another dependency.
If you don't already understand functions like to_tsvector
, you should really read Postgres full-text search is Good Enough!. It's a fantastic post, both as a tutorial and a reference. For this post, I'll assume you know the relevant PostgreSQL functions.
Honoring user preferences meant, in our context, performing the full-text search as a chainable Ecto query, so that other query conditions like "must be vegetarian" could also be applied. In other words, the full-text search function takes a %Ecto.Query{}
and returns a modified one.
After getting search working, I noticed that it was rather slow, so to speed it up I added a materialized view and triggers to update that view as necessary. There may be a better way; I'll cover that at the end.
Elixir Code
First off, let's take a look at the Elixir code behind recipe search. Searching for "thai chicken" on the site ends up calling RecipeSearch.run(query, "thai chicken")
, where query
is a base %Ecto.Query{}
that needs narrowing to match the term. I'll comment on the code below.
defmodule RecipeSearch do
import Ecto.Query
def run(query, search_string) do
_run(query, normalize(search_string))
end
defmacro matching_recipe_ids_and_ranks(search_string) do
quote do
fragment(
"""
SELECT recipe_search.id AS id,
ts_rank(
recipe_search.document, plainto_tsquery(unaccent(?))
) AS rank
FROM recipe_search
WHERE recipe_search.document @@ plainto_tsquery(unaccent(?))
OR recipe_search.title ILIKE ?
""",
^unquote(search_string),
^unquote(search_string),
^"%#{unquote(search_string)}%"
)
end
end
defp _run(query, ""), do: query
defp _run(query, search_string) do
from recipe in query,
join: id_and_rank in matching_recipe_ids_and_ranks(search_string),
on: id_and_rank.id == recipe.id,
order_by: [desc: id_and_rank.rank]
end
defp normalize(search_string) do
search_string
|> String.downcase
|> String.replace("gluten free", "gluten-free")
|> String.replace("dairy free", "dairy-free")
|> String.replace("sugar free", "sugar-free")
|> String.replace("meatless", "no-meat")
|> String.replace("vegetarian", "no-meat")
|> String.replace(~r/\n/, " ")
|> String.replace(~r/\t/, " ")
|> String.replace(~r/\s{2,}/, " ")
|> String.trim
end
end
The run/2
function normalizes the search string (which involves typical things like discarding extra white space, as well as domain-specific things like treating "gluten free" as "gluten-free"), then joins the subquery.
The macro, matching_recipe_ids_and_ranks/1
, builds a subquery to search the recipe_search
view for matching recipes and return their recipe ids and ranks. It takes care of unaccenting the term and looking for a full-text match against the view's document
column, as well as running an alternative ILIKE
search on the title for substring matches. It builds the query with plainto_tsquery
, which means that special characters like @
are ignored and a string like "thai chicken
becomes thai & chicken
- that is, all words must match.
Initially, I had the subquery return only the ids of matching recipes and did something like WHERE id IN subquery
, but this didn't let me make use of the subquery's rank. Joining the subquery allowed me to ORDER BY
the subquery's rank, so that "better" matches for a term could appear first.
The Database View
The subquery selects FROM recipe_search
. This is a view that pulls together information from recipes
and tags
. As I'm about to show, it's a materialized view because that proved to be faster, but for all this module knows, it could be a regular view which simply encapsulates building the document
we're searching.
I created recipe_search
with migration code like this:
execute(
"""
CREATE EXTENSION IF NOT EXISTS unaccent
"""
)
execute(
"""
CREATE MATERIALIZED VIEW recipe_search AS
SELECT
recipes.id AS id,
recipes.title AS title,
(
setweight(to_tsvector(unaccent(recipes.title)), 'A') ||
setweight(to_tsvector(unaccent(coalesce(string_agg(tags.value_slug, ' '), ' '))), 'B')
) AS document
FROM recipes
LEFT JOIN recipes_tags ON recipes_tags.recipe_id = recipes.id
LEFT JOIN tags ON tags.id = recipes_tags.tag_id
GROUP BY recipes.id
"""
)
# to support full-text searches
create index("recipe_search", ["document"], using: :gin)
# to support substring title matches with ILIKE
execute("CREATE INDEX recipe_search_title_trgm_index ON recipe_search USING gin (title gin_trgm_ops)")
# to support updating CONCURRENTLY
create unique_index("recipe_search", [:id])
Essentially, this says:
-
Create a materialized view with one row for each row in
recipes
- For each row, join all the tags for that recipe into one big string
-
Squish the title and tags together into one
document
, weighting title as more important than tags for search purposes - Add some indexes to speed up searches
Using a materialized view means doing work in advance that would otherwise have to be done at query time, and in my testing, it sped up searches from about 60ms to 6ms; I would expect a larger difference as the number of recipes increases.
However, materialized views have a downside: they must be explicitly refreshed, or else searches will return outdated results.
Refreshing the Materialized View
We debated refreshing recipe_search
on a timer, but that seemed wrong in a couple of ways.
First, because there will often be no reason to refresh; nothing has changed. That's wasted effort.
Second, because when something does change, we'd like the refresh to happen immediately, not whenever the timer goes off again.
After some thought, I set this up using triggers. These tell the database that whenever recipes
, or tags
, or recipes_tags
changes, recipes_search
needs to be refreshed.
Here's the Ecto migration code:
execute(
"""
CREATE OR REPLACE FUNCTION refresh_recipe_search()
RETURNS TRIGGER LANGUAGE plpgsql
AS $$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY recipe_search;
RETURN NULL;
END $$;
"""
)
execute(
"""
CREATE TRIGGER refresh_recipe_search
AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE
ON recipes
FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_recipe_search();
"""
)
execute(
"""
CREATE TRIGGER refresh_recipe_search
AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE
ON recipes_tags
FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_recipe_search();
"""
)
execute(
"""
CREATE TRIGGER refresh_recipe_search
AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE
ON tags
FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_recipe_search();
"""
)
A Possible Improvement
The code here seems to work quite well. The remaining downside is that recalculating the entire materialized view when only one row needs to change is a bit wasteful.
This doesn't seem likely to become a problem anytime soon. However, Rob Sullivan (by way of John Mosesman) suggested a way to fix it if it does: instead of a materialized view, make recipe_search
be a regular table. And instead of rebuilding the whole thing when recipes
or tags
change, just update the recipe_search
rows for specific recipes that need it.
I like this idea and will keep it in mind for the next time I build a feature like this.
Update 2020-07-08
I just found a post which references this one and makes some improvements on my approach.