DB Indexes Do Not Magically Compose
Disclaimer: as with every performance discussion, nothing beats measuring.
Disclaimer 2: Metaphors are great at hiding important distinctions between two situations through massive context flattening.
Having said that, metaphor time.
You have just graduated from the Very Fancy School With An Important Alumni Network. As a graduation gift, you are given two school alumni contact books:
-
One is a list of alumni, sorted by the year of graduation, then sorted by their name.
-
Another is a list of graduates, sorted by their current place of residence, then sorted by their name.
Later on, you try and answer some questions.
How many alumni are in New York City? Easy question to answer with one of the contact books!
How many alumni graduated in 1986? Again, easy question to answer!
Are there any graduates by the name of Paula Plum in New York City? Easy question to answer!
You are then asked how many people who graduated in the 90s who live in New York City. This is a bit trickier. You can cross reference the list of NYC graduates with 10 lists of graduates from the 90s. At least they're all sorted by name so you can "optimize" a bit, and just look at the 11 lists.
Now, I ask you whether, in the history of the school, anybody named Paula Plum graduated from the school. Maybe you pick up the "place of residence" alumni book and hope they don't live in Yorktown. Or you pick up the "year of graduation" alumni book and hope that Paula Plum was a popular name in 1956.
Your database indexes are just data organized in a certain order. If you have a query that only partly aligns with a part of your index, then your database might not have any choice but to look at "everything" to answer your questions.
For example, if your alumni
table has indexes on:
(graduation_year, name)
(place_of_residence, name)
You don't magically get an index on name
from this! Questions that only look at names can still end up pretty slow, unless you also end up filtering somehow by graduation_year
and place_of_residence
.
Similarly, if you just have indexes on name
, graduation_year
and place_of_residence
, if you query on two of those properties, a question like "everyone whose name starts with C living in California" will likely end up as "get everyone whose name starts with C, then check one by one". Or maybe you'll use both indexes, but it will involve pulling a lot of data from both categories.
Of course, on small datasets, or queries where one property is "sufficiently distinct" (if you only have 4 people named Paula Plum, your DB will probably pick that index quickly), you're all golden.
But in a multi-tenant system, where almost every single query has ... AND owner_id = ?
in it, then any index that doesn't include owner_id
is likely to not be carrying its own weight.
For example, imagine a document
table, with title
and owner_id
columns and individual indexes on those columns.
A query like select * from document where title = 'foo' and owner_id = 32
can end up being resolved in a couple ways:
-
Look up documents where
title = 'foo'
across your entire system via an index. Even though you only want one tenant's documents, you'll look for all the documents, then filter down to only that tenant's document by looking inside the document data itself (way costlier than just traversing an index). -
Look up all of the documents where
owner_id = 32
via an index. Even though you only want the documents wheretitle = 'foo'
, you're going to get all of this tenant's documents, then check which ones have the title you care about.
In either case you're potentially looking at a massive performance decrease in cases where one field is often the same across many rows (title = 'New Document'
for example).
If instead you made sure your indexes have your tenancy key first (on (owner_id, title)
), then a filter via owner_id = 32 and title = 'foo'
is going to be mostly instant! You will have already organized your data in a way matching how you query it.
Your DB planner does not have magic inside of it. Help it out with the right kind of index from the outset.