Feature Flags In Depth

Feature flags are a way for us to enable or disable code paths without needing to re-deploy software.

In a world without feature flags, you might develop a new feature on a branch, and then when you merge it into main, the next time you deploy the feature goes out. This feature goes out on the release (so you want to time your deploy based on some marketing strategy), and on top of this the release goes out and immediately causes problems. You have to rollback (also rolling back some bugfixes that were included) and now there are a bunch of headaches.

Feature flags let us resolve this. They give us a way to enable or disable a feature for users very quickly, without needing a redeploy. This usually involves just adding a row in a database. We can release a feature for a subset of features, release it for everyone, unrelease a feature if there is an issue, all in seconds instead of minutes.

What does this look like for your codebase in practice? Generally, there is just a single function to determine a feature flag's state.

def render_homepage(request): 
  if feature_is_active(request.user, 'new_homepage'):
    return render(request, 'new_homepage.html')
  return render(request, 'homepage.html')

Inside a request, we check if a certain feature flag is enabled for a user. If it is, we use a different code path.

If you are running a web application with a real amount of enterprise users, there is no reason for you not to have a robust feature flag system.

Straightforward enough! But there are many design decisions involved when building out this kind of system. Here is a bit of a walk-through of design decisions and considerations when using feature flags.

Feature Flag Axes

The most basic version of a feature flag is just a system-wide toggle. But this is less than ideal. Really you want a feature flag system that lets you target subsets of users (mostly for testing purposes).

Generally feature flags should be able to be keyed off of users. That is to say, you should be able to toggle a feature on or off for a specific user with your feature flag. You want to be able to to fix a feature to be on or off for specific users.

But the meaning of “user” in the previous thing is somewhat feature dependent. If you have a notion of teams in your system, then you really want a way to have feature flags for teams. That way you can enable a feature for a team, and now all users in it will get access to this feature.

In my experience the feature flags for teams is more important than the one for users. It’s feature-dependent, of course (a feature flag around a new user profile page might want to be keyed on users, whereas a feature around core functionality likely needs to be tagged on teams), but you want to be able to turn on features not just for one user, but all of their collaborators as well.

Sometimes we want to apply on a subset of users/teams. You can do this in a simple way, if all you care about is a percentage of users. Hash some user-based ID to a number between 0 and 100, and then for anyone whose hash is between 0 and some percentage, they get the feature.

If you want to do percentage-based rollouts, some important things to keep in mind:

Your hash must be stable across deploys. You really shouldn’t have users popping into and out of your A group or B group.
You really want a way to change that percentage as easily as a feature flag. For example an administration page in the app. Changing this should not require a deploy
You should have an easy override. One way you can do this:
- You have a percentage-based feature flag table
- Your team-based feature flag table can be “enable this flag” or “disable this flag”. This allows for overriding to happen in both directions if a user contacts and says something

These should give you the tools to do percentage-based rollouts and not wreak havoc on your system.

Feature Flags As A Service

There are feature flags services. Those services let you outsource a lot of design decisions around feature flags, but it’s a pretty high cost to pay (both in cost and extra API calls). Feature flags are not complicated enough to warrant outsourcing them!

Instead, make sure to record feature flag state in your observability and debugging tools. If a page has a feature flag for a new design, every request should have the feature flag for this new design as part of its logged data. If you do this, then not only will you know what happens in specific cases, but you’re a handful of clicks away from getting helpful (if approximate) usage data.

If you want all the metrics that those tools offer, it’s time to get some real observability tools (that will incidentally help you with debugging and performance as well).

I don’t want to dismiss those services too much, but the actual implementation work for feature flags is not hard, if you know where you’re going. Feature flag libraries are all over the place, so pick one you like. Homegrow it if your team doesn’t have too many architecture astronauts

Feature Flag Storage Schema

There are many feature flag libraries. You can google your stack + "feature flag" and pick one. Otherwise, storing information in your relational database of choice works.

One way to store feature flag information (mostly cribbed from djago-waffle):

One table for “feature switches”. Each row has the following
- name (the ID for the flag). For example new_profile_page, which you would then check for later.
- Enabled state (On/Off/Undefined). If this is On it is on for everyone. If it’s Off it’s off for everyone. Otherwise enabled state is determined by another thing
- Percentage-based enable state. Either empty or a number between 0 and 100. Describes a percent of the population that should have access to this feature.
On your user model, a column for feature flags, as a text column. Just a comma-separated list of feature flag names.
If you have a team model, you could stick the same feature flag data there as well.

In this model, user-based feature flags are “almost free” (some extra storage on the user table, but you can query that along with other user information on each request in the same query). The feature switch table will likely be so small that you will have near-instant querying on that front as well (and could have an application-side cache if you wish, though one needs to always be wary of caching).

Feature flag checks thus look like:

If a flag is in the “feature switch” table, check that there. If there’s an enabled state of On or Off, that’s the result.
Check the user’s specifically enabled feature flags. Respect that if we don’t have an explicit enabled state in the feature switch table. (Repeat on the team table as needed)
Finally, if none of that is set up, rely on the percentage-based enable state

Feature Flag Ergonomics

Make sure your feature flag names are easy to search for. Very unique (i.e. new_profile_page_2023 vs profile_page).

On top of this, it’s good to make feature flags easy to access. Though you’ll likely have a function like feature_flag_enabled(request.user, 'new_profile_page'), it’s nice to have a helper object that would also let you have a list of feature flags being used.

class FeatureFlags:
  def __init__(self, user):
        self.user = user
    new_profile_page = FeatureFlag("Profile Page Updates")
  awesome_feature = FeatureFlag("Awesome New Feature")

(in Python you can do something like the above, using descriptors to make things work is an exercise for the user)

With something like the above, you can end up with if request.users.flags.new_profile_page. Autocomplete, typo-resistant, the works.

Feature Flags On The Frontend

If you have a helper class like the above, the simplest way forward is codegen. Have a script that spits out Typescript definitions for an object, and dump feature flag data into the frontend on a page load. This relies on the idea that feature flags are both not that high in cardinality, and are often read (so you probably “always want them”).

With something like that you should be able to have something like a flags global in your Javascript, and you can do if (flags.new_profile_page) { ... }.

Injecting feature flags in this way makes them quite visible to anyone snooping around, but it doesn’t introduce a new request. On top of that, in an Single Page Application-like environment you’ll get some caching on those values.

Administering Feature Flags

You really want an administration UI to manage feature flags. You could do everything via the command line, but if customer success staff can self-manage feature flags, you suddenly unlock a valuable tool for getting customers to be happy. Especially when it comes to getting early feedback on a feature designed around some key users’ issues.

Feature Flag Cardinality, Or How To Avoid A Billion Tests

Feature flags are great in helping to quickly merge in code without having long-running feature branches. Other people will be exposed to changes more quickly, and will see the code when operating on other things.

But perhaps you are working on a feature that does take time. Your code is getting merged into main but is effectively hidden from the app by default for months and months.

In production your flag is turned off for most people. So your test suite, servicing its goal as a check against issues in production, should test code while the flag is disabled.

During development, one still wants to test functionality though! If you have written some code and are getting it reviewed, a lack of tests is already a big red flag. So you clearly want to have tests with the flag enabled!

How can we have the cake and eat it too? There’s the heavy handed approach: run the test suite with the flag off and run it with the flag on. This doubles your test costs, but is going to guarantee that your tests run in both conditions.

But if you have any long-running flags, then you don’t have 2 configurations, but 2^n configurations, for the n flags hiding in-development features. If we add too much pressure to using feature flags, developers will simply opt for less-safe ways of moving forward.

Trying to navigate testing is hard, because you often end up having to make judgement calls about what is worth testing and what isn’t. Here are some ideas that can make this easier.

Only Use Feature Flags To Control Displaying

Since feature flags are often paired with user-visible changes, you will want to insert flags to control whether some new feature is displayed.

many people will add that code to control a template, but then also add access control to make sure that APIs are inaccessible to users with the flag off, or to limit batch processing changes to users with the flag on.

if you can get away with it, do not add functionality-related feature flag checks. Instead of checking whether a user has a flag before accessing an API, simply assume that a user accessing the API, using a new field, or otherwise indicating knowledge of the feature has the right to use it.

This seems very cowboy, but here are some important advantages:

during development people might toggle a feature on and off repeatedly. Writing code this way will reduce the number of code paths that blow up because of “User had feature but no longer has it” related issues
If your feature flags are really just display-related, you will also be able to limit feature flag toggle issues in tests. Your tests should be able to cover the gamut of behavior for the most part, and only a handful of UI-based tests are going to be trickier.

Make Feature Changes Into Additions

One might argue that if you are introducing a behavioral change, then no matter what at a conceptual level you will have a “feature flag”-like thing going on somewhere. Somewhere there will be a branch, and a risk of testing going wrong.

The best way to avoid that sort of issue is to design changes as additions rather than changes. Rely on existing, well trodden paths. Generalize on existing code paths. One would correctly identify that making specific code more general is risky! The original design probably relied on the fact that it wasn’t being used in specific scenarios, and now it would be. But you’ll actually be testing your changes properly, and can roll them out incrementally.

One example: A project had access control features that were linked to user groups, for example “this client’s information is only available to members of this user group”. We wanted to allow people to also specify specific users as ‘owners’ of the client information.

Instead of adding some functionality that would query multiple tables (the user group tables and the user tables), we instead went with a feature change where we would have single-member groups automatically created for a user.

Almost all of our code remained unchanged. We added a bit of data to tell us that a specific user group is one of those “individuals”, added creation of these groups to the user signup. And during testing of this feature, our feature flag was limited to controlling whether the UI would include these groups in API responses.

This did introduce new risks of bugs (things like the group name potentially being desynced from the user name), but the regularity meant that we could rely on existing code for almost all of the backend logic. We could do data backfills without revealing the change, but people could test it and not have knock-on effects elsewhere.

Make Feature Flag Removals A Thing

It’s important to insist on getting feature flags removed when possible. The most extreme version of this is the “time bomb” code. Feature flags have an associated timestamp, and in CI things blow up if you’re past the timestamp. If you want to be a bit less global about this, you can add a username, and only their CI builds fails.

At that point the author of the feature flag has two options:

Do the work to remove the feature flag
Bump the timestamp

The second one requires some justifications and, in a word, shame. So often that can just lead to a quick feature flag cleanup (that hopefully doesn’t lead to nasty surprises).

Can We Make Feature Flags Too Easy To Use?

Feature flags being completely frictionless help with the following:

they make it easier to get code merged into main . Separating “merging good code” from “releasing a feature” means that we don’t have long-lived branches that aren’t getting tested and suffering from bitrot
They make it easier to get features in front of internal staff. Feature flags let us share ongoing work without requiring staff to set up a local environment, or having a bunch of “staging only flags”
They give us options for shipping features early to key users and getting feedback.

But feature flags can definitely be a crutch. The biggest issue is that even if we limit feature flags to visual aspects, local environments can quickly drift from production. A feature gets enabled in production, and the flag is enabled, but we forget to send up the flag removal for a couple months… this ends with most developers actually working with some old version of the app relative to production!

This is mostly a diligence thing. Celebrating turning features on (and also removing the feature flags) can help with this. Sometimes there needs to be decisions to outright remove things that have been behind flags for years. But ultimately feature flags are a key tool for improving iteration cycles, removing more blockers from hitting the “merge” button.