tl;dr don't turn on all the checks for a subset of files, check as many files as possible and add checks on a module-by-module basis

This post is a bit of a response to some online complaining by Armin Ronacher about mypy. While I agree in the abstract that mypy-s payoff feels less nice than Typescript's, I spent 30 minutes or so looking at Sentry's codebase and felt that Armin's problems are not "just" mypy problems, but problems compounded by how Sentry is choosing to adopt mypy.

Adding typing to Python is a pain in the butt in many situations. The payoffs don't come nearly as quickly as with Typescript, and things like Django just are really tricky to get working.

Having said that, there are very good payoffs once you get to very high coverage. On larger codebases, though, not all paths are created equal.

Unfortunately, Sentry's mypy.ini is, at the time of this writing, a good example of how not to do this.

The most important thing, by far, is you want mypy to look at all of your files, as much as possible.

Imagine you have three Python modules, A, B, and C. A is your platonic ideal of a fully typed module. B has some work to do, and C is kind of cowboy land.

Many projects will say "OK, we will run mypy on A"

[mypy]
  files = src/A/

disallow_any_generics=True
disallow_untyped_defs=True
disallow_untyped_calls=True
# 100 other amazing correctness and style options

This will make sure A stays clean, but you won't get any checking of the usage of A in B or C.

So you might go to the next step, of including B.

[mypy]
  files = src/A/,
          src/B/

# all the important correctness options

But now you hit a bunch of errors, because B is only partially cleaned up!

This is the trap of the perfectionist using any form of linting tool like mypy. You want all the good options, so you turn them on. But then you can only cover a subset of files.

But you could have 100% of the checks on 10% of the code, or.... 50% of the checks on 50% of the code? It's foolish to pretend there's some objective metric here, but I think the general idea is sound.

If you instead do something like:

[mypy]
files = src/A/,
        src/B/

disallow_any_generics=True
disallow_untyped_defs=True
disallow_untyped_calls=True
[mypy-B.*]
disallow_untyped_calls=False
disallow_untyped_defs=True

You will now at least get some type checking on some of the code.

But what you really actually want to do, to try and get src/C/ into the game, is:

[mypy]
files = src/

disallow_any_generics=True
disallow_untyped_defs=True
disallow_untyped_calls=True
[mypy-B.*]
disallow_untyped_calls=False
disallow_untyped_defs=True
[mypy-C.*]
disallow_untyped_calls=False
disallow_untyped_defs=False
disallow_any_generics=False

the important part here being that maybe we have to disable some extra checks to get C into mypys set of checks, even if it's only a fragment.

This obviously looks very tedious. I would recommend writing a template that generates some of these for people who have loads of modules. That leaves most of the tedium at the "configuring mypy per module" step, rather than the "spend a bunch of time handling Any false positives".

disallow_untyped_defs is something that feels very wrong to set to False, but we shouldn't forget that mypy will do checking of function bodies! You don't want to just have good function definitions, you want those usages to be checked in other places, even if that type checking process is incomplete.

A point about Any: the unfortunate reality is that right now mypy will often end up making many things into Any when code is incompletely transitioned into type-conformance, and that cascades into a lot of places. Because of this, disallowing any usage conflicts with things like allowing missing imports. Again, the point is that you want code that can be checked, to be checked. Being agressive about Any usage as a pre-requisite to pulling code into mypy's checks will lead to a lot of friction.

Any of the disallow_*_any_* options will directly conflict with the objective of getting good coverage of mypy. Disabling that will get you some checks, though more on the level of pylint than Typescript in some cases.

There is some weird side effect of adopting mypy this way: sometimes you make a very minor change of improving/adding a function signature, and suddenly you uncover 10 little bugs. So long as, as a team, you are willing to let people fix what's reasonable, and otherwise punt on problems that are hard to not slow things down too much, you can end up slowly, but surely, getting to a nice set of typed code. And, of course, all of your new code can be fresh and clean.

Some other random tips:

  • assert isinstance(thing, SomeClass) is a tried-and-true way of type narrowing. If you really care about assert-stripping, do if not isinstance(thing, SomeClass): throw ValueError(...). Union problems can go away real easily this way

  • TypedDict is not that fun ergonomically. It does what it says it will do, but you really have to read the PEP to not just lose all value. Sometimes just opting for dataclasses (or tuples!) is much nicer. More recent Python versions make tuples nicer to write out.

  • It's important to consider how much you care about types in your test code. I know I'm saying to cover a huge chunk of code, but, well... test code will be run and do its thing. It might be worth having less strict things there.