Automate code formatting in Python
Right off the bat, let's clarify an important distinction. Writing code that works and writing good code are two very different things. The former is a skill while the latter is an art form, and this difference distinguishes great programmers from the crowd.
When we talk of good code, the word 'good' is vague by design. That's because there are no rules set in stone about what makes code good or bad. All we have are some abstract guidelines such as readability:
Programs are meant to be read by humans and only incidentally for computers to execute.
-- Abelson & Sussman
Those are two MIT professors with pretty solid credentials. Identifying the quality of code is an intuition that is honed over time, through practice and experience. Code reviews go a long way towards this goal.
Code review is the process where developers more experienced than yourself read through your code and suggest improvements that could make it better. These suggestions can improve performance, incorporate new language features, patch security oversights, or correct code style.
... when it comes to code review.
But manual code reviews are expensive. The time it takes someone to read your code is time they did not spend building awesome stuff. They are also error-prone and by no means comprehensive. There are human limits to knowledge and memory.
Enter automation. Automated code reviews are faster, less error-prone, and more in-depth than their manual counterparts.
Let's dive deep into the process with a sample project containing a single Python file. We'll riddle the file with issues and then set up a workflow that automatically finds and fixes these problems.
A. Codebase with PEP-8 violations
Before we automate code review, let's first write some code to review. Here is a sample program that lists primes up to a given number. It might hurt to look at, which is good as it means your code-olfactory senses are working.
There are a lot of problems with this script. Don't get me wrong, it works, but it's not good.
- Extraneous space before function parenthesis
- 2-space indentation
- No spaces around operators
- Single line around functions
- Uppercase F in f-strings
B. Black code formatter
Black is a popular code formatter for Python. It is capable of automatically reformatting your Python files, fixing all code style violations. What's neat is that it is pretty opinionated and can't be configured much, making it ideal for automation.
So let's install Black, while also taking the opportunity to set up some first-class dependency management with a tool I personally love, Pipenv. Running the following command creates two files, Pipfile and Pipfile.lock, in the root of the repo and installs Black as a dev dependency.
Running Black without any args formats all files in the repo directly. Apart from reformatting your files, it has two less dangerous modes.
--check: In this mode, Black purely checks if there are any code style violations. The return code is 0 if there are no violations and non-zero if there are any.
--diff: In this mode, Black shows the changes it will make without actually making them. This mode is helpful if you want to inspect the changes before they are actually made.
Code review can be split in two parts: the interesting part where you solve big picture issues and the mundane parts where you identify non-idiomatic snippets and code style violations. Let's automate the boring parts of the code review.
1. Set up Git hooks
We just saw how incredible Black is. Wouldn't it be awesome if Black ran automatically every time you were to commit your code? It's possible, with Git hooks. Git hooks are programs that run on your codebase when you execute certain Git commands. The 'pre-commit' hook is of particular interest to us because we'd like the lint check to take place before the commit is created, and prevent the commit from being created if it fails.
Autohooks is a Python package for managing these hooks via Python. It has a plugin system that enables integration with tools like Black. Let's install both Autohooks and the Black-integration plugin.
Make a pyproject.toml file in the root of your repo with the following content.
Activate the hooks and run the check function to see if everything works fine.
Try git commit-ing the poorly written source code. Ha, gotcha! Here's how things will go down:
- The pre-commit hook will be initiated.
- The top-level hooks by Autohooks will be invoked.
- Autohooks will then execute Black with the --check argument.
- Black will return a non-zero code because the file contains errors.
- Git will halt the commit operation.
💡 Pro-tip: You can bypass the hook with the --no-verify flag on git commit. It's not recommended but we're not the police, so do what you want.
💡 Pro-tip: You can remove the --check argument and then every time you commit, Black will reformat your files for you.
2. Lint check using GitHub Actions
The main drawback of Git hooks is that they are local. In a project with multiple contributors, there might be people who might forget to activate the hook or actively try to bypass them. In such cases, the solution is to run the lint check on the remote repo itself. GitHub Actions provides an extremely versatile solution for running the lint.
Create the file lint.yml inside the .github/workflows directory of the repo.
This workflow checks out the repository, sets up Python, installs Black and then lints the files. By default, this action runs Black with the --check and --diff arguments. Once you set up linting, all future commits and PRs will pass through Black.
Our list_primes.py file will fail the test. The logs will show both the failing files as well as the diffs for those files (because of the -diff argument). That'll come in pretty handy when you're fixing the violations.
3. Lint fixes using GitHub Workflows
That brings us to the one aspect we still haven't addressed yet. Black is capable of reformatting files, but so far, we have only used it to detect issues and present diffs. We've not tapped into Black's full potential yet.
How about we turn the automation up to 11 and update our GitHub workflow to automatically fix code style violations?
This workflow uses the same two steps as the previous one, that is checking out the repo and setting up Python. Then we install Pipenv and use that to install Black on the system. The lint-action action runs Black and then commits the changed files. This creates a new commit with the same changes that Black had shown in the diff!
💡 Pro-tip: You can customize the author's name and email and also the message of the commit. Just add the following to the with key of the action. It's an opportunity to be creative!
Now that you can be assured that code pushed to the repo is free from style guide deviations, this frees up code reviewers to take a bigger picture look at the code and leave the minutiae to HAL.
End-to-end automation with DeepSource
Phew! That was a lot of work, wasn't it? But guess what, we only looked at style guide violations for now. Adding more features like looking for security gaps, finding possible bugs, and making complex refactors would make this a very long exercise. But it does not have to be.
You could also consider automating this entire audit, review and refactor process with DeepSource that can scan your code on every commit, and for every pull request, through several tools (including linters and security analyzers) and can automatically fix many issues. DeepSource also has its custom-built analyzers for most languages that are constantly improved and kept up-to-date.
It’s incredibly easy to set up! You need only add a .deepsource.toml file in your repository root, and DeepSource will pick it up. Much less effort than what we just went through.
Finis coronat opus
Code reviews are a very important learning tool for new developers. It's a way to transfer knowledge, experience, and convention from a senior developer to a junior, a way to understand how even the code that was deemed as final can be made better, cleaner, and more efficient.
I'd venture so far as to say that code reviews are one of the best learning tools for developers, and they are wasted on mundane things like code style. Introduce a little automation and make every code review count.
They know enough who know how to learn.
When done well, a code review can be a truly educational experience. Automation cannot replace that. What automation can do is take the mundane out of the process.