Leveraging static code analysis in a Ruby CI pipeline
Continuous integration, or CI, refers to the culture and the technologies that enable continuously merging features and bug fixes into the main branch of the codebase. Code changes are incorporated immediately after testing, rather than being bunched with other updates in a waterfall release process.
Similarly, continuous delivery, or CD, refers to automatically deploying the changed code to the target environment, such as pre-production branches to staging and master to production. CD picks up where CI left off, and as such, they often go hand in hand.
Static code analysis typically falls under the CI aspect of a CI/CD pipeline. Taking the example of a small Ruby project, we'll be setting up a CI workflow to analyze code quality using static analysis in the following areas:
- Consistency, with the widely adopted Ruby style guide.
- Layout, such as unjust spacing or misaligned indentation.
- Linting, such as inadequate permissions or redundant operations.
- Security analysis, such as the use of unsafe methods.
Creating a sandbox
Let's make a fresh new directory for our adventure today. Initialize a Git repository in the folder and check it into GitHub as we'll be using GitHub Workflows as our CI tool (more on this later).
Setting up Ruby and Bundler
You probably already have Ruby installed on your computer. But I find it best not to use pre-installed Ruby for a couple of reasons. Reason the first, it's generally much older than the latest stable version, and you don't want to miss out on Ruby's newest features, do you? Reason the second, it's relatively easy to break your system by installing, removing, or updating a critical package.
Don't fret; there's a solution — RVM. I won't get into details of RVM here, but using it, you can install and manage several versions of Ruby on a system, keeping the system Ruby pristine.
Next, we set up Bundler, a fantastic package manager for Ruby. We need it to keep track of our projects dependencies. It's extremely straightforward to install.
To ensure that our project gems remain localized to our project, we can set up Bundler to install Gems at a given path. Create a directory .bundle/ and a config file within the directory with the following content.
With this configuration, Bundler will install all gems inside a .gems/ folder inside the current project folder proj/. Add both directories .bundle/ and .gems/ to your .gitignore file so that they are not checked into VCS.
Getting familiar with Rubocop
For analyzing the code quality in all the areas we mentioned above, we will be using Rubocop, one of the finest linters available for Ruby. Rubocop comes with an extensive collection of rules, called 'cops', organized in groups, called 'departments', based on their functionality.
To install, add the line to your Gemfile and run bundle install.
To list all the offences in any given file or directory, just pass the names as arguments to rubocop.
Rubocop is also capable of autocorrecting most of the errors it reports, which is incredibly helpful. To enable autocorrection, pass the -a flag. Passing -A uses a more aggressive auto-correct mode, which is not advisable unless you are sure of what you are doing.
You will need to prepend bundle exec to these commands if Rubocop is not globally installed.
Now we get to the fun part, scripting in Ruby. Take this script, for example. It takes a file name as an argument and prints said file's content to STDOUT, very similar to the cat command (hence the name).
It is formatted poorly, violates many rules from the style guide, and even has a couple of gaping security flaws. We'll fix those pretty soon but first, let's do a preliminary scan with Rubocop and observe the output.
Rubocop found 7 offenses, of which 6 it can correct automatically, labeled as [Correctable] in the output above.
Picking the infrastructure
Choices abound when it comes to picking a CI/CD infrastructure provider. From Travis CI, a darling of open-source developers, to Jenkins, the tool of choice for enterprise teams who'd rather self-host their customized solution, dev-ops engineers are spoilt for choice.
But the simplest of these in my experience has been GitHub workflows, a GitHub-native solution allowing you to set up entire chains of jobs, described as YAML files, that can be initiated based on specific triggers. We can use them throughout the CI/CD pipeline, from running checks on PRs before merge to deploying the code after. There are hundreds of pre-built actions (many officially maintained) that take the effort out of setting up end-to-end pipelines.
Naturally, we'll be using GitHub workflows as our CI pipeline infrastructure. The end goal is to have linting as a check on our PRs and commits. Only PRs that pass the checks would be mergeable.
Adding lint workflow
Let's see what the workflow file would look like in our case. Create a new directory .github/, create another directory within this one named workflows/, and in this directory, create a file named lint.yml.
The lint workflow consists of a single job. The job is fired on every push event to the master branch and performs three steps:
- actions/checkout: checks out the code repository
- sets up Ruby version 3.0 and the latest compatible version of Bundler
- uses Bundler to install all packages in the Gemfile
- run: runs Rubocop on the entire current working directory
Once the job completes, we get our outcome. In our case, it's a big red cross. The workflow execution fails because Rubocop found issues in the code. The logs reveal the same message we had seen earlier, and lists offences identified by Rubocop.
😢 We'll get there, eventually.
We want our check to pass. Nobody likes failing checks. Coming back to our local setup, let's use Rubocop's autofix feature to quickly resolve these issues. First, we should let Rubocop take care of the automatic stuff using the -a flag.
Rubocop solved most of the issues reported, including one issue introduced during the autofix process itself! With 6 of the 7 problems are already fixed, we've managed to shave off ~70% of our work with zero effort input.
What's left is one unsafe autofix and one security vulnerability that Rubocop cannot fix automatically. We can take care of those:
- The frozen string literal comment is missing. That's a reasonable thing to add to the file, so we'll let Rubocop add it using the stronger autofix flag -A.
- Kernel::open is a prominent security risk, especially so when passing tainted input to the function. We've talked about this (and other security pitfalls before). Replacing that with File.open should do the trick.
Commit and push. We're green now! At this point, you should pat yourself on the back for a job well done.
Now that you're at peak code quality, we need to ensure it stays that way. This means that we need to ensure that no PR negatively affects our codebase quality. To run the check on every incoming PR, add the pull_request event to our lint workflow.
Now, to test that our check is working as expected, we need to make a PR with some code that Rubocop would flag. Let's refactor the lines in our script, that are concerned with reading the file, to use a block.
Check out a new branch from master. Commit and push to this branch and open a PR. You'll see that the checks fail, and thus the PR cannot be merged unless overridden by an administrator.
🚧 Our check is working just fine!
🧠 Brain-teaser: Can you identify why the updated code, using a block, is being flagged by Rubocop?
🤷♂️ Hint: If you want a hint, here's the Rubocop output for the PR:
🧑💻 Answer: Defining list inside the block means that it is not accessible outside the block. This makes the assignment useless and will lead to a bug in the subsequent use of the variable.
💡 Lesson: Though indirectly, code analysis can sometimes also help identify potential bugs!
While we invested considerable time and effort, we now have checks and actions set up to monitor code quality in our repo. But what if we didn't have time to spare or didn't want to spend the effort? We're busy developers, after all!
Consider using DeepSource. It continuously scans the code on every commit, and on every pull request, through various static code analyzers (including linters and security analyzers), and can automatically fix some of them. DeepSource also has its custom-built analyzers for most languages that are constantly improved and kept up-to-date.
It’s incredibly easy to set up! You need only add a .deepsource.toml file in your repository root, and DeepSource will pick it up. It takes much less effort and the end result is way more polished that setting up several workflows in GitHub.
Automate the tedium away
CI/CD pipelines are quintessential to the agile development workflow. The ability to add features, squash bugs, and get the changes in production instantly can make a very significant difference. Startups live or die based on how often they iterate.
Integrating static analysis into the CI pipeline ensures that only the cleanest and most compliant code makes its way into production. For something that takes very little time to set up, consumes a minuscule amount of resources, and does not significantly affect test/build timings, static analysis can add a lot of confidence to your build process.
Confident iterations await. Till next time!