Tips for writing glob patterns in DeepSource configuration
Test patterns and exclude patterns are optional, yet important parts of the .deepsource.toml configuration file. They are written as glob patterns. These patterns play an essential role in decreasing noise and false positives in issues raised by DeepSource for a project.
What is a Glob?
Glob or "Shell Globing" is the process of writing glob patterns that match files in a filesystem. Glob patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command rm -rf textfiles/*.txt removes (rm) all the files with names ending in .txt from the folder textfiles. Here, * is a wildcard character and when combined with .txt results in *.txt which is a glob pattern.
In addition to matching filenames, globs are also used for matching arbitrary strings (wildcard matching) like? which matches a single character.
But since, we are focussing on writing test-file and exclude-file patterns we will be focussing on wildcards such as *, ** and / that are widely used for writing glob patterns for matching files in a project.
- A * matches any string, including the empty string. Like in the above example textfiles/*.txt where '*' matches all files with names ending in .txt
- A / is a common character that is widely used as the path separator.
- ** is the feature known as globstar that matches all files and zero or more directories and subdirectories. If followed by a / it matches only directories and subdirectories. To work that way it must be the only thing inside the path part e.g. /Demo/**.py will not work that way.
Globs in DeepSource configuration
DeepSource configuration has two sections which take glob patterns: exclude_patterns and test_patterns.
- exclude_patterns are a list of glob patterns that should be excluded when the analyses are run. These patterns should be relative to the repository's root.
- test_patterns are a list of glob patterns that should be marked as tests or containing test files. These patterns should also be relative to the repository's root.
By default, DeepSource checks every file and runs analysis on all of them. Setting the exclude_patterns and test_patterns configuration in .deepsource.toml gives DeepSource more context about your code. DeepSource can then selectively analyze the files that are important.
A quick example and why you need globs
When writing Python code, if you don't mark your test files in test-patterns, DeepSource will detect usage of the assert statement. assert provides an easy way to check some condition and fail execution, it’s very common for developers to use it to check validity. But, when the Python interpreter is invoked with the -O (optimize) flag, the assert statements are removed from the bytecode.
So, if assert statements are used for user-facing validation in production code, the block won’t be executed at all — potentially opening up a security vulnerability. It is recommended to use assert statements only in tests. Hence, to avoid raising issues like this by DeepSource, it is important to add test files in test_patterns.
Here is a sample configuration with some examples of test and exclude patterns.
Missing or wrong patterns add noise
Let us look at what happens when we write test_patterns incorrectly. FOSSASIA in one of their projects named open-event-server wrote test-file patterns as */tests/** which resulted in the total number of issues to be around 1700+.
But a simple fix with a PR that updated test-file pattern to tests/** resulted in reducing issues raised by DeepSource to around 1500.
Writing glob patterns correctly
Let us now write our own glob patterns for test and exclude patterns. We will use the Glob Tester Tool to test our patterns before using them in the configuration.
Say, we have a project structure like this:
Go to this example of Glob Tester Tool to play with this file structure.
Test file patterns
All the important directories here are sub-directories of /src/. The glob pattern to match all files under the /src/tests/ directory should be written as */tests** — where */ denotes matching a string followed by a path separator.
So, if we use */tests/** as one of the test-file patterns in DeepSource configuration, DeepSource will look for a directory named tests under the root directory, and then recursively look for all files in it.
Using Glob Tester Tool to find matches for file paths based on glob pattern
Configuration with this value will look like:
Protip: If there are multiple tests sub-directories in a project, the same glob pattern won't work. Your pattern will change to **/tests/** — **/, in the beginning, matches every occurrence of the tests directory recursively in all sub-directories.
Exclude file patterns
We can write exclude_patterns to ignore directories like examples and migrations. Your glob patterns will look like:
Combining them together, this is how the complete configuration in .deepsource.toml will look like: