Scanning JavaScript Code For Security Vulnerabilities

July 7, 2023

In early 2023, we added support for taint analysis in our JavaScript analyzer. This feature has allowed us to add significantly more powerful security issues, and improve the coverage for existing checks.

In this post, I’ll go over how static analysis tools perform taint checking, and how we’ve implemented it in our analyzer.

Anatomy of a linter

To understand how security scanning works, we must first understand how regular code is scanned for bug-risks and other issues. Most developers will use a linter that runs in the background, and points out suspicious code.

ESLint is the industry standard tool for linting JavaScript projects. It’s open source with a permissive license, can be extended with plugins, and has a well documented API. For these reasons, it is also an excellent case study to understand how linters operate.

Nearly all linters will first read a program’s source code, and build a data structure called an abstract syntax tree (AST). When studying the properties of a program, its syntax tree is more convenient than a mere string that stores the source code. The process of transforming source strings into tree-like data structures is called “parsing”.

Most modern programming languages will have official tooling to produce these ASTs from strings that represent source code. JavaScript, unfortunately, doesn’t have built-in support for parsing its own source code.

As such, we rely on libraries that implement parsing algorithms like espree, tree-sitter, or meriyah to create ASTs for us.

Consider the following code snippet:

if (condition) {
  // ... code
}

You should be able to visualize this if-statement’s structure like this:

                      Program
                         │
                         │
                    If Statement
                         │
      ┌──────────────────┼───────────────────┐
      │                  │                   │
test:Identifier   then:BlockStatement   alternate:null
                         │
                         │
                      [empty]

Parsing the snippet shown above will yield this object:

;({
  // Every JS AST represents a "Program",
  // which is a collection of statement nodes.
  type: 'Program',
  statements: [
    {
      // every "node" in the syntax tree will have a "type" field
      // that shows the kind of statement or expression
      // a node represents.
      type: 'IfStatement',
      test: {
        type: 'Identifier',
        name: 'condition'
      },
      consequent: {
        type: 'BlockStatement',
        body: [
          /* our if statement has an empty body */
        ]
      },
      // there is no else-block
      alternate: null
    }
  ]
})

This represents a tree-like data structure where every node has a “type”, and then some named properties that represent its children.

You can play with this online tool that lets you type in arbitrary JavaScript code, and view the corresponding syntax trees.

Linters like ESLint will have functions that traverse these trees to detect patterns that are indicative of code smells. Often, these functions are grouped together by the type of node which they visit. Consider this lint, for instance:

const noEmptyIfRule = {
  // .. boilerplate for ESLint rules,
  create(context) {
    return {
      IfStatement(ifNode) {
        if (ifNode.consequent.type === 'BlockStatement' && ifNode.consequent.body.length === 0) {
          context.report({
            message: 'Should not have blank if-statements',
            node: ifNode
          })
        }
      }
    }
  }
}

This object, called a “rule” by ESLint, communicates to the linter that it should run the provided logic when visiting IfStatement nodes. When the AST traverser sees an if statement, it calls this function, which then raises an issue if the if-statement has an empty then-block.

Most lints, however, are not so simple. You can only do so much by looking at the mere shape of a user’s program. Often times, you want to track where a value ends up. Here is an example of SQL injection in an express.js codebase:

const app = express()
const db = new Database({
  /* some db config */
})

app.get('/user', async (req, res) => {
  const name = req.params.username
  const query = `select * from users where name = ${name}`
  const result = db.runQuery(query)
  if (!result) {
    return res.statusEnd(404)
  }
  res.json(result)
})

If the username URL parameter contains a malicious string, this app is now exposed to SQL injection.
However, to detect this vulnerability, we have to:

Look for all function calls that match db.runQuery.
Ensure that db is an instance of a database class from some database library.
Ensure that the argument to runQuery is a value that contains tainted data in some form or another.

The 3rd step is the hardest, and requires more machinery on top of regular tree traversals.

A taint analysis primer

A sizeable amount of security breaches can be narrowed down to a single design flaw: insecure data reaching sensitive spots in a program.

The goal with vulnerability scanning, then, is to track the flow of malicious data and sound the alarm if it reaches a location that might break the application.

Taint can arise from any data source external to the program, especially if that source isn’t under the developer’s control. In JavaScript applications, there are a few common patterns that invite tainted data inside a program:

A server response returned by fetch, axios.get, or XMLHttpRequest.
Query parameters of the URL in window.location.
The body of an HTTP request (req.body in express.js).
Environment variables that may have been set by a different program (process.env.<var name>).
User input obtained from an HTML form (FormData), etc.

Any function call or variable that may yield tainted data is called a taint source.

Next, we concern ourselves with where this tainted data might end up in a vulnerable program. Like we saw in the previous section, database queries are a common target. We’d also like to prevent tainted data from enter other function calls like eval, child_process.exec, vm.run, etc. These commonly targeted function calls are taint sinks.

In most programs, a variable that stores tainted data will go through a few function calls before it actually reaches the sink: In the below code snippet, we’re fetching some JSON data from a URL, ensuring its not malicious by calling a validator function, and then appending it the body of a web-page:

const data = await fetch('www.sketchy-sketchers.com/item/031')
const json = await data.json()

const sanitizedData = validate(json)
if (sanitizedData) {
  document.body.innerHTML += '' + sanitizedData.description + ''
}

Even though we made a call to fetch, the above code snippet can be considered safe since it passes json to a validator function before using it. Functions that eliminate taint from a data-flow path are called sanitizers.

To prevent SQL injection, most applications will sanitize query parameters arising from user input in a similar fashion.

On the contrary, there are might be function calls that do not remove taint. For example, the call to find in the snippet below does not perform any sanitization:

const resp = await fetch('www.fishy-fishermen.com/users/all')
const userList = await resp.json()

const doug = userList.find((user) => user.name === 'Devious doug')
if (doug) {
  document.getElementById('fisherman-of-the-day').innerHTML = doug
}

Functions that receive and return tainted values without sanitizing them are called taint propagators.

To perform taint analysis, we begin by tracking how tainted data is passed around after being generated. If it only goes from one propagator to another before reaching a sink, we raise an alert. If there is at least one taint sanitizer standing between the sink and the source, the code is considered safe (except if the data is tainted again after being sanitized).

Taint analysis for JavaScript code

Most taint checkers do multiple passes on the program being scanned.

A flow graph is built from the AST that stores information about the code paths taken by variables.
All taint sources are identified, and every data-flow node originating in such a place is treated as the root for a graph traversal.
Finally, we see if any set of edges in the data-flow graph connect a source and a sink.

Doing such computation on every single file in a highly dynamic language like JavaScript is very resource intensive. Moreover, there isn’t a standard way to build CFGs from JavaScript code.

The JavaScript analyzer used by DeepSource performs taint analysis on demand. Instead of finding every taint source and following its data flow in the entire project, we do a bottom-up scan on the AST once a sink has been found.

This decision was made based on an observation - while the source for tainted data can be anything, a vulnerability is usually classified by the nature of its sink:

console.log: Log injection.
db.query: SQL injection.
child_process.spawn: Shell injection.
element.innerHTML: Cross site scripting.

We do not build explicit data flow graphs, or track the flow of every single variable in a project. Instead, we parse every file in a repository to build an AST, and then traverse it until we see a pattern that looks like a taint sink.

Once we’ve identified a possible sink for tainted data, we look at the expression that is going into the sink. Often times, it will be a non-trivial expression. For example:

element.innerHTML = `Lucky number: ${luckyNumber}`

We see that a template expression containing one variable (luckyNumber) is being assigned to the sink. To determine if this assignment is vulnerable, we’ll have to backtrack and follow the data-flow that luckyNumber takes in the program. If we find a taint source while tracing the origin of the expression, we’ll raise an issue. However, if we see a taint sanitizer before any taint source is seen, we can exit early.

Since a single tainted value can be used in multiple places, we cache the results from our traversal in a table. If luckyNumber is used again in a different sink, we can perform an inexpensive table lookup instead of tracing the its data flow again.

Implementation

The JS analyzer embeds an in-house linter, within which we write lints that detect code smells. The linter itself is similar to ESLint, and is backwards compatible with most of ESLint’s plugins.

For security issues, we supplement the linter with an additional module called the TaintChecker. Just like the linter has individual lints, the taint checker has “rules”. Every rule is a set of instructions that identifies a taint source.

A lint simply detects patterns that indicate the presence of taint sinks. Once we’ve arrived at the sink, we’ll consult the TaintChecker, which will then inspect all values being passed to the sink and discern if they’re secure.

The overall architecture is explained by this diagram:

To understand it better, lets have a look at a real example. Following is a lint that detects SQL injection in a JavaScript program:

import { JSLint } from 'linter'
import ESTree from 'estree'

// A "JSLint" describes AST-queries on JavaScript ASTs.
// In this case, we're looking for a call-expression node
// and then running some checks on it.
const detectSqlInjection: JSLint = (linterContext) => {
  // The taint checker is a property of the linter-context
  // that can be acessed by any lint function.
  const { taintChecker } = linterContext

  return {
    CallExpression(node: ESTree.CallExpression) {
      // a "CallExpression" node has two children:
      // 1. callee -> the function being called.
      // 2. arguments -> the list of arguments its being passed.
      const { callee, arguments: args } = node

      if (
        isDbQuery(callee) &&
        args.length >= 0 &&
        // In most security checks, this function does the heavy lifting.
        taintChecker.isExpressionTainted(args[0])
      ) {
        linterContext.raiseIssue(node, 'Possible SQL Injection!')
      }
    }
  }
}

/**
 * @returns `true` if the input node is a call to some function like
 * `db.runQuery`.
 */
function isDbQuery(node: ESTree.Expression): boolean {
  // ...
}

This is a simplified version of the check we use to detect SQL injection. Of importance to us, is the call to isExpressionTainted. Its internals are implemented as shown below:

class TaintChecker {
  /**
   * @returns `true` if [node] is (possibly) tainted.
   */
  public isExpressionTainted(node: ESTree.Node) {
    // This AST node has already been covered by a previous run
    if (this.cache.has(node)) {
      return this.cache.get(node)
    }

    let isTainted = false
    switch (node.type) {
      case 'BinaryExpression': {
        isTainted = this.isExpressionTainted(node.left) || this.isExpressionTainted(node.right)
        break
      }

      case 'Identifier': {
        // identifiers are handled separetely by taint rules
        isTainted = this.isIdTainted(node)
        break
      }

      case 'TemplateLiteral': {
        // a template literal is tainted if at least one of its sub-expressions are.
        isTainted = node.expressions.some(this.isExpressionTainted.bind(this))
        break
      }
      // ... cases to handle all expression nodes
    }

    // cache the return value for future early-exits.
    this.cache.set(node, isTainted)
    return isTainted
  }
}

Much like a linter, the taint checker recursively visits an AST using the visitor pattern. Except, it does that in reverse. We start from a node and traverse the AST upwards, to find out where an expression originates.

For every node that we visit, we check if there’s a corresponding rule defined for it. A rule is a function that accepts a node (and other utils) as input, and returns true if the node is tainted:

type TaintRule = (_this: TaintChecker, node: ESTree.Node, ctx: LinterContext) => boolean

The taint checker maintains a mapping from node-types to rule sets:

class TaintChecker {
  // maps a node-type to a list of taint rules.
  // e.g: "CallExpression" -> [rule1, rule2, rule3]
  private readonly ruleSet = new Map()

  // ...
}

Every time an identifier is visited by the isExpressionTainted method, we call a helper called isIdTainted:

class TaintChecker {
  public isIdtainted(node: ESTree.Identifier): boolean {
    const rules = this.ruleSet.get('Identifier')
    if (!rules) return
    // An Identifier is tainted if at least one rule flags it as such.
    return rules.some((rule) => rule(this, node, this.linterContext))
  }
  // ...
}

Similarly, we can have a set of rules for call-expressions (to discern if they propagate taint), and other AST nodes.

Extending a taint checker with rules

If you’ve used enough of ESLint, you’ll know that it allows users to author their own rules. Since our taint-checker follows a similar architecture, we can extend its coverage by writing more custom taint rules.

Notice how we’re covering our bases with a two sets of rules:

A set of lints to cover taint sinks.
A set of taint rules to cover taint sources.

This design implies a many-to-many relationship between sources and sinks, and has another added benefit: When we add a new taint source to the ruleset, we end up improving coverage for every issue.

Increasing coverage for security scanners

We’ve built a solid foundation for detecting taint. Now then, where do we look for vulnerabilities to source our taint rules from?

Thankfully, good people at the OWASP foundation have curated a list of top 10 critical security risks. This standard is updated periodically, and serves a good starting point for those looking to secure their applications against common attacks.

The MITRE corporation also maintains a database of CWEs that vulnerability scanners can use to evaluate coverage.

Currently, DeepSource supports standards like OWASP and SANS, and will continue increase CWE coverage for its security scanning mechanisms.

‍

More from DeepSource

Support for Bitbucket Data Center

Bitbucket Server has come to end-of-life. We’re here to help with your transition to Data Center or Cloud.

From Zero To Secure

A guide to risk-free SAST implementation for AppSec teams.

Get started with DeepSource

DeepSource is free forever for small teams and open-source projects. Start analyzing your code in less than 2 minutes.

CHOOSE AN ACCOUNT

GitHub

GitLab

Bitbucket

Read product updates, company announcements, how we build DeepSource, what we think about good code, and more.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

More from DeepSource

Get started with DeepSource

Newsletter