Security Code Review Using CAT.NET - Part 2

Hi Andreas Fuchsberger here again......

How does CAT.NET work?

As I mentioned in Part 1 here, CAT.NET is an information-flow type static analysis tool using an implementation of tainted-variable analysis.

Tainted-variable analysis is an integrity problem in which that tries to identify whether less-trusted data obtained from the user might influence other data that the system trusts. Clearly, to do this analysis, sources and sinks of possibly tainted data need to be identified. For managed code, this amounts to identifying methods that originate a tainted value and methods that use a possibly tainted value. For CAT.NET a number of XML of user editable configuration files is used to define sources and sinks. Then CAT.NET needs to find how information is stored in a variable and where it is used later in any other module of the application.

CAT.NET uses the Common Compiler Infrastructure (CCI) which is used extensively within Microsoft for building compiler-like tools.CCI is an integrated set of components that encapsulate the logic that compilers and related development tools typically have in common. CCI has many features but firstly for CAT.NET it has the ability to read the Common Intermediate Language (CIL) used to store binary code in a .NET Framework assembly directly.

Further to perform its analysis CAT.NET needs to build of a specific heap analysis called flow-insensitive points-to analysis. This analysis computes a “may point to” relation over a loaded assembly or assemblies, we’ll call this relation pointsTo, where pointsTo(o1.f, o2) means that the field f of the object named o1 might refer to the object named o2 in some execution of the program. A may-point-to relation is also computed for local variables: pointsTo(υ, o) means that the local variable υ might refer to the object named o. The relation pointsTo(υ.f, o) holds if there exists an o’ such that pointsTo(υ, o’) and pointsTo(o’.f, o) .

CAT.NET uses a combination of Control Flow and Data Flow Graphs and to build the relation for every object in every module supplied to CAT.NET.

Control Flow Graphs

A control flow graph (CFG) is a representation of a program where contiguous regions of code without branches, known as basic blocks, are represented as nodes in a graph and edges between nodes indicate the possible flow of the program. A CFG shows the sequence of events as a program executes.

Data Flow Graphs

A data-flow graph (DFG) is a graph which represents operations and data dependencies and the order the operations are performed. As such any algorithm consists of a number of ordered operations. However simple DFGs are not able to represent loops or sub routine branching. Data Flow Graphs are therefore are often augmented with control-flow information and are then known as Control Data Flow Graphs (CDFG). A DFG consists of nodes and arcs, where the each node represented has an input or an output port and an arc represents a connection between and input and output port.

Data Flow Super Graphs

Defined by CAT.NET a data flow super graph is a special type of data flow graph that contains data flow information at both an intra-procedural and inter-procedural level.

CCI provides functions for building the Data Flow and Control Flow Graphs on an intra-procedural level and CAT.NET uses these to build a Data Flow Super Graph. The Data Flow Super Graph that CAT.NET builds covers all objects across methods in all modules on an inter-procedural level.

Once the Data Flow Super Graph is built, CAT.NET iterates for each of the XML that makes up the CAT.NET rules across Data Flow Super Graph to find all data flow paths between the sources and the sinks. It does this by traversing each path in the Data Flow Super Graph and colouring (i.e. assigning a constant to a traversed path) the graph according to the variables use.

Before reporting a source is linked to a sink, CAT.NET checks how the variable is transformed and filters out valid transformations. Variables that remain tainted once a complete source to sink path has been traversed are reported as a possible vulnerability in the original code including file name and line numbers.

In the next post I will explain the semantics of the XML rules and how to modify the supplied rule set.