- Published on
Graphs in static analysis
- Authors
- Name
- Laxman Vijay
Static analysis is an important part of compiler engineering. Static analysis are used in code optimization, security analysis, bug detection, etc. Graphs are important part of compilers. In this article, lets have a look at the various graphs used in compiler engineering especially for static analysis.
Here is a list of the various graphs used in static analysis:
- Control flow graphs (CFG)
- Dominator tree
- Abstract syntax tree (AST)
- Class hierarchy graph
- Call graph
- Type propagation graph
- Pointer assignment graph
- Interprocedural CFG (Supergraph)
- Exploded supergraph
- Value flow graph
Control flow graph:
A control flow graph (CFG) is a visual representation of the flow of control through the program. It shows the sequence of opertions that are to be performed during the execution of the program. CFGs are in fact a visual representation of an intermediate representation (IR) of the original source code. The CFGs are directed graphs.
A CFG consists of nodes which are called basic blocks. The edges between these basic blocks represent the flow of control across these basic blocks. An edge could be created due to an if condition or a loop in the program.
Each basic block can have an entry and exit point. The entire graph can also have one entry and exit point.
Depending on the usecase, merge points after a branch can also be modelled as a separate node.
The construction of an efficient IR/CFG usually requires usage of a Single Static Assignment (SSA) form. SSA form does not allow reassignment to variables and usage of a phi-node at merge points.
Analyzing the flow of data through a CFG is called data flow analysis. Analysis which use the CFG are also called flow sensitive analysis because they utilize the program flow to perform the analysis.
Dominator tree:
A dominator tree is a variation of control flow graphs (CFG) that clarifies the domination relationship between basic blocks in a CFG. A block A is dominated by block B if every path from CFG's entry point to B must pass through A.
Dominance can be modelled as pre and post dominance.
Dominator trees are used to compute the SSA form and also in certain data flow analyses.
Abstract syntax tree
An Abstract Syntax Tree (AST) is used to represent the program structure. The AST is typically used in static analysis for performing type analysis.
An AST is just a representation of the program. It does not however represent the flow of control. Thus, analysis which use AST are called flow insensitive analysis.
Type analysis is a flow insensitive analysis. Usually an algorithm called unification is used.
AST construction is the first step of compiler construction. This is done by using a lexer and a parser. The CFG and the IR is later built from the AST.
Class hierarchy graph:
Class hierarchy is a DAG structure that represents the classes and its hierarchy. This is common in object oriented languages like Java.
Class hierarchy information can be obtained through Reflection and then a graph can be constructed using the information. Such a hierarchy graph is useful in the later stages of static analysis such as call graph construction.
Call graph:
A call graph is a multigraph which models the method/function invocations from a program. It is also directed.
Generating the call graph is the first step for interprocedural analysis. Call graphs can be generated statically or dynamically. Static generation involves generation of the call graph by traversing the source code directly. Dynamic call graph is generated on the fly during the program exection. This is because for large programs there can be several thousands of method invocations and statically building the graph might become resource constrained.
There are several algorithms to construct a call graph. Many of them depend on the presence of a class hierarchy graph. Some examples of construction algorithms include Class hierarchy analysis, Rapid type analysis, Variable type analysis, Declared type analysis and SPARK.
Type propagation graph:
Type propagation graph is used in the construction of call graph using variable type analysis. Type propagation graphs model the variable types and their aliases.
Pointer assignment graph:
A Pointer assignment graph (PAG) is a directed graph used in pointer analysis to model how pointers are assigned, referenced and dereferenced in a program.
Such PAG graphs are then used for call graph construction.
Interprocedural CFG:
Interprocedural CFGs (ICFGs) are also called as Supergraphs in static analysis. Supergraphs model interprocedural flow in a program. They are a natural extension to CFGs and in addition to the regular flow of the CFG, function calls within the CFG are also considered.
This adds new types of edges like call flow and return flow. Such analysis that include interprocedural flows are also called context sensitive analysis.
Exploded supergraph:
Exploded supergraphs are a variation of ICFGs where the ICFG is extended to all the variables in the analysis domain. This is used in distributive functional interprocedural analyses like IFDS (Interprocedural finite distributive subset) and IDE (Interprocedural distributive environment) frameworks.
Functional approach in interprocedural analysis is a method wherein a summary of a function is construction (This includes the input parameters, return values, etc.).
This summary is customizable and can be modified per needs. Frameworks like IFDS, IDE, Sparse IFDS, etc. model these summaries as graphs. The analysis then becomes a graph reachability problem.
Value flow graph:
A value flow graph models the usage of variables and its aliases throughout the program. Most of the times, the effect of a variable can be directly modelled by observing its usage. This model helps optimizing certain analysis like IFDS.