-
Notifications
You must be signed in to change notification settings - Fork 6
Control Flow Graph
This document was generated from 'src/documentation/print-cfg-wiki.ts' on 2025-05-06, 18:48:23 UTC presenting an overview of flowR's control flow graph (v2.2.12, using R v4.4.3). Please do not edit this file/wiki page directly.
flowR produces three main perspectives of the program: 1) a normalized version of the AST and 2) a dataflow graph, and 3) a control flow graph (CFG). flowR uses this CFG interweaved with its data flow analysis and for some of its queries (e.g., to link to the last call in a Call-Context Query).
Please note that, mostly due to historical reasons, the control dependencies that are stored directly within the DFG provide only a partial view of the CFG. While they provide you with information on the conditional execution of vertices, they do not encode the order of execution. In contrast, the CFG describes a complete view of the program's control flow.
Tip
If you want to investigate the Control Flow Graph,
you can use the :controlflow*
command in the REPL (see the Interface wiki page for more information).
By default, this view does not use basic blocks as, for example, R allows unconditional jumps to occur in spots where conventional languages would assume expressions (e.g., if-conditions).
Yet, by using :controlflowbb*
you can inspect the CFG with basic blocks (although you have to keep in mind that now, there can be a value flow between basic blocks)
For readability, we structure this wiki page into various segments:
For now, let's look at a CFG for a program without any branching:
x <- 2 * 3 + 1
The corresponding CFG is a directed, labeled graph with two types of edges (control and flow dependencies).
flowchart RL
n7(["`RExpressionList (7)`"])
n0(["`RSymbol (0)
#34;x#34;`"])
n1(["`RNumber (1)
#34;2#34;`"])
n2(["`RNumber (2)
#34;3#34;`"])
n3(["`RBinaryOp (3)
#34;2 #42; 3#34;`"])
n3-exit((3-exit))
n4(["`RNumber (4)
#34;1#34;`"])
n5(["`RBinaryOp (5)
#34;2 #42; 3 #43; 1#34;`"])
n5-exit((5-exit))
n6(["`RBinaryOp (6)
#34;x #60;#45; 2 #42; 3 #43; 1#34;`"])
n6-exit((6-exit))
n7-exit((7-exit))
n6 -.->|"FD"| n7
n2 -.->|"FD"| n1
n1 -.->|"FD"| n3
n3-exit -.->|"FD"| n2
n4 -.->|"FD"| n3-exit
n3 -.->|"FD"| n5
n5-exit -.->|"FD"| n4
n5 -.->|"FD"| n0
n0 -.->|"FD"| n6
n6-exit -.->|"FD"| n5-exit
n7-exit -.->|"FD"| n6-exit
style n7 stroke:cyan,stroke-width:6.5px; style n7-exit stroke:green,stroke-width:6.5px;
(The analysis required 13.1 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
)
Important
As the edges describe dependencies they point in the inverse order of execution (which is very helpful for backward analyses)! The visitors abstract away from this and there is no harm in considering an inverted CFG. Yet, you should keep this in mind!
Every normalized node of the normalized AST that has any relevance to the
execution is added and automatically linked using its id (similarly to vertices of the dataflow graph).
Expressions, such as 2 * 3
get an additional node with an artificial id that ends in -exit
to mark whenever their calculation is over.
To gain a better understanding, let's have a look at a simple program with a single branching structure:
flowchart RL
n6(["`RExpressionList (6)`"])
n5["`RIfThenElse (5)
#34;if(u) 3 else 2#34;`"]
n5-condition[[5-condition]]
n5-exit((5-exit))
n0(["`RSymbol (0)
#34;u#34;`"])
n2(["`RExpressionList (2)
#34;3#34;`"])
n1(["`RNumber (1)
#34;3#34;`"])
n2-exit((2-exit))
n4(["`RExpressionList (4)
#34;2#34;`"])
n3(["`RNumber (3)
#34;2#34;`"])
n4-exit((4-exit))
n6-exit((6-exit))
n5 -.->|"FD"| n6
n1 -.->|"FD"| n2
n2-exit -.->|"FD"| n1
n3 -.->|"FD"| n4
n4-exit -.->|"FD"| n3
n5-condition -.->|"FD"| n0
n2 -->|"CD (TRUE)"| n5-condition
n4 -->|"CD (FALSE)"| n5-condition
n0 -.->|"FD"| n5
n5-exit -.->|"FD"| n2-exit
n5-exit -.->|"FD"| n4-exit
n6-exit -.->|"FD"| n5-exit
style n6 stroke:cyan,stroke-width:6.5px; style n6-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 3.9 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
if(u) 3 else 2
Here, you can see the if
node followed by the condition (in this case merely u
) that then splits into two branches for the two possible outcomes.
The if
structure is terminated by the corresponding -exit
node (see the structure section for more details).
For you to compare, the following shows the CFG of an if
without an else
branch:
flowchart RL
n6(["`RExpressionList (6)`"])
n5["`RIfThenElse (5)
#34;if(u || v) 3#34;`"]
n5-condition[[5-condition]]
n5-exit((5-exit))
n0(["`RSymbol (0)
#34;u#34;`"])
n1(["`RSymbol (1)
#34;v#34;`"])
n2(["`RBinaryOp (2)
#34;u || v#34;`"])
n2-exit((2-exit))
n4(["`RExpressionList (4)
#34;3#34;`"])
n3(["`RNumber (3)
#34;3#34;`"])
n4-exit((4-exit))
n6-exit((6-exit))
n5 -.->|"FD"| n6
n1 -.->|"FD"| n0
n0 -.->|"FD"| n2
n2-exit -.->|"FD"| n1
n3 -.->|"FD"| n4
n4-exit -.->|"FD"| n3
n5-condition -.->|"FD"| n2-exit
n4 -->|"CD (TRUE)"| n5-condition
n2 -.->|"FD"| n5
n5-exit -.->|"FD"| n4-exit
n5-exit -->|"CD (FALSE)"| n5-condition
n6-exit -.->|"FD"| n5-exit
style n6 stroke:cyan,stroke-width:6.5px; style n6-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 2.8 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
if(u || v) 3
Activating the calculation of basic blocks produces the following:
flowchart RL
subgraph nbb-5-condition [Block bb-5-condition]
direction RL
n5-condition[[5-condition]]
n2-exit((2-exit))
n5-condition -.-> n2-exit
n1["`RSymbol (1)
#34;v#34;`"]
n2-exit -.-> n1
n0["`RSymbol (0)
#34;u#34;`"]
n1 -.-> n0
n2["`RBinaryOp (2)
#34;u || v#34;`"]
n0 -.-> n2
n5["`RIfThenElse (5)
#34;if(u || v) 3#34;`"]
n2 -.-> n5
n6["`RExpressionList (6)`"]
n5 -.-> n6
end
subgraph nbb-4-exit [Block bb-4-exit]
direction RL
n4-exit((4-exit))
n3["`RNumber (3)
#34;3#34;`"]
n4-exit -.-> n3
n4["`RExpressionList (4)
#34;3#34;`"]
n3 -.-> n4
end
subgraph nbb-6-exit [Block bb-6-exit]
direction RL
n6-exit((6-exit))
n5-exit((5-exit))
n6-exit -.-> n5-exit
end
nbb-6-exit -.->|"FD"| nbb-4-exit
nbb-6-exit -->|"CD (FALSE)"| nbb-5-condition
nbb-4-exit -->|"CD (TRUE)"| nbb-5-condition
style nbb-5-condition stroke:cyan,stroke-width:6.5px; style nbb-6-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 3.0 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplifications: unique-cf-sets
, to-basic-blocks
.
if(u || v) 3
Which is probably much more readable if compacted (although the reconstucted code can sometimes be slightly mislieading as flowR tries its best to make it syntactically correct and hence add closing braces etc. which are technically not part of the respective block):
flowchart RL
nbb-5-condition[["`Basic Block (bb-5-condition)
if(u || v) #123; #125;`"]]
nbb-4-exit[["`Basic Block (bb-4-exit)
3`"]]
nbb-6-exit[["`Basic Block (bb-6-exit)
`"]]
nbb-6-exit -.->|"FD"| nbb-4-exit
nbb-6-exit -->|"CD (FALSE)"| nbb-5-condition
nbb-4-exit -->|"CD (TRUE)"| nbb-5-condition
style nbb-5-condition stroke:cyan,stroke-width:6.5px; style nbb-6-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 2.2 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplifications: unique-cf-sets
, to-basic-blocks
and render a simplified/compacted version.
if(u || v) 3
The control flow graph also harmonizes with function definitions, and calls:
flowchart RL
n9(["`RExpressionList (9)`"])
n0(["`RSymbol (0)
#34;f#34;`"])
n5-params[[5-params]]
n5-exit((5-exit))
n5(["`RFunctionDefinition (5)
#34;function() #123; 3 #125;#34;`"])
n4(["`RExpressionList (4)`"])
n3(["`RNumber (3)
#34;3#34;`"])
n4-exit((4-exit))
n6(["`RBinaryOp (6)
#34;f #60;#45; function() #123; 3 #125;#34;`"])
n6-exit((6-exit))
n7(["`RSymbol (7)
#34;f()#34;`"])
n8["`RFunctionCall (8)
#34;f()#34;
calls:#91;5#93;`"]
n8-name[[8-name]]
n8-exit((8-exit))
n8-resolved-call-exit((8-resolved-call-exit))
n9-exit((9-exit))
n6 -.->|"FD"| n9
n3 -.->|"FD"| n4
n4-exit -.->|"FD"| n3
n5-params -.->|"FD"| n5
n4 -.->|"FD"| n5-params
n5-exit -.->|"FD"| n4-exit
n5 -.->|"FD"| n0
n0 -.->|"FD"| n6
n6-exit -.->|"FD"| n5
n8 -.->|"FD"| n6-exit
n7 -.->|"FD"| n8
n8-name -.->|"FD"| n7
n8-exit -.->|"FD"| n8-name
n8-resolved-call-exit -.->|"FD"| n8-exit
n8-resolved-call-exit -.->|"FD"| n5-exit
n9-exit -.->|"FD"| n8-resolved-call-exit
style n9 stroke:cyan,stroke-width:6.5px; style n9-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 6.8 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
f <- function() { 3 }
f()
You can produce your very own control flow graph with extractCFG
.
The ControlFlowGraph
class describes everything required to model the control flow graph, with its edge types described by
CfgEdge
and its vertices by CfgSimpleVertex
.
However, you should be aware of the ControlFlowInformation
interface which adds some additional information the the CFG
(and is used during the construction of the CFG as well):
-
ControlFlowInformation
Summarizes the control information of a programDefined at ./src/control-flow/control-flow-graph.ts#L335
/** Summarizes the control information of a program */ export interface ControlFlowInformation<Vertex extends CfgSimpleVertex = CfgSimpleVertex> extends MergeableRecord { /** all active 'return'(-like) unconditional jumps */ returns: NodeId[], /** all active 'break'(-like) unconditional jumps */ breaks: NodeId[], /** all active 'next'(-like) unconditional jumps */ nexts: NodeId[], /** intended to construct a hammock graph, with 0 exit points representing a block that should not be part of the CFG (like a comment) */ entryPoints: NodeId[], /** See {@link ControlFlowInformation#entryPoints|entryPoints} */ exitPoints: NodeId[], /** the control flow graph summarizing the flow information */ graph: ControlFlowGraph<Vertex> }
View more (MergeableRecord)
-
Defined at ./src/util/objects.ts#L11
export type MergeableRecord = Record<string, unknown>
-
Defined at ./src/statistics/features/common-syntax-probability.ts#L29
string: Record<string, Measurement>
-
Defined at ./src/statistics/features/common-syntax-probability.ts#L12
export interface CommonSyntaxTypeCounts<Measurement=bigint> { // just a helper to collect all as well (could be derived from sum) total: Measurement, // counts whenever you pass more than one node that is not sensible for any other category multiple: Measurement, // similar to multiple, but only counts empty (bodies etc.) empty: Measurement, // in case of a = x etc. withArgument: Measurement, // arguments used without value noValue: Measurement, // does include t and f, as well as NULL etc. (any special symbol) singleVar: Record<string, Measurement> number: Record<number, Measurement> // only explicit integers integer: Record<number, Measurement> complex: Record<number, Measurement> string: Record<string, Measurement> logical: Record<typeof RTrue | typeof RFalse, Measurement>, call: Record<string, Measurement>, unnamedCall: Measurement, // binop includes all assignments! binOp: Record<string, Measurement>, unaryOp: Record<string, Measurement>, // unknown content, records lexeme (can include break etc. for bodies), due to my oversight, this includes function definitions other: Record<string, Measurement> }
-
-
Defined at ./src/dataflow/graph/graph.ts#L386
unknown
-
-
To check whether the CFG has the expected shape, you can use the test function assertCfg
which supports testing for
sub-graphs as well (it provides diffing capabilities similar to assertDataflow
).
As the CFG may become unhandy for larger programs, there are simplifications available with simplifyControlFlowInformation
(these can be passed on to the extractCFG
function as well).
All vertex types are summarized in the CfgVertexType
enum which currently contains the following types:
-
MidMarker
(mid) -
EndMarker
(end) -
Statement
(stm) -
Expression
(expr) -
Block
(blk)
We use the CfgBasicBlockVertex
to represent basic blocks and separate
expressions (CfgExpressionVertex
) and statements (CfgStatementVertex
)
as control flow units with and without side effects (if you want to, you can see view statements as effectful expressions).
The markers (CfgMidMarkerVertex
and CfgEndMarkerVertex
)
indicate specific segments of larger expressions/statements (e.g., an if
which has a condition and its branches).
To signal these links, the expressions and statements contain information about the attached markers:
-
Defined at ./src/control-flow/control-flow-graph.ts#L51
interface CfgWithMarker extends CfgBaseVertex { /** mid-markers linked to this statement */ mid?: NodeId[] /** end-markers linked to this statement */ end?: NodeId[] }
View more (CfgBaseVertex)
-
CfgBaseVertex
A plain vertex in theControlFlowGraph
. Please useCfgSimpleVertex
to refer to all potential vertex types within the graph.Defined at ./src/control-flow/control-flow-graph.ts#L40
/** * A plain vertex in the {@link ControlFlowGraph}. * Please use {@link CfgSimpleVertex} to refer to all potential vertex types within the graph. */ interface CfgBaseVertex extends MergeableRecord { /** the type of the vertex */ type: CfgVertexType, /** the id of the vertex, for non-blocks this should directly relate to the AST node */ id: NodeId, /** child nodes attached to this one */ children?: NodeId[], /** if the vertex calls a function, this links all targets of this call */ callTargets?: Set<NodeId>, }
-
Defined at ./src/util/objects.ts#L11
export type MergeableRecord = Record<string, unknown>
-
Defined at ./src/statistics/features/common-syntax-probability.ts#L29
string: Record<string, Measurement>
-
Defined at ./src/statistics/features/common-syntax-probability.ts#L12
export interface CommonSyntaxTypeCounts<Measurement=bigint> { // just a helper to collect all as well (could be derived from sum) total: Measurement, // counts whenever you pass more than one node that is not sensible for any other category multiple: Measurement, // similar to multiple, but only counts empty (bodies etc.) empty: Measurement, // in case of a = x etc. withArgument: Measurement, // arguments used without value noValue: Measurement, // does include t and f, as well as NULL etc. (any special symbol) singleVar: Record<string, Measurement> number: Record<number, Measurement> // only explicit integers integer: Record<number, Measurement> complex: Record<number, Measurement> string: Record<string, Measurement> logical: Record<typeof RTrue | typeof RFalse, Measurement>, call: Record<string, Measurement>, unnamedCall: Measurement, // binop includes all assignments! binOp: Record<string, Measurement>, unaryOp: Record<string, Measurement>, // unknown content, records lexeme (can include break etc. for bodies), due to my oversight, this includes function definitions other: Record<string, Measurement> }
-
-
Defined at ./src/dataflow/graph/graph.ts#L386
unknown
-
-
-
Similarly, the markers contain a link to their root:
-
Defined at ./src/control-flow/control-flow-graph.ts#L66
export interface CfgWithRoot extends CfgBaseVertex { /** the vertex for which this is a marker */ root: NodeId }
View more (CfgBaseVertex)
-
CfgBaseVertex
A plain vertex in theControlFlowGraph
. Please useCfgSimpleVertex
to refer to all potential vertex types within the graph.Defined at ./src/control-flow/control-flow-graph.ts#L40
/** * A plain vertex in the {@link ControlFlowGraph}. * Please use {@link CfgSimpleVertex} to refer to all potential vertex types within the graph. */ interface CfgBaseVertex extends MergeableRecord { /** the type of the vertex */ type: CfgVertexType, /** the id of the vertex, for non-blocks this should directly relate to the AST node */ id: NodeId, /** child nodes attached to this one */ children?: NodeId[], /** if the vertex calls a function, this links all targets of this call */ callTargets?: Set<NodeId>, }
-
Defined at ./src/util/objects.ts#L11
export type MergeableRecord = Record<string, unknown>
-
Defined at ./src/statistics/features/common-syntax-probability.ts#L29
string: Record<string, Measurement>
-
Defined at ./src/statistics/features/common-syntax-probability.ts#L12
export interface CommonSyntaxTypeCounts<Measurement=bigint> { // just a helper to collect all as well (could be derived from sum) total: Measurement, // counts whenever you pass more than one node that is not sensible for any other category multiple: Measurement, // similar to multiple, but only counts empty (bodies etc.) empty: Measurement, // in case of a = x etc. withArgument: Measurement, // arguments used without value noValue: Measurement, // does include t and f, as well as NULL etc. (any special symbol) singleVar: Record<string, Measurement> number: Record<number, Measurement> // only explicit integers integer: Record<number, Measurement> complex: Record<number, Measurement> string: Record<string, Measurement> logical: Record<typeof RTrue | typeof RFalse, Measurement>, call: Record<string, Measurement>, unnamedCall: Measurement, // binop includes all assignments! binOp: Record<string, Measurement>, unaryOp: Record<string, Measurement>, // unknown content, records lexeme (can include break etc. for bodies), due to my oversight, this includes function definitions other: Record<string, Measurement> }
-
-
Defined at ./src/dataflow/graph/graph.ts#L386
unknown
-
-
-
In mermaid visualizations, we use rectangles for statements, rounded rectangles for expressions, circles for exit markers and double-lined rectangles for mid markers. Blocks are visualized as boxes around the contained vertices.
Note
Every CFG vertex has a NodeId
that links it to the normalized AST (although basic blocks will find no counterpart as they are a structuring element of the CFG.
Additionally, it may provide information on the called functions (in case that the current element is a function call).
Have a look at the CfgBaseVertex
interface for more information.
The ControlFlowGraph
uses two types of edges to represent the control flow, separated by the CfgEdgeType
enum
and the two interfaces: CfgFlowDependencyEdge
and CfgControlDependencyEdge
.
The most common edge is the flow dependency (FD) which simply signals that the source vertex happens after the target vertex in the control flow.
So x; y
would produce a flow dependency from y
to x
(additionally to the program-enveloping root expression list):
flowchart RL
n2(["`RExpressionList (2)`"])
n0(["`RSymbol (0)
#34;x#34;`"])
n1(["`RSymbol (1)
#34;y#34;`"])
n2-exit((2-exit))
n0 -.->|"FD"| n2
n1 -.->|"FD"| n0
n2-exit -.->|"FD"| n1
style n2 stroke:cyan,stroke-width:6.5px; style n2-exit stroke:green,stroke-width:6.5px;
(The analysis required 1.5 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
)
Control dependencies (CD) are used to signal that the execution of the source vertex depends on the taget vertex (which, e.g., is the condition of an if
statement or while
loop).
They contain additional information to signal when the source vertex is executed:
-
Defined at ./src/control-flow/control-flow-graph.ts#L109
interface CfgControlDependencyEdge extends MergeableRecord { label: CfgEdgeType.Cd /** the id which caused the control dependency */ caused: NodeId, /** is the control dependency satisfied with a true condition or is it negated (e.g., else-branch)? */ when: typeof RTrue | typeof RFalse }
View more (MergeableRecord)
-
Defined at ./src/util/objects.ts#L11
export type MergeableRecord = Record<string, unknown>
-
Defined at ./src/statistics/features/common-syntax-probability.ts#L29
string: Record<string, Measurement>
-
Defined at ./src/statistics/features/common-syntax-probability.ts#L12
export interface CommonSyntaxTypeCounts<Measurement=bigint> { // just a helper to collect all as well (could be derived from sum) total: Measurement, // counts whenever you pass more than one node that is not sensible for any other category multiple: Measurement, // similar to multiple, but only counts empty (bodies etc.) empty: Measurement, // in case of a = x etc. withArgument: Measurement, // arguments used without value noValue: Measurement, // does include t and f, as well as NULL etc. (any special symbol) singleVar: Record<string, Measurement> number: Record<number, Measurement> // only explicit integers integer: Record<number, Measurement> complex: Record<number, Measurement> string: Record<string, Measurement> logical: Record<typeof RTrue | typeof RFalse, Measurement>, call: Record<string, Measurement>, unnamedCall: Measurement, // binop includes all assignments! binOp: Record<string, Measurement>, unaryOp: Record<string, Measurement>, // unknown content, records lexeme (can include break etc. for bodies), due to my oversight, this includes function definitions other: Record<string, Measurement> }
-
-
Defined at ./src/dataflow/graph/graph.ts#L386
unknown
-
-
The extra caused
link signals the vertex that caused the control flow influence.
Example: if-else
flowchart RL
n6(["`RExpressionList (6)`"])
n5["`RIfThenElse (5)
#34;if(u) 3 else 2#34;`"]
n5-condition[[5-condition]]
n5-exit((5-exit))
n0(["`RSymbol (0)
#34;u#34;`"])
n2(["`RExpressionList (2)
#34;3#34;`"])
n1(["`RNumber (1)
#34;3#34;`"])
n2-exit((2-exit))
n4(["`RExpressionList (4)
#34;2#34;`"])
n3(["`RNumber (3)
#34;2#34;`"])
n4-exit((4-exit))
n6-exit((6-exit))
n5 -.->|"FD"| n6
n1 -.->|"FD"| n2
n2-exit -.->|"FD"| n1
n3 -.->|"FD"| n4
n4-exit -.->|"FD"| n3
n5-condition -.->|"FD"| n0
n2 -->|"CD (TRUE)"| n5-condition
n4 -->|"CD (FALSE)"| n5-condition
n0 -.->|"FD"| n5
n5-exit -.->|"FD"| n2-exit
n5-exit -.->|"FD"| n4-exit
n6-exit -.->|"FD"| n5-exit
style n6 stroke:cyan,stroke-width:6.5px; style n6-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 1.8 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
if(u) 3 else 2
Example: while-loop
flowchart RL
n4(["`RExpressionList (4)`"])
n0(["`RSymbol (0)
#34;u#34;`"])
n3["`RWhileLoop (3)
#34;while(u) b#34;`"]
n3-condition[[3-condition]]
n3-exit((3-exit))
n2(["`RExpressionList (2)
#34;b#34;`"])
n1(["`RSymbol (1)
#34;b#34;`"])
n2-exit((2-exit))
n4-exit((4-exit))
n3 -.->|"FD"| n4
n3 -.->|"FD"| n2-exit
n1 -.->|"FD"| n2
n2-exit -.->|"FD"| n1
n0 -.->|"FD"| n3
n3-condition -.->|"FD"| n0
n2 -->|"CD (TRUE)"| n3-condition
n3-exit -->|"CD (FALSE)"| n3-condition
n4-exit -.->|"FD"| n3-exit
style n4 stroke:cyan,stroke-width:6.5px; style n4-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 7.3 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
while(u) b
Please note that repeat loops do not have control dependencies, as they repeat their body unconditionally. Additionally, the control flow graph does not have to be connected. If you use a repeat without any exit condition, the corresponding exit markers are not reachable from the entry:
Example: repeat-loop (infinite)
flowchart RL
n6(["`RExpressionList (6)`"])
n3(["`RExpressionList (3)`"])
n2(["`RSymbol (2)
#34;b#34;`"])
n3-exit((3-exit))
n4["`RRepeatLoop (4)
#34;repeat #123; b #125;#34;`"]
n4-exit((4-exit))
n5(["`RSymbol (5)
#34;after#34;`"])
n6-exit((6-exit))
n4 -.->|"FD"| n6
n4 -.->|"FD"| n3-exit
n2 -.->|"FD"| n3
n3-exit -.->|"FD"| n2
n3 -.->|"FD"| n4
n5 -.->|"FD"| n4-exit
n6-exit -.->|"FD"| n5
style n6 stroke:cyan,stroke-width:6.5px; style n6-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 2.2 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
repeat { b }; after
Example: repeat-loop (with break)
flowchart RL
n11(["`RExpressionList (11)`"])
n8(["`RExpressionList (8)`"])
n7(["`RExpressionList (7)
#34;b; if(u) break;#34;`"])
n2(["`RSymbol (2)
#34;b#34;`"])
n6["`RIfThenElse (6)
#34;if(u) break#34;`"]
n6-condition[[6-condition]]
n6-exit((6-exit))
n3(["`RSymbol (3)
#34;u#34;`"])
n5(["`RExpressionList (5)
#34;break#34;`"])
n4["`RBreak (4)
#34;break#34;`"]
n7-exit((7-exit))
n8-exit((8-exit))
n9["`RRepeatLoop (9)
#34;repeat #123; b; if(u) break; #125;#34;`"]
n9-exit((9-exit))
n10(["`RSymbol (10)
#34;after#34;`"])
n11-exit((11-exit))
n9 -.->|"FD"| n11
n9 -.->|"FD"| n8-exit
n7 -.->|"FD"| n8
n2 -.->|"FD"| n7
n6 -.->|"FD"| n2
n4 -.->|"FD"| n5
n6-condition -.->|"FD"| n3
n5 -->|"CD (TRUE)"| n6-condition
n3 -.->|"FD"| n6
n6-exit -->|"CD (FALSE)"| n6-condition
n7-exit -.->|"FD"| n6-exit
n8-exit -.->|"FD"| n7-exit
n8 -.->|"FD"| n9
n9-exit -.->|"FD"| n4
n10 -.->|"FD"| n9-exit
n11-exit -.->|"FD"| n10
style n11 stroke:cyan,stroke-width:6.5px; style n11-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 2.3 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
repeat { b; if(u) break; }; after
In the context of a for-loop, the control dependency refer to whether the respective vector still has values to iterate over.
Example: for-loop
flowchart RL
n7(["`RExpressionList (7)`"])
n0(["`RSymbol (0)
#34;i#34;`"])
n6["`RForLoop (6)
#34;for(i in 1#58;10) b#34;`"]
n1(["`RNumber (1)
#34;1#34;`"])
n2(["`RNumber (2)
#34;10#34;`"])
n3(["`RBinaryOp (3)
#34;1#58;10#34;`"])
n3-exit((3-exit))
n5(["`RExpressionList (5)
#34;b#34;`"])
n4(["`RSymbol (4)
#34;b#34;`"])
n5-exit((5-exit))
n6-head[[6-head]]
n6-exit((6-exit))
n7-exit((7-exit))
n6 -.->|"FD"| n7
n6 -.->|"FD"| n5-exit
n2 -.->|"FD"| n1
n1 -.->|"FD"| n3
n3-exit -.->|"FD"| n2
n4 -.->|"FD"| n5
n5-exit -.->|"FD"| n4
n3 -.->|"FD"| n6
n0 -.->|"FD"| n3-exit
n6-head -.->|"FD"| n0
n5 -->|"CD (TRUE)"| n6-head
n6-exit -->|"CD (FALSE)"| n3-exit
n7-exit -.->|"FD"| n6-exit
style n7 stroke:cyan,stroke-width:6.5px; style n7-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 2.4 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
for(i in 1:10) b
If you generate the CFG with the extractCFG
function you can (and, if you want to gain inter-procedural information, should)
pass a matching dataflow graph to it to incorporate the dataflow perspective into the CFG.
The difference becomes obvious when we look at the code f <- function() b; f()
first without the dataflow graph:
flowchart RL
n7(["`RExpressionList (7)`"])
n0(["`RSymbol (0)
#34;f#34;`"])
n3-params[[3-params]]
n3-exit((3-exit))
n3(["`RFunctionDefinition (3)
#34;function() b#34;`"])
n2(["`RExpressionList (2)
#34;b#34;`"])
n1(["`RSymbol (1)
#34;b#34;`"])
n2-exit((2-exit))
n4(["`RBinaryOp (4)
#34;f #60;#45; function() b#34;`"])
n4-exit((4-exit))
n5(["`RSymbol (5)
#34;f()#34;`"])
n6["`RFunctionCall (6)
#34;f()#34;`"]
n6-name[[6-name]]
n6-exit((6-exit))
n7-exit((7-exit))
n4 -.->|"FD"| n7
n1 -.->|"FD"| n2
n2-exit -.->|"FD"| n1
n3-params -.->|"FD"| n3
n2 -.->|"FD"| n3-params
n3-exit -.->|"FD"| n2-exit
n3 -.->|"FD"| n0
n0 -.->|"FD"| n4
n4-exit -.->|"FD"| n3
n6 -.->|"FD"| n4-exit
n5 -.->|"FD"| n6
n6-name -.->|"FD"| n5
n6-exit -.->|"FD"| n6-name
n7-exit -.->|"FD"| n6-exit
style n7 stroke:cyan,stroke-width:6.5px; style n7-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 1.6 ms (including the normalization and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
f <- function() b; f()
And now, including dataflow information:
flowchart RL
n7(["`RExpressionList (7)`"])
n0(["`RSymbol (0)
#34;f#34;`"])
n3-params[[3-params]]
n3-exit((3-exit))
n3(["`RFunctionDefinition (3)
#34;function() b#34;`"])
n2(["`RExpressionList (2)
#34;b#34;`"])
n1(["`RSymbol (1)
#34;b#34;`"])
n2-exit((2-exit))
n4(["`RBinaryOp (4)
#34;f #60;#45; function() b#34;`"])
n4-exit((4-exit))
n5(["`RSymbol (5)
#34;f()#34;`"])
n6["`RFunctionCall (6)
#34;f()#34;
calls:#91;3#93;`"]
n6-name[[6-name]]
n6-exit((6-exit))
n6-resolved-call-exit((6-resolved-call-exit))
n7-exit((7-exit))
n4 -.->|"FD"| n7
n1 -.->|"FD"| n2
n2-exit -.->|"FD"| n1
n3-params -.->|"FD"| n3
n2 -.->|"FD"| n3-params
n3-exit -.->|"FD"| n2-exit
n3 -.->|"FD"| n0
n0 -.->|"FD"| n4
n4-exit -.->|"FD"| n3
n6 -.->|"FD"| n4-exit
n5 -.->|"FD"| n6
n6-name -.->|"FD"| n5
n6-exit -.->|"FD"| n6-name
n6-resolved-call-exit -.->|"FD"| n6-exit
n6-resolved-call-exit -.->|"FD"| n3-exit
n7-exit -.->|"FD"| n6-resolved-call-exit
style n7 stroke:cyan,stroke-width:6.5px; style n7-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 1.8 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
f <- function() b; f()
There are two important additions:
- A new exit marker, canonically suffixed with
-resolved-call-exit
signals that we are aware of the function call target. This marker always follows the exit marker of the function call and links not just the call but also the exit points of the function definition. - A new calls attribute attached to the function call vertex. This holds the
NodeId
of the function definitions that are called from this vertex.
For built-in functions that are provided by flowR's built-in configuration (see the interface wiki page) the CFG does not contain the additional information directly:
flowchart RL
n4(["`RExpressionList (4)`"])
n0(["`RSymbol (0)
#34;print(3)#34;`"])
n3["`RFunctionCall (3)
#34;print(3)#34;`"]
n3-name[[3-name]]
n3-exit((3-exit))
n2(["`RArgument (2)
#34;3#34;`"])
n2-before-value[[2-before-value]]
n1(["`RNumber (1)
#34;3#34;`"])
n2-exit((2-exit))
n4-exit((4-exit))
n3 -.->|"FD"| n4
n0 -.->|"FD"| n3
n3-name -.->|"FD"| n0
n2-before-value -.->|"FD"| n2
n1 -.->|"FD"| n2-before-value
n2-exit -.->|"FD"| n1
n2 -.->|"FD"| n3-name
n3-exit -.->|"FD"| n2-exit
n4-exit -.->|"FD"| n3-exit
style n4 stroke:cyan,stroke-width:6.5px; style n4-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 2.0 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
print(3)
This is due to the fact that the dataflow graph does contain the required call information (and there are no new control vertices to add as the built-in call has no target in the source code):
flowchart LR
1{{"`#91;RNumber#93; 3
(1)
*1.7*`"}}
3[["`#91;RFunctionCall#93; print
(3)
*1.1-8*
(1)`"]]
built-in:print["`Built-In:
print`"]
style built-in:print stroke:gray,fill:lightgray,stroke-width:2px,opacity:.8;
3 -->|"returns, argument"| 1
3 -.->|"reads"| built-in:print
linkStyle 1 stroke:gray;
R Code of the Dataflow Graph
The analysis required 1.4 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered unknown side effects (with ids: 3 (linked)) during the analysis.
print(3)
As mentioned in the introduction, our control flow graph does not use basic blocks by default and hence simply links all vertices independent of whether they have (un-)conditional jumps or not.
On the upside, this tells us the execution order (and, in case of promises, forcing order) of involved expressions and seamlessly handles cases like
x <- return(3)
. On the downside, this makes it hard to apply classical control flow graph algorithms and, in general, makes the graph much harder to read.
Yet, we can request basic blocks or transform an existing CFG into basic blocks using the convertCfgToBasicBlocks
function.
Any program without any (un-)conditional jumps now contains a single basic block:
flowchart RL
nbb-7-exit[["`Basic Block (bb-7-exit)
x #60;#45; 2 #42; 3 #43; 1`"]]
style nbb-7-exit stroke:cyan,stroke-width:6.5px; style nbb-7-exit stroke:green,stroke-width:6.5px;
R Code of the CFG
The analysis required 1.8 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplifications: unique-cf-sets
, to-basic-blocks
and render a simplified/compacted version.
x <- 2 * 3 + 1
While the CFG without basic blocks is much bigger:
flowchart RL
n7(["`RExpressionList (7)`"])
n0(["`RSymbol (0)
#34;x#34;`"])
n1(["`RNumber (1)
#34;2#34;`"])
n2(["`RNumber (2)
#34;3#34;`"])
n3(["`RBinaryOp (3)
#34;2 #42; 3#34;`"])
n3-exit((3-exit))
n4(["`RNumber (4)
#34;1#34;`"])
n5(["`RBinaryOp (5)
#34;2 #42; 3 #43; 1#34;`"])
n5-exit((5-exit))
n6(["`RBinaryOp (6)
#34;x #60;#45; 2 #42; 3 #43; 1#34;`"])
n6-exit((6-exit))
n7-exit((7-exit))
n6 -.->|"FD"| n7
n2 -.->|"FD"| n1
n1 -.->|"FD"| n3
n3-exit -.->|"FD"| n2
n4 -.->|"FD"| n3-exit
n3 -.->|"FD"| n5
n5-exit -.->|"FD"| n4
n5 -.->|"FD"| n0
n0 -.->|"FD"| n6
n6-exit -.->|"FD"| n5-exit
n7-exit -.->|"FD"| n6-exit
style n7 stroke:cyan,stroke-width:6.5px; style n7-exit stroke:green,stroke-width:6.5px;
(The analysis required 1.6 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
)
In a way, using the basic blocks perspective does not remove any of these vertices (we just usually visualize them compacted as their execution order should be "obvious").
The vertices are still there, as elems of the CfgBasicBlockVertex
:
flowchart RL
subgraph nbb-7-exit [Block bb-7-exit]
direction RL
n7-exit((7-exit))
n6-exit((6-exit))
n7-exit -.-> n6-exit
n5-exit((5-exit))
n6-exit -.-> n5-exit
n4["`RNumber (4)
#34;1#34;`"]
n5-exit -.-> n4
n3-exit((3-exit))
n4 -.-> n3-exit
n2["`RNumber (2)
#34;3#34;`"]
n3-exit -.-> n2
n1["`RNumber (1)
#34;2#34;`"]
n2 -.-> n1
n3["`RBinaryOp (3)
#34;2 #42; 3#34;`"]
n1 -.-> n3
n5["`RBinaryOp (5)
#34;2 #42; 3 #43; 1#34;`"]
n3 -.-> n5
n0["`RSymbol (0)
#34;x#34;`"]
n5 -.-> n0
n6["`RBinaryOp (6)
#34;x #60;#45; 2 #42; 3 #43; 1#34;`"]
n0 -.-> n6
n7["`RExpressionList (7)`"]
n6 -.-> n7
end
style nbb-7-exit stroke:cyan,stroke-width:6.5px; style nbb-7-exit stroke:green,stroke-width:6.5px;
(The analysis required 1.6 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplifications: unique-cf-sets
, to-basic-blocks
.
)
The benefit (for comprehensibility and algorithms) becomes more apparent when we look at a more complicated program:
f <- function(a, b = 3) {
if(a > b) {
return(a * b);
} else {
while(a < b) {
a <- a + 1;
}
return(a);
}
}
print(f(21) + f(42))
With basic blocks, this code looks like this:
flowchart RL
nbb-42-exit[["`Basic Block (bb-42-exit)
`"]]
nbb-42[["`Basic Block (bb-42)
f #60;#45; function(a, b=3) #123; #125;`"]]
nbb-40-condition[["`Basic Block (bb-40-condition)
#123; if(a #62; b) #123; #125; #125;`"]]
nbb-19-exit[["`Basic Block (bb-19-exit)
#123; return(a #42; b) #125;`"]]
nbb-38[["`Basic Block (bb-38)
#123; #123;#125; #125;`"]]
nbb-33-condition[["`Basic Block (bb-33-condition)
while(a #60; b) #123;#125;`"]]
nbb-32-exit[["`Basic Block (bb-32-exit)
#123; a #60;#45; a #43; 1 #125;`"]]
nbb-39-exit[["`Basic Block (bb-39-exit)
return(a)`"]]
nbb-2-exit[["`Basic Block (bb-2-exit)
function(a, b=3) #123; #125;`"]]
nbb-5-exit[["`Basic Block (bb-5-exit)
function(a, b=3) #123; #125;`"]]
nbb-48-exit[["`Basic Block (bb-48-exit)
print(f(21) #43; f(42))`"]]
nbb-52-exit[["`Basic Block (bb-52-exit)
f(42)`"]]
nbb-56-exit[["`Basic Block (bb-56-exit)
`"]]
nbb-42-exit -.->|"FD"| nbb-19-exit
nbb-42-exit -.->|"FD"| nbb-39-exit
nbb-40-condition -.->|"FD"| nbb-2-exit
nbb-40-condition -.->|"FD"| nbb-5-exit
nbb-19-exit -->|"CD (TRUE)"| nbb-40-condition
nbb-38 -->|"CD (FALSE)"| nbb-40-condition
nbb-33-condition -.->|"FD"| nbb-38
nbb-33-condition -.->|"FD"| nbb-32-exit
nbb-32-exit -->|"CD (TRUE)"| nbb-33-condition
nbb-39-exit -->|"CD (FALSE)"| nbb-33-condition
nbb-2-exit -.->|"FD"| nbb-42
nbb-5-exit -.->|"FD"| nbb-42
nbb-48-exit -.->|"FD"| nbb-42
nbb-52-exit -.->|"FD"| nbb-48-exit
nbb-52-exit -.->|"FD"| nbb-42-exit
nbb-56-exit -.->|"FD"| nbb-52-exit
nbb-56-exit -.->|"FD"| nbb-42-exit
style nbb-42 stroke:cyan,stroke-width:6.5px; style nbb-56-exit stroke:green,stroke-width:6.5px;
(The analysis required 10.6 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplifications: unique-cf-sets
, to-basic-blocks
and render a simplified/compacted version.
)
Now, without basic blocks, this is a different story...
The full CFG
flowchart RL
n56(["`RExpressionList (56)`"])
n0(["`RSymbol (0)
#34;f#34;`"])
n42-params[[42-params]]
n42-exit((42-exit))
n42(["`RFunctionDefinition (42)
#34;function(a, b = 3) #123;
if(a #62; b) #123;
#92;treturn(a #42; b);
#125; else #123;
#92;twhile(a #60; b) #123;
#92;t#92;ta #60;#45; a #43; 1;
#92;t#125;
#92;treturn(a);
#125;
#125;#34;`"])
n41(["`RExpressionList (41)`"])
n40["`RIfThenElse (40)
#34;if(a #62; b) #123;
#92;treturn(a #42; b);
#125; else #123;
#92;twhile(a #60; b) #123;
#92;t#92;ta #60;#45; a #43; 1;
#92;t#125;
#92;treturn(a);
#125;#34;`"]
n40-condition[[40-condition]]
n40-exit((40-exit))
n8(["`RSymbol (8)
#34;a#34;`"])
n9(["`RSymbol (9)
#34;b#34;`"])
n10(["`RBinaryOp (10)
#34;a #62; b#34;`"])
n10-exit((10-exit))
n19(["`RExpressionList (19)`"])
n13(["`RSymbol (13)
#34;return(a #42; b)#34;`"])
n18["`RFunctionCall (18)
#34;return(a #42; b)#34;`"]
n18-name[[18-name]]
n18-exit((18-exit))
n17(["`RArgument (17)
#34;a #42; b#34;`"])
n17-before-value[[17-before-value]]
n14(["`RSymbol (14)
#34;a#34;`"])
n15(["`RSymbol (15)
#34;b#34;`"])
n16(["`RBinaryOp (16)
#34;a #42; b#34;`"])
n16-exit((16-exit))
n17-exit((17-exit))
n19-exit((19-exit))
n39(["`RExpressionList (39)`"])
n38(["`RExpressionList (38)
#34;while(a #60; b) #123;
#92;t#92;ta #60;#45; a #43; 1;
#92;t#125;
#92;treturn(a);#34;`"])
n22(["`RSymbol (22)
#34;a#34;`"])
n23(["`RSymbol (23)
#34;b#34;`"])
n24(["`RBinaryOp (24)
#34;a #60; b#34;`"])
n24-exit((24-exit))
n33["`RWhileLoop (33)
#34;while(a #60; b) #123;
#92;t#92;ta #60;#45; a #43; 1;
#92;t#125;#34;`"]
n33-condition[[33-condition]]
n33-exit((33-exit))
n32(["`RExpressionList (32)`"])
n27(["`RSymbol (27)
#34;a#34;`"])
n28(["`RSymbol (28)
#34;a#34;`"])
n29(["`RNumber (29)
#34;1#34;`"])
n30(["`RBinaryOp (30)
#34;a #43; 1#34;`"])
n30-exit((30-exit))
n31(["`RBinaryOp (31)
#34;a #60;#45; a #43; 1#34;`"])
n31-exit((31-exit))
n32-exit((32-exit))
n34(["`RSymbol (34)
#34;return(a)#34;`"])
n37["`RFunctionCall (37)
#34;return(a)#34;`"]
n37-name[[37-name]]
n37-exit((37-exit))
n36(["`RArgument (36)
#34;a#34;`"])
n36-before-value[[36-before-value]]
n35(["`RSymbol (35)
#34;a#34;`"])
n36-exit((36-exit))
n38-exit((38-exit))
n39-exit((39-exit))
n41-exit((41-exit))
n2(["`RParameter (2)
#34;a#34;`"])
n1(["`RSymbol (1)
#34;a#34;`"])
n2-before-value[[2-before-value]]
n2-exit((2-exit))
n5(["`RParameter (5)
#34;b#34;`"])
n3(["`RSymbol (3)
#34;b#34;`"])
n5-before-value[[5-before-value]]
n4(["`RNumber (4)
#34;3#34;`"])
n5-exit((5-exit))
n43(["`RBinaryOp (43)
#34;f #60;#45; function(a, b = 3) #123;
if(a #62; b) #123;
#92;treturn(a #42; b);
#125; else #123;
#92;twhile(a #60; b) #123;
#92;t#92;ta #60;#45; a #43; 1;
#92;t#125;
#92;treturn(a);
#125;
#125;#34;`"])
n43-exit((43-exit))
n44(["`RSymbol (44)
#34;print(f(21) #43; f(42))#34;`"])
n55["`RFunctionCall (55)
#34;print(f(21) #43; f(42))#34;`"]
n55-name[[55-name]]
n55-exit((55-exit))
n54(["`RArgument (54)
#34;f(21) #43; f(42)#34;`"])
n54-before-value[[54-before-value]]
n45(["`RSymbol (45)
#34;f(21)#34;`"])
n48(["`RFunctionCall (48)
#34;f(21)#34;
calls:#91;42#93;`"])
n48-name[[48-name]]
n48-exit((48-exit))
n47(["`RArgument (47)
#34;21#34;`"])
n47-before-value[[47-before-value]]
n46(["`RNumber (46)
#34;21#34;`"])
n47-exit((47-exit))
n48-resolved-call-exit((48-resolved-call-exit))
n49(["`RSymbol (49)
#34;f(42)#34;`"])
n52(["`RFunctionCall (52)
#34;f(42)#34;
calls:#91;42#93;`"])
n52-name[[52-name]]
n52-exit((52-exit))
n51(["`RArgument (51)
#34;42#34;`"])
n51-before-value[[51-before-value]]
n50(["`RNumber (50)
#34;42#34;`"])
n51-exit((51-exit))
n52-resolved-call-exit((52-resolved-call-exit))
n53(["`RBinaryOp (53)
#34;f(21) #43; f(42)#34;`"])
n53-exit((53-exit))
n54-exit((54-exit))
n56-exit((56-exit))
n43 -.->|"FD"| n56
n40 -.->|"FD"| n41
n9 -.->|"FD"| n8
n8 -.->|"FD"| n10
n10-exit -.->|"FD"| n9
n18 -.->|"FD"| n19
n13 -.->|"FD"| n18
n18-name -.->|"FD"| n13
n17-before-value -.->|"FD"| n17
n15 -.->|"FD"| n14
n14 -.->|"FD"| n16
n16-exit -.->|"FD"| n15
n16 -.->|"FD"| n17-before-value
n17-exit -.->|"FD"| n16-exit
n17 -.->|"FD"| n18-name
n18-exit -.->|"FD"| n17-exit
n19-exit -.->|"FD"| n18-exit
n38 -.->|"FD"| n39
n33 -.->|"FD"| n38
n33 -.->|"FD"| n32-exit
n23 -.->|"FD"| n22
n22 -.->|"FD"| n24
n24-exit -.->|"FD"| n23
n31 -.->|"FD"| n32
n29 -.->|"FD"| n28
n28 -.->|"FD"| n30
n30-exit -.->|"FD"| n29
n30 -.->|"FD"| n27
n27 -.->|"FD"| n31
n31-exit -.->|"FD"| n30-exit
n32-exit -.->|"FD"| n31-exit
n24 -.->|"FD"| n33
n33-condition -.->|"FD"| n24-exit
n32 -->|"CD (TRUE)"| n33-condition
n33-exit -->|"CD (FALSE)"| n33-condition
n37 -.->|"FD"| n33-exit
n34 -.->|"FD"| n37
n37-name -.->|"FD"| n34
n36-before-value -.->|"FD"| n36
n35 -.->|"FD"| n36-before-value
n36-exit -.->|"FD"| n35
n36 -.->|"FD"| n37-name
n37-exit -.->|"FD"| n36-exit
n38-exit -.->|"FD"| n37-exit
n39-exit -.->|"FD"| n38-exit
n40-condition -.->|"FD"| n10-exit
n19 -->|"CD (TRUE)"| n40-condition
n39 -->|"CD (FALSE)"| n40-condition
n10 -.->|"FD"| n40
n40-exit -.->|"FD"| n19-exit
n40-exit -.->|"FD"| n39-exit
n41-exit -.->|"FD"| n40-exit
n1 -.->|"FD"| n2
n2-before-value -.->|"FD"| n1
n2-exit -.->|"FD"| n2-before-value
n2 -.->|"FD"| n42
n42-params -.->|"FD"| n2-exit
n42-params -.->|"FD"| n5-exit
n3 -.->|"FD"| n5
n5-before-value -.->|"FD"| n3
n4 -.->|"FD"| n5-before-value
n5-exit -.->|"FD"| n4
n5 -.->|"FD"| n42
n41 -.->|"FD"| n42-params
n42-exit -.->|"FD"| n41-exit
n42 -.->|"FD"| n0
n0 -.->|"FD"| n43
n43-exit -.->|"FD"| n42
n55 -.->|"FD"| n43-exit
n44 -.->|"FD"| n55
n55-name -.->|"FD"| n44
n54-before-value -.->|"FD"| n54
n45 -.->|"FD"| n48
n48-name -.->|"FD"| n45
n47-before-value -.->|"FD"| n47
n46 -.->|"FD"| n47-before-value
n47-exit -.->|"FD"| n46
n47 -.->|"FD"| n48-name
n48-exit -.->|"FD"| n47-exit
n48-resolved-call-exit -.->|"FD"| n48-exit
n48-resolved-call-exit -.->|"FD"| n42-exit
n49 -.->|"FD"| n52
n52-name -.->|"FD"| n49
n51-before-value -.->|"FD"| n51
n50 -.->|"FD"| n51-before-value
n51-exit -.->|"FD"| n50
n51 -.->|"FD"| n52-name
n52-exit -.->|"FD"| n51-exit
n52-resolved-call-exit -.->|"FD"| n52-exit
n52-resolved-call-exit -.->|"FD"| n42-exit
n52 -.->|"FD"| n48-resolved-call-exit
n48 -.->|"FD"| n53
n53-exit -.->|"FD"| n52-resolved-call-exit
n53 -.->|"FD"| n54-before-value
n54-exit -.->|"FD"| n53-exit
n54 -.->|"FD"| n55-name
n55-exit -.->|"FD"| n54-exit
n56-exit -.->|"FD"| n55-exit
style n56 stroke:cyan,stroke-width:6.5px; style n56-exit stroke:green,stroke-width:6.5px;
(The analysis required 6.5 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment.
We used the following simplification: unique-cf-sets
.
)
And again it should be noted that even though the example code is more complicated, this is still far from the average real-world script.
There is a plethora of functions that you can use the traverse the normalized AST and the dataflow graph. Similarly, flowR provides you with a set of utility functions and classes that you can use to interact with the control flow graph.
If you are just interested in traversing the vertices within the cfg, two simple functions
visitCfgInOrder
and visitCfgInReverseOrder
are available. For basic blocks
these will automatically traverse the elements contained within the blocks (in the respective order).
For example, the following function will return all numbers contained within the CFG:
function sampleCollectNumbers(cfg: ControlFlowInformation, ast: NormalizedAst): RNumberValue[] {
const numbers: RNumberValue[] = [];
visitCfgInOrder(cfg.graph, cfg.entryPoints, id => {
/* obtain the corresponding node from the AST */
const node = ast.idMap.get(id);
/* if it is present and a number, add the parsed value to the list */
if(isRNumber(node)) {
numbers.push(node.content);
}
});
return numbers;
}
Defined at ./src/documentation/print-cfg-wiki.ts#L51
Calling it with the CFG and AST of the expression x - 1 + 2L * 3
yields the following elements (in this order):
{"num":1,"complexNumber":false,"markedAsInt":false}
{"num":2,"complexNumber":false,"markedAsInt":true}
{"num":3,"complexNumber":false,"markedAsInt":false}
A more useful appearance of these visitors occurs with happensBefore
which uses the CFG to determine whether the execution
of one vertex always, maybe, or never happens before another vertex (see the corresponding query documentation for more information).
As mentioned above, you can use the test function assertCfg
to check whether the control flow graph has the desired shape.
The function supports testing for sub-graphs as well (it provides diffing capabilities similar to assertDataflow
).
If you want to diff two control flow graphs, you can use the diffOfControlFlowGraphs
function.
To be a valid representation of the program, the CFG should satisfy a collection of properties that, in turn, you can automatically assume to hold
when working with it. In general, we verify these in every unit test using assertCfgSatisfiesProperties
,
and you can have a look at the active properties by checking the CfgProperties
object.
In general, we check for a hammock graph (given that the program contains no definite infinite loop) and the absence of direct cycles.
The simple traversal functions are great for simple tasks, but very unhandy when you want to do something more sophisticated that incorporates language semantics such as function calls. Hence, we provide a series of incrementally more sophisticated (but complex) visitors that incorporate various alternative perspectives:
-
Basic CFG Visitor:
As a class-based version of the simple traversal functions -
Syntax-Aware CFG Visitor:
If you want directly incorporate the type of the respective vertex in the normalized AST into your visitor -
Dataflow-Aware CFG Visitor:
If you require the dataflow information as well (e.g., to track built-in function calls, ...) -
Semantic CFG Visitor:
Currently the most advanced visitor that combines syntactic with dataflow information.
The BasicCfgGuidedVisitor
class essential provides the same functionality as the simple traversal functions but in a class-based version.
Using it, you can select whether you want to traverse the CFG in order or in reverse order.
To replicate the number collector from above, you can use the following code:
class CollectNumbersVisitor extends BasicCfgGuidedVisitor {
private numbers: RNumberValue[] = [];
private ast: NormalizedAst;
constructor(controlFlow: ControlFlowInformation, ast: NormalizedAst) {
super({ controlFlow, defaultVisitingOrder: 'forward' });
this.ast = ast;
}
protected override onVisitNode(node: NodeId): void {
const astNode = this.ast.idMap.get(node);
if(isRNumber(astNode)) {
this.numbers.push(astNode.content);
}
super.onVisitNode(node);
}
public getNumbers(): RNumberValue[] {
return this.numbers;
}
}
Defined at ./src/documentation/print-cfg-wiki.ts#L64
Instead of directly calling visitCfgInOrder
we pass the forward
visiting order to the constructor of the visitor.
Executing it with the CFG and AST of the expression x - 1 + 2L * 3
, causes the following numbers to be collected:
{"num":1,"complexNumber":false,"markedAsInt":false}
{"num":2,"complexNumber":false,"markedAsInt":true}
{"num":3,"complexNumber":false,"markedAsInt":false}
The SyntaxAwareCfgGuidedVisitor
class incorporates knowledge of the normalized AST into the CFG traversal and
directly provides specialized visitors for the various node types.
Now, our running example of collecting all numbers simplifies to this:
class CollectNumbersSyntaxVisitor extends SyntaxAwareCfgGuidedVisitor {
private numbers: RNumberValue[] = [];
constructor(controlFlow: ControlFlowInformation, normalizedAst: NormalizedAst) {
super({ controlFlow, normalizedAst, defaultVisitingOrder: 'forward' });
}
protected override visitRNumber(node: RNumber<ParentInformation>): void {
this.numbers.push(node.content);
}
public getNumbers(): RNumberValue[] {
return this.numbers;
}
}
Defined at ./src/documentation/print-cfg-wiki.ts#L87
And again, executing it with the CFG and AST of the expression x - 1 + 2L * 3
, causes the following numbers to be collected:
{"num":1,"complexNumber":false,"markedAsInt":false}
{"num":2,"complexNumber":false,"markedAsInt":true}
{"num":3,"complexNumber":false,"markedAsInt":false}
There is a lot of benefit in incorporating the dataflow information into the CFG traversal, as it contains
information about overwritten function calls, definition targets, and so on.
Our best friend is the getOriginInDfg
function which provides the important information about the origin of a vertex in the dataflow graph.
The DataflowAwareCfgGuidedVisitor
class does some of the basic lifting for us.
While it is not ideal for our goal of collecting all numbers, it shines in other areas such as collecting all used variables, ...
class CollectNumbersDataflowVisitor extends DataflowAwareCfgGuidedVisitor {
private numbers: RNumberValue[] = [];
constructor(controlFlow: ControlFlowInformation, dataflow: DataflowInformation) {
super({ controlFlow, dataflow, defaultVisitingOrder: 'forward' });
}
protected override visitValue(node: DataflowGraphVertexValue): void {
const astNode = this.config.dataflow.graph.idMap?.get(node.id);
if(isRNumber(astNode)) {
this.numbers.push(astNode.content);
}
}
public getNumbers(): RNumberValue[] {
return this.numbers;
}
}
Defined at ./src/documentation/print-cfg-wiki.ts#L103
Again, executing it with the CFG and Dataflow of the expression x - 1 + 2L * 3
, causes the following numbers to be collected:
{"num":1,"complexNumber":false,"markedAsInt":false}
{"num":2,"complexNumber":false,"markedAsInt":true}
{"num":3,"complexNumber":false,"markedAsInt":false}
The SemanticCfgGuidedVisitor
class is flowR's most advanced visitor that combines the syntactic and dataflow information.
The main idea is simple, it provides special handlers for assignments, conditionals, and other R semantics but still follows
the structure of the CFG.
Note
This visitor is still in the design phase so please open up a new issue if you have any suggestions or find any bugs.
To explore what it is capable of, let's create a visitor that prints all values that are used in assignments:
class CollectSourcesSemanticVisitor extends SemanticCfgGuidedVisitor {
private sources: string[] = [];
constructor(controlFlow: ControlFlowInformation, normalizedAst: NormalizedAst, dataflow: DataflowInformation) {
super({ controlFlow, normalizedAst, dataflow, defaultVisitingOrder: 'forward' });
}
protected override onAssignmentCall({ source }: { source?: NodeId }): void {
if(source) {
this.sources.push(recoverName(source, this.config.normalizedAst.idMap) ?? '??');
}
}
public getSources(): NodeId[] {
return this.sources;
}
}
Defined at ./src/documentation/print-cfg-wiki.ts#L122
Executing it with the CFG and Dataflow of the expression x <- 2; 3 -> x; assign("x", 42 + 21)
, causes the following values (/lexemes) to be collected:
2
3
+
Currently maintained by Florian Sihler at Ulm University
Email | GitHub | Penguins | Portfolio