Skip to content

Control Flow Graph

github-actions[bot] edited this page May 6, 2025 · 1 revision

This document was generated from 'src/documentation/print-cfg-wiki.ts' on 2025-05-06, 18:48:23 UTC presenting an overview of flowR's control flow graph (v2.2.12, using R v4.4.3). Please do not edit this file/wiki page directly.

flowR produces three main perspectives of the program: 1) a normalized version of the AST and 2) a dataflow graph, and 3) a control flow graph (CFG). flowR uses this CFG interweaved with its data flow analysis and for some of its queries (e.g., to link to the last call in a Call-Context Query).

Please note that, mostly due to historical reasons, the control dependencies that are stored directly within the DFG provide only a partial view of the CFG. While they provide you with information on the conditional execution of vertices, they do not encode the order of execution. In contrast, the CFG describes a complete view of the program's control flow.

Tip

If you want to investigate the Control Flow Graph, you can use the :controlflow* command in the REPL (see the Interface wiki page for more information). By default, this view does not use basic blocks as, for example, R allows unconditional jumps to occur in spots where conventional languages would assume expressions (e.g., if-conditions). Yet, by using :controlflowbb* you can inspect the CFG with basic blocks (although you have to keep in mind that now, there can be a value flow between basic blocks)

For readability, we structure this wiki page into various segments:

Initial Overview

For now, let's look at a CFG for a program without any branching:

x <- 2 * 3 + 1

The corresponding CFG is a directed, labeled graph with two types of edges (control and flow dependencies).

flowchart RL
    n7(["`RExpressionList (7)`"])
    n0(["`RSymbol (0)
#34;x#34;`"])
    n1(["`RNumber (1)
#34;2#34;`"])
    n2(["`RNumber (2)
#34;3#34;`"])
    n3(["`RBinaryOp (3)
#34;2 #42; 3#34;`"])
    n3-exit((3-exit))
    n4(["`RNumber (4)
#34;1#34;`"])
    n5(["`RBinaryOp (5)
#34;2 #42; 3 #43; 1#34;`"])
    n5-exit((5-exit))
    n6(["`RBinaryOp (6)
#34;x #60;#45; 2 #42; 3 #43; 1#34;`"])
    n6-exit((6-exit))
    n7-exit((7-exit))
    n6 -.->|"FD"| n7
    n2 -.->|"FD"| n1
    n1 -.->|"FD"| n3
    n3-exit -.->|"FD"| n2
    n4 -.->|"FD"| n3-exit
    n3 -.->|"FD"| n5
    n5-exit -.->|"FD"| n4
    n5 -.->|"FD"| n0
    n0 -.->|"FD"| n6
    n6-exit -.->|"FD"| n5-exit
    n7-exit -.->|"FD"| n6-exit
    style n7 stroke:cyan,stroke-width:6.5px;    style n7-exit stroke:green,stroke-width:6.5px;
Loading

(The analysis required 13.1 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets . )

Important

As the edges describe dependencies they point in the inverse order of execution (which is very helpful for backward analyses)! The visitors abstract away from this and there is no harm in considering an inverted CFG. Yet, you should keep this in mind!

Every normalized node of the normalized AST that has any relevance to the execution is added and automatically linked using its id (similarly to vertices of the dataflow graph). Expressions, such as 2 * 3 get an additional node with an artificial id that ends in -exit to mark whenever their calculation is over.

To gain a better understanding, let's have a look at a simple program with a single branching structure:

flowchart RL
    n6(["`RExpressionList (6)`"])
    n5["`RIfThenElse (5)
#34;if(u) 3 else 2#34;`"]
    n5-condition[[5-condition]]
    n5-exit((5-exit))
    n0(["`RSymbol (0)
#34;u#34;`"])
    n2(["`RExpressionList (2)
#34;3#34;`"])
    n1(["`RNumber (1)
#34;3#34;`"])
    n2-exit((2-exit))
    n4(["`RExpressionList (4)
#34;2#34;`"])
    n3(["`RNumber (3)
#34;2#34;`"])
    n4-exit((4-exit))
    n6-exit((6-exit))
    n5 -.->|"FD"| n6
    n1 -.->|"FD"| n2
    n2-exit -.->|"FD"| n1
    n3 -.->|"FD"| n4
    n4-exit -.->|"FD"| n3
    n5-condition -.->|"FD"| n0
    n2 -->|"CD (TRUE)"| n5-condition
    n4 -->|"CD (FALSE)"| n5-condition
    n0 -.->|"FD"| n5
    n5-exit -.->|"FD"| n2-exit
    n5-exit -.->|"FD"| n4-exit
    n6-exit -.->|"FD"| n5-exit
    style n6 stroke:cyan,stroke-width:6.5px;    style n6-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 3.9 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets .

if(u) 3 else 2

Here, you can see the if node followed by the condition (in this case merely u) that then splits into two branches for the two possible outcomes. The if structure is terminated by the corresponding -exit node (see the structure section for more details).

For you to compare, the following shows the CFG of an if without an else branch:

flowchart RL
    n6(["`RExpressionList (6)`"])
    n5["`RIfThenElse (5)
#34;if(u || v) 3#34;`"]
    n5-condition[[5-condition]]
    n5-exit((5-exit))
    n0(["`RSymbol (0)
#34;u#34;`"])
    n1(["`RSymbol (1)
#34;v#34;`"])
    n2(["`RBinaryOp (2)
#34;u || v#34;`"])
    n2-exit((2-exit))
    n4(["`RExpressionList (4)
#34;3#34;`"])
    n3(["`RNumber (3)
#34;3#34;`"])
    n4-exit((4-exit))
    n6-exit((6-exit))
    n5 -.->|"FD"| n6
    n1 -.->|"FD"| n0
    n0 -.->|"FD"| n2
    n2-exit -.->|"FD"| n1
    n3 -.->|"FD"| n4
    n4-exit -.->|"FD"| n3
    n5-condition -.->|"FD"| n2-exit
    n4 -->|"CD (TRUE)"| n5-condition
    n2 -.->|"FD"| n5
    n5-exit -.->|"FD"| n4-exit
    n5-exit -->|"CD (FALSE)"| n5-condition
    n6-exit -.->|"FD"| n5-exit
    style n6 stroke:cyan,stroke-width:6.5px;    style n6-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 2.8 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets .

if(u || v) 3

Activating the calculation of basic blocks produces the following:

flowchart RL
    subgraph nbb-5-condition [Block bb-5-condition]
        direction RL
    n5-condition[[5-condition]]
    n2-exit((2-exit))
    n5-condition -.-> n2-exit
    n1["`RSymbol (1)
#34;v#34;`"]
    n2-exit -.-> n1
    n0["`RSymbol (0)
#34;u#34;`"]
    n1 -.-> n0
    n2["`RBinaryOp (2)
#34;u || v#34;`"]
    n0 -.-> n2
    n5["`RIfThenElse (5)
#34;if(u || v) 3#34;`"]
    n2 -.-> n5
    n6["`RExpressionList (6)`"]
    n5 -.-> n6
    end
    subgraph nbb-4-exit [Block bb-4-exit]
        direction RL
    n4-exit((4-exit))
    n3["`RNumber (3)
#34;3#34;`"]
    n4-exit -.-> n3
    n4["`RExpressionList (4)
#34;3#34;`"]
    n3 -.-> n4
    end
    subgraph nbb-6-exit [Block bb-6-exit]
        direction RL
    n6-exit((6-exit))
    n5-exit((5-exit))
    n6-exit -.-> n5-exit
    end
    nbb-6-exit -.->|"FD"| nbb-4-exit
    nbb-6-exit -->|"CD (FALSE)"| nbb-5-condition
    nbb-4-exit -->|"CD (TRUE)"| nbb-5-condition
    style nbb-5-condition stroke:cyan,stroke-width:6.5px;    style nbb-6-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 3.0 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplifications: unique-cf-sets, to-basic-blocks .

if(u || v) 3

Which is probably much more readable if compacted (although the reconstucted code can sometimes be slightly mislieading as flowR tries its best to make it syntactically correct and hence add closing braces etc. which are technically not part of the respective block):

flowchart RL
    nbb-5-condition[["`Basic Block (bb-5-condition)
if(u || v) #123; #125;`"]]
    nbb-4-exit[["`Basic Block (bb-4-exit)
3`"]]
    nbb-6-exit[["`Basic Block (bb-6-exit)
`"]]
    nbb-6-exit -.->|"FD"| nbb-4-exit
    nbb-6-exit -->|"CD (FALSE)"| nbb-5-condition
    nbb-4-exit -->|"CD (TRUE)"| nbb-5-condition
    style nbb-5-condition stroke:cyan,stroke-width:6.5px;    style nbb-6-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 2.2 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplifications: unique-cf-sets, to-basic-blocks and render a simplified/compacted version.

if(u || v) 3

The control flow graph also harmonizes with function definitions, and calls:

flowchart RL
    n9(["`RExpressionList (9)`"])
    n0(["`RSymbol (0)
#34;f#34;`"])
    n5-params[[5-params]]
    n5-exit((5-exit))
    n5(["`RFunctionDefinition (5)
#34;function() #123; 3 #125;#34;`"])
    n4(["`RExpressionList (4)`"])
    n3(["`RNumber (3)
#34;3#34;`"])
    n4-exit((4-exit))
    n6(["`RBinaryOp (6)
#34;f #60;#45; function() #123; 3 #125;#34;`"])
    n6-exit((6-exit))
    n7(["`RSymbol (7)
#34;f()#34;`"])
    n8["`RFunctionCall (8)
#34;f()#34;
 calls:#91;5#93;`"]
    n8-name[[8-name]]
    n8-exit((8-exit))
    n8-resolved-call-exit((8-resolved-call-exit))
    n9-exit((9-exit))
    n6 -.->|"FD"| n9
    n3 -.->|"FD"| n4
    n4-exit -.->|"FD"| n3
    n5-params -.->|"FD"| n5
    n4 -.->|"FD"| n5-params
    n5-exit -.->|"FD"| n4-exit
    n5 -.->|"FD"| n0
    n0 -.->|"FD"| n6
    n6-exit -.->|"FD"| n5
    n8 -.->|"FD"| n6-exit
    n7 -.->|"FD"| n8
    n8-name -.->|"FD"| n7
    n8-exit -.->|"FD"| n8-name
    n8-resolved-call-exit -.->|"FD"| n8-exit
    n8-resolved-call-exit -.->|"FD"| n5-exit
    n9-exit -.->|"FD"| n8-resolved-call-exit
    style n9 stroke:cyan,stroke-width:6.5px;    style n9-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 6.8 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets .

f <- function() { 3 }
f()

Structure of the Control Flow Graph

You can produce your very own control flow graph with extractCFG. The ControlFlowGraph class describes everything required to model the control flow graph, with its edge types described by CfgEdge and its vertices by CfgSimpleVertex. However, you should be aware of the ControlFlowInformation interface which adds some additional information the the CFG (and is used during the construction of the CFG as well):

  • ControlFlowInformation
    Summarizes the control information of a program

    Defined at ./src/control-flow/control-flow-graph.ts#L335
    /** Summarizes the control information of a program */
    export interface ControlFlowInformation<Vertex extends CfgSimpleVertex = CfgSimpleVertex> extends MergeableRecord {
        /** all active 'return'(-like) unconditional jumps */
        returns:     NodeId[],
        /** all active 'break'(-like) unconditional jumps */
        breaks:      NodeId[],
        /** all active 'next'(-like) unconditional jumps */
        nexts:       NodeId[],
        /** intended to construct a hammock graph, with 0 exit points representing a block that should not be part of the CFG (like a comment) */
        entryPoints: NodeId[],
        /** See {@link ControlFlowInformation#entryPoints|entryPoints} */
        exitPoints:  NodeId[],
        /** the control flow graph summarizing the flow information */
        graph:       ControlFlowGraph<Vertex>
    }
    View more (MergeableRecord)
    • MergeableRecord

      Defined at ./src/util/objects.ts#L11
      export type MergeableRecord = Record<string, unknown>
      • string

        Defined at ./src/statistics/features/common-syntax-probability.ts#L29
        string:       Record<string, Measurement>
        • CommonSyntaxTypeCounts

          Defined at ./src/statistics/features/common-syntax-probability.ts#L12
          export interface CommonSyntaxTypeCounts<Measurement=bigint> {
              // just a helper to collect all as well (could be derived from sum)
              total:        Measurement,
              // counts whenever you pass more than one node that is not sensible for any other category
              multiple:     Measurement,
              // similar to multiple, but only counts empty (bodies etc.)
              empty:        Measurement,
              // in case of a = x etc.
              withArgument: Measurement,
              // arguments used without value
              noValue:      Measurement,
              // does include t and f, as well as NULL etc. (any special symbol)
              singleVar:    Record<string, Measurement>
              number:       Record<number, Measurement>
              // only explicit integers
              integer:      Record<number, Measurement>
              complex:      Record<number, Measurement>
              string:       Record<string, Measurement>
              logical:      Record<typeof RTrue | typeof RFalse, Measurement>,
              call:         Record<string, Measurement>,
              unnamedCall:  Measurement,
              // binop includes all assignments!
              binOp:        Record<string, Measurement>,
              unaryOp:      Record<string, Measurement>,
              // unknown content, records lexeme (can include break etc. for bodies), due to my oversight, this includes function definitions
              other:        Record<string, Measurement>
          }
      • unknown

        Defined at ./src/dataflow/graph/graph.ts#L386
        unknown

To check whether the CFG has the expected shape, you can use the test function assertCfg which supports testing for sub-graphs as well (it provides diffing capabilities similar to assertDataflow). As the CFG may become unhandy for larger programs, there are simplifications available with simplifyControlFlowInformation (these can be passed on to the extractCFG function as well).

CFG Vertices

All vertex types are summarized in the CfgVertexType enum which currently contains the following types:

  • MidMarker (mid)
  • EndMarker (end)
  • Statement (stm)
  • Expression (expr)
  • Block (blk)

We use the CfgBasicBlockVertex to represent basic blocks and separate expressions (CfgExpressionVertex) and statements (CfgStatementVertex) as control flow units with and without side effects (if you want to, you can see view statements as effectful expressions). The markers (CfgMidMarkerVertex and CfgEndMarkerVertex) indicate specific segments of larger expressions/statements (e.g., an if which has a condition and its branches).

To signal these links, the expressions and statements contain information about the attached markers:

  • CfgWithMarker

    Defined at ./src/control-flow/control-flow-graph.ts#L51
    interface CfgWithMarker extends CfgBaseVertex {
        /** mid-markers linked to this statement */
        mid?: NodeId[]
        /** end-markers linked to this statement */
        end?: NodeId[]
    }
    View more (CfgBaseVertex)
    • CfgBaseVertex
      A plain vertex in the ControlFlowGraph . Please use CfgSimpleVertex to refer to all potential vertex types within the graph.

      Defined at ./src/control-flow/control-flow-graph.ts#L40
      /**
       * A plain vertex in the {@link ControlFlowGraph}.
       * Please use {@link CfgSimpleVertex} to refer to all potential vertex types within the graph.
       */
      interface CfgBaseVertex extends MergeableRecord {
          /** the type of the vertex */
          type:         CfgVertexType,
          /** the id of the vertex, for non-blocks this should directly relate to the AST node */
          id:           NodeId,
          /** child nodes attached to this one */
          children?:    NodeId[],
          /** if the vertex calls a function, this links all targets of this call */
          callTargets?: Set<NodeId>,
      }
      • MergeableRecord

        Defined at ./src/util/objects.ts#L11
        export type MergeableRecord = Record<string, unknown>
        • string

          Defined at ./src/statistics/features/common-syntax-probability.ts#L29
          string:       Record<string, Measurement>
          • CommonSyntaxTypeCounts

            Defined at ./src/statistics/features/common-syntax-probability.ts#L12
            export interface CommonSyntaxTypeCounts<Measurement=bigint> {
                // just a helper to collect all as well (could be derived from sum)
                total:        Measurement,
                // counts whenever you pass more than one node that is not sensible for any other category
                multiple:     Measurement,
                // similar to multiple, but only counts empty (bodies etc.)
                empty:        Measurement,
                // in case of a = x etc.
                withArgument: Measurement,
                // arguments used without value
                noValue:      Measurement,
                // does include t and f, as well as NULL etc. (any special symbol)
                singleVar:    Record<string, Measurement>
                number:       Record<number, Measurement>
                // only explicit integers
                integer:      Record<number, Measurement>
                complex:      Record<number, Measurement>
                string:       Record<string, Measurement>
                logical:      Record<typeof RTrue | typeof RFalse, Measurement>,
                call:         Record<string, Measurement>,
                unnamedCall:  Measurement,
                // binop includes all assignments!
                binOp:        Record<string, Measurement>,
                unaryOp:      Record<string, Measurement>,
                // unknown content, records lexeme (can include break etc. for bodies), due to my oversight, this includes function definitions
                other:        Record<string, Measurement>
            }
        • unknown

          Defined at ./src/dataflow/graph/graph.ts#L386
          unknown

Similarly, the markers contain a link to their root:

  • CfgWithRoot

    Defined at ./src/control-flow/control-flow-graph.ts#L66
    export interface CfgWithRoot extends CfgBaseVertex {
        /** the vertex for which this is a marker */
        root: NodeId
    }
    View more (CfgBaseVertex)
    • CfgBaseVertex
      A plain vertex in the ControlFlowGraph . Please use CfgSimpleVertex to refer to all potential vertex types within the graph.

      Defined at ./src/control-flow/control-flow-graph.ts#L40
      /**
       * A plain vertex in the {@link ControlFlowGraph}.
       * Please use {@link CfgSimpleVertex} to refer to all potential vertex types within the graph.
       */
      interface CfgBaseVertex extends MergeableRecord {
          /** the type of the vertex */
          type:         CfgVertexType,
          /** the id of the vertex, for non-blocks this should directly relate to the AST node */
          id:           NodeId,
          /** child nodes attached to this one */
          children?:    NodeId[],
          /** if the vertex calls a function, this links all targets of this call */
          callTargets?: Set<NodeId>,
      }
      • MergeableRecord

        Defined at ./src/util/objects.ts#L11
        export type MergeableRecord = Record<string, unknown>
        • string

          Defined at ./src/statistics/features/common-syntax-probability.ts#L29
          string:       Record<string, Measurement>
          • CommonSyntaxTypeCounts

            Defined at ./src/statistics/features/common-syntax-probability.ts#L12
            export interface CommonSyntaxTypeCounts<Measurement=bigint> {
                // just a helper to collect all as well (could be derived from sum)
                total:        Measurement,
                // counts whenever you pass more than one node that is not sensible for any other category
                multiple:     Measurement,
                // similar to multiple, but only counts empty (bodies etc.)
                empty:        Measurement,
                // in case of a = x etc.
                withArgument: Measurement,
                // arguments used without value
                noValue:      Measurement,
                // does include t and f, as well as NULL etc. (any special symbol)
                singleVar:    Record<string, Measurement>
                number:       Record<number, Measurement>
                // only explicit integers
                integer:      Record<number, Measurement>
                complex:      Record<number, Measurement>
                string:       Record<string, Measurement>
                logical:      Record<typeof RTrue | typeof RFalse, Measurement>,
                call:         Record<string, Measurement>,
                unnamedCall:  Measurement,
                // binop includes all assignments!
                binOp:        Record<string, Measurement>,
                unaryOp:      Record<string, Measurement>,
                // unknown content, records lexeme (can include break etc. for bodies), due to my oversight, this includes function definitions
                other:        Record<string, Measurement>
            }
        • unknown

          Defined at ./src/dataflow/graph/graph.ts#L386
          unknown

In mermaid visualizations, we use rectangles for statements, rounded rectangles for expressions, circles for exit markers and double-lined rectangles for mid markers. Blocks are visualized as boxes around the contained vertices.

Note

Every CFG vertex has a NodeId that links it to the normalized AST (although basic blocks will find no counterpart as they are a structuring element of the CFG. Additionally, it may provide information on the called functions (in case that the current element is a function call). Have a look at the CfgBaseVertex interface for more information.

CFG Edges

The ControlFlowGraph uses two types of edges to represent the control flow, separated by the CfgEdgeType enum and the two interfaces: CfgFlowDependencyEdge and CfgControlDependencyEdge.

Flow Dependencies

The most common edge is the flow dependency (FD) which simply signals that the source vertex happens after the target vertex in the control flow. So x; y would produce a flow dependency from y to x (additionally to the program-enveloping root expression list):

flowchart RL
    n2(["`RExpressionList (2)`"])
    n0(["`RSymbol (0)
#34;x#34;`"])
    n1(["`RSymbol (1)
#34;y#34;`"])
    n2-exit((2-exit))
    n0 -.->|"FD"| n2
    n1 -.->|"FD"| n0
    n2-exit -.->|"FD"| n1
    style n2 stroke:cyan,stroke-width:6.5px;    style n2-exit stroke:green,stroke-width:6.5px;
Loading

(The analysis required 1.5 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets . )

Control Dependencies

Control dependencies (CD) are used to signal that the execution of the source vertex depends on the taget vertex (which, e.g., is the condition of an if statement or while loop). They contain additional information to signal when the source vertex is executed:

  • CfgControlDependencyEdge

    Defined at ./src/control-flow/control-flow-graph.ts#L109
    interface CfgControlDependencyEdge extends MergeableRecord {
        label:  CfgEdgeType.Cd
        /** the id which caused the control dependency */
        caused: NodeId,
        /** is the control dependency satisfied with a true condition or is it negated (e.g., else-branch)? */
        when:   typeof RTrue | typeof RFalse
    }
    View more (MergeableRecord)
    • MergeableRecord

      Defined at ./src/util/objects.ts#L11
      export type MergeableRecord = Record<string, unknown>
      • string

        Defined at ./src/statistics/features/common-syntax-probability.ts#L29
        string:       Record<string, Measurement>
        • CommonSyntaxTypeCounts

          Defined at ./src/statistics/features/common-syntax-probability.ts#L12
          export interface CommonSyntaxTypeCounts<Measurement=bigint> {
              // just a helper to collect all as well (could be derived from sum)
              total:        Measurement,
              // counts whenever you pass more than one node that is not sensible for any other category
              multiple:     Measurement,
              // similar to multiple, but only counts empty (bodies etc.)
              empty:        Measurement,
              // in case of a = x etc.
              withArgument: Measurement,
              // arguments used without value
              noValue:      Measurement,
              // does include t and f, as well as NULL etc. (any special symbol)
              singleVar:    Record<string, Measurement>
              number:       Record<number, Measurement>
              // only explicit integers
              integer:      Record<number, Measurement>
              complex:      Record<number, Measurement>
              string:       Record<string, Measurement>
              logical:      Record<typeof RTrue | typeof RFalse, Measurement>,
              call:         Record<string, Measurement>,
              unnamedCall:  Measurement,
              // binop includes all assignments!
              binOp:        Record<string, Measurement>,
              unaryOp:      Record<string, Measurement>,
              // unknown content, records lexeme (can include break etc. for bodies), due to my oversight, this includes function definitions
              other:        Record<string, Measurement>
          }
      • unknown

        Defined at ./src/dataflow/graph/graph.ts#L386
        unknown

The extra caused link signals the vertex that caused the control flow influence.

Example: if-else
flowchart RL
    n6(["`RExpressionList (6)`"])
    n5["`RIfThenElse (5)
#34;if(u) 3 else 2#34;`"]
    n5-condition[[5-condition]]
    n5-exit((5-exit))
    n0(["`RSymbol (0)
#34;u#34;`"])
    n2(["`RExpressionList (2)
#34;3#34;`"])
    n1(["`RNumber (1)
#34;3#34;`"])
    n2-exit((2-exit))
    n4(["`RExpressionList (4)
#34;2#34;`"])
    n3(["`RNumber (3)
#34;2#34;`"])
    n4-exit((4-exit))
    n6-exit((6-exit))
    n5 -.->|"FD"| n6
    n1 -.->|"FD"| n2
    n2-exit -.->|"FD"| n1
    n3 -.->|"FD"| n4
    n4-exit -.->|"FD"| n3
    n5-condition -.->|"FD"| n0
    n2 -->|"CD (TRUE)"| n5-condition
    n4 -->|"CD (FALSE)"| n5-condition
    n0 -.->|"FD"| n5
    n5-exit -.->|"FD"| n2-exit
    n5-exit -.->|"FD"| n4-exit
    n6-exit -.->|"FD"| n5-exit
    style n6 stroke:cyan,stroke-width:6.5px;    style n6-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 1.8 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets .

if(u) 3 else 2
Example: while-loop
flowchart RL
    n4(["`RExpressionList (4)`"])
    n0(["`RSymbol (0)
#34;u#34;`"])
    n3["`RWhileLoop (3)
#34;while(u) b#34;`"]
    n3-condition[[3-condition]]
    n3-exit((3-exit))
    n2(["`RExpressionList (2)
#34;b#34;`"])
    n1(["`RSymbol (1)
#34;b#34;`"])
    n2-exit((2-exit))
    n4-exit((4-exit))
    n3 -.->|"FD"| n4
    n3 -.->|"FD"| n2-exit
    n1 -.->|"FD"| n2
    n2-exit -.->|"FD"| n1
    n0 -.->|"FD"| n3
    n3-condition -.->|"FD"| n0
    n2 -->|"CD (TRUE)"| n3-condition
    n3-exit -->|"CD (FALSE)"| n3-condition
    n4-exit -.->|"FD"| n3-exit
    style n4 stroke:cyan,stroke-width:6.5px;    style n4-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 7.3 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets .

while(u) b

Please note that repeat loops do not have control dependencies, as they repeat their body unconditionally. Additionally, the control flow graph does not have to be connected. If you use a repeat without any exit condition, the corresponding exit markers are not reachable from the entry:

Example: repeat-loop (infinite)
flowchart RL
    n6(["`RExpressionList (6)`"])
    n3(["`RExpressionList (3)`"])
    n2(["`RSymbol (2)
#34;b#34;`"])
    n3-exit((3-exit))
    n4["`RRepeatLoop (4)
#34;repeat #123; b #125;#34;`"]
    n4-exit((4-exit))
    n5(["`RSymbol (5)
#34;after#34;`"])
    n6-exit((6-exit))
    n4 -.->|"FD"| n6
    n4 -.->|"FD"| n3-exit
    n2 -.->|"FD"| n3
    n3-exit -.->|"FD"| n2
    n3 -.->|"FD"| n4
    n5 -.->|"FD"| n4-exit
    n6-exit -.->|"FD"| n5
    style n6 stroke:cyan,stroke-width:6.5px;    style n6-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 2.2 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets .

repeat { b }; after
Example: repeat-loop (with break)
flowchart RL
    n11(["`RExpressionList (11)`"])
    n8(["`RExpressionList (8)`"])
    n7(["`RExpressionList (7)
#34;b; if(u) break;#34;`"])
    n2(["`RSymbol (2)
#34;b#34;`"])
    n6["`RIfThenElse (6)
#34;if(u) break#34;`"]
    n6-condition[[6-condition]]
    n6-exit((6-exit))
    n3(["`RSymbol (3)
#34;u#34;`"])
    n5(["`RExpressionList (5)
#34;break#34;`"])
    n4["`RBreak (4)
#34;break#34;`"]
    n7-exit((7-exit))
    n8-exit((8-exit))
    n9["`RRepeatLoop (9)
#34;repeat #123; b; if(u) break; #125;#34;`"]
    n9-exit((9-exit))
    n10(["`RSymbol (10)
#34;after#34;`"])
    n11-exit((11-exit))
    n9 -.->|"FD"| n11
    n9 -.->|"FD"| n8-exit
    n7 -.->|"FD"| n8
    n2 -.->|"FD"| n7
    n6 -.->|"FD"| n2
    n4 -.->|"FD"| n5
    n6-condition -.->|"FD"| n3
    n5 -->|"CD (TRUE)"| n6-condition
    n3 -.->|"FD"| n6
    n6-exit -->|"CD (FALSE)"| n6-condition
    n7-exit -.->|"FD"| n6-exit
    n8-exit -.->|"FD"| n7-exit
    n8 -.->|"FD"| n9
    n9-exit -.->|"FD"| n4
    n10 -.->|"FD"| n9-exit
    n11-exit -.->|"FD"| n10
    style n11 stroke:cyan,stroke-width:6.5px;    style n11-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 2.3 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets .

repeat { b; if(u) break; }; after

In the context of a for-loop, the control dependency refer to whether the respective vector still has values to iterate over.

Example: for-loop
flowchart RL
    n7(["`RExpressionList (7)`"])
    n0(["`RSymbol (0)
#34;i#34;`"])
    n6["`RForLoop (6)
#34;for(i in 1#58;10) b#34;`"]
    n1(["`RNumber (1)
#34;1#34;`"])
    n2(["`RNumber (2)
#34;10#34;`"])
    n3(["`RBinaryOp (3)
#34;1#58;10#34;`"])
    n3-exit((3-exit))
    n5(["`RExpressionList (5)
#34;b#34;`"])
    n4(["`RSymbol (4)
#34;b#34;`"])
    n5-exit((5-exit))
    n6-head[[6-head]]
    n6-exit((6-exit))
    n7-exit((7-exit))
    n6 -.->|"FD"| n7
    n6 -.->|"FD"| n5-exit
    n2 -.->|"FD"| n1
    n1 -.->|"FD"| n3
    n3-exit -.->|"FD"| n2
    n4 -.->|"FD"| n5
    n5-exit -.->|"FD"| n4
    n3 -.->|"FD"| n6
    n0 -.->|"FD"| n3-exit
    n6-head -.->|"FD"| n0
    n5 -->|"CD (TRUE)"| n6-head
    n6-exit -->|"CD (FALSE)"| n3-exit
    n7-exit -.->|"FD"| n6-exit
    style n7 stroke:cyan,stroke-width:6.5px;    style n7-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 2.4 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets .

for(i in 1:10) b

If you generate the CFG with the extractCFG function you can (and, if you want to gain inter-procedural information, should) pass a matching dataflow graph to it to incorporate the dataflow perspective into the CFG.

The difference becomes obvious when we look at the code f <- function() b; f() first without the dataflow graph:

flowchart RL
    n7(["`RExpressionList (7)`"])
    n0(["`RSymbol (0)
#34;f#34;`"])
    n3-params[[3-params]]
    n3-exit((3-exit))
    n3(["`RFunctionDefinition (3)
#34;function() b#34;`"])
    n2(["`RExpressionList (2)
#34;b#34;`"])
    n1(["`RSymbol (1)
#34;b#34;`"])
    n2-exit((2-exit))
    n4(["`RBinaryOp (4)
#34;f #60;#45; function() b#34;`"])
    n4-exit((4-exit))
    n5(["`RSymbol (5)
#34;f()#34;`"])
    n6["`RFunctionCall (6)
#34;f()#34;`"]
    n6-name[[6-name]]
    n6-exit((6-exit))
    n7-exit((7-exit))
    n4 -.->|"FD"| n7
    n1 -.->|"FD"| n2
    n2-exit -.->|"FD"| n1
    n3-params -.->|"FD"| n3
    n2 -.->|"FD"| n3-params
    n3-exit -.->|"FD"| n2-exit
    n3 -.->|"FD"| n0
    n0 -.->|"FD"| n4
    n4-exit -.->|"FD"| n3
    n6 -.->|"FD"| n4-exit
    n5 -.->|"FD"| n6
    n6-name -.->|"FD"| n5
    n6-exit -.->|"FD"| n6-name
    n7-exit -.->|"FD"| n6-exit
    style n7 stroke:cyan,stroke-width:6.5px;    style n7-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 1.6 ms (including the normalization and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets .

f <- function() b; f()

And now, including dataflow information:

flowchart RL
    n7(["`RExpressionList (7)`"])
    n0(["`RSymbol (0)
#34;f#34;`"])
    n3-params[[3-params]]
    n3-exit((3-exit))
    n3(["`RFunctionDefinition (3)
#34;function() b#34;`"])
    n2(["`RExpressionList (2)
#34;b#34;`"])
    n1(["`RSymbol (1)
#34;b#34;`"])
    n2-exit((2-exit))
    n4(["`RBinaryOp (4)
#34;f #60;#45; function() b#34;`"])
    n4-exit((4-exit))
    n5(["`RSymbol (5)
#34;f()#34;`"])
    n6["`RFunctionCall (6)
#34;f()#34;
 calls:#91;3#93;`"]
    n6-name[[6-name]]
    n6-exit((6-exit))
    n6-resolved-call-exit((6-resolved-call-exit))
    n7-exit((7-exit))
    n4 -.->|"FD"| n7
    n1 -.->|"FD"| n2
    n2-exit -.->|"FD"| n1
    n3-params -.->|"FD"| n3
    n2 -.->|"FD"| n3-params
    n3-exit -.->|"FD"| n2-exit
    n3 -.->|"FD"| n0
    n0 -.->|"FD"| n4
    n4-exit -.->|"FD"| n3
    n6 -.->|"FD"| n4-exit
    n5 -.->|"FD"| n6
    n6-name -.->|"FD"| n5
    n6-exit -.->|"FD"| n6-name
    n6-resolved-call-exit -.->|"FD"| n6-exit
    n6-resolved-call-exit -.->|"FD"| n3-exit
    n7-exit -.->|"FD"| n6-resolved-call-exit
    style n7 stroke:cyan,stroke-width:6.5px;    style n7-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 1.8 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets .

f <- function() b; f()

There are two important additions:

  1. A new exit marker, canonically suffixed with -resolved-call-exit signals that we are aware of the function call target. This marker always follows the exit marker of the function call and links not just the call but also the exit points of the function definition.
  2. A new calls attribute attached to the function call vertex. This holds the NodeId of the function definitions that are called from this vertex.

For built-in functions that are provided by flowR's built-in configuration (see the interface wiki page) the CFG does not contain the additional information directly:

flowchart RL
    n4(["`RExpressionList (4)`"])
    n0(["`RSymbol (0)
#34;print(3)#34;`"])
    n3["`RFunctionCall (3)
#34;print(3)#34;`"]
    n3-name[[3-name]]
    n3-exit((3-exit))
    n2(["`RArgument (2)
#34;3#34;`"])
    n2-before-value[[2-before-value]]
    n1(["`RNumber (1)
#34;3#34;`"])
    n2-exit((2-exit))
    n4-exit((4-exit))
    n3 -.->|"FD"| n4
    n0 -.->|"FD"| n3
    n3-name -.->|"FD"| n0
    n2-before-value -.->|"FD"| n2
    n1 -.->|"FD"| n2-before-value
    n2-exit -.->|"FD"| n1
    n2 -.->|"FD"| n3-name
    n3-exit -.->|"FD"| n2-exit
    n4-exit -.->|"FD"| n3-exit
    style n4 stroke:cyan,stroke-width:6.5px;    style n4-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 2.0 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets .

print(3)

This is due to the fact that the dataflow graph does contain the required call information (and there are no new control vertices to add as the built-in call has no target in the source code):

flowchart LR
    1{{"`#91;RNumber#93; 3
      (1)
      *1.7*`"}}
    3[["`#91;RFunctionCall#93; print
      (3)
      *1.1-8*
    (1)`"]]
    built-in:print["`Built-In:
print`"]
    style built-in:print stroke:gray,fill:lightgray,stroke-width:2px,opacity:.8;
    3 -->|"returns, argument"| 1
    3 -.->|"reads"| built-in:print
    linkStyle 1 stroke:gray;
Loading
R Code of the Dataflow Graph

The analysis required 1.4 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered unknown side effects (with ids: 3 (linked)) during the analysis.

print(3)

Adding Basic Blocks

As mentioned in the introduction, our control flow graph does not use basic blocks by default and hence simply links all vertices independent of whether they have (un-)conditional jumps or not. On the upside, this tells us the execution order (and, in case of promises, forcing order) of involved expressions and seamlessly handles cases like x <- return(3). On the downside, this makes it hard to apply classical control flow graph algorithms and, in general, makes the graph much harder to read. Yet, we can request basic blocks or transform an existing CFG into basic blocks using the convertCfgToBasicBlocks function.

Any program without any (un-)conditional jumps now contains a single basic block:

flowchart RL
    nbb-7-exit[["`Basic Block (bb-7-exit)
x #60;#45; 2 #42; 3 #43; 1`"]]
    style nbb-7-exit stroke:cyan,stroke-width:6.5px;    style nbb-7-exit stroke:green,stroke-width:6.5px;
Loading
R Code of the CFG

The analysis required 1.8 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplifications: unique-cf-sets, to-basic-blocks and render a simplified/compacted version.

x <- 2 * 3 + 1

While the CFG without basic blocks is much bigger:

flowchart RL
    n7(["`RExpressionList (7)`"])
    n0(["`RSymbol (0)
#34;x#34;`"])
    n1(["`RNumber (1)
#34;2#34;`"])
    n2(["`RNumber (2)
#34;3#34;`"])
    n3(["`RBinaryOp (3)
#34;2 #42; 3#34;`"])
    n3-exit((3-exit))
    n4(["`RNumber (4)
#34;1#34;`"])
    n5(["`RBinaryOp (5)
#34;2 #42; 3 #43; 1#34;`"])
    n5-exit((5-exit))
    n6(["`RBinaryOp (6)
#34;x #60;#45; 2 #42; 3 #43; 1#34;`"])
    n6-exit((6-exit))
    n7-exit((7-exit))
    n6 -.->|"FD"| n7
    n2 -.->|"FD"| n1
    n1 -.->|"FD"| n3
    n3-exit -.->|"FD"| n2
    n4 -.->|"FD"| n3-exit
    n3 -.->|"FD"| n5
    n5-exit -.->|"FD"| n4
    n5 -.->|"FD"| n0
    n0 -.->|"FD"| n6
    n6-exit -.->|"FD"| n5-exit
    n7-exit -.->|"FD"| n6-exit
    style n7 stroke:cyan,stroke-width:6.5px;    style n7-exit stroke:green,stroke-width:6.5px;
Loading

(The analysis required 1.6 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets . )

In a way, using the basic blocks perspective does not remove any of these vertices (we just usually visualize them compacted as their execution order should be "obvious"). The vertices are still there, as elems of the CfgBasicBlockVertex:

flowchart RL
    subgraph nbb-7-exit [Block bb-7-exit]
        direction RL
    n7-exit((7-exit))
    n6-exit((6-exit))
    n7-exit -.-> n6-exit
    n5-exit((5-exit))
    n6-exit -.-> n5-exit
    n4["`RNumber (4)
#34;1#34;`"]
    n5-exit -.-> n4
    n3-exit((3-exit))
    n4 -.-> n3-exit
    n2["`RNumber (2)
#34;3#34;`"]
    n3-exit -.-> n2
    n1["`RNumber (1)
#34;2#34;`"]
    n2 -.-> n1
    n3["`RBinaryOp (3)
#34;2 #42; 3#34;`"]
    n1 -.-> n3
    n5["`RBinaryOp (5)
#34;2 #42; 3 #43; 1#34;`"]
    n3 -.-> n5
    n0["`RSymbol (0)
#34;x#34;`"]
    n5 -.-> n0
    n6["`RBinaryOp (6)
#34;x #60;#45; 2 #42; 3 #43; 1#34;`"]
    n0 -.-> n6
    n7["`RExpressionList (7)`"]
    n6 -.-> n7
    end
    style nbb-7-exit stroke:cyan,stroke-width:6.5px;    style nbb-7-exit stroke:green,stroke-width:6.5px;
Loading

(The analysis required 1.6 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplifications: unique-cf-sets, to-basic-blocks . )

The benefit (for comprehensibility and algorithms) becomes more apparent when we look at a more complicated program:

f <- function(a, b = 3) {
 if(a > b) {
 	return(a * b);
 } else {
 	while(a < b) {
 		a <- a + 1;
 	}
 	return(a);
 }
}

print(f(21) + f(42))

With basic blocks, this code looks like this:

flowchart RL
    nbb-42-exit[["`Basic Block (bb-42-exit)
`"]]
    nbb-42[["`Basic Block (bb-42)
f #60;#45; function(a, b=3) #123; #125;`"]]
    nbb-40-condition[["`Basic Block (bb-40-condition)
#123; if(a #62; b) #123; #125; #125;`"]]
    nbb-19-exit[["`Basic Block (bb-19-exit)
#123; return(a #42; b) #125;`"]]
    nbb-38[["`Basic Block (bb-38)
#123; #123;#125; #125;`"]]
    nbb-33-condition[["`Basic Block (bb-33-condition)
while(a #60; b) #123;#125;`"]]
    nbb-32-exit[["`Basic Block (bb-32-exit)
#123; a #60;#45; a #43; 1 #125;`"]]
    nbb-39-exit[["`Basic Block (bb-39-exit)
return(a)`"]]
    nbb-2-exit[["`Basic Block (bb-2-exit)
function(a, b=3) #123; #125;`"]]
    nbb-5-exit[["`Basic Block (bb-5-exit)
function(a, b=3) #123; #125;`"]]
    nbb-48-exit[["`Basic Block (bb-48-exit)
print(f(21) #43; f(42))`"]]
    nbb-52-exit[["`Basic Block (bb-52-exit)
f(42)`"]]
    nbb-56-exit[["`Basic Block (bb-56-exit)
`"]]
    nbb-42-exit -.->|"FD"| nbb-19-exit
    nbb-42-exit -.->|"FD"| nbb-39-exit
    nbb-40-condition -.->|"FD"| nbb-2-exit
    nbb-40-condition -.->|"FD"| nbb-5-exit
    nbb-19-exit -->|"CD (TRUE)"| nbb-40-condition
    nbb-38 -->|"CD (FALSE)"| nbb-40-condition
    nbb-33-condition -.->|"FD"| nbb-38
    nbb-33-condition -.->|"FD"| nbb-32-exit
    nbb-32-exit -->|"CD (TRUE)"| nbb-33-condition
    nbb-39-exit -->|"CD (FALSE)"| nbb-33-condition
    nbb-2-exit -.->|"FD"| nbb-42
    nbb-5-exit -.->|"FD"| nbb-42
    nbb-48-exit -.->|"FD"| nbb-42
    nbb-52-exit -.->|"FD"| nbb-48-exit
    nbb-52-exit -.->|"FD"| nbb-42-exit
    nbb-56-exit -.->|"FD"| nbb-52-exit
    nbb-56-exit -.->|"FD"| nbb-42-exit
    style nbb-42 stroke:cyan,stroke-width:6.5px;    style nbb-56-exit stroke:green,stroke-width:6.5px;
Loading

(The analysis required 10.6 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplifications: unique-cf-sets, to-basic-blocks and render a simplified/compacted version. )

Now, without basic blocks, this is a different story...

The full CFG
flowchart RL
    n56(["`RExpressionList (56)`"])
    n0(["`RSymbol (0)
#34;f#34;`"])
    n42-params[[42-params]]
    n42-exit((42-exit))
    n42(["`RFunctionDefinition (42)
#34;function(a, b = 3) #123;
 if(a #62; b) #123;
 #92;treturn(a #42; b);
 #125; else #123;
 #92;twhile(a #60; b) #123;
 #92;t#92;ta #60;#45; a #43; 1;
 #92;t#125;
 #92;treturn(a);
 #125;
#125;#34;`"])
    n41(["`RExpressionList (41)`"])
    n40["`RIfThenElse (40)
#34;if(a #62; b) #123;
 #92;treturn(a #42; b);
 #125; else #123;
 #92;twhile(a #60; b) #123;
 #92;t#92;ta #60;#45; a #43; 1;
 #92;t#125;
 #92;treturn(a);
 #125;#34;`"]
    n40-condition[[40-condition]]
    n40-exit((40-exit))
    n8(["`RSymbol (8)
#34;a#34;`"])
    n9(["`RSymbol (9)
#34;b#34;`"])
    n10(["`RBinaryOp (10)
#34;a #62; b#34;`"])
    n10-exit((10-exit))
    n19(["`RExpressionList (19)`"])
    n13(["`RSymbol (13)
#34;return(a #42; b)#34;`"])
    n18["`RFunctionCall (18)
#34;return(a #42; b)#34;`"]
    n18-name[[18-name]]
    n18-exit((18-exit))
    n17(["`RArgument (17)
#34;a #42; b#34;`"])
    n17-before-value[[17-before-value]]
    n14(["`RSymbol (14)
#34;a#34;`"])
    n15(["`RSymbol (15)
#34;b#34;`"])
    n16(["`RBinaryOp (16)
#34;a #42; b#34;`"])
    n16-exit((16-exit))
    n17-exit((17-exit))
    n19-exit((19-exit))
    n39(["`RExpressionList (39)`"])
    n38(["`RExpressionList (38)
#34;while(a #60; b) #123;
 #92;t#92;ta #60;#45; a #43; 1;
 #92;t#125;
 #92;treturn(a);#34;`"])
    n22(["`RSymbol (22)
#34;a#34;`"])
    n23(["`RSymbol (23)
#34;b#34;`"])
    n24(["`RBinaryOp (24)
#34;a #60; b#34;`"])
    n24-exit((24-exit))
    n33["`RWhileLoop (33)
#34;while(a #60; b) #123;
 #92;t#92;ta #60;#45; a #43; 1;
 #92;t#125;#34;`"]
    n33-condition[[33-condition]]
    n33-exit((33-exit))
    n32(["`RExpressionList (32)`"])
    n27(["`RSymbol (27)
#34;a#34;`"])
    n28(["`RSymbol (28)
#34;a#34;`"])
    n29(["`RNumber (29)
#34;1#34;`"])
    n30(["`RBinaryOp (30)
#34;a #43; 1#34;`"])
    n30-exit((30-exit))
    n31(["`RBinaryOp (31)
#34;a #60;#45; a #43; 1#34;`"])
    n31-exit((31-exit))
    n32-exit((32-exit))
    n34(["`RSymbol (34)
#34;return(a)#34;`"])
    n37["`RFunctionCall (37)
#34;return(a)#34;`"]
    n37-name[[37-name]]
    n37-exit((37-exit))
    n36(["`RArgument (36)
#34;a#34;`"])
    n36-before-value[[36-before-value]]
    n35(["`RSymbol (35)
#34;a#34;`"])
    n36-exit((36-exit))
    n38-exit((38-exit))
    n39-exit((39-exit))
    n41-exit((41-exit))
    n2(["`RParameter (2)
#34;a#34;`"])
    n1(["`RSymbol (1)
#34;a#34;`"])
    n2-before-value[[2-before-value]]
    n2-exit((2-exit))
    n5(["`RParameter (5)
#34;b#34;`"])
    n3(["`RSymbol (3)
#34;b#34;`"])
    n5-before-value[[5-before-value]]
    n4(["`RNumber (4)
#34;3#34;`"])
    n5-exit((5-exit))
    n43(["`RBinaryOp (43)
#34;f #60;#45; function(a, b = 3) #123;
 if(a #62; b) #123;
 #92;treturn(a #42; b);
 #125; else #123;
 #92;twhile(a #60; b) #123;
 #92;t#92;ta #60;#45; a #43; 1;
 #92;t#125;
 #92;treturn(a);
 #125;
#125;#34;`"])
    n43-exit((43-exit))
    n44(["`RSymbol (44)
#34;print(f(21) #43; f(42))#34;`"])
    n55["`RFunctionCall (55)
#34;print(f(21) #43; f(42))#34;`"]
    n55-name[[55-name]]
    n55-exit((55-exit))
    n54(["`RArgument (54)
#34;f(21) #43; f(42)#34;`"])
    n54-before-value[[54-before-value]]
    n45(["`RSymbol (45)
#34;f(21)#34;`"])
    n48(["`RFunctionCall (48)
#34;f(21)#34;
 calls:#91;42#93;`"])
    n48-name[[48-name]]
    n48-exit((48-exit))
    n47(["`RArgument (47)
#34;21#34;`"])
    n47-before-value[[47-before-value]]
    n46(["`RNumber (46)
#34;21#34;`"])
    n47-exit((47-exit))
    n48-resolved-call-exit((48-resolved-call-exit))
    n49(["`RSymbol (49)
#34;f(42)#34;`"])
    n52(["`RFunctionCall (52)
#34;f(42)#34;
 calls:#91;42#93;`"])
    n52-name[[52-name]]
    n52-exit((52-exit))
    n51(["`RArgument (51)
#34;42#34;`"])
    n51-before-value[[51-before-value]]
    n50(["`RNumber (50)
#34;42#34;`"])
    n51-exit((51-exit))
    n52-resolved-call-exit((52-resolved-call-exit))
    n53(["`RBinaryOp (53)
#34;f(21) #43; f(42)#34;`"])
    n53-exit((53-exit))
    n54-exit((54-exit))
    n56-exit((56-exit))
    n43 -.->|"FD"| n56
    n40 -.->|"FD"| n41
    n9 -.->|"FD"| n8
    n8 -.->|"FD"| n10
    n10-exit -.->|"FD"| n9
    n18 -.->|"FD"| n19
    n13 -.->|"FD"| n18
    n18-name -.->|"FD"| n13
    n17-before-value -.->|"FD"| n17
    n15 -.->|"FD"| n14
    n14 -.->|"FD"| n16
    n16-exit -.->|"FD"| n15
    n16 -.->|"FD"| n17-before-value
    n17-exit -.->|"FD"| n16-exit
    n17 -.->|"FD"| n18-name
    n18-exit -.->|"FD"| n17-exit
    n19-exit -.->|"FD"| n18-exit
    n38 -.->|"FD"| n39
    n33 -.->|"FD"| n38
    n33 -.->|"FD"| n32-exit
    n23 -.->|"FD"| n22
    n22 -.->|"FD"| n24
    n24-exit -.->|"FD"| n23
    n31 -.->|"FD"| n32
    n29 -.->|"FD"| n28
    n28 -.->|"FD"| n30
    n30-exit -.->|"FD"| n29
    n30 -.->|"FD"| n27
    n27 -.->|"FD"| n31
    n31-exit -.->|"FD"| n30-exit
    n32-exit -.->|"FD"| n31-exit
    n24 -.->|"FD"| n33
    n33-condition -.->|"FD"| n24-exit
    n32 -->|"CD (TRUE)"| n33-condition
    n33-exit -->|"CD (FALSE)"| n33-condition
    n37 -.->|"FD"| n33-exit
    n34 -.->|"FD"| n37
    n37-name -.->|"FD"| n34
    n36-before-value -.->|"FD"| n36
    n35 -.->|"FD"| n36-before-value
    n36-exit -.->|"FD"| n35
    n36 -.->|"FD"| n37-name
    n37-exit -.->|"FD"| n36-exit
    n38-exit -.->|"FD"| n37-exit
    n39-exit -.->|"FD"| n38-exit
    n40-condition -.->|"FD"| n10-exit
    n19 -->|"CD (TRUE)"| n40-condition
    n39 -->|"CD (FALSE)"| n40-condition
    n10 -.->|"FD"| n40
    n40-exit -.->|"FD"| n19-exit
    n40-exit -.->|"FD"| n39-exit
    n41-exit -.->|"FD"| n40-exit
    n1 -.->|"FD"| n2
    n2-before-value -.->|"FD"| n1
    n2-exit -.->|"FD"| n2-before-value
    n2 -.->|"FD"| n42
    n42-params -.->|"FD"| n2-exit
    n42-params -.->|"FD"| n5-exit
    n3 -.->|"FD"| n5
    n5-before-value -.->|"FD"| n3
    n4 -.->|"FD"| n5-before-value
    n5-exit -.->|"FD"| n4
    n5 -.->|"FD"| n42
    n41 -.->|"FD"| n42-params
    n42-exit -.->|"FD"| n41-exit
    n42 -.->|"FD"| n0
    n0 -.->|"FD"| n43
    n43-exit -.->|"FD"| n42
    n55 -.->|"FD"| n43-exit
    n44 -.->|"FD"| n55
    n55-name -.->|"FD"| n44
    n54-before-value -.->|"FD"| n54
    n45 -.->|"FD"| n48
    n48-name -.->|"FD"| n45
    n47-before-value -.->|"FD"| n47
    n46 -.->|"FD"| n47-before-value
    n47-exit -.->|"FD"| n46
    n47 -.->|"FD"| n48-name
    n48-exit -.->|"FD"| n47-exit
    n48-resolved-call-exit -.->|"FD"| n48-exit
    n48-resolved-call-exit -.->|"FD"| n42-exit
    n49 -.->|"FD"| n52
    n52-name -.->|"FD"| n49
    n51-before-value -.->|"FD"| n51
    n50 -.->|"FD"| n51-before-value
    n51-exit -.->|"FD"| n50
    n51 -.->|"FD"| n52-name
    n52-exit -.->|"FD"| n51-exit
    n52-resolved-call-exit -.->|"FD"| n52-exit
    n52-resolved-call-exit -.->|"FD"| n42-exit
    n52 -.->|"FD"| n48-resolved-call-exit
    n48 -.->|"FD"| n53
    n53-exit -.->|"FD"| n52-resolved-call-exit
    n53 -.->|"FD"| n54-before-value
    n54-exit -.->|"FD"| n53-exit
    n54 -.->|"FD"| n55-name
    n55-exit -.->|"FD"| n54-exit
    n56-exit -.->|"FD"| n55-exit
    style n56 stroke:cyan,stroke-width:6.5px;    style n56-exit stroke:green,stroke-width:6.5px;
Loading

(The analysis required 6.5 ms (including the dataflow analysis, normalization, and parsing with the r-shell engine) within the generation environment. We used the following simplification: unique-cf-sets . )

And again it should be noted that even though the example code is more complicated, this is still far from the average real-world script.

Working with the CFG

There is a plethora of functions that you can use the traverse the normalized AST and the dataflow graph. Similarly, flowR provides you with a set of utility functions and classes that you can use to interact with the control flow graph.

Simple Traversal

If you are just interested in traversing the vertices within the cfg, two simple functions visitCfgInOrder and visitCfgInReverseOrder are available. For basic blocks these will automatically traverse the elements contained within the blocks (in the respective order). For example, the following function will return all numbers contained within the CFG:

function sampleCollectNumbers(cfg: ControlFlowInformation, ast: NormalizedAst): RNumberValue[] {
	const numbers: RNumberValue[] = [];
	visitCfgInOrder(cfg.graph, cfg.entryPoints, id => {
		/* obtain the corresponding node from the AST */
		const node = ast.idMap.get(id);
		/* if it is present and a number, add the parsed value to the list */
		if(isRNumber(node)) {
			numbers.push(node.content);
		}
	});
	return numbers;
}

Defined at ./src/documentation/print-cfg-wiki.ts#L51

Calling it with the CFG and AST of the expression x - 1 + 2L * 3 yields the following elements (in this order):

  • {"num":1,"complexNumber":false,"markedAsInt":false}
  • {"num":2,"complexNumber":false,"markedAsInt":true}
  • {"num":3,"complexNumber":false,"markedAsInt":false}

A more useful appearance of these visitors occurs with happensBefore which uses the CFG to determine whether the execution of one vertex always, maybe, or never happens before another vertex (see the corresponding query documentation for more information).

Diffing and Testing

As mentioned above, you can use the test function assertCfg to check whether the control flow graph has the desired shape. The function supports testing for sub-graphs as well (it provides diffing capabilities similar to assertDataflow). If you want to diff two control flow graphs, you can use the diffOfControlFlowGraphs function.

Checking Properties

To be a valid representation of the program, the CFG should satisfy a collection of properties that, in turn, you can automatically assume to hold when working with it. In general, we verify these in every unit test using assertCfgSatisfiesProperties, and you can have a look at the active properties by checking the CfgProperties object. In general, we check for a hammock graph (given that the program contains no definite infinite loop) and the absence of direct cycles.

Sophisticated CFG Traversal

The simple traversal functions are great for simple tasks, but very unhandy when you want to do something more sophisticated that incorporates language semantics such as function calls. Hence, we provide a series of incrementally more sophisticated (but complex) visitors that incorporate various alternative perspectives:

Basic CFG Visitor

The BasicCfgGuidedVisitor class essential provides the same functionality as the simple traversal functions but in a class-based version. Using it, you can select whether you want to traverse the CFG in order or in reverse order.

To replicate the number collector from above, you can use the following code:

class CollectNumbersVisitor extends BasicCfgGuidedVisitor {
	private numbers: RNumberValue[] = [];
	private ast:     NormalizedAst;

	constructor(controlFlow: ControlFlowInformation, ast: NormalizedAst) {
		super({ controlFlow, defaultVisitingOrder: 'forward' });
		this.ast = ast;
	}

	protected override onVisitNode(node: NodeId): void {
		const astNode = this.ast.idMap.get(node);
		if(isRNumber(astNode)) {
			this.numbers.push(astNode.content);
		}
		super.onVisitNode(node);
	}

	public getNumbers(): RNumberValue[] {
		return this.numbers;
	}
}

Defined at ./src/documentation/print-cfg-wiki.ts#L64

Instead of directly calling visitCfgInOrder we pass the forward visiting order to the constructor of the visitor. Executing it with the CFG and AST of the expression x - 1 + 2L * 3, causes the following numbers to be collected:

  • {"num":1,"complexNumber":false,"markedAsInt":false}
  • {"num":2,"complexNumber":false,"markedAsInt":true}
  • {"num":3,"complexNumber":false,"markedAsInt":false}

Syntax-Aware CFG Visitor

The SyntaxAwareCfgGuidedVisitor class incorporates knowledge of the normalized AST into the CFG traversal and directly provides specialized visitors for the various node types. Now, our running example of collecting all numbers simplifies to this:

class CollectNumbersSyntaxVisitor extends SyntaxAwareCfgGuidedVisitor {
	private numbers: RNumberValue[] = [];

	constructor(controlFlow: ControlFlowInformation, normalizedAst: NormalizedAst) {
		super({ controlFlow, normalizedAst, defaultVisitingOrder: 'forward' });
	}

	protected override visitRNumber(node: RNumber<ParentInformation>): void {
		this.numbers.push(node.content);
	}

	public getNumbers(): RNumberValue[] {
		return this.numbers;
	}
}

Defined at ./src/documentation/print-cfg-wiki.ts#L87

And again, executing it with the CFG and AST of the expression x - 1 + 2L * 3, causes the following numbers to be collected:

  • {"num":1,"complexNumber":false,"markedAsInt":false}
  • {"num":2,"complexNumber":false,"markedAsInt":true}
  • {"num":3,"complexNumber":false,"markedAsInt":false}

Dataflow-Aware CFG Visitor

There is a lot of benefit in incorporating the dataflow information into the CFG traversal, as it contains information about overwritten function calls, definition targets, and so on. Our best friend is the getOriginInDfg function which provides the important information about the origin of a vertex in the dataflow graph. The DataflowAwareCfgGuidedVisitor class does some of the basic lifting for us. While it is not ideal for our goal of collecting all numbers, it shines in other areas such as collecting all used variables, ...

class CollectNumbersDataflowVisitor extends DataflowAwareCfgGuidedVisitor {
	private numbers: RNumberValue[] = [];

	constructor(controlFlow: ControlFlowInformation, dataflow: DataflowInformation) {
		super({ controlFlow, dataflow, defaultVisitingOrder: 'forward' });
	}

	protected override visitValue(node: DataflowGraphVertexValue): void {
		const astNode = this.config.dataflow.graph.idMap?.get(node.id);
		if(isRNumber(astNode)) {
			this.numbers.push(astNode.content);
		}
	}

	public getNumbers(): RNumberValue[] {
		return this.numbers;
	}
}

Defined at ./src/documentation/print-cfg-wiki.ts#L103

Again, executing it with the CFG and Dataflow of the expression x - 1 + 2L * 3, causes the following numbers to be collected:

  • {"num":1,"complexNumber":false,"markedAsInt":false}
  • {"num":2,"complexNumber":false,"markedAsInt":true}
  • {"num":3,"complexNumber":false,"markedAsInt":false}

Semantic CFG Visitor

The SemanticCfgGuidedVisitor class is flowR's most advanced visitor that combines the syntactic and dataflow information. The main idea is simple, it provides special handlers for assignments, conditionals, and other R semantics but still follows the structure of the CFG.

Note

This visitor is still in the design phase so please open up a new issue if you have any suggestions or find any bugs.

To explore what it is capable of, let's create a visitor that prints all values that are used in assignments:

class CollectSourcesSemanticVisitor extends SemanticCfgGuidedVisitor {
	private sources: string[] = [];

	constructor(controlFlow: ControlFlowInformation, normalizedAst: NormalizedAst, dataflow: DataflowInformation) {
		super({ controlFlow, normalizedAst, dataflow, defaultVisitingOrder: 'forward' });
	}

	protected override onAssignmentCall({ source }: { source?: NodeId }): void {
		if(source) {
			this.sources.push(recoverName(source, this.config.normalizedAst.idMap) ?? '??');
		}
	}

	public getSources(): NodeId[] {
		return this.sources;
	}
}

Defined at ./src/documentation/print-cfg-wiki.ts#L122

Executing it with the CFG and Dataflow of the expression x <- 2; 3 -> x; assign("x", 42 + 21), causes the following values (/lexemes) to be collected:

  • 2
  • 3
  • +