Skip to content

Commit

Permalink
Merge latest changes from SeeWasm v1.1
Browse files Browse the repository at this point in the history
  • Loading branch information
harveyghq committed Jul 6, 2024
1 parent c0b3243 commit d690d9b
Show file tree
Hide file tree
Showing 43 changed files with 606 additions and 12,133 deletions.
44 changes: 41 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,9 @@ on:
- '**.py'
- ".github/workflows/*.yml"
workflow_dispatch:

jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
Expand All @@ -25,6 +24,7 @@ jobs:
- "3.10"
- "3.11"
- "3.12"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
Expand All @@ -49,7 +49,45 @@ jobs:
run: |
curl -JLO "https://github.com/WebAssembly/wabt/releases/download/1.0.32/wabt-1.0.32-ubuntu.tar.gz"
tar xzf wabt-1.0.32-ubuntu.tar.gz
- name: Cache wasi-sdk
id: cache-wasi-sdk
uses: actions/cache@v4
with:
path: wasi-sdk-22.0
key: wasi-sdk
- name: Install wasi-sdk
if: steps.cache-wasi-sdk.outputs.cache-hit != 'true'
run: |
curl -JLO "https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-22/wasi-sdk-22.0-linux.tar.gz"
tar xzf wasi-sdk-22.0-linux.tar.gz
- name: Cache wasmtime
id: cache-wasmtime
uses: actions/cache@v4
with:
path: ~/.wasmtime
key: wasmtime
- name: Install wasmtime
if: steps.cache-wasmtime.outputs.cache-hit != 'true'
run: |
curl https://wasmtime.dev/install.sh -sSf | bash
- name: Install Rust
run: |
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
rustup target add wasm32-wasi
- uses: actions/setup-go@v5
with:
go-version: 1.22
check-latest: true
cache: true
- name: Install tinygo
run: |
wget https://github.com/tinygo-org/tinygo/releases/download/v0.32.0/tinygo_0.32.0_amd64.deb
sudo dpkg -i tinygo_0.32.0_amd64.deb
- name: Run pytest
run: |
export PATH=$(pwd)/wabt-1.0.32/bin:$PATH
pytest test/test.py --tb=short --durations=0
export PATH=$(pwd)/wasi-sdk-22.0/bin:$PATH
export PATH=$(pwd)/.cargo/bin:$PATH
export PATH=$(pwd)/.wasmtime/bin:$PATH
pytest test.py --tb=short --durations=0
pytest test/test_linux.py --tb=short --durations=0
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ log/

# Results
output/
**/debug
**/rust/**/target

# Cache files
.cache/
Expand Down
113 changes: 74 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

##  Prerequisites 

To run WASEM, install the necessary Python libraries as follows:
To run WASEM, ensure you have Python 3.7 or a later version installed. Then, install the required Python libraries by executing the following command:

```shell
python3 -m pip install -r requirements.txt
Expand All @@ -20,19 +20,29 @@ pip install --upgrade pip wheel
pip install --force-reinstall leb128==1.0.4
```

If you need to visualize the graph (`--visualize`), make sure you have installed `graphviz` on your system. You can use `sudo apt install graphviz` to install.
To verify everything is set up correctly, run the following command:

To analyze files written in other programming languages, you must generate the corresponding WASM file in your local environment. This section provides brief instructions on how to compile C/C++ SGX programs into WASM.
```shell
python3 -m pytest test.py -vv
```

This command traverses the `./test` folder and performs symbolic execution on all Wasm binaries.
If successful, a success message will be displayed, typically **after several seconds**.

Sample Wasm binaries, including "Hello World" in C, Go, and Rust, are provided in the folder.
These can be compiled from their respective source languages; the compilation processes are detailed in [WASI tutorial](https://github.com/bytecodealliance/wasmtime/blob/main/docs/WASI-tutorial.md#compiling-to-wasi) (C and Rust), and [WASI "Hello World" example](https://wasmbyexample.dev/examples/wasi-hello-world/wasi-hello-world.go.en-us.html) (Go).

For Rust and C++ project, you can use `wasm-tools` to demangle symbol names in the `name` section. Install with `cargo install wasm-tools`. Confirm by `wasm-tools --version`. Details can be found at [Wasm Tools](https://github.com/bytecodealliance/wasm-tools).

## Normal Mode

In this section, we would show how to use WASEM to analyze normal Wasm file.
This section demonstrates how to use WASEM to analyze normal Wasm file.

### Options
All valid options are shown in below:

```shell
WASEM, a symbolic execution engine for Wasm module
WASEM, a general symbolic execution framework for WebAssembly (WASM) binaries

optional arguments:
-h, --help show this help message and exit
Expand Down Expand Up @@ -60,57 +70,82 @@ Features:

Analyze:
-s, --symbolic perform the symbolic execution
--search [{dfs,bfs,random,interval}]
set the search algorithm
```

We will detail these options according to their functionalities.

### Input Arguments
According to values given to input arguments, WASEM can deassemble the target binary and construct valid inputs.
WASEM can deassemble the target binary and construct valid inputs based on the values of the input arguments.

Specifically, `-f` is not an optional option, following which the path of to be analyzed Wasm binary should be given.
`--stdin STRING` and `--sym_stdin N` can pass concrete or symbolic bytes through the stdin stream.
The difference between them is that a concrete string has to be passed through `--stdin`, and a string consisting of `N` symbolic characters need to be passed through `--sym_stdin`.
For example, `--sym_stdin 5` will input 5 symbolic bytes if some functions need to read input from stdin.
Specifically, `-f` option is mandatory, and it must be followed by the path of the Wasm binary to be analyzed. The `--stdin STRING` and `--sym_stdin N` options allow users to pass concrete and symbolic bytes through the stdin stream, respectively. A concrete string must be passed using `--stdin`, while a string consisting of `N` symbolic characters must be passed using `--sym_stdin`. For example, `--sym_stdin 5` inputs 5 symbolic bytes for functions that read from stdin.

Similarly, `--args STRING1, STRING2, ...` and `--sym_args N1, N2, ...` pass concrete and symbolic arguments to the Wasm binary.
For instance, if `main` requires three arguments where each of them should be two bytes, `--sym_args 2 2 2` is enough.
Similarly, `--args STRING1, STRING2, ...` and `--sym_args N1, N2, ...` options pass concrete and symbolic arguments to the Wasm binary. For instance, if `main` requires three arguments, each two bytes long, `--sym_args 2 2 2` is enough.

Some programs will interact with files and conduct reading and writing.
WASEM can also simulate this by a *symbolic file system*.
Users have to apply `--sym_files N M` to create `N` symbolic files, where each of them has (or can hold) `M` bytes at most.
Some programs interact with files. WASEM simulates this using a *symbolic file system*. Users can create `N` symbolic files, each with up to `M` bytes, using the `--sym_files N M` option.

Finally, as several high-level programming languages can be compiled to Wasm binaries. We have achieved some specific optimizations, but users have to indicate the source language by `--source_types`.
As multiple high-level programming languages can be compiled to Wasm binaries, we have implemented specific optimizations. To take advantage of these optimizations, users must indicate the source language using the `--source_type` option.

### Features
`--entry` can tell WASEM which function is the entry, from which the symbolic execution performs.
Note that the `__original_main` is the default entry for all Wasm binaries following WASI standard.
The toolchain we mentioned in the [previous section](README.md#Prerequisites) can generate Wasm binaries following WASI standard.
`--entry` specifies the entry function from which symbolic execution begins. By default, the entry function is `__original_main`. Users must specify a proper entry function to ensure the symbolic execution is performed correctly.

As we mentioned in our paper, the given Wasm will be parsed into ICFG.
Sometimes visualizing the ICFG is necessary for debugging.
Thus `--visualize` can achieve this goal.
The input Wasm is parsed into an Interprocedural Control Flow Graph (ICFG), which can be visualized for debugging purposes using the `--visualize` option (requires `graphviz`, installable via `sudo apt install graphviz` on Ubuntu).

During symbolic execution, constraints solving is a bottleneck for the performance.
We have implemented a set of optimizations on the solving process.
The `--incremental` refers to *incremental solving*, which may not always introduce positive optimizations during the analysis. Therefore, we set a flag to allow users to decidie if enable the incremental solving.
The constraint solving process is a bottleneck for symbolic execution performance; however, we have implemented some optimizations to mitigate this issue. The `--incremental` flag enables *incremental solving*. Note that it may not always yield positive results during analysis, and is therefore optional.

The `-v` is an optional option.
Accoding to different values, different levels of logging can be generated, which may help the debugging.
The `-v` option controls the logging level, allowing users to adjust the verbosity of logging output to aid in debugging.

### Analyze
The `-s` is a mandatory option.
It will enable the symbolic execution analysis on the given Wasm binary.
The `-s` is a mandatory option. It enables symbolic execution analysis on the given Wasm binary.

## Example
If we want to execute a program which does not requrie any extra arguments and input, the command should be:
The `--search` option specifies the search algorithm used during symbolic execution. The default algorithm is Depth-First Search (DFS), but users can choose from the following options: `bfs`, `random`, and `interval`.

```shell
python main.py -f test_files/hello_world.wasm -s
### Output
The output of WASEM, including logs and results, is stored in the `output` folder, with each file named according to the pattern `NAME_TIMESTAMP`.

The log file follows a specific format, which illustrates the call trace of the anaylzed program:

```log
2024-07-01 07:50:36,191 | WARNING | Totally remove 27 unrelated functions, around 50.000% of all functions
2024-07-01 07:50:36,205 | INFO | Call: __original_main -> __main_void
2024-07-01 07:50:36,218 | INFO | Call: __main_void -> __wasi_args_sizes_get
2024-07-01 07:50:36,219 | INFO | Call: args_sizes_get (import)
2024-07-01 07:50:36,219 | INFO | args_sizes_get, argc_addr: 70792, arg_buf_size_addr: 70796
2024-07-01 07:50:36,219 | INFO | Return: args_sizes_get (import)
2024-07-01 07:50:36,219 | INFO | Return: __wasi_args_sizes_get
...
```

The corresponding logging and results of feasible paths will be generated in `output` folder.
The result is a JSON file containing feasible paths with their solutions, formatted as follows:

If compilicated arguments are required. For example, a `base64` program whose `main` is like:
```json
{
"Status": "xxx",
"Solution": {"xxx"},
"Output": [
{
"name": "stdout",
"output": "xxx"
},
{
"name": "stderr",
"output": "xxx"
}
]
}
```

You can use `./clean.sh -f` to remove all files in the `output` folder.

### Example
To execute a program that takes no extra arguments or input, use the following command:

```shell
python3 launcher.py -f PATH_TO_WASM_BINARY -s
```

If compilicated arguments are required, for example, a `base64` program with a `main` function like:

```c
// main of base64
Expand All @@ -128,14 +163,14 @@ int main(int argc, char **argv)
// encode or decode
}
```
We can see that the `base64` not only requires a two bytes arguments, but also needs a string of input to encode or decode. Also, the encoded or decoded results will go to a file.
Thus, the command to analyze the `base64` is like:
The `base64` program expects two-byte arguments and a string input to encode or decode, producing output that is written to a file.
Thus, the command to analyze `base64` is like:
```shell
python main.py -f PATH_TO_BASE64 -s --sym_args 2 --sym_stdin 5 --sym_files 1 10 -v info
python3 launcher.py -f PATH_TO_BASE64 -s --sym_args 2 --sym_stdin 5 --sym_files 1 10
```


## SGX Mode

### Compilation
Expand Down
64 changes: 61 additions & 3 deletions clean.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,64 @@
#!/usr/bin/env bash
set -e

rm -rf output
mkdir -p output/log output/result
touch output/log/.placeholder output/result/.placeholder
OUTPUT_DIR=output

error() {
command printf '\033[1;31mError: %s\033[0m\n\n' "$1" 1>&2
}

usage() {
cat >&2 <<END_USAGE
clean.sh: remove and recreate empty "log" and "result" directories
USAGE:
clean.sh -[fi]
FLAGS:
-h, --help Prints help information
-i, --interactive Interactive mode
-f, --force Force remove output files and directories
END_USAGE
}

# parse command line options
while [ $# -gt 0 ]
do
arg="$1"

case "$arg" in
-h|--help)
usage
exit 1
;;
-f|--force)
shift # shift off the argument
rm -rf $OUTPUT_DIR
mkdir -p $OUTPUT_DIR/log $OUTPUT_DIR/result
touch $OUTPUT_DIR/log/.placeholder $OUTPUT_DIR/result/.placeholder
exit 0
;;
-i|--interactive)
shift # shift off the argument
read -p "Are you sure you want to remove all output files and directories (rm -rf $OUTPUT_DIR)? [y/N] " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]
then
rm -rf $OUTPUT_DIR
mkdir -p $OUTPUT_DIR/log $OUTPUT_DIR/result
touch $OUTPUT_DIR/log/.placeholder $OUTPUT_DIR/result/.placeholder
exit 0
fi
exit 1
;;
*)
error "Unknown option: '$arg'"
usage
exit 1
;;
esac
done

usage
exit 1
11 changes: 0 additions & 11 deletions eunomia/arch/wasm/lib/c_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@
import math

from seewasm.arch.wasm.configuration import Configuration
from seewasm.arch.wasm.dwarfParser import (decode_vararg,
decode_var_type,
get_func_index_from_state,
get_source_location)
from seewasm.arch.wasm.exceptions import (ProcFailTermination,
UnsupportExternalFuncError)
from seewasm.arch.wasm.lib.utils import _extract_params, _loadN, _storeN
Expand Down Expand Up @@ -179,13 +175,6 @@ def emul(self, state, param_str, return_str, data_section, analyzer):
logging.info(
f"\tscanf, pattern_p: {pattern_p}, param_p: {param_p}")

# parse the scanf's argument's type
# get addr of vararg 0.
addr = decode_vararg(state, param_p, 0)

# parse dwarf information
var_type, var_size = decode_var_type(analyzer, state, addr)

pattern = C_extract_string_by_mem_pointer(
pattern_p, data_section, state)

Expand Down
15 changes: 14 additions & 1 deletion eunomia/arch/wasm/lib/go_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,17 @@
5: BitVec("global", 32),
6: BitVec("global.Go", 32)} # module the memory map in wasm_exec.js

# used for vulnerability detection
PANIC_FUNCTIONS = {'runtime.nilPanic': 'nil pointer dereference',
'runtime.lookupPanic': 'index out of range',
'runtime.slicePanic': 'slice out of range',
'runtime.sliceToArrayPointerPanic': 'slice smaller than array',
'runtime.divideByZeroPanic': 'divide by zero',
'runtime.unsafeSlicePanic': 'unsafe.Slice: len out of range',
'runtime.chanMakePanic': 'new channel is too big',
'runtime.negativeShiftPanic': 'negative shift',
'runtime.blockingPanic': 'trying to do blocking operation in exported function'}


def calculateHeapAddresses(state, data_section):
val_81328 = simplify(
Expand Down Expand Up @@ -133,7 +144,7 @@ def decode_golang_string(addr):
# i8* @runtime.alloc(i32 %size, i8* %layout, i8* %context, i8* %parentHandle)
alloc_size = len(write_bytes)
inst_call = WasmInstruction(
0x10, 'call', None, None, b'\x10', 0, 0,
0x10, 1, 'call', None, None, b'\x10', 0, 0,
'call a function', 'call ' + str(runtime_alloc_ind),
-1)
arguments = [BitVecVal(0, 32) for i in range(3)]
Expand Down Expand Up @@ -218,6 +229,7 @@ def decode_golang_string(addr):
out_bytes += format_str[parsed_ind:].encode()
logging.info(f"fmt.printf: {repr(format_str)}")
logging.info(f"{out_bytes}")
state.file_sys[1]["content"] += list(out_bytes)
store32(pret, len(out_bytes))
store32(pret + 4, 0)
manually_constructed = True
Expand Down Expand Up @@ -346,6 +358,7 @@ def decode_golang_string(addr):
elif self.name == 'runtime.putchar':
ch = param_list[0]
print(chr(ch.as_long()), end='')
state.file_sys[1]["content"] += [ch.as_long()]
else:
print(param_list)

Expand Down
Loading

0 comments on commit d690d9b

Please sign in to comment.