Skip to content

Using PBench

Steve Burnett edited this page Jul 18, 2024 · 15 revisions

This page presents the online help for all pbench commands.

pbench

Run ./pbench --help to see the online help for pbench.

Tool for running Presto benchmarks

Usage:
  pbench [command]

Available Commands:
  cmp         Compare two query result directories
  completion  Generate the autocompletion script for the specified shell
  genconfig   Generate benchmark cluster configurations
  help        Help about any command
  loadjson    Load query JSON files into event listener database and run recorders
  replay      Replay workload from a CSV file
  round       Round the decimal values in the benchmark query output files for easier comparison
  run         Run a benchmark
  save        Save table information for recreating the schema and data

Flags:
  -h, --help   help for pbench

Use "pbench [command] --help" for more information about a command.

pbench cmp

For more information about pbench cmp see Comparing Benchmarks.

Note: pbench cmp is an experimental feature that is not included in the default builds in Releases.

Run ./pbench cmp --help to see the online help for pbench cmp.

Compare two query result directories

Usage:
  pbench cmp [flags] [directory 1] [directory 2]

Flags:
  -r, --file-id-regex string regex to extract file id from file names in two directories to find matching files to compare (default ".*(query_\d{2})(?:_c0)?\.output")
  -h, --help help for cmp
  -o, --output-path string diff output path (default "./diff")

pbench completion

Run ./pbench completion --help to see the online help for pbench completion.

Generate the autocompletion script for pbench for the specified shell.
See each sub-command's help for details on how to use the generated script.

Usage:
  pbench completion [command]

Available Commands:
  bash        Generate the autocompletion script for bash
  fish        Generate the autocompletion script for fish
  powershell  Generate the autocompletion script for powershell
  zsh         Generate the autocompletion script for zsh

Flags:
  -h, --help   help for completion

Use "pbench completion [command] --help" for more information about a command.

pbench genconfig

For more information about pbench genconfig see Generating Benchmark Configurations.

Run ./pbench genconfig --help to see the online help for pbench genconfig.

Generate benchmark cluster configurations

Usage:
  pbench genconfig [flags] [directory to search recursively for config.json]
  pbench genconfig [command]

Available Commands:
  default     Print the built-in default generator parameter file.

Flags:
  -h, --help                    help for genconfig
  -p, --parameter-file string   Specifies the parameter file. Use built-in defaults if not specified.
  -t, --template-dir string     Specifies the template directory. Use built-in template if not specified.

Use "pbench genconfig [command] --help" for more information about a command.

pbench help

Run ./pbench help --help to see the online help for pbench help.

Help provides help for any command in the application.
Simply type pbench help [path to command] for full details.

Usage:
  pbench help [command] [flags]

Flags:
  -h, --help   help for help

For example, the two commands

  • pbench genconfig default --help
  • pbench help genconfig default

return the same output:

Print the built-in default generator parameter file.

Usage:
  pbench genconfig default

Flags:
  -h, --help   help for default

pbench loadjson

Run ./pbench loadjson --help to see the online help for pbench loadjson.

Load query JSON files into event listener database and run recorders

Usage:
  pbench loadjson [flags] [list of files or directories to process]

Flags:
  -c, --comment string       Add a comment to this run (optional)
  -x, --extract-plan         Extract the plan JSON from query JSON then save them to the output path
  -h, --help                 help for loadjson
      --influx string        InfluxDB connection config for run recorder (optional)
      --mysql string         MySQL connection config for event listener and run recorder (optional)
  -n, --name string          Assign a name to this run. (default: "load_<current time>") (default "load_240502-144312")
  -o, --output-path string   Output directory path (default "/Users/<username>/Downloads/pbench")
  -P, --parallel int         Number of parallel threads to load json files (default 10)
  -r, --record-run           Record all the loaded JSON as a run

The default for -P varies, as its default is the number of CPU cores on the system.

pbench replay

Run ./pbench replay --help to see the online help for pbench replay.

Replay workload from a CSV file
The fields in the CSV file are:
"query_id","create_time","wall_time_millis","output_rows","written_output_rows","catalog","schema","session_properties","query"
We also expect the queries in this CSV file are sorted by "create_time" in ascending order.

Usage:
  pbench replay [flags] [workload csv file]

Flags:
      --force-https          Force all API requests to use HTTPS
  -h, --help                 help for replay
  -n, --name string          Assign a name to this run. (default: "replay_<current time>") (default "replay_240620-105652")
  -o, --output-path string   Output directory path (default "/Users/<username>/Downloads/pbench")
  -p, --password string      Presto user password (optional)
  -s, --server string        Presto server address (default "http://127.0.0.1:8080")
      --trino                Use Trino protocol
  -u, --user string          Presto user name (default "pbench")

pbench round

Run ./pbench round --help to see the online help for pbench round.

Note: pbench round is an experimental feature that is not included in the default builds in Releases.

The program will try to match every column in the first row to see which column has matching decimal.

After processing the first row, it will only look at the matched columns. So if the overly long decimal only appears from the second row, this might not work properly.

A PR was opened to fix the native/Java decimal precision discrepancy but so far it does not work quite well:

https://github.com/facebookincubator/velox/pull/7944

Usage:
  pbench round [flags] [list of files or directories to process]

Flags:
  -e, --file-extension stringArray Specifies the file extensions ton include for processing (including the dot). You can specify multiple file extensions. (default [.output])
  -f, --format string Specifies the format of the files. Accepted values are: "csv"" or "json" which is the output file from the "run"" command (default "json")
  -h, --help help for round
  -p, --precision int Decimal precision to preserve. (default 12)
  -r, --recursive Recursively walk a path if a directory is provided in the arguments.
  -i, --rewrite-in-place When turned on, we will rewrite the file in-place. Otherwise, we save the rewritten file separately.

pbench run

For more information about pbench run, see Running PBench.

Run ./pbench run --help to see the online help for pbench run.

Run a benchmark that is defined by a sequence of JSON configuration files.

Usage:
  pbench run [flags] [list of root-level benchmark stage JSON files]

Flags:
  -c, --comment string       Add a comment to this run (optional)
  -h, --help                 help for run
      --influx string        InfluxDB connection config for run recorder (optional)
      --mysql string         MySQL connection config for run recorder (optional)
  -n, --name string          Assign a name to this run. (default: "<main stage name>-<current time>")
  -o, --output-path string   Output directory path (default "current directory")
  -p, --password string      Presto user password (optional)
      --pulumi string        (only works when a MySQL run recorder is specified) Pulumi API config for storing deployment details with MySQL (optional)
  -k, --rand-skip int        Skip the first N random selections from the sequence (optional)
  -e, --seed int             Random seed for randomized execution (default 1712866111317118)
  -s, --server string        Presto server address (default "http://127.0.0.1:8080")
  -u, --user string          Presto user name (default "pbench")

pbench save

Run ./pbench save --help to see the online help for pbench save.

Save table information for recreating the schema and data

Usage:
  pbench save [flags] [list of table names]

Flags:
      --catalog string        Catalog name
  -f, --file string           CSV file to read catalog,schema,table
      --force-https           Force all API requests to use HTTPS
  -h, --help                  help for save
  -o, --output-path string    Output directory path (default "/Users/<username>/Downloads/collect-stats")
  -P, --parallel int          Number of parallel threads to save table summaries. (default 10)
  -p, --password string       Presto user password (optional)
      --schema string         Schema name
  -s, --server string         Presto server address (default "http://127.0.0.1:8080")
      --session stringArray   Session property (property can be used multiple times; format is
                              key=value; use 'SHOW SESSION' in Presto CLI to see available properties)
      --trino                 Use Trino protocol
  -u, --user string           Presto user name (default "pbench")```

The default for -P varies, as its default is the number of CPU cores on the system.