-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ssa fuzzer #7641
base: master
Are you sure you want to change the base?
feat: ssa fuzzer #7641
Conversation
tooling/ssa_fuzzer/fuzzer/Cargo.toml
Outdated
@@ -0,0 +1,33 @@ | |||
[package] | |||
name = "fuzz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might end up with having other fuzzers in the project, at which point this name will have to be a bit more specific I think.
for i in 0..config::NUMBER_OF_VARIABLES_INITIAL { | ||
let witness = Witness(i); | ||
let value = FieldElement::from(data.initial_witness.get(i as usize).unwrap()); | ||
witness_map.insert(witness, value); | ||
values.push(value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It's a pity that WitnessMap
doesn't have a constructor that takes an iterator, but you could avoid .get(i as usize).unwrap()
by doing for example:
for (i, f) in data.initial_witness.iter().map(FieldElement::from).enumerate() {
witness_map.insert(Witness(i), f);
values.push(f);
}
let mut acir_ids = vec![]; | ||
let mut brillig_ids = vec![]; | ||
// by default private variables ids are indexed from 0 to NUMBER_OF_VARIABLES_INITIAL - 1 | ||
for i in 0..config::NUMBER_OF_VARIABLES_INITIAL { | ||
acir_ids.push(i); | ||
brillig_ids.push(i); | ||
} | ||
Self { | ||
acir_builder, | ||
brillig_builder, | ||
acir_ids, | ||
brillig_ids, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut acir_ids = vec![]; | |
let mut brillig_ids = vec![]; | |
// by default private variables ids are indexed from 0 to NUMBER_OF_VARIABLES_INITIAL - 1 | |
for i in 0..config::NUMBER_OF_VARIABLES_INITIAL { | |
acir_ids.push(i); | |
brillig_ids.push(i); | |
} | |
Self { | |
acir_builder, | |
brillig_builder, | |
acir_ids, | |
brillig_ids, | |
// by default private variables ids are indexed from 0 to NUMBER_OF_VARIABLES_INITIAL - 1 | |
let ids = (0..config::NUMBER_OF_VARIABLES_INITIAL).collect::<Vec<_>>(); | |
Self { | |
acir_builder, | |
brillig_builder, | |
acir_ids: ids.clone(), | |
brillig_ids: ids, |
let acir_arrays_len = self.acir_arrays.len() as u32; | ||
let brillig_arrays_len = self.brillig_arrays.len() as u32; | ||
let acir_array = self.acir_arrays[(array_idx % acir_arrays_len) as usize]; | ||
let brillig_array = self.brillig_arrays[(array_idx % brillig_arrays_len) as usize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can acir_arrays
and brillig_arrays
be of different length? When do they diverge?
As I understand this is where you project arbitrary lhs
and rhs
numbers to something you can actually use. So, say you have 3 arrays in your program, and the arbitrary lhs
is 1234, then here we will access array 1.
What I'm not sure about is what happens if for example acir_arrays
has 3 items, and brillig_arrays
has 4 items; in that case we will access acir_arrays[1]
and brillig_arrays[2]
. I may be misunderstanding how they are supposed to work, but wondering if it's possible that we built up acir_arrays
and brillig_arrays
together, so e.g. acir_arrays[1]
corresponds to brillig_arrays[1]
, and to achieve consistent results we should be getting the same arrays, but for some reason later we insert brillig_arrays[3]
, and that causes insert_array_get(1234, _)
to refer to two unrelated arrays now, because of the modulo.
Similar thing in the next section where modulo is used for the index itself: if the two arrays can be of different lengths, then how can we reason about getting a random element from both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the length of the arrays should always match
pub(crate) fn insert_array(&mut self, elements: Vec<u32>) { | ||
let mut acir_values_ids = vec![]; | ||
let mut brillig_values_ids = vec![]; | ||
for elem in elements { | ||
let acir_len = self.acir_ids.len(); | ||
let brillig_len = self.brillig_ids.len(); | ||
acir_values_ids.push(elem % acir_len as u32); | ||
brillig_values_ids.push(elem % brillig_len as u32); | ||
} | ||
let acir_len = acir_values_ids.len(); | ||
let brillig_len = brillig_values_ids.len(); | ||
let acir_array = self.acir_builder.insert_make_array(acir_values_ids); | ||
let brillig_array = self.brillig_builder.insert_make_array(brillig_values_ids); | ||
self.acir_arrays.push(Array { id: acir_array, length: acir_len as u32 }); | ||
self.brillig_arrays.push(Array { id: brillig_array, length: brillig_len as u32 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub(crate) fn insert_array(&mut self, elements: Vec<u32>) { | |
let mut acir_values_ids = vec![]; | |
let mut brillig_values_ids = vec![]; | |
for elem in elements { | |
let acir_len = self.acir_ids.len(); | |
let brillig_len = self.brillig_ids.len(); | |
acir_values_ids.push(elem % acir_len as u32); | |
brillig_values_ids.push(elem % brillig_len as u32); | |
} | |
let acir_len = acir_values_ids.len(); | |
let brillig_len = brillig_values_ids.len(); | |
let acir_array = self.acir_builder.insert_make_array(acir_values_ids); | |
let brillig_array = self.brillig_builder.insert_make_array(brillig_values_ids); | |
self.acir_arrays.push(Array { id: acir_array, length: acir_len as u32 }); | |
self.brillig_arrays.push(Array { id: brillig_array, length: brillig_len as u32 }); | |
pub(crate) fn insert_array(&mut self, elements: &[u32]) { | |
let mut acir_values_ids = vec![]; | |
let mut brillig_values_ids = vec![]; | |
let acir_len = self.acir_ids.len(); | |
let brillig_len = self.brillig_ids.len(); | |
for elem in elements { | |
acir_values_ids.push(elem % acir_len as u32); | |
brillig_values_ids.push(elem % brillig_len as u32); | |
} | |
let acir_array = self.acir_builder.insert_make_array(acir_values_ids); | |
let brillig_array = self.brillig_builder.insert_make_array(brillig_values_ids); | |
self.acir_arrays.push(Array { id: acir_array, length: elements.len() as u32 }); | |
self.brillig_arrays.push(Array { id: brillig_array, length: elements.len() as u32 }); |
nit: Most of the time we pass &[_]
instead of Vec<_>
unless you really want to pass ownership. You don't have to, just a bit more general as it works with arrays too.
I also moved the acir_ids.len()
out of the loop, just to make it clear it acir_ids
don't change during the loop.
if self.acir_arrays.is_empty() { | ||
// no arrays created | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this ties into my earlier question: how can you be sure that self.brillig_arrays
is also empty?
self.acir_arrays.push(Array { id: acir_return_id, length: acir_array.length }); | ||
self.brillig_arrays.push(Array { id: brillig_return_id, length: brillig_array.length }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only see them grow in lockstep, but maybe I'm looking in the wrong place.
Maybe a different representation could remove the ambiguity, for example instead of this:
/// ACIR arrays
acir_arrays: Vec<Array>,
/// Brillig arrays
brillig_arrays: Vec<Array>,
It could be just:
/// ACIR and Brillig arrays
arrays: Vec<Array>,
with:
self.arrays.push(Array { acir_id: acir_return_id, brillig_id: brillig_return_id, length: array.length });
I'm probably wrong, just don't see why.
let acir_id = self.acir_ids[(index % acir_array.length) as usize]; | ||
let brillig_id = self.brillig_ids[(index % brillig_array.length) as usize]; | ||
let acir_return_id = self.acir_builder.insert_array_get(acir_array_id, acir_id); | ||
let brillig_return_id = self.brillig_builder.insert_array_get(brillig_array_id, brillig_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand what the modulo operation do here.
So we called context.insert_array_get(array_idx, index)
with arbitrary u32
values, like 12345 and 67890. We take array_idx % self.acir_arrays.len()
to pick an array that actually exists, say, acir_arrays[2]
. Then, we look at the length of that array, and use index % acir_array.length
to access acir_ids
. Finally we use whatever is in that acir_id
slot to actually issue a get against the array.
I don't understand why we take a modulo of acir_array.length
to access acir_ids
. What do they have to do with each other? Say acir_arrays[2]
has a length of 3; does it mean that acir_ids[0..3]
are valid indexes for this particular array, but others are not?
When I look at insert_array
, I see this:
pub(crate) fn insert_array(&mut self, elements: Vec<u32>) {
let mut acir_values_ids = vec![];
for elem in elements {
let acir_len = self.acir_ids.len();
acir_values_ids.push(elem % acir_len as u32);
}
let acir_len = acir_values_ids.len();
let acir_array = self.acir_builder.insert_make_array(acir_values_ids);
self.acir_arrays.push(Array { id: acir_array, length: acir_len as u32 });
I interpret that as elements
be a vector of random u32
numbers, which we project down to the length of acir_ids
, and then we create an array that consists of random variable IDs that already existed.
Say we have 5 IDs in acir_ids
and an elements
consists of 100 u32
, then by the end we should have 5 (or 6 with the new array?) values in acir_ids
and 100 values on the 0..5
range in the array.
Now, when we do let acir_id = self.acir_ids[(index % acir_array.length) as usize];
we would be doing acir_ids[67890 % 100]
, which is acir_ids[90]
, which should cause a panic because the acir_ids
only has 5 items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my mistake, the index should be a number
if self.is_constant { | ||
(Witness(0), Witness(0)) | ||
} else if self.acir_ids.len() as u32 != NUMBER_OF_VARIABLES_INITIAL { | ||
(Witness(NUMBER_OF_VARIABLES_INITIAL), Witness(NUMBER_OF_VARIABLES_INITIAL)) | ||
} else { | ||
(Witness(NUMBER_OF_VARIABLES_INITIAL - 1), Witness(NUMBER_OF_VARIABLES_INITIAL - 1)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand this:
- When in constant mode, instead of adding
NUMBER_OF_VARIABLES_INITIAL
parameter witnesses, we insert them as constants into the circuit, which should allow constant folding to eliminate a lot of the code, and the witness #0 will be the return value. - When we have more (I would write
>= NUMBER_OF_VARIABLES_INITIAL
) variables than the initials, then we assume the one following the parameters is the return value. - If we have exactly
NUMBER_OF_VARIABLES_INITIAL
variables then it's just the inputs, with not even a return value, and we return the last input witness. - It's not spelled out, but we won't have less than
NUMBER_OF_VARIABLES_INITIAL
variables in non-constant mode.
What I don't fully understand is if constant
mode will have a return value at 0, then why doesn't the non-constant case also have a return value at NUMBER_OF_VARIABLES_INITIAL
. As I understand we always create NUMBER_OF_VARIABLES_INITIAL
parameters but we might use less than that. Treating NUMBER_OF_VARIABLES_INITIAL-1
as return value seems strange, as it's a) not a return value and b) might not even be used and c) trivially the same in both circuits, coming from the outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only case where we return NUMBER_OF_VARIABLES_INITIAL - 1
is when the fuzzer has not provided any valid instructions.
Let's assume we have 3 variables in non-constant mode.
If no instructions were inserted, then we terminate the program with the last element of ids
, which is equal to Witness(2)
. After execution, the resulting witness will be (Witness(0), Witness(1), Witness(2))
, we return Witness(2)
.
If any number of instructions were inserted, then only one (return value) will be added to the final witness, which is (Witness(0), Witness(1), Witness(2), Witness(3)), we return Witness(3).
In constant mode, instead of variables, we insert constants to preserve the logic of interaction with the ids, so the resulting witness will have only one value (return value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up on all my comments!
Comparing with constant folding is a great idea 👏
I sent some more questions about how things work.
You need to delete `rust-toolchain.toml` from root of the project, because cargo-fuzz requires nightly compiler for address sanitizer. | ||
``` | ||
rustup install nightly | ||
rustup default nightly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that this command works as well, so I don't have to default to nightly
:
cargo +nightly fuzz ...
let initial_witness = witness_map; | ||
log::debug!("instructions: {:?}", data.methods.clone()); | ||
log::debug!("initial_witness: {:?}", initial_witness); | ||
|
||
let mut fuzzer = Fuzzer::new(type_.clone(), values); | ||
for method in data.methods.clone() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let initial_witness = witness_map; | |
log::debug!("instructions: {:?}", data.methods.clone()); | |
log::debug!("initial_witness: {:?}", initial_witness); | |
let mut fuzzer = Fuzzer::new(type_.clone(), values); | |
for method in data.methods.clone() { | |
let initial_witness = witness_map; | |
log::debug!("instructions: {:?}", data.methods); | |
log::debug!("initial_witness: {:?}", initial_witness); | |
let mut fuzzer = Fuzzer::new(type_, values); | |
for method in data.methods { |
Printing shouldn't need cloning, and then both seem like the last time we use these variables, so we can move them.
|
||
#[derive(Debug, Error)] | ||
pub enum FuzzerBuilderError { | ||
#[error("Compilation panicked")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[error("Compilation panicked")] | |
#[error("Compilation panicked: {0}")] |
At least I think we have to include the field to actually see the value.
let acir_id = self.acir_ids[(index % array.length) as usize]; | ||
let brillig_id = self.brillig_ids[(index % array.length) as usize]; | ||
let acir_value = u32_to_id_value(value % (self.acir_ids.len() as u32)); | ||
let brillig_value = u32_to_id_value(value % (self.brillig_ids.len() as u32)); | ||
|
||
let acir_return_id = self.acir_builder.insert_array_set(acir_array_id, acir_id, acir_value); | ||
let brillig_return_id = | ||
self.brillig_builder.insert_array_set(brillig_array_id, brillig_id, brillig_value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? What is acir_id
supposed to be? It looks like index
is a random number which we correctly scale down to the length of the array using (index % array.length) as usize
, but then instead of using that value to index the acir_array_id
, we use it to pick an ID from acir_ids
, from the first N IDs, where N is the length of the array.
I thought you should duse use index % array.length
to set the value, which is what seems to be happening in insert_array_get
:
let acir_array_id = array.acir_id;
let index = index % array.length;
let acir_return_id = self.acir_builder.insert_array_get(acir_array_id, index);
fn optimize_into_acir( | ||
builder: FunctionBuilder, | ||
options: SsaEvaluatorOptions, | ||
) -> Result<ArtifactsAndWarnings, RuntimeError> { | ||
let builder = SsaBuilder { | ||
ssa: builder.finish(), | ||
ssa_logging: SsaLogging::None, | ||
print_codegen_timings: false, | ||
}; | ||
let ssa = optimize_all(builder, &options)?; | ||
|
||
let brillig = ssa.to_brillig(&BrilligOptions::default()); | ||
|
||
let ssa = SsaBuilder { | ||
ssa, | ||
ssa_logging: options.ssa_logging.clone(), | ||
print_codegen_timings: options.print_codegen_timings, | ||
} | ||
.run_pass(|ssa| ssa.fold_constants_with_brillig(&brillig), "Inlining Brillig Calls Inlining") | ||
.run_pass(Ssa::dead_instruction_elimination, "Dead Instruction Elimination (2nd)") | ||
.finish(); | ||
|
||
// to triage ssa after optimizations, when crash found | ||
let formatted_ssa = format!("{}", ssa); | ||
log::debug!("formatted_ssa: {:?}", formatted_ssa); | ||
match ssa.into_acir(&brillig, &BrilligOptions::default(), options.expression_width) { | ||
Ok(artifacts) => Ok(ArtifactsAndWarnings(artifacts, vec![])), | ||
Err(e) => Err(e), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it could easily diverge from the noirc_evaluator::ssa::optimize_into_acir
version.
I see that in the beginning the difference is how the SsaBuilder
is constructed, so maybe instead of this:
mod ssa {
pub(crate) fn optimize_into_acir(
program: Program,
options: &SsaEvaluatorOptions,
) -> Result<ArtifactsAndWarnings, RuntimeError> {
let builder = SsaBuilder::new(
program,
options.ssa_logging.clone(),
options.print_codegen_timings,
&options.emit_ssa,
)?;
let mut ssa = optimize_all(builder, options)?;
... // more steps on the SSA
}
}
mod fuzzer {
fn optimize_into_acir(
builder: FunctionBuilder,
options: SsaEvaluatorOptions,
) -> Result<ArtifactsAndWarnings, RuntimeError> {
let builder = SsaBuilder {
ssa: builder.finish(),
ssa_logging: SsaLogging::None,
print_codegen_timings: false,
};
let ssa = optimize_all(builder, &options)?;
... // same-ish steps on the SSA as above
}
}
We could have a version of optimize_into_acir
that takes an SsaBuilder
and call that from both places:
mod ssa {
pub fn optimize_ssa_builder_into_acir(
builder: SsaBuilder,
options: &SsaEvaluatorOptions,
) -> Result<ArtifactsAndWarnings, RuntimeError> {
let mut ssa = optimize_all(builder, options)?;
... // more steps on the SSA
}
pub(crate) fn optimize_into_acir(
program: Program,
options: &SsaEvaluatorOptions,
) -> Result<ArtifactsAndWarnings, RuntimeError> {
let builder = SsaBuilder::new(
program,
options.ssa_logging.clone(),
options.print_codegen_timings,
&options.emit_ssa,
)?;
optimize_ssa_builder_into_acir(builder, options)
}
mod fuzzer {
fn optimize_function_builder_into_acir(
builder: FunctionBuilder,
options: SsaEvaluatorOptions,
) -> Result<ArtifactsAndWarnings, RuntimeError> {
let builder = SsaBuilder {
ssa: builder.finish(),
ssa_logging: SsaLogging::None,
print_codegen_timings: false,
};
ssa::optimize_ssa_builder_into_acir(builder)
}
}
Description
Summary*
This PR introduces SSA fuzzer for testing ACIR and Brillig execution of ssa built from scratch.
Additional Context
Documentation*
Check one:
PR Checklist*
cargo fmt
on default settings.