Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ssa fuzzer #7641

Open
wants to merge 70 commits into
base: master
Choose a base branch
from
Open

Conversation

jewelofchaos9
Copy link
Contributor

Description

Summary*

This PR introduces SSA fuzzer for testing ACIR and Brillig execution of ssa built from scratch.

Additional Context

Documentation*

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

@jewelofchaos9 jewelofchaos9 requested a review from aakoshh March 27, 2025 01:27
@@ -0,0 +1,33 @@
[package]
name = "fuzz"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might end up with having other fuzzers in the project, at which point this name will have to be a bit more specific I think.

Comment on lines 71 to 76
for i in 0..config::NUMBER_OF_VARIABLES_INITIAL {
let witness = Witness(i);
let value = FieldElement::from(data.initial_witness.get(i as usize).unwrap());
witness_map.insert(witness, value);
values.push(value);
}
Copy link
Contributor

@aakoshh aakoshh Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's a pity that WitnessMap doesn't have a constructor that takes an iterator, but you could avoid .get(i as usize).unwrap() by doing for example:

for (i, f) in data.initial_witness.iter().map(FieldElement::from).enumerate() {
  witness_map.insert(Witness(i), f);
  values.push(f);
}

Comment on lines 92 to 103
let mut acir_ids = vec![];
let mut brillig_ids = vec![];
// by default private variables ids are indexed from 0 to NUMBER_OF_VARIABLES_INITIAL - 1
for i in 0..config::NUMBER_OF_VARIABLES_INITIAL {
acir_ids.push(i);
brillig_ids.push(i);
}
Self {
acir_builder,
brillig_builder,
acir_ids,
brillig_ids,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut acir_ids = vec![];
let mut brillig_ids = vec![];
// by default private variables ids are indexed from 0 to NUMBER_OF_VARIABLES_INITIAL - 1
for i in 0..config::NUMBER_OF_VARIABLES_INITIAL {
acir_ids.push(i);
brillig_ids.push(i);
}
Self {
acir_builder,
brillig_builder,
acir_ids,
brillig_ids,
// by default private variables ids are indexed from 0 to NUMBER_OF_VARIABLES_INITIAL - 1
let ids = (0..config::NUMBER_OF_VARIABLES_INITIAL).collect::<Vec<_>>();
Self {
acir_builder,
brillig_builder,
acir_ids: ids.clone(),
brillig_ids: ids,

let acir_arrays_len = self.acir_arrays.len() as u32;
let brillig_arrays_len = self.brillig_arrays.len() as u32;
let acir_array = self.acir_arrays[(array_idx % acir_arrays_len) as usize];
let brillig_array = self.brillig_arrays[(array_idx % brillig_arrays_len) as usize];
Copy link
Contributor

@aakoshh aakoshh Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can acir_arrays and brillig_arrays be of different length? When do they diverge?

As I understand this is where you project arbitrary lhs and rhs numbers to something you can actually use. So, say you have 3 arrays in your program, and the arbitrary lhs is 1234, then here we will access array 1.

What I'm not sure about is what happens if for example acir_arrays has 3 items, and brillig_arrays has 4 items; in that case we will access acir_arrays[1] and brillig_arrays[2]. I may be misunderstanding how they are supposed to work, but wondering if it's possible that we built up acir_arrays and brillig_arrays together, so e.g. acir_arrays[1] corresponds to brillig_arrays[1], and to achieve consistent results we should be getting the same arrays, but for some reason later we insert brillig_arrays[3], and that causes insert_array_get(1234, _) to refer to two unrelated arrays now, because of the modulo.

Similar thing in the next section where modulo is used for the index itself: if the two arrays can be of different lengths, then how can we reason about getting a random element from both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the length of the arrays should always match

Comment on lines 136 to 150
pub(crate) fn insert_array(&mut self, elements: Vec<u32>) {
let mut acir_values_ids = vec![];
let mut brillig_values_ids = vec![];
for elem in elements {
let acir_len = self.acir_ids.len();
let brillig_len = self.brillig_ids.len();
acir_values_ids.push(elem % acir_len as u32);
brillig_values_ids.push(elem % brillig_len as u32);
}
let acir_len = acir_values_ids.len();
let brillig_len = brillig_values_ids.len();
let acir_array = self.acir_builder.insert_make_array(acir_values_ids);
let brillig_array = self.brillig_builder.insert_make_array(brillig_values_ids);
self.acir_arrays.push(Array { id: acir_array, length: acir_len as u32 });
self.brillig_arrays.push(Array { id: brillig_array, length: brillig_len as u32 });
Copy link
Contributor

@aakoshh aakoshh Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(crate) fn insert_array(&mut self, elements: Vec<u32>) {
let mut acir_values_ids = vec![];
let mut brillig_values_ids = vec![];
for elem in elements {
let acir_len = self.acir_ids.len();
let brillig_len = self.brillig_ids.len();
acir_values_ids.push(elem % acir_len as u32);
brillig_values_ids.push(elem % brillig_len as u32);
}
let acir_len = acir_values_ids.len();
let brillig_len = brillig_values_ids.len();
let acir_array = self.acir_builder.insert_make_array(acir_values_ids);
let brillig_array = self.brillig_builder.insert_make_array(brillig_values_ids);
self.acir_arrays.push(Array { id: acir_array, length: acir_len as u32 });
self.brillig_arrays.push(Array { id: brillig_array, length: brillig_len as u32 });
pub(crate) fn insert_array(&mut self, elements: &[u32]) {
let mut acir_values_ids = vec![];
let mut brillig_values_ids = vec![];
let acir_len = self.acir_ids.len();
let brillig_len = self.brillig_ids.len();
for elem in elements {
acir_values_ids.push(elem % acir_len as u32);
brillig_values_ids.push(elem % brillig_len as u32);
}
let acir_array = self.acir_builder.insert_make_array(acir_values_ids);
let brillig_array = self.brillig_builder.insert_make_array(brillig_values_ids);
self.acir_arrays.push(Array { id: acir_array, length: elements.len() as u32 });
self.brillig_arrays.push(Array { id: brillig_array, length: elements.len() as u32 });

nit: Most of the time we pass &[_] instead of Vec<_> unless you really want to pass ownership. You don't have to, just a bit more general as it works with arrays too.

I also moved the acir_ids.len() out of the loop, just to make it clear it acir_ids don't change during the loop.

Comment on lines 177 to 180
if self.acir_arrays.is_empty() {
// no arrays created
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this ties into my earlier question: how can you be sure that self.brillig_arrays is also empty?

Comment on lines 197 to 198
self.acir_arrays.push(Array { id: acir_return_id, length: acir_array.length });
self.brillig_arrays.push(Array { id: brillig_return_id, length: brillig_array.length });
Copy link
Contributor

@aakoshh aakoshh Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only see them grow in lockstep, but maybe I'm looking in the wrong place.

Maybe a different representation could remove the ambiguity, for example instead of this:

    /// ACIR arrays
    acir_arrays: Vec<Array>,
    /// Brillig arrays
    brillig_arrays: Vec<Array>,

It could be just:

    /// ACIR and Brillig arrays
    arrays: Vec<Array>,

with:

self.arrays.push(Array { acir_id: acir_return_id, brillig_id: brillig_return_id, length: array.length });

I'm probably wrong, just don't see why.

Comment on lines 167 to 170
let acir_id = self.acir_ids[(index % acir_array.length) as usize];
let brillig_id = self.brillig_ids[(index % brillig_array.length) as usize];
let acir_return_id = self.acir_builder.insert_array_get(acir_array_id, acir_id);
let brillig_return_id = self.brillig_builder.insert_array_get(brillig_array_id, brillig_id);
Copy link
Contributor

@aakoshh aakoshh Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand what the modulo operation do here.

So we called context.insert_array_get(array_idx, index) with arbitrary u32 values, like 12345 and 67890. We take array_idx % self.acir_arrays.len() to pick an array that actually exists, say, acir_arrays[2]. Then, we look at the length of that array, and use index % acir_array.length to access acir_ids. Finally we use whatever is in that acir_id slot to actually issue a get against the array.

I don't understand why we take a modulo of acir_array.length to access acir_ids. What do they have to do with each other? Say acir_arrays[2] has a length of 3; does it mean that acir_ids[0..3] are valid indexes for this particular array, but others are not?

When I look at insert_array, I see this:

  pub(crate) fn insert_array(&mut self, elements: Vec<u32>) {
        let mut acir_values_ids = vec![];
        for elem in elements {
            let acir_len = self.acir_ids.len();
            acir_values_ids.push(elem % acir_len as u32);
        }
        let acir_len = acir_values_ids.len();
        let acir_array = self.acir_builder.insert_make_array(acir_values_ids);
        self.acir_arrays.push(Array { id: acir_array, length: acir_len as u32 });

I interpret that as elements be a vector of random u32 numbers, which we project down to the length of acir_ids, and then we create an array that consists of random variable IDs that already existed.

Say we have 5 IDs in acir_ids and an elements consists of 100 u32, then by the end we should have 5 (or 6 with the new array?) values in acir_ids and 100 values on the 0..5 range in the array.

Now, when we do let acir_id = self.acir_ids[(index % acir_array.length) as usize]; we would be doing acir_ids[67890 % 100], which is acir_ids[90], which should cause a panic because the acir_ids only has 5 items.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my mistake, the index should be a number

Comment on lines +358 to +364
if self.is_constant {
(Witness(0), Witness(0))
} else if self.acir_ids.len() as u32 != NUMBER_OF_VARIABLES_INITIAL {
(Witness(NUMBER_OF_VARIABLES_INITIAL), Witness(NUMBER_OF_VARIABLES_INITIAL))
} else {
(Witness(NUMBER_OF_VARIABLES_INITIAL - 1), Witness(NUMBER_OF_VARIABLES_INITIAL - 1))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand this:

  • When in constant mode, instead of adding NUMBER_OF_VARIABLES_INITIAL parameter witnesses, we insert them as constants into the circuit, which should allow constant folding to eliminate a lot of the code, and the witness #0 will be the return value.
  • When we have more (I would write >= NUMBER_OF_VARIABLES_INITIAL) variables than the initials, then we assume the one following the parameters is the return value.
  • If we have exactly NUMBER_OF_VARIABLES_INITIAL variables then it's just the inputs, with not even a return value, and we return the last input witness.
  • It's not spelled out, but we won't have less than NUMBER_OF_VARIABLES_INITIAL variables in non-constant mode.

What I don't fully understand is if constant mode will have a return value at 0, then why doesn't the non-constant case also have a return value at NUMBER_OF_VARIABLES_INITIAL. As I understand we always create NUMBER_OF_VARIABLES_INITIAL parameters but we might use less than that. Treating NUMBER_OF_VARIABLES_INITIAL-1 as return value seems strange, as it's a) not a return value and b) might not even be used and c) trivially the same in both circuits, coming from the outside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case where we return NUMBER_OF_VARIABLES_INITIAL - 1 is when the fuzzer has not provided any valid instructions.

Let's assume we have 3 variables in non-constant mode.

If no instructions were inserted, then we terminate the program with the last element of ids, which is equal to Witness(2). After execution, the resulting witness will be (Witness(0), Witness(1), Witness(2)), we return Witness(2).

If any number of instructions were inserted, then only one (return value) will be added to the final witness, which is (Witness(0), Witness(1), Witness(2), Witness(3)), we return Witness(3).

In constant mode, instead of variables, we insert constants to preserve the logic of interaction with the ids, so the resulting witness will have only one value (return value).

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up on all my comments!

Comparing with constant folding is a great idea 👏
I sent some more questions about how things work.

@jewelofchaos9 jewelofchaos9 requested a review from aakoshh March 27, 2025 14:21
You need to delete `rust-toolchain.toml` from root of the project, because cargo-fuzz requires nightly compiler for address sanitizer.
```
rustup install nightly
rustup default nightly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that this command works as well, so I don't have to default to nightly:

cargo +nightly fuzz ...

Comment on lines +76 to +81
let initial_witness = witness_map;
log::debug!("instructions: {:?}", data.methods.clone());
log::debug!("initial_witness: {:?}", initial_witness);

let mut fuzzer = Fuzzer::new(type_.clone(), values);
for method in data.methods.clone() {
Copy link
Contributor

@aakoshh aakoshh Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let initial_witness = witness_map;
log::debug!("instructions: {:?}", data.methods.clone());
log::debug!("initial_witness: {:?}", initial_witness);
let mut fuzzer = Fuzzer::new(type_.clone(), values);
for method in data.methods.clone() {
let initial_witness = witness_map;
log::debug!("instructions: {:?}", data.methods);
log::debug!("initial_witness: {:?}", initial_witness);
let mut fuzzer = Fuzzer::new(type_, values);
for method in data.methods {

Printing shouldn't need cloning, and then both seem like the last time we use these variables, so we can move them.


#[derive(Debug, Error)]
pub enum FuzzerBuilderError {
#[error("Compilation panicked")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[error("Compilation panicked")]
#[error("Compilation panicked: {0}")]

At least I think we have to include the field to actually see the value.

Comment on lines +176 to +183
let acir_id = self.acir_ids[(index % array.length) as usize];
let brillig_id = self.brillig_ids[(index % array.length) as usize];
let acir_value = u32_to_id_value(value % (self.acir_ids.len() as u32));
let brillig_value = u32_to_id_value(value % (self.brillig_ids.len() as u32));

let acir_return_id = self.acir_builder.insert_array_set(acir_array_id, acir_id, acir_value);
let brillig_return_id =
self.brillig_builder.insert_array_set(brillig_array_id, brillig_id, brillig_value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? What is acir_id supposed to be? It looks like index is a random number which we correctly scale down to the length of the array using (index % array.length) as usize, but then instead of using that value to index the acir_array_id, we use it to pick an ID from acir_ids, from the first N IDs, where N is the length of the array.

I thought you should duse use index % array.length to set the value, which is what seems to be happening in insert_array_get:

        let acir_array_id = array.acir_id;
        let index = index % array.length;
        let acir_return_id = self.acir_builder.insert_array_get(acir_array_id, index);

Comment on lines +24 to +53
fn optimize_into_acir(
builder: FunctionBuilder,
options: SsaEvaluatorOptions,
) -> Result<ArtifactsAndWarnings, RuntimeError> {
let builder = SsaBuilder {
ssa: builder.finish(),
ssa_logging: SsaLogging::None,
print_codegen_timings: false,
};
let ssa = optimize_all(builder, &options)?;

let brillig = ssa.to_brillig(&BrilligOptions::default());

let ssa = SsaBuilder {
ssa,
ssa_logging: options.ssa_logging.clone(),
print_codegen_timings: options.print_codegen_timings,
}
.run_pass(|ssa| ssa.fold_constants_with_brillig(&brillig), "Inlining Brillig Calls Inlining")
.run_pass(Ssa::dead_instruction_elimination, "Dead Instruction Elimination (2nd)")
.finish();

// to triage ssa after optimizations, when crash found
let formatted_ssa = format!("{}", ssa);
log::debug!("formatted_ssa: {:?}", formatted_ssa);
match ssa.into_acir(&brillig, &BrilligOptions::default(), options.expression_width) {
Ok(artifacts) => Ok(ArtifactsAndWarnings(artifacts, vec![])),
Err(e) => Err(e),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could easily diverge from the noirc_evaluator::ssa::optimize_into_acir version.

I see that in the beginning the difference is how the SsaBuilder is constructed, so maybe instead of this:

mod ssa {
  pub(crate) fn optimize_into_acir(
    program: Program,
    options: &SsaEvaluatorOptions,
  ) -> Result<ArtifactsAndWarnings, RuntimeError> {
    let builder = SsaBuilder::new(
        program,
        options.ssa_logging.clone(),
        options.print_codegen_timings,
        &options.emit_ssa,
    )?;
    let mut ssa = optimize_all(builder, options)?;
    ... // more steps on the SSA
  }
}

mod fuzzer {
  fn optimize_into_acir(
    builder: FunctionBuilder,
    options: SsaEvaluatorOptions,
  ) -> Result<ArtifactsAndWarnings, RuntimeError> {
    let builder = SsaBuilder {
        ssa: builder.finish(),
        ssa_logging: SsaLogging::None,
        print_codegen_timings: false,
    };
    let ssa = optimize_all(builder, &options)?;
    ... // same-ish steps on the SSA as above
  }
}

We could have a version of optimize_into_acir that takes an SsaBuilder and call that from both places:

mod ssa {

  pub fn optimize_ssa_builder_into_acir(
    builder: SsaBuilder,
    options: &SsaEvaluatorOptions,
  ) -> Result<ArtifactsAndWarnings, RuntimeError> {
    let mut ssa = optimize_all(builder, options)?;
    ... // more steps on the SSA
  }
  
  pub(crate) fn optimize_into_acir(
    program: Program,
    options: &SsaEvaluatorOptions,
  ) -> Result<ArtifactsAndWarnings, RuntimeError> {
    let builder = SsaBuilder::new(
        program,
        options.ssa_logging.clone(),
        options.print_codegen_timings,
        &options.emit_ssa,
    )?;
    optimize_ssa_builder_into_acir(builder, options)
}

mod fuzzer {
  fn optimize_function_builder_into_acir(
    builder: FunctionBuilder,
    options: SsaEvaluatorOptions,
  ) -> Result<ArtifactsAndWarnings, RuntimeError> {
    let builder = SsaBuilder {
        ssa: builder.finish(),
        ssa_logging: SsaLogging::None,
        print_codegen_timings: false,
    };
    ssa::optimize_ssa_builder_into_acir(builder)
  }
}

@aakoshh aakoshh mentioned this pull request Apr 1, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants