Skip to content

Commit

Permalink
Update website
Browse files Browse the repository at this point in the history
  • Loading branch information
samuelebortolotti committed Oct 16, 2024
1 parent a49e964 commit df03bb9
Show file tree
Hide file tree
Showing 12 changed files with 164 additions and 30 deletions.
7 changes: 5 additions & 2 deletions assets/css/benchmark_style.css
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,9 @@ table {
border-bottom: 2px solid black;
border-top: 2px solid black;
border-spacing: 0;
width: 95%;
width: auto;
margin: auto;
margin-bottom: 1em;
display: block;
overflow-x: auto;
}

Expand Down Expand Up @@ -266,4 +265,8 @@ body.dark-mode table th, body.dark-mode table td {
body.dark-mode table th {
background-color: #333;
color: #f5f5f5;
}

.digit {
margin: 0px
}
Binary file added assets/images/mnist-0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/mnist-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/mnist-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/mnist-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/mnist-4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/mnist-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/mnist-6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/mnist-7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/mnist-8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/mnist-9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
187 changes: 159 additions & 28 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ constraints. However, recent research observed that tasks requiring both
learning and reasoning on background knowledge often suffer from reasoning
shortcuts (RSs): predictors can solve the downstream reasoning task without
associating the correct concepts to the high-dimensional data. To address this
issue, we introduce rsbench, a comprehensive benchmark suite designed to
issue, we introduce ``rsbench``, a comprehensive benchmark suite designed to
systematically evaluate the impact of RSs on models by providing easy access to
highly customizable tasks affected by RSs. Furthermore, rsbench implements
highly customizable tasks affected by RSs. Furthermore, ``rsbench``implements
common metrics for evaluating concept quality and introduces novel formal
verification procedures for assessing the presence of RSs in learning tasks.
Using rsbench, we highlight that obtaining high quality concepts in both purely
Using ``rsbench``, we highlight that obtaining high quality concepts in both purely
neural and neuro-symbolic models is a far-from-solved problem.


Expand Down Expand Up @@ -62,17 +62,17 @@ mitigation of reasoning shortcuts." NeurIPS 2023.</span>

<h1><a name="overview">Overview</a></h1>

- *A Variety of L&R Tasks*: rsbench offers five L&R tasks and at least one data
- *A Variety of L&R Tasks*: ``rsbench``offers five L&R tasks and at least one data
set each. The tasks come in different flavors -- *arithmetic*, *logic*, and
*high-stakes* -- and with a formal specification of the corresponding prior
knowledge. rsbench also provides data generators for creating new OOD splits
knowledge. ``rsbench``also provides data generators for creating new OOD splits
useful for testing the down-stream consequences of RSs.

- *Evaluation*: rsbench comes with implementations for several metrics for
- *Evaluation*: ``rsbench``comes with implementations for several metrics for
evaluating the quality of *label* and *concept* predictions, as well as
visualization code for them.

- *Verification*: rsbench implements a new algorithm, `countrss`, that makes
- *Verification*: ``rsbench``implements a new algorithm, `countrss`, that makes
use of automated reasoning packages for formally veryfing whether a L&R task
allows for RSs without training any model! This tool works with any prior
knowledge encoded in CNF format, the de-facto standard in automated
Expand All @@ -98,6 +98,55 @@ mitigation of reasoning shortcuts." NeurIPS 2023.</span>

<h1><a name="usage">Usage</a></h1>

In this section we provide useful infromation to get started with ``rsbench``.

<h2>Configure and run the data generators</h2>

The data generators are available at the following [GitHub link](https://github.com/unitn-sml/rsbench-code/tree/main/rssgen).

The datasets included are:

- [``MNMath``](#MNMath)
- [``MNLogic``](#MNLogic)
- [``Kand-Logic``](#Kand-Logic)
- [``CLE4EVR``](#CLE4EVR)
- [``SDD-OIA``](#SDD-OIA)

Each generator is highly customizable through configuration files. For `MNMath`, `MNLogic`, and `Kand-Logic`, you need to edit a `.yml` file, with examples and instructions available in the `examples_config` folder. On the other hand, `CLE4EVR` and `SDD-OIA` use `.json` configuration files. For further details, please refer to the respective GitHub page for each generator.

<h2>Load rsbench data and train your model</h2>

To load and use ``rsbench``data, you can use the provided suite that comprises data loading, model training, and evaluation. This ready-to-use toolkit is available at this [GitHub link](https://github.com/unitn-sml/rsbench-code/tree/main/rsseval). Alternatively, you can create your own dataset class by writing just a few lines of code

```python
from rss.datasets.xor import MNLOGIC

class required_args:
def __init__(self):
self.c_sup = 0 # specifies % supervision available on concepts
self.which_c = -1 # specifies which concepts to supervise, -1=all
self.batch_size = 64 # batch size of the loaders

args = required_args()

dataset = MNLOGIC(args)
train_loader, val_loader, test_loader = dataset.get_loaders()

model = #define your model here
optimizer = #define optimizer here
criterion = #define loss function here

for epoch in range(30):
for images, labels, concepts in train_loader:
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels, concepts)
loss.backward()
optimizer.step()
```

<h2> Quickstart </h2>

We provide a simple tutorial designed to demonstrate how to load and use the data generated by `rsbench`. This tutorial is meant to give a quick overview and get you started with the data we provide. You can access the Google Colab tutorial using the following link:

[MNIST Math Google Colab](https://colab.research.google.com/drive/1QYizKR1yS9dT7pI7dRITdw0HrvIOGjEP#scrollTo=rHrAvZnU-fWe)
Expand All @@ -112,7 +161,7 @@ For a more thorough evaluation of the model, we recommend exploring the `rsseval

Within this folder, you'll find a [notebook](https://github.com/unitn-sml/rsbench-code/blob/main/rsseval/rss/notebooks/evaluate.ipynb) dedicated to evaluating concept quality using the metrics discussed in our paper. This will help you assess the performance and quality of the models more comprehensively.

# MNMath
<h1><a name="MNMath">MNMath</a></h1>

<img src="assets/images/rsbench-mnmath.png" alt="mnmath" width="80%" height="auto">

Expand All @@ -129,13 +178,102 @@ confuse 3's with 4's and still perfectly predict the output of the system.
However, for a new, out-of-distribution task like $2 + 4$, it will wrongly
output $5$.


**Ready-made**: `MNAdd-Half` WRITEME

**Ready-made**: `MNAdd-EvenOdd` WRITEME


# MNLogic
**Ready-made**: `MNAdd-Half` is a modified version of `MNIST-Addition` that focuses on only half of the digits, specifically those from 0 to 4. It was introduced for the first time in [Marconato et al., 2024b](https://openreview.net/pdf?id=pDcM1k7mgZ).

The dataset includes the following combinations of digits:

<table>
<tr>
<td> <img class="digit" src="assets/images/mnist-0.png" alt="0" width="25"/> + <img class="digit" src="assets/images/mnist-0.png" alt="0" width="25"/> = 0 </td>
</tr>
<tr>
<td> <img class="digit" src="assets/images/mnist-0.png" alt="0" width="25"/> + <img class="digit" src="assets/images/mnist-1.png" alt="1" width="25"/> = 1 </td>
</tr>
<tr>
<td> <img class="digit" src="assets/images/mnist-2.png" alt="2" width="25"/> + <img class="digit" src="assets/images/mnist-3.png" alt="3" width="25"/> = 5 </td>
</tr>
<tr>
<td> <img class="digit" src="assets/images/mnist-2.png" alt="2" width="25"/> + <img class="digit" src="assets/images/mnist-4.png" alt="4" width="25"/> = 6 </td>
</tr>
</table>

The digits 0 and 1 are unaffected by reasoning shortcuts, while digits 2, 3, and 4 can be predicted in various ways, as illustrated below.

The `MNAdd-Half` dataset contains a total of 2940 fully annotated training samples, 840 validation samples, 420 test samples, and an additional 1080 out-of-distribution test samples. These samples exclusively consist of sums involving these digits, such as 1 + 3 = 4.

There are three potential optimal solutions, two of which are reasoning shortcuts. Specifically:

<img class="digit" src="assets/images/mnist-0.png" alt="0" width="20"/> &rarr; 0,
<img class="digit" src="assets/images/mnist-1.png" alt="1" width="20"/> &rarr; 1,
<img class="digit" src="assets/images/mnist-2.png" alt="2" width="20"/> &rarr; 2,
<img class="digit" src="assets/images/mnist-3.png" alt="3" width="20"/> &rarr; 3,
<img class="digit" src="assets/images/mnist-4.png" alt="4" width="20"/> &rarr; 4

<img class="digit" src="assets/images/mnist-0.png" alt="0" width="20"/> &rarr; 0,
<img class="digit" src="assets/images/mnist-1.png" alt="1" width="20"/> &rarr; 1,
<img class="digit" src="assets/images/mnist-2.png" alt="2" width="20"/> &rarr; 3,
<img class="digit" src="assets/images/mnist-3.png" alt="3" width="20"/> &rarr; 2,
<img class="digit" src="assets/images/mnist-4.png" alt="4" width="20"/> &rarr; 3

<img class="digit" src="assets/images/mnist-0.png" alt="0" width="20"/> &rarr; 0,
<img class="digit" src="assets/images/mnist-1.png" alt="1" width="20"/> &rarr; 1,
<img class="digit" src="assets/images/mnist-2.png" alt="2" width="20"/> &rarr; 4,
<img class="digit" src="assets/images/mnist-3.png" alt="3" width="20"/> &rarr; 1,
<img class="digit" src="assets/images/mnist-4.png" alt="4" width="20"/> &rarr; 2

**Ready-made**: `MNAdd-EvenOdd` is yet another modified version of `MNIST-Addition` that focuses on only some digit combinations, specifically combinations of either even or odd digits. It was first introduced in [Marconato et al., 2023](https://openreview.net/pdf?id=QEHU2o2Q7h).

<table>
<tr>
<td> <img class="digit" src="assets/images/mnist-0.png" alt="0" width="25"/> + <img class="digit" src="assets/images/mnist-6.png" alt="6" width="25"/> = 6 </td>
<td></td>
</tr>
<tr>
<td> <img class="digit" src="assets/images/mnist-2.png" alt="2" width="25"/> + <img class="digit" src="assets/images/mnist-8.png" alt="8" width="25"/> = 10 </td>
</tr>
<tr>
<td> <img class="digit" src="assets/images/mnist-4.png" alt="4" width="25"/> + <img class="digit" src="assets/images/mnist-6.png" alt="6" width="25"/> = 10 </td>
</tr>
<tr>
<td> <img class="digit" src="assets/images/mnist-4.png" alt="4" width="25"/> + <img class="digit" src="assets/images/mnist-8.png" alt="8" width="25"/> = 12 </td>
</tr>
</table>

<table>
<tr>
<td> <img class="digit" src="assets/images/mnist-1.png" alt="1" width="25"/> + <img class="digit" src="assets/images/mnist-5.png" alt="5" width="25"/> = 6 </td>
<td></td>
</tr>
<tr>
<td> <img class="digit" src="assets/images/mnist-3.png" alt="3" width="25"/> + <img class="digit" src="assets/images/mnist-7.png" alt="7" width="25"/> = 10 </td>
</tr>
<tr>
<td> <img class="digit" src="assets/images/mnist-1.png" alt="1" width="25"/> + <img class="digit" src="assets/images/mnist-9.png" alt="9" width="25"/> = 10 </td>
</tr>
<tr>
<td> <img class="digit" src="assets/images/mnist-3.png" alt="3" width="25"/> + <img class="digit" src="assets/images/mnist-9.png" alt="9" width="25"/> = 12 </td>
</tr>

</table>

It contains 6720 fully annotated training samples, 1920 validation samples, and 960 in-distribution test samples, along with 5040 out-of-distribution test samples representing all other sums not seen during training.

As described in [Marconato et al., 2024a](https://openreview.net/pdf?id=tLTtqySDFb), the number of deterministic reasoning shortcuts is determined by finding integer solutions for the digits in the linear system, totaling 49.

An example of RS in this setting is the following:

<img class="digit" src="assets/images/mnist-0.png" alt="0" width="20"/> &rarr; 5,
<img class="digit" src="assets/images/mnist-1.png" alt="1" width="20"/> &rarr; 5,
<img class="digit" src="assets/images/mnist-2.png" alt="2" width="20"/> &rarr; 7,
<img class="digit" src="assets/images/mnist-3.png" alt="3" width="20"/> &rarr; 7,
<img class="digit" src="assets/images/mnist-4.png" alt="4" width="20"/> &rarr; 9,
<img class="digit" src="assets/images/mnist-5.png" alt="5" width="20"/> &rarr; 1,
<img class="digit" src="assets/images/mnist-6.png" alt="6" width="20"/> &rarr; 1,
<img class="digit" src="assets/images/mnist-7.png" alt="7" width="20"/> &rarr; 3,
<img class="digit" src="assets/images/mnist-8.png" alt="8" width="20"/> &rarr; 3,
<img class="digit" src="assets/images/mnist-9.png" alt="9" width="20"/> &rarr; 5

<h1><a name="MNLogic">MNLogic</a></h1>

<img src="assets/images/rsbench-mnlogic.png" alt="mnlogic" width="80%" height="auto">

Expand All @@ -148,24 +286,22 @@ images of zeros and ones representing the truth value of $k$ bits, and the
ground-truth label $y$ is whether they satisfies the formula or not.

By default, the `MNLogic` assumes the formula is a $k$-bit XOR, but any other
formula can be supplied. rsbench provides code to generate random CNF formulas,
formula can be supplied. ``rsbench``provides code to generate random CNF formulas,
that is, random conjunctions of disjunctions (clauses) of $k$ bits. The code
allows to control the number of bits $k$ and the number of structure of the
random formula, that is, the number of clauses and their length. It also avoids
trivial data by ensuring each clauses is neither a tautology nor a
contradiction.


# Kand-Logic
<h1><a name="Kand-Logic">Kand-Logic</a></h1>

<img src="assets/images/rsbench-kandlogic.png" alt="kandlogic" width="80%" height="auto">

This task, inspired by Wassily Kandinsky's paintings and [Mueller and Holzinger 2021](https://www.sciencedirect.com/science/article/pii/S0004370221000977) requires simple (but non-trivial) perceptual processing and relatively complex reasoning in classifying logical patterns on sets of images comprising different shapes and colors. For example, each input can comprise two $64 \times 64$ images, i.e., $x = (x_1, x_2)$, each depicting three geometric primitives with different shapes (`square`, `triangle`, `circle`) and colors (`red`, `blue`, `yellow`). The goal is to predict whether $x_1$ and $x_2$ fit the same predefined logical pattern or not. The pattern is built out of predicates like `all primitives in the image have a different color`, `all primitives have the same color`, and `exactly two primitives have the same shape`.

Unlike `MNLogic`, in `Kand-Logic` each primitive has multiple attributes that cannot easily be processed separately. This means that RSs can easily appear, e.g., confuse shape with color when either is sufficient to entail the right prediction, as in the example above. We provide the data set used in [Marconato et al. 2024](https://arxiv.org/abs/2402.12240) ($3$ images per input with $3$ primitives each) and a generator that allows configuring the number of images and primitives per input and the pattern itself.
Unlike `MNLogic`, in `Kand-Logic` each primitive has multiple attributes that cannot easily be processed separately. This means that RSs can easily appear, e.g., confuse shape with color when either is sufficient to entail the right prediction, as in the example above. We provide the data set used in [Marconato et al. 2024b](https://arxiv.org/abs/2402.12240) ($3$ images per input with $3$ primitives each) and a generator that allows configuring the number of images and primitives per input and the pattern itself.


# CLE4EVR
<h1><a name="CLE4EVR">CLE4EVR</a></h1>

<img src="assets/images/rsbench-cle4evr.png" alt="cle4evr" width="80%" height="auto">

Expand All @@ -178,11 +314,7 @@ The default knowledge $\mathsf K$ is designed to induce Reasoning Shortcuts: it

The generator allows to customize the number of objects per image, the knowledge, and whether occlusion is allowed.





# BDD-OIA
<h1><a name="BDD-OIA">BDD-OIA</a></h1>

<img src="assets/images/rsbench-bddoia.png" alt="bddoia" width="80%" height="auto">

Expand All @@ -196,8 +328,7 @@ The constraints specify conditions for being able to proceed (${\tt green\\_ligh

Common Reasoning Shortcuts allow to, for example confuse ${\tt pedestrians}$ with ${\tt red\\_light}$ s, as they both imply the correct $ {\tt stop}$ action for all training examples.


# SDD-OIA
<h1><a name="SDD-OIA">SDD-OIA</a></h1>

<img src="assets/images/rsbench-sddoia.png" alt="sddoia" width="80%" height="auto">

Expand Down

0 comments on commit df03bb9

Please sign in to comment.