Update website

unitn-sml · Oct 16, 2024 · df03bb9 · df03bb9
1 parent a49e964
commit df03bb9
Show file tree

Hide file tree

Showing 12 changed files with 164 additions and 30 deletions.
diff --git a/assets/css/benchmark_style.css b/assets/css/benchmark_style.css
@@ -47,10 +47,9 @@ table {
   border-bottom: 2px solid black;
   border-top: 2px solid black;
   border-spacing: 0;
-  width: 95%;
+  width: auto;
   margin: auto;
   margin-bottom: 1em;
-  display: block;
   overflow-x: auto;
 }
 
@@ -266,4 +265,8 @@ body.dark-mode table th, body.dark-mode table td {
 body.dark-mode table th {
   background-color: #333;
   color: #f5f5f5;
+}
+
+.digit {
+  margin: 0px
 }
diff --git a/assets/images/mnist-0.png b/assets/images/mnist-0.png
diff --git a/assets/images/mnist-1.png b/assets/images/mnist-1.png
diff --git a/assets/images/mnist-2.png b/assets/images/mnist-2.png
diff --git a/assets/images/mnist-3.png b/assets/images/mnist-3.png
diff --git a/assets/images/mnist-4.png b/assets/images/mnist-4.png
diff --git a/assets/images/mnist-5.png b/assets/images/mnist-5.png
diff --git a/assets/images/mnist-6.png b/assets/images/mnist-6.png
diff --git a/assets/images/mnist-7.png b/assets/images/mnist-7.png
diff --git a/assets/images/mnist-8.png b/assets/images/mnist-8.png
diff --git a/assets/images/mnist-9.png b/assets/images/mnist-9.png
diff --git a/index.md b/index.md
@@ -15,12 +15,12 @@ constraints. However, recent research observed that tasks requiring both
 learning and reasoning on background knowledge often suffer from reasoning
 shortcuts (RSs): predictors can solve the downstream reasoning task without
 associating the correct concepts to the high-dimensional data. To address this
-issue, we introduce rsbench, a comprehensive benchmark suite designed to
+issue, we introduce ``rsbench``, a comprehensive benchmark suite designed to
 systematically evaluate the impact of RSs on models by providing easy access to
-highly customizable tasks affected by RSs. Furthermore, rsbench implements
+highly customizable tasks affected by RSs. Furthermore, ``rsbench``implements
 common metrics for evaluating concept quality and introduces novel formal
 verification procedures for assessing the presence of RSs in learning tasks.
-Using rsbench, we highlight that obtaining high quality concepts in both purely
+Using ``rsbench``, we highlight that obtaining high quality concepts in both purely
 neural and neuro-symbolic models is a far-from-solved problem.
 
 
@@ -62,17 +62,17 @@ mitigation of reasoning shortcuts." NeurIPS 2023.</span>
 
 <h1><a name="overview">Overview</a></h1>
 
-- *A Variety of L&R Tasks*: rsbench offers five L&R tasks and at least one data
+- *A Variety of L&R Tasks*: ``rsbench``offers five L&R tasks and at least one data
   set each.  The tasks come in different flavors -- *arithmetic*, *logic*, and
   *high-stakes* -- and with a formal specification of the corresponding prior
-  knowledge.  rsbench also provides data generators for creating new OOD splits
+  knowledge.  ``rsbench``also provides data generators for creating new OOD splits
   useful for testing the down-stream consequences of RSs.
 
-- *Evaluation*: rsbench comes with implementations for several metrics for
+- *Evaluation*: ``rsbench``comes with implementations for several metrics for
   evaluating the quality of *label* and *concept* predictions, as well as
   visualization code for them.
 
-- *Verification*: rsbench implements a new algorithm, `countrss`, that makes
+- *Verification*: ``rsbench``implements a new algorithm, `countrss`, that makes
   use of automated reasoning packages for formally veryfing whether a L&R task
   allows for RSs without training any model!  This tool works with any prior
   knowledge encoded in CNF format, the de-facto standard in automated
@@ -98,6 +98,55 @@ mitigation of reasoning shortcuts." NeurIPS 2023.</span>
 
 <h1><a name="usage">Usage</a></h1>
 
+In this section we provide useful infromation to get started with ``rsbench``.
+
+<h2>Configure and run the data generators</h2> 
+
+The data generators are available at the following [GitHub link](https://github.com/unitn-sml/rsbench-code/tree/main/rssgen).
+
+The datasets included are:
+
+- [``MNMath``](#MNMath)
+- [``MNLogic``](#MNLogic)
+- [``Kand-Logic``](#Kand-Logic)
+- [``CLE4EVR``](#CLE4EVR)
+- [``SDD-OIA``](#SDD-OIA)
+
+Each generator is highly customizable through configuration files. For `MNMath`, `MNLogic`, and `Kand-Logic`, you need to edit a `.yml` file, with examples and instructions available in the `examples_config` folder. On the other hand, `CLE4EVR` and `SDD-OIA` use `.json` configuration files. For further details, please refer to the respective GitHub page for each generator.
+
+<h2>Load rsbench data and train your model</h2>
+
+To load and use ``rsbench``data, you can use the provided suite that comprises data loading, model training, and evaluation. This ready-to-use toolkit is available at this [GitHub link](https://github.com/unitn-sml/rsbench-code/tree/main/rsseval). Alternatively, you can create your own dataset class by writing just a few lines of code
+
+```python
+from rss.datasets.xor import MNLOGIC
+
+class required_args:
+    def __init__(self):
+      self.c_sup = 0 # specifies % supervision available on concepts
+      self.which_c = -1 # specifies which concepts to supervise, -1=all
+      self.batch_size = 64 # batch size of the loaders
+
+args = required_args()
+
+dataset = MNLOGIC(args)
+train_loader, val_loader, test_loader = dataset.get_loaders()
+
+model = #define your model here
+optimizer = #define optimizer here
+criterion = #define loss function here
+
+for epoch in range(30):
+    for images, labels, concepts in train_loader:
+        optimizer.zero_grad()
+        outputs = model(images)
+        loss = criterion(outputs, labels, concepts)
+        loss.backward()
+        optimizer.step()
+```
+
+<h2> Quickstart </h2>
+
 We provide a simple tutorial designed to demonstrate how to load and use the data generated by `rsbench`. This tutorial is meant to give a quick overview and get you started with the data we provide. You can access the Google Colab tutorial using the following link:
 
 [MNIST Math Google Colab](https://colab.research.google.com/drive/1QYizKR1yS9dT7pI7dRITdw0HrvIOGjEP#scrollTo=rHrAvZnU-fWe)
@@ -112,7 +161,7 @@ For a more thorough evaluation of the model, we recommend exploring the `rsseval
 
 Within this folder, you'll find a [notebook](https://github.com/unitn-sml/rsbench-code/blob/main/rsseval/rss/notebooks/evaluate.ipynb) dedicated to evaluating concept quality using the metrics discussed in our paper. This will help you assess the performance and quality of the models more comprehensively.
 
-# MNMath
+<h1><a name="MNMath">MNMath</a></h1>
 
 <img src="assets/images/rsbench-mnmath.png" alt="mnmath" width="80%" height="auto">
 
@@ -129,13 +178,102 @@ confuse 3's with 4's and still perfectly predict the output of the system.
 However, for a new, out-of-distribution task like $2 + 4$, it will wrongly
 output $5$.
 
-
-**Ready-made**: `MNAdd-Half` WRITEME
-
-**Ready-made**: `MNAdd-EvenOdd` WRITEME
-
-
-# MNLogic
+**Ready-made**: `MNAdd-Half` is a modified version of `MNIST-Addition` that focuses on only half of the digits, specifically those from 0 to 4. It was introduced for the first time in [Marconato et al., 2024b](https://openreview.net/pdf?id=pDcM1k7mgZ).
+
+The dataset includes the following combinations of digits:
+
+<table>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-0.png" alt="0" width="25"/> + <img class="digit" src="assets/images/mnist-0.png" alt="0" width="25"/> = 0 </td>
+  </tr>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-0.png" alt="0" width="25"/> + <img class="digit" src="assets/images/mnist-1.png" alt="1" width="25"/> = 1 </td>
+  </tr>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-2.png" alt="2" width="25"/> + <img class="digit" src="assets/images/mnist-3.png" alt="3" width="25"/> = 5 </td>
+  </tr>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-2.png" alt="2" width="25"/> + <img class="digit" src="assets/images/mnist-4.png" alt="4" width="25"/> = 6 </td>
+  </tr>
+</table>
+
+The digits 0 and 1 are unaffected by reasoning shortcuts, while digits 2, 3, and 4 can be predicted in various ways, as illustrated below.
+
+The `MNAdd-Half` dataset contains a total of 2940 fully annotated training samples, 840 validation samples, 420 test samples, and an additional 1080 out-of-distribution test samples. These samples exclusively consist of sums involving these digits, such as 1 + 3 = 4.
+
+There are three potential optimal solutions, two of which are reasoning shortcuts. Specifically:
+
+<img class="digit" src="assets/images/mnist-0.png" alt="0" width="20"/> &rarr; 0, 
+<img class="digit" src="assets/images/mnist-1.png" alt="1" width="20"/> &rarr; 1, 
+<img class="digit" src="assets/images/mnist-2.png" alt="2" width="20"/> &rarr; 2, 
+<img class="digit" src="assets/images/mnist-3.png" alt="3" width="20"/> &rarr; 3, 
+<img class="digit" src="assets/images/mnist-4.png" alt="4" width="20"/> &rarr; 4
+
+<img class="digit" src="assets/images/mnist-0.png" alt="0" width="20"/> &rarr; 0, 
+<img class="digit" src="assets/images/mnist-1.png" alt="1" width="20"/> &rarr; 1, 
+<img class="digit" src="assets/images/mnist-2.png" alt="2" width="20"/> &rarr; 3, 
+<img class="digit" src="assets/images/mnist-3.png" alt="3" width="20"/> &rarr; 2, 
+<img class="digit" src="assets/images/mnist-4.png" alt="4" width="20"/> &rarr; 3
+
+<img class="digit" src="assets/images/mnist-0.png" alt="0" width="20"/> &rarr; 0, 
+<img class="digit" src="assets/images/mnist-1.png" alt="1" width="20"/> &rarr; 1, 
+<img class="digit" src="assets/images/mnist-2.png" alt="2" width="20"/> &rarr; 4, 
+<img class="digit" src="assets/images/mnist-3.png" alt="3" width="20"/> &rarr; 1, 
+<img class="digit" src="assets/images/mnist-4.png" alt="4" width="20"/> &rarr; 2
+
+**Ready-made**: `MNAdd-EvenOdd` is yet another modified version of `MNIST-Addition` that focuses on only some digit combinations, specifically combinations of either even or odd digits. It was first introduced in [Marconato et al., 2023](https://openreview.net/pdf?id=QEHU2o2Q7h).
+
+<table>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-0.png" alt="0" width="25"/> + <img class="digit" src="assets/images/mnist-6.png" alt="6" width="25"/> = 6 </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-2.png" alt="2" width="25"/> + <img class="digit" src="assets/images/mnist-8.png" alt="8" width="25"/> = 10 </td>
+  </tr>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-4.png" alt="4" width="25"/> + <img class="digit" src="assets/images/mnist-6.png" alt="6" width="25"/> = 10 </td>
+  </tr>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-4.png" alt="4" width="25"/> + <img class="digit" src="assets/images/mnist-8.png" alt="8" width="25"/> = 12 </td>
+  </tr>
+</table>
+
+<table>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-1.png" alt="1" width="25"/> + <img class="digit" src="assets/images/mnist-5.png" alt="5" width="25"/> = 6 </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-3.png" alt="3" width="25"/> + <img class="digit" src="assets/images/mnist-7.png" alt="7" width="25"/> = 10 </td>
+  </tr>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-1.png" alt="1" width="25"/> + <img class="digit" src="assets/images/mnist-9.png" alt="9" width="25"/> = 10 </td>
+  </tr>
+  <tr>
+    <td> <img class="digit" src="assets/images/mnist-3.png" alt="3" width="25"/> + <img class="digit" src="assets/images/mnist-9.png" alt="9" width="25"/> = 12 </td>
+  </tr>
+
+</table>
+
+It contains 6720 fully annotated training samples, 1920 validation samples, and 960 in-distribution test samples, along with 5040 out-of-distribution test samples representing all other sums not seen during training.
+
+As described in [Marconato et al., 2024a](https://openreview.net/pdf?id=tLTtqySDFb), the number of deterministic reasoning shortcuts is determined by finding integer solutions for the digits in the linear system, totaling 49.
+
+An example of RS in this setting is the following: 
+
+<img class="digit" src="assets/images/mnist-0.png" alt="0" width="20"/> &rarr; 5, 
+<img class="digit" src="assets/images/mnist-1.png" alt="1" width="20"/> &rarr; 5, 
+<img class="digit" src="assets/images/mnist-2.png" alt="2" width="20"/> &rarr; 7, 
+<img class="digit" src="assets/images/mnist-3.png" alt="3" width="20"/> &rarr; 7, 
+<img class="digit" src="assets/images/mnist-4.png" alt="4" width="20"/> &rarr; 9,
+<img class="digit" src="assets/images/mnist-5.png" alt="5" width="20"/> &rarr; 1,
+<img class="digit" src="assets/images/mnist-6.png" alt="6" width="20"/> &rarr; 1,
+<img class="digit" src="assets/images/mnist-7.png" alt="7" width="20"/> &rarr; 3,
+<img class="digit" src="assets/images/mnist-8.png" alt="8" width="20"/> &rarr; 3,
+<img class="digit" src="assets/images/mnist-9.png" alt="9" width="20"/> &rarr; 5
+
+<h1><a name="MNLogic">MNLogic</a></h1>
 
 <img src="assets/images/rsbench-mnlogic.png" alt="mnlogic" width="80%" height="auto">
 
@@ -148,24 +286,22 @@ images of zeros and ones representing the truth value of $k$ bits, and the
 ground-truth label $y$ is whether they satisfies the formula or not.
 
 By default, the `MNLogic` assumes the formula is a $k$-bit XOR, but any other
-formula can be supplied. rsbench provides code to generate random CNF formulas,
+formula can be supplied. ``rsbench``provides code to generate random CNF formulas,
 that is, random conjunctions of disjunctions (clauses) of $k$ bits. The code
 allows to control the number of bits $k$ and the number of structure of the
 random formula, that is, the number of clauses and their length. It also avoids
 trivial data by ensuring each clauses is neither a tautology nor a
 contradiction.
 
-
-# Kand-Logic
+<h1><a name="Kand-Logic">Kand-Logic</a></h1>
 
 <img src="assets/images/rsbench-kandlogic.png" alt="kandlogic" width="80%" height="auto">
 
 This task, inspired by Wassily Kandinsky's paintings and [Mueller and Holzinger 2021](https://www.sciencedirect.com/science/article/pii/S0004370221000977) requires simple (but non-trivial) perceptual processing and relatively complex reasoning in classifying logical patterns on sets of images comprising different shapes and colors. For example, each input can comprise two $64 \times 64$ images, i.e., $x = (x_1, x_2)$, each depicting three geometric primitives with different shapes (`square`, `triangle`, `circle`) and colors (`red`, `blue`, `yellow`). The goal is to predict whether $x_1$ and $x_2$ fit the same predefined logical pattern or not. The pattern is built out of predicates like `all primitives in the image have a different color`, `all primitives have the same color`, and `exactly two primitives have the same shape`.
 
-Unlike `MNLogic`, in `Kand-Logic` each primitive has multiple attributes that cannot easily be processed separately.  This means that RSs can easily appear, e.g., confuse shape with color when either is sufficient to entail the right prediction, as in the example above. We provide the data set used in [Marconato et al. 2024](https://arxiv.org/abs/2402.12240) ($3$ images per input with $3$ primitives each) and a generator that allows configuring the number of images and primitives per input and the pattern itself.
+Unlike `MNLogic`, in `Kand-Logic` each primitive has multiple attributes that cannot easily be processed separately.  This means that RSs can easily appear, e.g., confuse shape with color when either is sufficient to entail the right prediction, as in the example above. We provide the data set used in [Marconato et al. 2024b](https://arxiv.org/abs/2402.12240) ($3$ images per input with $3$ primitives each) and a generator that allows configuring the number of images and primitives per input and the pattern itself.
 
-
-# CLE4EVR
+<h1><a name="CLE4EVR">CLE4EVR</a></h1>
 
 <img src="assets/images/rsbench-cle4evr.png" alt="cle4evr" width="80%" height="auto">
 
@@ -178,11 +314,7 @@ The default knowledge $\mathsf K$ is designed to induce Reasoning Shortcuts: it
 
 The generator allows to customize the number of objects per image, the knowledge, and whether occlusion is allowed.
 
-
-
-
-
-# BDD-OIA
+<h1><a name="BDD-OIA">BDD-OIA</a></h1>
 
 <img src="assets/images/rsbench-bddoia.png" alt="bddoia" width="80%" height="auto">
 
@@ -196,8 +328,7 @@ The constraints specify conditions for being able to proceed (${\tt green\\_ligh
 
 Common Reasoning Shortcuts allow to, for example confuse ${\tt pedestrians}$ with ${\tt red\\_light}$ s, as they both imply the correct $ {\tt stop}$  action for all training examples.
 
-
-# SDD-OIA
+<h1><a name="SDD-OIA">SDD-OIA</a></h1>
 
 <img src="assets/images/rsbench-sddoia.png" alt="sddoia" width="80%" height="auto">