MathGAP

This repository contains MathGAP, a data-generation framework for math word problems with arbitrarily complex proofs. The method was introduced in the paper MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs, ICLR 2025.

With MathGAP you can generate arithmetic math word problems of arbitrary complexity, including ground-truth reasoning traces demonstrating how to solve them. You are in complete control over the proof structure of the problem, the operations involved in solving it, the arithmetic concepts they involve, and surface-level features like numerical values, entities, and agents.

Supported Features

generating new math word problems
- including reasoning traces
controlling the characteristics of the proof tree required to solve each math word problem
- specify the set of inference rules that can be part of the proof tree
- specify the width or depth of the proof tree
reordering of sentences in the problem formulation
vary the surface-level features of the problem
- specify the range of numbers that can occur in the math word problem
- specify which agents, entities etc to use
extendable with:
- custom inference rules
- custom templates (ways of expressing logical forms in natural language)
- vocabularies for agents, entities, attributes, and units
- other languages beyond English
tracking the origin of each word when rendered as natural language (i.e., is it an agent, entity, etc)

Using MathGAP

conda create -n mathgap python=3.10
conda activate mathgap
pip install -e mathgap

Generate a nonlinear problem and its solution for a tree of a certain depth:

cd mathgap
python demo_generation.py example-nonlinear --depth 3 --graph

Go here for code specific to the paper, including methods to generate data from the same distribution as those used in the paper's experiments.

How it works

In a nutshell, MathGAP applies inference rules in reverse order in order to generate proof trees. Section 3 in the paper describes the formalism used, while 4.1 explains the generation method. In brief the nodes of a proof tree are labelled with logical forms that correspond to facts in the world described by a math word problem. The leaf nodes correspond to the problem formulation (e.g., Alice has 5 apples, Bob has 3 more apples than Alice), and the parent nodes correspond to new facts that can be deduced (e.g., Bob has 8 apples). The root usually corresponds to the question and its answer (e.g., How many apples does Bob have?), but note that that need not be the case; we may have problems where further information beyond what is asked can be deduced.

Multiple orderings of axioms may be valid under a given proof tree; we support sampling different such orderings (see Section 5.4 in the paper for one example).

Note that the tree does not use concrete names (e.g., Alice) but rather property placeholders (e.g., agent1), which are later instantiated from a list of possible values. Additionally, each logical form or inference step can be rendered with multiple templates (e.g., Bob has 3 more apples than Alice vs. Alice has 3 apples less than Bob). The rendering process allows tracking of metadata like which character/word in the problem originated from which property.

Extending MathGAP

Logical Forms

If you want to add a new logical form, you can:

Create the new logical form in mathgap/mathgap/logicalforms. For larger logical forms, the boilerplate code can be automatically generated by using mathgap/mathgap/logicalforms/_gen_util.py. You will have to implement some abstract methods yourself.
Implement the rendering of your new logical form (as text, latex etc)
Create new templates for the logical form in mathgap/mathgap/data/templates and register them in the template catalog in mathgap/mathgap/data/util.py. Depending on how you plan to use the logical form, you need to specify different types of templates.
- statements: For logical forms that can occur as leaf nodes (or answers if you intend to render those)
- conclusions: For logical forms that can occur as inner nodes of a proof tree. (This is only necessary if you want to render reasoning traces.)
- questions: For logical forms that can be asked for (usually either the root node of the tree or inner nodes)
Create new inference rules that introduce the logical form as a premise or conclusion.
If your new logical form can be at the root of a proof tree, you want to extend your generator.

Inference Rules

You can create a new inference rule in mathgap/mathgap/trees/rules/ by:

Create a new file named after the types of the premises (e.g., contcont.py) in a folder named after the type of the conclusion (e.g., comp). In this file, create a class called after the premises and conclusion (e.g., ContContComp), which extends from InferenceRule.
Define metadata like: type of conclusion, types of premises, variable times, parametrization and then implement all methods.
Implement the inference rule abstract methods (if the variable times do not differ between premise/conclusion, you can use the default implementation) NOTE: if your rule introduces a write (e.g., Container) to a variable that is not a new one, then you want to validate whether no other write occurs to the same variable-key at the same time already in is_reverse_applicable. Also, the order of the premises should respect the time (e.g., in ContTransferCont the Cont needs to be the first premise)
Add the inference rule to the list of rules that your generator uses

Instantiators

If you want to enforce new constraints when instantiating properties (e.g. if there should be no subtractions with a carry), then you can simply add a new Instantiator and use it to instantiate your properties.

Instantiation Values

For the respective property type, simply add a new version in mathgap/mathgap/data and load your new version instead of the default one.

Templates

Look at the existing templates in mathgap/mathgap/data/templates to see how it's done. Key concepts to know:

templates can have one of three types: statement, question, conclusion
each template consists of parts
partials are named partial templates that can replace parts of templates (this is helpful to reduce complexity, e.g., we can have either entity, attribute entity etc, depending on which properties are available in the logical form)
parts can be resolved either through:
- properties (the corresponding instantiation of the property will be looked up from the logical form)
- partials (all possible partials under the given name can be rendererd instead of this part)
- expressions (the instantiated expression is looked up and rendered up to some depth, e.g. 5 + 10 if depth = 1)
- lists of properties (in this case you can specify which word will be used to join them, e.g. "Alice, Bob and Charlie")
for groups of templates, conditions can be specified, which need to hold in order for the template to be eligible to render a logical form of this type. conditions include: And, Equality etc (you can also add custom ones, see conditions.py). Conditions can reference the logical form but also the tree (e.g. through queries: "condition": { "type": "equality", "property": "sender", "query": "conclusion.agent" }, this would compare the property sender of the logical form with the agent of the conclusion logical form).

Generation

There's a default generator to generate general nonlinear proof trees (these nonlinear trees are different to those used in the paper, visit experiments/opedal24_ood_eval for those). You can specify the stopping criteria (e.g., depth 3) as well as the inference rules that are used. Depending on the set of inference rules you use, you can also generate linear proof trees. If you want a specific tree structure, you can either implement a custom rule-sampling-policy or hardcode a new generator.

Acknowledgements

The authors are grateful for help from Yanick Zengaffinen in developing the code in this repository, which improves upon and generalizes the original codebase.

Citation

@inproceedings{opedal2025mathgap,
title={Math{GAP}: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs},
author={Andreas Opedal and Haruki Shirakami and Bernhard Sch{\"o}lkopf and Abulhair Saparov and Mrinmaya Sachan},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=5ck9PIrTpH}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
experiments/opedal24_ood_eval		experiments/opedal24_ood_eval
mathgap		mathgap
.gitignore		.gitignore
Readme.md		Readme.md
demo_generation.py		demo_generation.py
pyproject.toml		pyproject.toml
setup.py		setup.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MathGAP

Supported Features

Using MathGAP

How it works

Extending MathGAP

Logical Forms

Inference Rules

Instantiators

Instantiation Values

Templates

Generation

Acknowledgements

Citation

About

Releases

Packages

Languages

eth-lre/mathgap-experiments

Folders and files

Latest commit

History

Repository files navigation

MathGAP

Supported Features

Using MathGAP

How it works

Extending MathGAP

Logical Forms

Inference Rules

Instantiators

Instantiation Values

Templates

Generation

Acknowledgements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages