{tidyLP}
provides what I think is a handy and intuitive interface
between data.frames and linear programs.
You can install the development version of tidyLP from GitHub with:
# install.packages("devtools")
devtools::install_github("colin-fraser/tidyLP")
I’ll start with a classic: tables and chairs (modified from these slides by Zico Kolter at CMU). A furniture company makes tables and chairs. They are made of wood and metal. A table is made of 1 unit of metal and 3 units of wood, and when you sell a table you make a profit of $200. A chair is made of 2 units of metal and 1 unit of wood, and makes a profit of $100. We have 6,000 units of metal and 9,000 units of wood in inventory. How many tables and chairs should we make?
library(tidyLP)
df <- tibble::tribble(
~ product_type,~ metal_units, ~ wood_units, ~ profit,
'table', 1, 3, 200,
'chair', 2, 1, 100
)
df |>
tidy_lp(
# objective function
profit,
# constraints
metal_units ~ leq(6000),
wood_units ~ leq(9000)
) |>
lp_solve() |>
bind_solution(name = 'quantity') |>
select(product_type, quantity)
#> # A tibble: 2 × 2
#> product_type quantity
#> <chr> <dbl>
#> 1 table 2400
#> 2 chair 1800
To start solving the problem with {tidyLP}
, we need a data frame that
expresses the information we need for the objective function and the
constraints. This data frame should have the same number of rows as
there are variables in the objective function. In this case, we are
choosing a number of tables and a number of chairs, so the data frame
should have two rows.
df <- tibble::tribble(
~ product_type, ~ metal_units, ~ wood_units, ~ profit,
'table', 1, 3, 200,
'chair', 2, 1, 100
)
df
#> # A tibble: 2 × 4
#> product_type metal_units wood_units profit
#> <chr> <dbl> <dbl> <dbl>
#> 1 table 1 3 200
#> 2 chair 2 1 100
To build the linear programming problem, we use the tidy_lp
function,
whose signature looks like this:
tidy_lp(
.data,
.objective,
...,
.direction = "max",
.all_int = FALSE,
.all_bin = FALSE
)
In its first argument, it takes the data frame. The second argument is
reserved for the objective function, which can be specified just by
naming a column from the data frame—in this case, we want to maximize
profit, so that’s what it will be. Then we pass constraints as unnamed
arguments to ...
in a special format that uses some formula trickery.
With the data frame formatted correctly, I find that it’s quite natural
to formulate the problem with this interface. We want to do the
following
- Max profit
- Use less than or equal to 6,000 metal units
- Use less than or equal to 9,000 wood units
Using tidy_lp
:
library(tidyLP)
lp_problem <- tidy_lp(df,
profit,
metal_units ~ leq(6000),
wood_units ~ leq(9000))
The constraints are specified using a combination of formulas and some
special helper functions—primarily, leq
, geq
, and eq
, which are
used on the right hand side of the formula to specify the direction of
the constraint.
This builds the linear program, but does not solve it. To solve it, you
pass lp_problem
to lp_solve
.
solution <- lp_solve(lp_problem)
It’s convenient to bind the solution to the original data frame, which
puts the solution in a column called .solution
.
library(dplyr)
bind_solution(solution) |>
select(product_type, .solution)
#> # A tibble: 2 × 2
#> product_type .solution
#> <chr> <dbl>
#> 1 table 2400
#> 2 chair 1800
Just to be sure, let’s have a closer look at the solution.
bind_solution(solution) |>
summarise(across(c(metal_units, wood_units, profit), ~sum(.x * .solution)))
#> # A tibble: 1 × 3
#> metal_units wood_units profit
#> <dbl> <dbl> <dbl>
#> 1 6000 9000 660000
In addition to the constraints laid out above, there are a few more details.
- We only have enough warehouse space to store 1,000 tables.
- Chairs can be bought individually, but each table is sold with two chairs. We need to make sure to produce enough chairs that they can be bundled with the tables.
To specify these constraints in tidyLP
, we can include column
transformations on the left hand side of the constraint formula. Under
the hood, whatever is on the left hand side of the constraint formula is
passed through the data using dplyr::transmute
, so anything that would
work there works here. Thus, the first constraint can be specified as
follows.
lp_problem <- lp_problem |>
add_constraints(
(product_type == 'table') ~ leq(1000)
)
The second constraint can be built similarly. We need
chairs >= 2 * tables
. In tidyLP
, constraints must be specified with
a numeric right hand side, so rearrange this to give
chairs - 2 * tables >= 0
.
lp_problem <- lp_problem |>
add_constraints(
(product_type == 'chair') - 2 * (product_type == 'table') ~ geq(0)
)
new_solution <- lp_problem |>
lp_solve() |>
bind_solution()
new_solution
#> # A tibble: 2 × 5
#> product_type metal_units wood_units profit .solution
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 table 1 3 200 1000
#> 2 chair 2 1 100 2500
These constraints really eat into our profits!
new_solution |>
summarise(across(c(metal_units, wood_units, profit), ~sum(.x * .solution)))
#> # A tibble: 1 × 3
#> metal_units wood_units profit
#> <dbl> <dbl> <dbl>
#> 1 6000 5500 450000
The package includes a (modified, fake, DO NOT USE FOR REAL FANTASY TEAM BUILDING) fantasy basketball dataset.
head(fantasy_points)
#> # A tibble: 6 × 6
#> Id Position Name Projected_FP Salary Team
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 82696-84669 PG Luka Doncic 64.6 11800 DAL
#> 2 82696-40199 PF Giannis Antetokounmpo 60.1 11600 MIL
#> 3 82696-15755 PF Anthony Davis 51.1 10500 LAL
#> 4 82696-80808 PF Jayson Tatum 47.9 10300 BOS
#> 5 82696-9488 SF LeBron James 49.8 10200 LAL
#> 6 82696-84671 PG Trae Young 45.8 10000 ATL
In fantasy basketball (Fanduel rules), the problem is to assemble a team
of 9 players that scores the most fantasy points, staying under the
salary cap of 60,000. This can be expressed as a binary linear program.
A naive first attempt with tidyLP
looks like this.
team <- fantasy_points |>
tidy_lp(
Projected_FP, # objective function
Salary ~ leq(60000),
all_variables() ~ eq(9),
.all_bin = TRUE # indicates this is a binary LP
) |>
lp_solve() |>
bind_solution(filter_nonzero = TRUE) # only show rows where the solution is non-zero
team |>
select(Name, Position, Projected_FP, Salary)
#> # A tibble: 9 × 4
#> Name Position Projected_FP Salary
#> <chr> <chr> <dbl> <dbl>
#> 1 Luka Doncic PG 64.6 11800
#> 2 Giannis Antetokounmpo PF 60.1 11600
#> 3 Terry Rozier SG 47.7 7800
#> 4 Fred VanVleet PG 49 7500
#> 5 Scottie Barnes PF 49 6700
#> 6 Ricky Rubio PG 30.3 3500
#> 7 Jaren Jackson PF 32.9 3500
#> 8 Robert Williams PF 31.1 3500
#> 9 Lonzo Ball PG 32.9 3500
Just to double check, let’s make sure this team satisfies the salary cap constraint.
team |>
summarise(sum(Salary))
#> # A tibble: 1 × 1
#> `sum(Salary)`
#> <dbl>
#> 1 59400
✅
However, this is not actually a legal solution, because I’ve left out some important constraints. There are constraints around the distribution of positions: a legal team has two each of point guards, shooting guards, small forwards, and power forwards; and one center. We can use the same methods from the tables and chairs example to specify these constraints.
team <- fantasy_points |>
tidy_lp(
# objective function
Projected_FP,
# constraints
Salary ~ leq(60000),
(Position == 'PG') ~ eq(2),
(Position == 'SG') ~ eq(2),
(Position == 'SF') ~ eq(2),
(Position == 'PF') ~ eq(2),
(Position == 'C') ~ eq(1),
# additional arguments
.all_bin = TRUE
) |>
lp_solve() |>
bind_solution(filter_nonzero = TRUE)
team |>
select(Name, Position, Projected_FP, Salary)
#> # A tibble: 9 × 4
#> Name Position Projected_FP Salary
#> <chr> <chr> <dbl> <dbl>
#> 1 Donovan Mitchell SG 50.9 9500
#> 2 Pascal Siakam C 50.8 9500
#> 3 Terry Rozier SG 47.7 7800
#> 4 Brandon Ingram SF 36.5 7600
#> 5 Fred VanVleet PG 49 7500
#> 6 Scottie Barnes PF 49 6700
#> 7 Bogdan Bogdanovic SF 27.2 4000
#> 8 Jaren Jackson PF 32.9 3500
#> 9 Lonzo Ball PG 32.9 3500
Just to make sure, let’s check that the constraints are satisfied.
team |>
count(Position)
#> # A tibble: 5 × 2
#> Position n
#> <chr> <int>
#> 1 C 1
#> 2 PF 2
#> 3 PG 2
#> 4 SF 2
#> 5 SG 2
team |>
summarise(sum(Salary))
#> # A tibble: 1 × 1
#> `sum(Salary)`
#> <dbl>
#> 1 59600
However, there’s yet another set of constraints: we can only have at most 2 players from any single NBA team.
team |>
count(Team)
#> # A tibble: 7 × 2
#> Team n
#> <chr> <int>
#> 1 ATL 1
#> 2 CHA 1
#> 3 CHI 1
#> 4 CLE 1
#> 5 MEM 1
#> 6 NO 1
#> 7 TOR 3
This solution has too many Raptors. One way to specify these constraints is to continue the same way that we specified the position constriants.
fantasy_points |>
tidy_lp(
# objective function
Projected_FP,
# constraints
Salary ~ leq(60000),
(Position == 'PG') ~ eq(2),
(Position == 'SG') ~ eq(2),
(Position == 'SF') ~ eq(2),
(Position == 'PF') ~ eq(2),
(Position == 'C') ~ eq(1),
(Team == 'ATL') ~ leq(2),
(Team == 'BOS') ~ leq(2),
...
But this would be very tedious. Instead, we can use another trick, which
is to specify a categorical_constraint
. A categorical_constraint
is
actually a set of constraints, one for each unique value in the
specified column.
team <- fantasy_points |>
tidy_lp(
# objective function
Projected_FP,
# constraints
Salary ~ leq(60000),
(Position == 'PG') ~ eq(2),
(Position == 'SG') ~ eq(2),
(Position == 'SF') ~ eq(2),
(Position == 'PF') ~ eq(2),
(Position == 'C') ~ eq(1),
categorical_constraint(Team) ~ leq(2),
# additional arguments
.all_bin = TRUE
) |>
lp_solve() |>
bind_solution(filter_nonzero = TRUE)
team |>
select(Name, Position, Projected_FP, Salary, Team)
#> # A tibble: 9 × 5
#> Name Position Projected_FP Salary Team
#> <chr> <chr> <dbl> <dbl> <chr>
#> 1 Luka Doncic PG 64.6 11800 DAL
#> 2 Donovan Mitchell SG 50.9 9500 CLE
#> 3 Pascal Siakam C 50.8 9500 TOR
#> 4 Terry Rozier SG 47.7 7800 CHA
#> 5 Scottie Barnes PF 49 6700 TOR
#> 6 Bogdan Bogdanovic SF 27.2 4000 ATL
#> 7 Jaren Jackson PF 32.9 3500 MEM
#> 8 Danilo Gallinari SF 20.5 3500 BOS
#> 9 Lonzo Ball PG 32.9 3500 CHI