categorical_logit_rng() doesn't accept negative_infinity() input #3331

mhollanders · 2025-02-10T07:13:47Z

Hello everyone,

The following Stan program:

generated quantities {
  vector[3] theta = [0, 0.5, 0.5]';
  vector[3] log_theta = log(theta);
  int test = categorical_rng(theta);  // works
  int test2 = categorical_rng(exp(log_theta - log_sum_exp(log_theta)));  // works
  int test3 = categorical_logit_rng(log_theta);  // doesn't work
}

yields this error:

categorical_logit_rng: Log odds parameter[1] is -inf, but must be finite!

Given that exponentiating the normalised log probabilities works for categorical_rng, I think this is a bug. Please let me know if I'm wrong.

Thanks,

Matt

The text was updated successfully, but these errors were encountered:

bob-carpenter · 2025-02-10T19:11:10Z

Hi, @mhollanders, and thanks for the reproducible bug report. Boundary conditions are challenging for something like Stan due to how floating-point arithmetic differs from real numbers. We've been torn in the past on what to do in cases like these, and as you can see, the result isn't entirely consistent.

Alternative case

Does this also work if you set

int test2 = categorical_rng(softmax(log_theta));

If theta is a simplex, then

softmax(log(theta)) = log(theta) = log(theta) - log_sum_exp(log(theta)).

The case at hand

This case is degenerate for simplexes in that 0 values are on the boundary of parameter space, so they go to plus or minus infinity in the unconstrained representations Stan uses for sampling. Technically, a unit simplex can have zero values, but they play havoc with floating point.

This isn't technically a bug so much as undefined behavior given our documentation. You'll see that the doc says that categorical_logit calls softmax() on the arguments and that the softmax doc requires finite arguments, i.e., $y \in \mathbb{R}^K$. But $\log(0) = -\infty \not\in \mathbb{R}$. Our doc doesn't say what happens for inputs not in $\mathbb{R}^K.$

Flagging this so softmax and categorical_logit throw errors if they get infinite arguments is probably the most sensible thing to do because they tend to break gradients. On the other hand, your example is from the generated quantities block, where we don't calculate gradients. Of course, we don't want differing function behavior in different blocks.

Alternatively, we could try to extend to the boundaries. We've done that in some cases, like allowing 0 as a simplex in categorical_rng. Specifically, we could set up special branches in the code for categorical_logit_rng that catches negative infinity values in the input, masks them out through the floating point arithmetic, then puts them back together. That is, if my input is $y = [-\infty \quad 1.2 \quad -3.7 \quad -\infty \quad 0.3 ],$ then what we'd need to do is mask out position 1 and 4, run softmax on $[1.2 \quad -3.7 \quad 0.3]$, then put back together with zeros in the masked position. That gets the right answer. That will lose derivative information if that $-\infty$ arises through the logarithm of zero. We will have to just throw an error if there is more than one positive infinity value.

Workaround

You can just apply softmax yourself, or maybe the log(y) - log_sum_exp(y) if that throws an error, and then use categorical_rng.

Recommendation

I think we should fix this so that it does the right thing on the boundary even if it loses derivative information for parameters.

mhollanders · 2025-02-11T21:28:45Z

Thanks for the clear explanation Bob, I appreciate your time.

jachymb · 2025-02-22T14:47:11Z

I found a similar issue when working with multinomial_logit. I would consider, that there is a sort of asymetric duality between negative and positive infinite values for softmax. Negative inifinity expresses impossibility at the given position, which simply corresponds to zero probability. But positive infinity expresses what? Necessity? Would that make all other values in the vector irrelevant? And even worse, what if there is more than one positive infinity?

Considering that softmax is defined in terms of the exponential function, this corresponds to the fact:

$\lim_{x\to-\infty} \exp(x) = 0$ (perfectly good real number)

$\lim_{x\to+\infty} \exp(x) = \infty$ (not a real number)

So IMHO it seems reasonable to define:

softmax([..., −∞, ...]) = [..., 0, ...]

softmax([..., +∞, ...]) = error

Either way, I would suggest at least making the documentation of softmax more explicit on this matter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

categorical_logit_rng() doesn't accept negative_infinity() input #3331

categorical_logit_rng() doesn't accept negative_infinity() input #3331

mhollanders commented Feb 10, 2025

bob-carpenter commented Feb 10, 2025 •

edited

Loading

mhollanders commented Feb 11, 2025

jachymb commented Feb 22, 2025

categorical_logit_rng() doesn't accept negative_infinity() input #3331

categorical_logit_rng() doesn't accept negative_infinity() input #3331

Comments

mhollanders commented Feb 10, 2025

bob-carpenter commented Feb 10, 2025 • edited Loading

mhollanders commented Feb 11, 2025

jachymb commented Feb 22, 2025

bob-carpenter commented Feb 10, 2025 •

edited

Loading