Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

categorical_logit_rng() doesn't accept negative_infinity() input #3331

Open
mhollanders opened this issue Feb 10, 2025 · 3 comments
Open

categorical_logit_rng() doesn't accept negative_infinity() input #3331

mhollanders opened this issue Feb 10, 2025 · 3 comments

Comments

@mhollanders
Copy link

Hello everyone,

The following Stan program:

generated quantities {
  vector[3] theta = [0, 0.5, 0.5]';
  vector[3] log_theta = log(theta);
  int test = categorical_rng(theta);  // works
  int test2 = categorical_rng(exp(log_theta - log_sum_exp(log_theta)));  // works
  int test3 = categorical_logit_rng(log_theta);  // doesn't work
}

yields this error:

categorical_logit_rng: Log odds parameter[1] is -inf, but must be finite!

Given that exponentiating the normalised log probabilities works for categorical_rng, I think this is a bug. Please let me know if I'm wrong.

Thanks,

Matt

@bob-carpenter
Copy link
Member

bob-carpenter commented Feb 10, 2025

Hi, @mhollanders, and thanks for the reproducible bug report. Boundary conditions are challenging for something like Stan due to how floating-point arithmetic differs from real numbers. We've been torn in the past on what to do in cases like these, and as you can see, the result isn't entirely consistent.

Alternative case

Does this also work if you set

int test2 = categorical_rng(softmax(log_theta));

If theta is a simplex, then

softmax(log(theta)) = log(theta) = log(theta) - log_sum_exp(log(theta)).

The case at hand

This case is degenerate for simplexes in that 0 values are on the boundary of parameter space, so they go to plus or minus infinity in the unconstrained representations Stan uses for sampling. Technically, a unit simplex can have zero values, but they play havoc with floating point.

This isn't technically a bug so much as undefined behavior given our documentation. You'll see that the doc says that categorical_logit calls softmax() on the arguments and that the softmax doc requires finite arguments, i.e., $y \in \mathbb{R}^K$. But $\log(0) = -\infty \not\in \mathbb{R}$. Our doc doesn't say what happens for inputs not in $\mathbb{R}^K.$

Flagging this so softmax and categorical_logit throw errors if they get infinite arguments is probably the most sensible thing to do because they tend to break gradients. On the other hand, your example is from the generated quantities block, where we don't calculate gradients. Of course, we don't want differing function behavior in different blocks.

Alternatively, we could try to extend to the boundaries. We've done that in some cases, like allowing 0 as a simplex in categorical_rng. Specifically, we could set up special branches in the code for categorical_logit_rng that catches negative infinity values in the input, masks them out through the floating point arithmetic, then puts them back together. That is, if my input is $y = [-\infty \quad 1.2 \quad -3.7 \quad -\infty \quad 0.3 ],$ then what we'd need to do is mask out position 1 and 4, run softmax on $[1.2 \quad -3.7 \quad 0.3]$, then put back together with zeros in the masked position. That gets the right answer. That will lose derivative information if that $-\infty$ arises through the logarithm of zero. We will have to just throw an error if there is more than one positive infinity value.

Workaround

You can just apply softmax yourself, or maybe the log(y) - log_sum_exp(y) if that throws an error, and then use categorical_rng.

Recommendation

I think we should fix this so that it does the right thing on the boundary even if it loses derivative information for parameters.

@mhollanders
Copy link
Author

Thanks for the clear explanation Bob, I appreciate your time.

@jachymb
Copy link

jachymb commented Feb 22, 2025

I found a similar issue when working with multinomial_logit. I would consider, that there is a sort of asymetric duality between negative and positive infinite values for softmax. Negative inifinity expresses impossibility at the given position, which simply corresponds to zero probability. But positive infinity expresses what? Necessity? Would that make all other values in the vector irrelevant? And even worse, what if there is more than one positive infinity?

Considering that softmax is defined in terms of the exponential function, this corresponds to the fact:

$\lim_{x\to-\infty} \exp(x) = 0$ (perfectly good real number)

$\lim_{x\to+\infty} \exp(x) = \infty$ (not a real number)

So IMHO it seems reasonable to define:

softmax([..., −∞, ...]) = [..., 0, ...]

softmax([..., +∞, ...]) = error

Either way, I would suggest at least making the documentation of softmax more explicit on this matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants