-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can batch translation on CPU result in different output? #693
Comments
Yes, the same string can have different outputs in batch translation on CPU. I know this can happen with Intel MKL (default backend on Intel CPU) and oneDNN (default on AMD CPU). The numerical result of the dot product attention can be slightly different depending on the number of padding positions in the input. If you are running on an Intel CPU, it is possible to work around this issue by enabling strict numerical reproducibility. Try setting this environment variable:
|
I actually use AMD for deployment, which is unfortunate 😔 If I set With |
Yes. I requested to add a similar flag in oneDNN but they don't plan to implement it. |
I've ran a couple of tests with AMD and Intel CPUs, I see the wheels are built using very recent versions now. |
According to the Intel document,
which would explain why
They use recent MKL versions. For example CTranslate2 1.20.1 wheels were already using Intel MKL 2021.2. |
This is the output with
When it's set to COMPATIBLE on Intel the output is consistent, but significantly slower. In one Intel doc I see:
which seems to suggest that even STRICT CNR doesn't guarantee consistent results, only COMPATIBLE mode will. |
Thanks for the feedback. I have not seen a case where In any case, guaranteeing consistent results is generally hard. The easiest is to accept that translations can have slight variations, but I understand it is hard to explain that to end users. Right now I'm not aware of another workaround without a performance penalty but I will keep exploring. |
Thanks Guillaume. It's a very small subset of content that experiences this with
Definitely! The most noticeable issue we get is currency strings, like
It's vanilla Transformer
Yeah, I completely understand, it's not an easy thing to fix. We've switched to synchronous translation instead of batch for CPU without much of a performance impact, if any. So, I'm quite happy to stick with synchronous translation. We actually did the same for our GPU deployments previously too, consistent output is more of a priority for us, so we're willing to do synchronous translation over batch if it guarantees results. |
I have a CPU model that produces different outputs for the same strings at different times.
I think it could be related to the bug from #546 where batch translation yielded different results on GPU, I'm currently using CTranslate2 1.20.1 so there's a lot of updates I'm missing.
Alternatively, I recall that on GPU, batch translation can have slightly different numerical results, and am curious whether the same can happen with CPU models and batch translation?
The text was updated successfully, but these errors were encountered: