Skip to content

Conversation

quic-sanising
Copy link
Contributor

✨ Add Frequency Penalty Support to On Device Sampling

This PR adds support for the frequency_penalty parameter in On Device Sampling for QEffForCausalLM models. This parameter adjusts token selection based on how often tokens have already appeared in the generated output:

  • Positive values discourage repetition and promote diversity.
  • Negative values encourage repetition.
  • Zero disables the penalty.

The implementation tracks token frequencies directly on the QAIC device using optimized scratch buffers, ensuring minimal overhead and maintaining high throughput. This feature integrates seamlessly with the existing include_sampler=True workflow and complements other supported strategies like repetition and presence penalties.

quic-sanising and others added 17 commits June 18, 2025 13:38
Signed-off-by: quic-sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: quic-sanising <[email protected]>
@quic-sanising
Copy link
Contributor Author

quic-sanising commented Jul 24, 2025

Depends on PR #463.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant