-
I found llama.cpp can only get last_hidden_states by using LLAMA_POOLING_TYPE_NONE or get pooler_output by using LLAMA_POOLING_TYPE_CLS\MEAN. What if I want to get both of them??Is it possiable? Please help me. Thank you. |
Beta Was this translation helpful? Give feedback.
Answered by
ggerganov
Jan 20, 2025
Replies: 1 comment 4 replies
-
Programmatically, you can obtain any intermediate result from the computation using the eval callback. See the |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Extracting the embeddings right before the language modelling head or the pooler seems like a relatively common practice in many applications, so we should extend the public API to do this in an easy way. Something like adding
llama_get_embeddings_pre()
andllama_get_embeddings_post()
calls.The callback can tell the engine which tensors to return and which to skip. See how the return value works.
The callback can have any kind of dynamic logic that you wan…