Skip to content

[Question]: Is it possible to get BOTH the last_hidden_states and pooler_output of embedding model like BERT? #11274

Answered by ggerganov
FdyCN asked this question in Q&A
Discussion options

You must be logged in to vote

Extracting the embeddings right before the language modelling head or the pooler seems like a relatively common practice in many applications, so we should extend the public API to do this in an easy way. Something like adding llama_get_embeddings_pre() and llama_get_embeddings_post() calls.

But how can I just dump one specific layer output

The callback can tell the engine which tensors to return and which to skip. See how the return value works.

Howerer I wanna make a runtime switch which for example, (i)th llama-decode function call to get pooler_output and then (i+1)th llama-decode function call to get last_hidden_states.

The callback can have any kind of dynamic logic that you wan…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@FdyCN
Comment options

@FdyCN
Comment options

@ggerganov
Comment options

Answer selected by FdyCN
@FdyCN
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants