Replies: 5 comments 1 reply
-
I think we could definitely separate them, but until now we haven't had anyone asking about this with a concrete use case, so I don't think this is something we should prioritize (we have enough tasks already!).
This is why we have factored out
I think technically you are right, but on the other hand the name MultiHeadDotProductAttention is known from the literature, and since it is the default I think the current name is acceptable as well. I don't have a strong opinion here. |
Beta Was this translation helpful? Give feedback.
-
Hello! If this is still on the agenda it would be great to have keys and values separated (I am converting from Haiku where this is implemented in MultiHeadAttention. An example where the separation is needed is attentive neural processes (see Figure 2). Thanks and kind regards, |
Beta Was this translation helpful? Give feedback.
-
After this commit, |
Beta Was this translation helpful? Give feedback.
-
@cgarciae @marcvanzee I second renaming to (Perhaps even just |
Beta Was this translation helpful? Give feedback.
-
To think of it, is there any reason not to just call it |
Beta Was this translation helpful? Give feedback.
-
Hey! I am very curious about the API exposed by
MultiHeadDotProductAttention
, currently it has the following signature:It seems to be tying
keys
andvalues
intoinputs_kv
which (while probably the most common option) seems restrictive, both Keras and Pytorch let you define them separately, I don't know of any case where they might be different but who knows what researches can come up with. Would it be worth separatinginputs_kv
intoinputs_k
andinputs_v
?Also the following things come to mind:
attention_fn
, MultiHeadAttention would be the better name here since contrary to the Pytorch and Keras implementations the user can use whatever form of attention it wants.Beta Was this translation helpful? Give feedback.
All reactions