Port Rotary Positional Embedding from NeuralAttentionlib.jl #2524

mashu · 2024-11-14T16:34:13Z

This is pretty much standard thing to do these days with MultiHeadAttention layer, I think we should have it as part of Flux.
If anyone can review it, I would be happy. Too often I miss this.

…mentation.

…ttentionlib.jl

…ng(...)

CarloLucibello · 2024-11-15T17:01:15Z

Why did you close this? It would make sense to have it. Actually you could file a PR to NNlib with the function generating the rotatory embedding.

mashu · 2024-11-15T18:19:14Z

Because it might not work on GPU and I figured I want to rewrite it not to compute rotations on fly but cache them.

CarloLucibello · 2024-11-25T06:39:35Z

This is in a package now
https://github.com/mashu/PositionalEmbeddings.jl

mashu added 4 commits November 14, 2024 11:59

src/layers/rotary.jl: Simplified with_rotary_position_embedding imple…

ce0fb72

…mentation.

src/layers/rotary.jl: Match forward and gradient exactly with NeuralA…

d887d93

…ttentionlib.jl

src/layers/rotary.jl: Add reference to original source.

5abe0e8

test/layers/rotary.jl: Add gradient_test with_rotary_position_embeddi…

9ca7d32

…ng(...)

mashu closed this Nov 15, 2024

CarloLucibello reopened this Nov 15, 2024

CarloLucibello closed this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port Rotary Positional Embedding from NeuralAttentionlib.jl #2524

Port Rotary Positional Embedding from NeuralAttentionlib.jl #2524

mashu commented Nov 14, 2024

CarloLucibello commented Nov 15, 2024

mashu commented Nov 15, 2024

CarloLucibello commented Nov 25, 2024

Port Rotary Positional Embedding from NeuralAttentionlib.jl #2524

Port Rotary Positional Embedding from NeuralAttentionlib.jl #2524

Conversation

mashu commented Nov 14, 2024

CarloLucibello commented Nov 15, 2024

mashu commented Nov 15, 2024

CarloLucibello commented Nov 25, 2024