Why does the 3D CNN example use tensor-product representations? #85
Unanswered
kalekundert
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The SE(3) CNN example uses the following "polynomial" representation between each of the residual blocks, where$\rho_i$ is the i'th SO(3) irrep:
In contrast, while it's possible I missed something, I think all of the representations considered in Appendix H of [Cesa2022] are just direct sums of SO(3) irreps, with different multiplicities such as:
(I'm assuming that$\rho^{\oplus 3}$ is shorthand for $\rho \oplus \rho \oplus \rho$ , but it's possible I'm misusing this syntax.)
Before looking carefully at the 3D CNN example, my intuition was just to use band-limited spectral regular representations for every intermediate layer, since the Fourier nonlinearities require that representation anyway. But now I'm wondering why one might want to use more complicated representations like$\rho_1 \otimes \rho_1$ , or different multiplicities of irreps. Especially since the SE(3) CNN example seems to go out of its way to do the former.
If I had to guess at the reasoning, I'd say that$\rho_1 \otimes \rho_1$ is 9-dimensional, so it creates a chunk of 9 feature vector elements that get treated as a single unit by the model (i.e. there won't be any learnable parameters that affect some of those feature vector elements but not others, due to the way the convolutional kernels are constructed—I think). In contrast, a representation like $\rho_1 \oplus \rho_1 \oplus \rho_1$ , which is also 9-dimensional, would be treated more like 3 independent 3-dimensional units. So the trade-off is a smaller number of more expressive features vs. a larger number of less expressive ones. Is that the right way to think about this?
More broadly, I'd appreciate any advice you have about things to consider when choosing which representations to use.
Some smaller follow-up questions/clarifications:
I do understand that the choice of non-linearity affects which representations can be used, e.g. gated nonlinearities require trivial representations, Fourier nonlinearities require spectral regular representations (so the feature vectors can be interpreted as Fourier coefficients), etc. I also understand that the choice of representation for the input and output layers of the whole model is dictated by the problem being solved. So my questions here are just about cases where neither of the above are determinative.
Assuming my reasoning about representation sizes is mostly on the right track, is there any reason to prefer$\rho_1 \otimes \rho_1$ over any other 9-dimensional representation, e.g. $\rho_4$ ? $\rho_4$ itself may not be a very good example, because I think there can only be equivariant linear maps between irreps of the same degree (Schur's lemma?), so I can see preferring the tensor product because it "mixes" more easily with $\rho_1$ and $\rho_2$ in some sense. But if there are other 9-dimensional reducible representations of SO(3), this seems like a reasonable question.
Is there any practical difference between the band-limited spectral regular representation (e.g. as created by$\rho_0 \oplus \rho_1^{\oplus 3} \oplus \rho_2^{\oplus 5} \oplus \cdots$ ? My understanding is that they are exactly the same, but the code to create the spectral regular representation seems a lot more complex than simply direct-summing irreps, so I'm not confident about that.
FourierPointwise
) andDo you have any data on which representations perform the best empirically, e.g. on the ModelNet10 or Atom3D LBA datasets? I suspect that this will end up being one of those hyperparameters that just has to be optimized, but it'd be nice to know what has worked in the past.
As always, thanks for taking the time to help me understand all of this. I really appreciate it!!
Beta Was this translation helpful? Give feedback.
All reactions