Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing basic RNN #162

Draft
wants to merge 31 commits into
base: main
Choose a base branch
from
Draft

Implementing basic RNN #162

wants to merge 31 commits into from

Conversation

castelao
Copy link
Collaborator

A work in progress. I'm mostly interested in loading a TF model from HDF and applying predict(), but I'll do my best in doing a complete implementation coherent with the rest of the library.

@milancurcic
Copy link
Member

milancurcic commented Oct 28, 2023

Amazing, thanks and great to see you, @castelao, after so many years. 🙂

Will review this coming week.

@castelao
Copy link
Collaborator Author

@milancurcic , yes, it's great to connect again. Thank you and the other developers for your time in this library! It is great.

I'm not fluent in modern Fortran, so if you see anything that doesn't make sense, please let me know. And be aware, it is still a WIP.

@milancurcic milancurcic added the enhancement New feature or request label Nov 1, 2023
@castelao castelao force-pushed the RNN branch 2 times, most recently from a9111b3 to 3fa5281 Compare November 14, 2023 16:24
@milancurcic
Copy link
Member

I added the support for rnn_layer to network % predict so that we can run the simple_rnn example. However I can't get the simple example to converge and I'm not very familiar with recurrent networks. @castelao is there a small toy example like the one in simple.f90 that's known to converge and fast? We can then use it for testing as well. With a working example and a small test suite (I can add this), this PR is almost good to go.

@castelao
Copy link
Collaborator Author

@milancurcic , I have to check how you updated the library since the last time I worked on this and see if what I did is still consistent. If it looks fine, I intend to work on the following:

  • The toy example
  • Load functionality from an HDF exported by TensorFlow. I want to avoid recompiling every time I change my coefficients.

Are there any other requirements before I can submit this PR for review? It will be slow progress, but I'm back to this.

Copy link
Collaborator

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I am currently working with neural-fortran, I did a quick review of this PR and left a few minor comments. Overall LGTM. Thank you.

src/nf/nf_layer_submodule.f90 Outdated Show resolved Hide resolved
procedure :: get_params
procedure :: init
procedure :: set_params
! procedure :: set_state
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! procedure :: set_state

Comment on lines +130 to +134
!module subroutine set_state(self, state)
! type(rnn_layer), intent(inout) :: self
! real, intent(in), optional :: state(:)
!end subroutine set_state

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
!module subroutine set_state(self, state)
! type(rnn_layer), intent(inout) :: self
! real, intent(in), optional :: state(:)
!end subroutine set_state

Comment on lines +31 to +34
db = gradient * self % activation % eval_prime(self % z)
dw = matmul(reshape(input, [size(input), 1]), reshape(db, [1, size(db)]))
self % gradient = matmul(self % weights, db)
self % dw = self % dw + dw
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently modified these lines in nf_dense_layer_submodule.f90 for better performances. The same logic could be done here IMO.

class(rnn_layer), intent(in) :: self
real, allocatable :: params(:)

params = [ &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pack can be avoided here, by using pointers or because it is not needed. See here for some changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for the subroutines get_gradients and set_params below.

@milancurcic
Copy link
Member

@castelao Thanks for all the additions. Sounds good.

  • It may be helpful to merge main into here, which would reflect the few recent changes in other layers that @jvdp1 mentioned.
  • Regarding loading recurrent layers from Keras HDF5 files; unless you need it, I'd say it's not a priority. If you need it, let's add it in a later PR, as there would be considerable additions needed to make it work (e.g. a Keras script example in neural-fortran/keras-snippets). That format is also no longer the latest Keras saved model format (they change too often!). Related but not part of this PR, I'd like to find an easy way to make this part (and HDF5 dependency) optional at build time. Probably via preprocessor flags in the code, but I'd need to read up how to do this in fpm.toml.
  • But a standalone example program in examples/ is important.

@castelao
Copy link
Collaborator Author

@jvdp1 , thanks for your suggestions. I'll work on that.

@milancurcic , yes, I have already rebased it with main. I'm interested in the loading from Keras output, but I see your point. I'll leave that loading capability for another PR, but I'll work on the example for this one. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants