Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Epochs to load CoLA #487

Merged
merged 4 commits into from
May 5, 2020
Merged

Use Epochs to load CoLA #487

merged 4 commits into from
May 5, 2020

Conversation

sgugger
Copy link
Contributor

@sgugger sgugger commented May 5, 2020

This PR changes the way CoLA is loaded to use the Epochs API.
It removes the test set (that was unused) and does the minimal amount of changes to work with the script in BertCoLA.

Two notes where the design changed:

  • renamed dev to validation as it's more expressive
  • put the closure at the end of the init of the CoLA dataset so that we can use a trailing closure.

Added functionality: we can pass a random generator and thus get reproducible results.

@sgugger sgugger requested a review from texasmichelle May 5, 2020 18:17
@texasmichelle
Copy link
Member

The kokoro failure is looking for the functions defined at the bottom of DataUtilities.swift. @BradLarson do we have a common function to use instead?

@sgugger
Copy link
Contributor Author

sgugger commented May 5, 2020

Yeah I just realized that, I deleted the two files thinking they were useless now but did not try to build after, sorry about that. I can leave DatasetUtilities with them for now, there is a TODO to rely on the swift-models at some point, when it is done, this can be removed.

public let batchSize: Int
/// The type of the collection of batches.
public typealias Batches = Slices<Sampling<Samples, ArraySlice<Int>>>
/// The type of the training seauence of epochs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp: sequence

LazyMapSequence<Batches, LabeledTextBatch>>

//public typealias DevDataIterator = GroupedIterator<MapIterator<ExampleIterator, DataBatch>>
//public typealias TestDataIterator = DevDataIterator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented lines

maxSequenceLength: maxSequenceLength,
batchSize: batchSize,
entropy: SystemRandomNumberGenerator()
) { (example: CoLAExample) -> LabeledTextBatch in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the advantages of moving to a trailing closure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more swift-idiomatic from what Dave constantly tells me ;)

The main advantage is that you don't have to define your exampleMap function separately and pass it, you can just put in curly brackets at the end (like in the main colab example).

@texasmichelle
Copy link
Member

This is so helpful!! It cleans up the CoLA implementation considerably. Although there are aspects of Epochs that are not entirely clear to me at first glance, it's great that it covers so much of this functionality. Thank you!

@sgugger
Copy link
Contributor Author

sgugger commented May 5, 2020

Thanks! Hope the design review on Friday will clarify the aspects of Epochs that are unclear.

@texasmichelle
Copy link
Member

Created issues #488 and #489 for later follow-up.

@sgugger sgugger merged commit 08692fb into tensorflow:master May 5, 2020
@sgugger sgugger deleted the cola_epochs branch May 5, 2020 19:48
@texasmichelle
Copy link
Member

Fixes #480

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants