-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Support target features #2227
Conversation
Hello @anderleich yes good starting point even though would be good to fix the Lint & Tests and maybe add a test to show everything works fine. |
Yes, totally agree. I just wanted to make sure I was on the right track. For now, I've just made the necessary changes in the code for vocabulary generation. That is, to make |
more docs update
onmt_server works with ctranslate2
Add CTranslate2 in requirements
Better docs
fix LM_scoring with v3
* fixed tensorboard logging * added test / validation tests * added tests in github actions
* fixed validation scoring * added default value to choices
various fixes. see comment in PR.
* revisit tgt_prefix
* sources and refs tokens are recovered with vocab.lookup_index * tests with dynamic scoring and copy are reactivated
* process transforms of buckets in batches rather than per example.
* pickable Vocab / v3.0.2
…tions (OpenNMT#2270) * use native crossentropy * doc * fix transform bug
* empty transformed buckets are replaced by None (in build_vocab, it is of size 1 so when the exemple is filtered for instance, we can't reach the first instance of the empty list) * detokenization in scoring_utils is done with apply_reverse * added _detokenize to BPETransform * handle empty TransformPipe by simply detokenizing with ' '.join()
* keep Label Smoothing for Validation (same as Train) * fix save transforms
* various fixes along v3.0.3
* wmt17 example
* better batching
* fix LM scoring
Closing this PR as |
This PR intends to add target features support to OpenNMT-py v3.0. All the code has been adapted for this new version.
Both source and target features support has been refactored for a more simplified handling of features. The way features are passed to the system has been changed and now features are appended to the actual textual data instead of providing a separate file. This also simplifies the way features are passed during inference and to the server. It uses the special character
│
as a feature separator, as in the previous versions of the OpenNMT framework. For instance:I've also added a way to provide default values for features. This can be really useful when mixing task specific data (with features) with general data which has not been annotated. Additionally, the
filterfeats
transform is no longer required and features are checked in the corpus loading process.A YAML configuration file would look like this:
For now, I've made the necessary changes in the code for vocabulary generation. That is, to make
onmt_build_vocab
work.@vince62s , do you think this a good starting point?