Skip to content

Commit

Permalink
Updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Thilina Rajapakse committed Jul 24, 2021
1 parent 7d8ee7a commit 5aa7921
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 42 deletions.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).


## [0.61.12] - 2021-07-24
## [0.61.13] - 2021-07-24

### Added

- Pretraining and finetuning BigBird [whr778](https://github.com/whr778)
- Pretraining and finetuning BigBird and XLMRoBERTa LMs [whr778](https://github.com/whr778)
## [0.61.10] - 2021-07-13

### Added
Expand Down
76 changes: 37 additions & 39 deletions docs/_docs/20-lm-specifics.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@
title: Language Modeling Specifics
permalink: /docs/lm-specifics/
excerpt: "Specific notes for Language Modeling tasks."
last_modified_at: 2020/12/08 00:05:36
last_modified_at: 2021/07/24 13:16:18
toc: true
---

The idea of (probabilistic) language modeling is to calculate the probability of a sentence (or sequence of words). This can be used to find the probabilities for the next word in a sequence, or the probabilities for possible words at a given (masked) position.

The commonly used *pre-training* strategies reflect this idea. For example;
Expand Down Expand Up @@ -44,26 +43,26 @@ The process of performing Language Modeling in Simple Transformers follows the [
2. Train the model with `train_model()`
3. Evaluate the model with `eval_model()`


## Supported Model Types

New model types are regularly added to the library. Language Modeling tasks currently supports the model types given below.

| Model | Model code for `LanguageModelingModel` |
| ---------- | -------------------------------------- |
| BERT | bert |
| BigBird | bigbird |
| CamemBERT | camembert |
| DistilBERT | distilbert |
| ELECTRA | electra |
| GPT-2 | gpt2 |
| Longformer | longformer |
| OpenAI GPT | openai-gpt |
| RoBERTa | roberta |
| XLMRoBERTa | xlmroberta |

**Tip:** The model code is used to specify the `model_type` in a Simple Transformers model.
{: .notice--success}


## ELECTRA Models

The ELECTRA model consists of a generator model and a discriminator model.
Expand All @@ -76,45 +75,44 @@ You can configure an ELECTRA model in several ways by using the options below.
- To load a saved ELECTRA model, you can provide the path to the save files as `model_name`.
- However, the pre-trained ELECTRA models made public by Google are available as separate generator and discriminator models. When starting from these models (Language Model fine-tuning), set `model_name` to `electra` and provide the pre-trained models as `generator_name` and `discriminator_name`. These two parameters can also be used to load locally saved generator and/or discriminator models.

```python
model = LanguageModelingModel(
"electra",
"electra",
generator_name="outputs/generator_model",
discriminator_name="outputs/disciminator_model",
)
```python
model = LanguageModelingModel(
"electra",
"electra",
generator_name="outputs/generator_model",
discriminator_name="outputs/disciminator_model",
)

```
```
- When training an ELECTRA language model from scratch, you can define the architecture by using the `generator_config` and `discriminator_config` in the `args` dict. The [default values](https://huggingface.co/transformers/model_doc/electra.html#electraconfig) will be used for any config parameters that aren't specified.

```python
model_args = {
"vocab_size": 52000,
"generator_config": {
"embedding_size": 128,
"hidden_size": 256,
"num_hidden_layers": 3,
},
"discriminator_config": {
"embedding_size": 128,
"hidden_size": 256,
},
}

train_file = "data/train_all.txt"

model = LanguageModelingModel(
"electra",
None,
args=model_args,
train_files=train_file,
)

```
```python
model_args = {
"vocab_size": 52000,
"generator_config": {
"embedding_size": 128,
"hidden_size": 256,
"num_hidden_layers": 3,
},
"discriminator_config": {
"embedding_size": 128,
"hidden_size": 256,
},
}

train_file = "data/train_all.txt"

model = LanguageModelingModel(
"electra",
None,
args=model_args,
train_files=train_file,
)

```

Refer to the [Language Modeling Minimal Start](/docs/lm-minimal-start/) for full (minimal) examples.


### Saving ELECTRA models

When using ELECTRA models for downstream tasks, the ELECTRA developers recommend using the discriminator model only. Because of this, Simple Transformers will save the generator and discriminator models separately at the end of training. The discriminator model can then be used for downstream tasks.
Expand All @@ -139,7 +137,6 @@ classification_model = ClassificationModel("electra", "outputs/checkpoint-1-epoc
**Note:** Both `save_discriminator()` and `save_generator()` methods takes in an optional `output_dir` argument which specifies where the model should be saved.
{: .notice--info}


## Distributed Training

Simple Transformers supports distributed language model training.
Expand All @@ -148,6 +145,7 @@ Simple Transformers supports distributed language model training.
{: .notice--success}

You can launch distributed training as shown below.

```bash
python -m torch.distributed.launch --nproc_per_node=4 train_new_lm.py
```
```
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

setup(
name="simpletransformers",
version="0.61.12",
version="0.61.13",
author="Thilina Rajapakse",
author_email="[email protected]",
description="An easy-to-use wrapper library for the Transformers library.",
Expand Down

0 comments on commit 5aa7921

Please sign in to comment.