Could you please release the processed pretraining data? #8

phellonchen · 2023-02-14T09:29:35Z

No description provided.

StevenTang1998 · 2023-02-14T10:17:17Z

You can download them at the link: https://huggingface.co/RUCAIBox. Since some datasets have license limitations, we cannot merge them into one dataset. You can merge them by your own.

phellonchen · 2023-02-15T04:54:23Z

Thanks. One more question, where can I find the code about a temperature-scaled mixing strategy (Raffel et al., 2020) with a rate of T = 2 to mitigate the disparity in tasks and datasets ? I have not found it in https://github.com/RUCAIBox/TextBox.

StevenTang1998 · 2023-02-15T05:00:02Z

The general code of pre-training is still under developping. For pre-training MVP, we just conducted the temperature-scaled mixing strategy by copying instances. You can also use it as a simple alternative.
For example, A dataset has 2 instances and B dataset has 8 instances. We merge them into a unified datasest with the temperature-scaled mixing strategy by doubling the instances in A dataset.

StevenTang1998 closed this as completed Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you please release the processed pretraining data? #8

Could you please release the processed pretraining data? #8

phellonchen commented Feb 14, 2023

StevenTang1998 commented Feb 14, 2023

phellonchen commented Feb 15, 2023

StevenTang1998 commented Feb 15, 2023

Could you please release the processed pretraining data? #8

Could you please release the processed pretraining data? #8

Comments

phellonchen commented Feb 14, 2023

StevenTang1998 commented Feb 14, 2023

phellonchen commented Feb 15, 2023

StevenTang1998 commented Feb 15, 2023