self-long-instruct

The project to enhance the self-instruct method to long-context instruction tuning dataset auto-generation based on long-LLMs, Retrieval-Augmented Generation (RAG) and LLMs-as-Agents

Base

self-instruct:
- paper link: here | github link: here
Retrieval-Augmented Generation (RAG):
- paper link: here | github link: here
LLMs-as-Agents:
- paper link: here | github link: here

Preparation

install the pip dependences:
```
pip install -r requirements.txt
```
download the punkt from nltk :
- method1: download through the api
```
import nltk
nltk.download('punkt')
```
- method2: if the api fails, you can go to the github repo and follow the steps below:
  - step1: download the whole packages directory into your conda env path like /home/user/anaconda3/envs/myenv/ and rename it nltk_data
  - step2: unzip the zip files through the nltk_data, especially the tokenizers/ and taggers/, and to make it convenient, we also provide a function to do it automatically:
```
from src.utils import unzip_nltk_data
nltk_data_dir = "/home/user/anaconda3/envs/myenv/"
unzip_nltk_data(nltk_data_dir, remove=True) 
```
install the poppler tools to make pdf2image work (Assuming your OS is Linux, well if not, you can check pdf2image installation guide further):
```
sudo apt-get install poppler-utils
```
follow the guide here and install the LibreOffice tool to make unstructured.partition.doc work

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
exp		exp
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

self-long-instruct

Base

Preparation

About

Releases

Packages

Languages

License

Strivin0311/self-long-instruct

Folders and files

Latest commit

History

Repository files navigation

self-long-instruct

Base

Preparation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages