Skip to content
This repository has been archived by the owner on Jun 30, 2024. It is now read-only.

The project to enhance the self-instruct method to long-context instruction tuning dataset auto-generation based on long-LLMs, Retrieval-Augmented Generation (RAG) and LLMs-as-Agents

License

Notifications You must be signed in to change notification settings

Strivin0311/self-long-instruct

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

self-long-instruct

The project to enhance the self-instruct method to long-context instruction tuning dataset auto-generation based on long-LLMs, Retrieval-Augmented Generation (RAG) and LLMs-as-Agents

Base

  • self-instruct:
  • Retrieval-Augmented Generation (RAG):
  • LLMs-as-Agents:

Preparation

  • install the pip dependences:
    pip install -r requirements.txt
  • download the punkt from nltk :
    • method1: download through the api
      import nltk
      nltk.download('punkt')
    • method2: if the api fails, you can go to the github repo and follow the steps below:
      • step1: download the whole packages directory into your conda env path like /home/user/anaconda3/envs/myenv/ and rename it nltk_data
      • step2: unzip the zip files through the nltk_data, especially the tokenizers/ and taggers/, and to make it convenient, we also provide a function to do it automatically:
        from src.utils import unzip_nltk_data
        nltk_data_dir = "/home/user/anaconda3/envs/myenv/"
        unzip_nltk_data(nltk_data_dir, remove=True) 
  • install the poppler tools to make pdf2image work (Assuming your OS is Linux, well if not, you can check pdf2image installation guide further):
    sudo apt-get install poppler-utils
  • follow the guide here and install the LibreOffice tool to make unstructured.partition.doc work

About

The project to enhance the self-instruct method to long-context instruction tuning dataset auto-generation based on long-LLMs, Retrieval-Augmented Generation (RAG) and LLMs-as-Agents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published