Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Workflow] Pre-Commit Linting and Formatting #104

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.5
hooks:
- id: ruff
args: [--fix]
- id: ruff-format
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ LLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM
See poetry documentation page for poetry [installation instructions](https://python-poetry.org/docs/#installation)

```shell
poetry install
poetry install --without dev
```

### [Option 3] pip
Expand Down Expand Up @@ -255,3 +255,10 @@ If you would like to contribute to this project, we recommend following the "for
5. Submit a **Pull request** so that we can review your changes

NOTE: Be sure to merge the latest from "upstream" before making a pull request!

### Setting Up Repo for Development

- We recommend using `poetry` to manage dependency
- Install deps via `poetry install`
- Enter virtual environment with `poetry shell`
- Install pre-commit hooks using `pre-commit install`
173 changes: 115 additions & 58 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

25 changes: 24 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,31 @@ shellingham = "^1.5.4"


[tool.poetry.group.dev.dependencies]
black = "^24.3.0"
pre-commit = "~3.7.0"
ruff = "~0.3.5"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"


[tool.ruff]
lint.ignore = ["C901", "E501", "E741", "F402", "F823" ]
lint.select = ["C", "E", "F", "I", "W"]
line-length = 119
exclude = [
"llama2",
"mistral",
]


[tool.ruff.lint.isort]
lines-after-imports = 2
known-first-party = ["llmtune"]

[tool.ruff.format]
quote-style = "double"
indent-style = "space"
skip-magic-trailing-comma = false
line-ending = "auto"

17 changes: 5 additions & 12 deletions src/data/dataset_generator.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,11 @@
import os
from os.path import join, exists
import pickle
import re
from functools import partial
from os.path import exists, join
from typing import Tuple, Union
import pickle

import re
from datasets import Dataset
from rich.console import Console
from rich.layout import Layout
from rich.panel import Panel

from src.data.ingestor import Ingestor, get_ingestor

Expand Down Expand Up @@ -64,12 +61,8 @@ def _format_one_prompt(self, example, is_test: bool = False):
return example

def _format_prompts(self):
self.dataset["train"] = self.dataset["train"].map(
partial(self._format_one_prompt, is_test=False)
)
self.dataset["test"] = self.dataset["test"].map(
partial(self._format_one_prompt, is_test=True)
)
self.dataset["train"] = self.dataset["train"].map(partial(self._format_one_prompt, is_test=False))
self.dataset["test"] = self.dataset["test"].map(partial(self._format_one_prompt, is_test=True))

def get_dataset(self) -> Tuple[Dataset, Dataset]:
self._train_test_split()
Expand Down
9 changes: 3 additions & 6 deletions src/data/ingestor.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
import csv
from abc import ABC, abstractmethod
from functools import partial

import ijson
import csv
from datasets import Dataset, load_dataset, concatenate_datasets
from datasets import Dataset, concatenate_datasets, load_dataset


def get_ingestor(data_type: str):
Expand All @@ -14,9 +13,7 @@ def get_ingestor(data_type: str):
elif data_type == "huggingface":
return HuggingfaceIngestor
else:
raise ValueError(
f"'type' must be one of 'json', 'csv', or 'huggingface', you have {data_type}"
)
raise ValueError(f"'type' must be one of 'json', 'csv', or 'huggingface', you have {data_type}")


class Ingestor(ABC):
Expand Down
1 change: 0 additions & 1 deletion src/finetune/finetune.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
from abc import ABC, abstractmethod
from typing import Union, List, Tuple, Dict


class Finetune(ABC):
Expand Down
Loading