Skip to content

Commit

Permalink
test(tfidf): rm test corpus from module, adapt doctest
Browse files Browse the repository at this point in the history
  • Loading branch information
cmdoret committed Nov 11, 2023
1 parent 2288932 commit 6bee671
Showing 1 changed file with 2 additions and 7 deletions.
9 changes: 2 additions & 7 deletions gimie/utils/text.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,6 @@
from pydantic.dataclasses import dataclass
import scipy.sparse as sp

CORPUS = [
"This is my test document.",
"This is another test document.",
]


def tokenize(text: str, sep: str = " ") -> List[str]:
"""Basic tokenizer. Removes punctuation, but not stop words.
Expand Down Expand Up @@ -164,9 +159,9 @@ class TfidfVectorizer(BaseModel):
--------
>>> docs = ["The quick brown fox", "jumps over", "the lazy dog."]
>>> vectorizer = TfidfVectorizer(config=TfidfConfig())
>>> tfidf = vectorizer.fit_transform(CORPUS)
>>> tfidf = vectorizer.fit_transform(docs)
>>> tfidf.shape
(2, 6)
(3, 8)
"""

config: TfidfConfig
Expand Down

0 comments on commit 6bee671

Please sign in to comment.