v0.2.0
Added c-TF-IDF as an algorithm to extract textual representations from images.
from concept import ConceptModel
concept_model = ConceptModel(ctfidf=True)
concepts = concept_model.fit_transform(img_names, docs=docs)
From the textual and visual embeddings, we use cosine similarity to find the best matching words
for each image. Then, after clustering the images, we combine all words in a cluster into a single
documents. Finally, c-TF-IDF is used to find the best words for each concept cluster.
The benefit of this method is that it takes the entire cluster structure into account when creating the
representations. This is not the case when we only consider words close to the concept embedding.