GitHub - SudarshiniTyagi/text_rep_stac_gan: End-to-end model for generating encoding vectors for sentences describing an image

Collaborator

Abstract

In this paper, we propose an end-to-end model for generating encoding vectors for sentences describing an image. We use the encoded vectors from our network to generate images using Stacked Generative Adversarial Networks to generate high resolution quality images. We present different model architectures with word and character level representations and provide an extensive qualitative and quantitative evaluation on images generated.

Full project report can be found here.

Main architecture

In this paper, we propose an end-to-end hybrid CNN-RNN model to learn the relation between an image and its text description. The learned model is then used to train Stack GAN conditioned on an input caption, whose encodings are generated from our trained model.

Results and conclusion

From the results, we conclude that due to the lack of volume of images generated, quantitative scores cannot be completely trusted. However, human evaluations suggests that images generated by GANs using our trained image encodings perform at par with the Stack GAN pre-trained model. Another noticeable fact is that our text representations are currently trained on captions for 1 class (Dog or Bird) only. As next steps for this project, we would like to train our embeddings on the entire COCO dataset, and evaluate images generated using recommended volume by both Inception score and our own hybrid model and compare these against the current state of the art models.

Another interesting conclusion we draw from our results is that hybrid CNN-RNN word level encodings with attention generate best images for both Stack GAN layer 1 and layer 2. layer 1 images, as expected, provide a structural representation/outline of image for the caption, and layer 2 images fill in the final details detailing the skeleton drawn in by layer 1 generators.

General Adversarial Networks have revolutionized the world of image generation, and we would like to continue our efforts further to gap the bridge between natural language and computer vision to improve image synthesis from captions and produce better models and results.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
gan_files		gan_files
generation_files		generation_files
picklers		picklers
MainArchitecture.png		MainArchitecture.png
README.md		README.md
char_embeddings.py		char_embeddings.py
config.json		config.json
data_loader.py		data_loader.py
main.py		main.py
models.py		models.py
trainer.py		trainer.py
word_embeddings.py		word_embeddings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Collaborator

Abstract

Main architecture

Results and conclusion

About

Releases

Packages

Languages

SudarshiniTyagi/text_rep_stac_gan

Folders and files

Latest commit

History

Repository files navigation

Collaborator

Abstract

Main architecture

Results and conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages