Skip to content

Commit

Permalink
PIXAR accepted
Browse files Browse the repository at this point in the history
  • Loading branch information
arranger1044 committed May 16, 2024
1 parent 4d3afce commit 6cfbf47
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 4 deletions.
7 changes: 7 additions & 0 deletions _news/pixar-accepted-acl24findings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: "PIXAR accepted at ACL Findings"
collection: news
permalink: /news/pixar-accepted
date: 2024-05-16
---
<a href="https://arxiv.org/abs/2401.03321"><b>PIXAR</b></a> the first pixel-based generative LLM is accepted at <b>ACL 24 Findings</b>.
8 changes: 4 additions & 4 deletions _publications/tai2024pixar.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,22 @@ collection: publications
ref: "tai2024pixar"
permalink: "publications/tai2024pixar"
title: "PIXAR: Auto-Regressive Language Modeling in Pixel Space"
date: 2024-02-26 00:00
date: 2024-05-16 00:00
tags: generative llms
image: "/images/papers/tai2024pixar/pixar.apng"
authors: "Yintao Tai, Xiyang Liao, Alessandro Suglia, Antonio Vergari"
paperurl: "https://arxiv.org/abs/2401.03321"
pdf: "https://arxiv.org/pdf/2401.03321.pdf"
venue: "arXiv 2024"
venue: "ACL 2024 Findings"
code: "https://github.com/april-tools/pixar"
excerpt: "Can LLMs understand, reason about and generate text by operating only on perceptual information such as pixels? We build PIXAR, the first generative pixel-based LLM to answer it."
abstract: "Recent work showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations. These models are implemented as autoencoders that reconstruct masked patches of rendered text. However, these pixel-based LLMs are limited to discriminative tasks (e.g., classification) and, similar to BERT, cannot be used to <i>generate text</i>. Therefore, they cannot be used for generative tasks such as free-form question answering. In this work, we introduce PIXAR, the first pixel-based autoregressive LLM that performs text generation. Consisting of only a decoder, PIXAR can perform free-form generative tasks
while keeping the number of parameters on par with previous encoder-decoder models. Furthermore, we highlight the challenges of generating text as non-noisy images and show this is due to using a maximum likelihood objective. To overcome this problem, we propose an adversarial pretraining stage that improves the readability and accuracy of PIXAR by 8.1 on LAMBADA and 8.5 on bAbI--- making it comparable to GPT-2 on text generation tasks. This paves the way to build open-vocabulary LLMs that operate on perceptual input only and calls into question the necessity of the usual symbolic input representation, i.e., text as (sub)tokens."
supplemental:
bibtex: "@article{tai2024pixar,<br/>
bibtex: "@inproceedings{tai2024pixar,<br/>
title={PIXAR: Auto-Regressive Language Modeling in Pixel Space},<br/>
author={Yintao Tai, Xiyang Liao, Alessandro Suglia, Antonio Vergari,<br/>
journal={arXiv preprint arXiv:2401.03321},<br/>
booktitle={Findings of the Association for Computational Linguistics: ACL 2024},<br/>
year={2024}
}"
---

0 comments on commit 6cfbf47

Please sign in to comment.