ALT field filling via AI bot #1089

zero0n3 · 2024-11-12T09:05:01Z

zero0n3
Nov 12, 2024

Hello
would it be possible to make the ALT field of the images automatically filled in by an AI bot, during the image upload phase on the client, before publishing?
the ALT field is designed to be filled in "brutally" with the description of the image, without any human imagination. It is a mere description of the photo. And the latest AI solutions, based on the tests done, perfectly cope with this simple task.
I therefore wonder if it were possible to make sure that during the upload phase of the photos, these are also checked in real time and described directly by a bot.

nikclayton · 2024-11-14T12:12:05Z

nikclayton
Nov 14, 2024
Maintainer

First off,

I'm not sure that:

the ALT field is designed to be filled in "brutally" with the description of the image, without any human imagination.

is at all accurate.

Putting that aside, and ignoring (for the moment) any concerns people might have about using AI to do this (e.g., using systems trained on data without recompense to the original authors or considering the license, the energy costs of AI, etc), it's also not clear that this helps.

The most comprehensive report I could find was https://stefanbohacek.com/blog/impact-of-fediverse-clients-on-the-use-of-alt-text/. This shows one client that does use AI to allow people to generate image descriptions, but the percentage of posts from this app with image descriptions is not higher than posts from people using other clients (Phanpy, etc).

That comes with a number of caveats (difficulty of determining client, people predisposed to provide captions might gravitate to a particular client, etc).

There are also technical challenges to doing this. On their own they're not especially difficult, but they do incur an ongoing cost on the project.

Google's MLKit can label images (https://developers.google.com/ml-kit/vision/image-labeling) but this is not the same thing as generating an effective caption.

Ice Cubes, an iOS client, does do this, and has a relevant blog post about they use OpenAI's Vision API to do this (https://dimillian.medium.com/adding-ai-generated-image-description-to-ice-cubes-c4e7990a5915).

As a thought experiment, doing that in Pachli would require at least:

A mechanism to allow users to opt in to this, with the feature turned off by default (if it was turned on by default user's images would be uploaded to OpenAI without their consent).
Developing and running a service to proxy requests between Pachli and OpenAI. This is because access to the OpenAI API is controlled by a key (password, effectively), and that cannot be distributed with Pachli, as third parties could easily extract the key from the app.

Instead, the proxy would hold the key, receive requests from Pachli, and pass them to/from OpenAI. Developing and then reliably running a proxy like this is significant additional work. It would also have its own privacy issues (because it would be handling user images).

And there would need to be additional work to ensure the service was only used by Pachli clients, otherwise it acts as an open relay for OpenAI.

Ice Cubes does this at -- I think -- https://icecubesrelay.fly.dev/openai.
OpenAI costs money to use (as would any hosting provider for the proxy mentioned in point 2). There's no budget for this in the Pachli project at the moment. And without careful rate limiting or budget configuration the proxy service mentioned in point 2 would provide a mechanism for external users to incur arbitrarily large expenses for the project.

A vastly simpler approach would be to require the user to sign up for an OpenAI account themselves, create an API key, and then paste this key in to Pachli. Then the user becomes responsible for any usage. While that's a lot safer, it's also more user hostile -- my intuition is that very few users would bother to do this. I might be wrong about this.

0 replies

keefmarshall · 2024-11-14T15:27:16Z

keefmarshall
Nov 14, 2024

My take on this: AI use has some quite controversial aspects as you mention above. Plus, Pachli feels like it should be just a mobile app, and shouldn't really need a lot of server-side infrastructure that has to be paid for and maintained.

But, if this is something people are likely to do anyway (i.e. paste an image into a Chatbot and use the description) then it would be simpler to do that for them. As with the translation feature, it would be nice if this was something offered by the home Mastodon/Fediverse instances, so Pachli could just hook into that, and users could make their instance choices according to features - but that doesn't seem to be a thing.

If it was me, I would vote for your final option where anyone who wants this has to add their own credentials / API key - this makes it fully opt-in and leaves no chance of anyone enabling it by accident, and it means only those folks who really want to use it will be able to - while still not excluding anyone from doing if, e.g. they have disabilities or other requirements which make this the only way they can sensibly add alt-text. It also makes it a lot easier from your side, as it's purely done within the client app and no additional server-side infrastructure is necessary.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ALT field filling via AI bot #1089

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

ALT field filling via AI bot #1089

zero0n3 Nov 12, 2024

Replies: 2 comments

nikclayton Nov 14, 2024 Maintainer

keefmarshall Nov 14, 2024

zero0n3
Nov 12, 2024

nikclayton
Nov 14, 2024
Maintainer

keefmarshall
Nov 14, 2024