Replies: 2 comments
-
First off, I'm not sure that:
is at all accurate. Putting that aside, and ignoring (for the moment) any concerns people might have about using AI to do this (e.g., using systems trained on data without recompense to the original authors or considering the license, the energy costs of AI, etc), it's also not clear that this helps. The most comprehensive report I could find was https://stefanbohacek.com/blog/impact-of-fediverse-clients-on-the-use-of-alt-text/. This shows one client that does use AI to allow people to generate image descriptions, but the percentage of posts from this app with image descriptions is not higher than posts from people using other clients (Phanpy, etc). That comes with a number of caveats (difficulty of determining client, people predisposed to provide captions might gravitate to a particular client, etc). There are also technical challenges to doing this. On their own they're not especially difficult, but they do incur an ongoing cost on the project. Google's MLKit can label images (https://developers.google.com/ml-kit/vision/image-labeling) but this is not the same thing as generating an effective caption. Ice Cubes, an iOS client, does do this, and has a relevant blog post about they use OpenAI's Vision API to do this (https://dimillian.medium.com/adding-ai-generated-image-description-to-ice-cubes-c4e7990a5915). As a thought experiment, doing that in Pachli would require at least:
A vastly simpler approach would be to require the user to sign up for an OpenAI account themselves, create an API key, and then paste this key in to Pachli. Then the user becomes responsible for any usage. While that's a lot safer, it's also more user hostile -- my intuition is that very few users would bother to do this. I might be wrong about this. |
Beta Was this translation helpful? Give feedback.
-
My take on this: AI use has some quite controversial aspects as you mention above. Plus, Pachli feels like it should be just a mobile app, and shouldn't really need a lot of server-side infrastructure that has to be paid for and maintained. But, if this is something people are likely to do anyway (i.e. paste an image into a Chatbot and use the description) then it would be simpler to do that for them. As with the translation feature, it would be nice if this was something offered by the home Mastodon/Fediverse instances, so Pachli could just hook into that, and users could make their instance choices according to features - but that doesn't seem to be a thing. If it was me, I would vote for your final option where anyone who wants this has to add their own credentials / API key - this makes it fully opt-in and leaves no chance of anyone enabling it by accident, and it means only those folks who really want to use it will be able to - while still not excluding anyone from doing if, e.g. they have disabilities or other requirements which make this the only way they can sensibly add alt-text. It also makes it a lot easier from your side, as it's purely done within the client app and no additional server-side infrastructure is necessary. |
Beta Was this translation helpful? Give feedback.
-
Hello
would it be possible to make the ALT field of the images automatically filled in by an AI bot, during the image upload phase on the client, before publishing?
the ALT field is designed to be filled in "brutally" with the description of the image, without any human imagination. It is a mere description of the photo. And the latest AI solutions, based on the tests done, perfectly cope with this simple task.
I therefore wonder if it were possible to make sure that during the upload phase of the photos, these are also checked in real time and described directly by a bot.
Beta Was this translation helpful? Give feedback.
All reactions