-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More local model options #2
Comments
Hi, The util function here https://github.com/yangkevin2/doc-story-generation/blob/main/story_generation/common/util.py#L927 interfaces with Alpa to get next-token logprobs. You could try changing that to use your local model instead. Just be aware that the quality of generated text might be a lot worse using a much smaller model, though. Thanks, |
Thank you! I created a branch and am now navigating my own personal dependency purgatory (using AMD GPU, ROCM, accelerate, and bitsandbytes 8bit, etc). Thanks for your guidance, I will remember you if it makes cool stories. |
Hi Kevin and anyone else, There are also a lot of openai calls in various functions - maybe I can find them all and change them, this turned into more of a weekend project than an afternoon project, so I will be delayed replacing every openai call with a local model.generate equivalent. I may return to this a little at a time, best guess a few weeks to completion as I've dug a little deeper over time. Anyone else feel free to look into my branch and try to suggest openai replacement model.generate equivalents. -Ben |
Oh, yeah if you don't want to use the GPT3 API at all you'll have to replace all of those. Sorry, thought you meant just the Alpa stuff. As an additional note, using your local models on a 16GB GPU will also pretty seriously compromise the quality of the resulting outputs, especially the plan/outline generation-- I'm not convinced that that part would work at all without using an instruction-tuned model (specifically text-davinci-002, since that supports suffix context in addition to a prompt). And in our preliminary experiments using "smaller" 13B models for the main generation procedure, the story quality was quite a bit worse too. |
I have great hope for producing about 1 good generation out of less than 20
attempts with models today. I agree the quality in general will require
more cherry picking outputs (reprompting?).
With the improvements coming to the smaller models (such as Llama 13B
competing with 175B GPT) getting a fully functional single GPU storyteller
before new models come out seems worthwhile to me.
I am very happy with the concepts in your paper and work the more I
consider the works potential. And open source!
…On Wed, Mar 15, 2023, 12:52 PM Kevin Yang ***@***.***> wrote:
Oh, yeah if you don't want to use the GPT3 API at all you'll have to
replace all of those. Sorry, thought you meant just the Alpa stuff.
As an additional note, using your local models on a 16GB GPU will also
pretty seriously compromise the quality of the resulting outputs,
especially the plan/outline generation-- I'm not convinced that that part
would work at all without using an instruction-tuned model (specifically
text-davinci-002, since that supports suffix context in addition to a
prompt). And in our preliminary experiments using "smaller" 13B models for
the main generation procedure, the story quality was quite a bit worse too.
—
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEMUTTWJCVKAEZGFXGGDEJLW4HXTBANCNFSM6AAAAAAV2NXL6I>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Yeah, if you're willing to do a bit of manual cherry-picking / interaction, then the requirements on model quality definitely go down significantly. I haven't tested with the new LLaMA models, but I agree it's likely they'd work better than the ones we tried previously (e.g., GPT-13B non-instruction-tuned). Would be curious to hear how it goes if you do end up trying that out. Glad you enjoyed the work! |
Hello,
What lines might one change to use model.generate of a local model on the same host?
I have a 16GB VRAM gaming GPU and have run local inference on bloomz-7B, RWKV 14B, Pythia 12B.
I want to be able to simply change a few lines to generate from a local model instead of hosting an alpa served version.
Thanks for your thoughts and consideration.
The text was updated successfully, but these errors were encountered: