Skip to content

Add support for llama.cpp completions #164

Closed
@ShelbyJenkins

Description

@ShelbyJenkins

Happy to have this crate!

I have it working with llama.cpp in server mode documented here: https://github.com/ggerganov/llama.cpp/tree/master/examples/server.
Just create the client like:

fn setup_client() -> Client<OpenAIConfig> {
        let backoff = backoff::ExponentialBackoffBuilder::new()
            .with_max_elapsed_time(Some(std::time::Duration::from_secs(60)))
            .build();
        let config = OpenAIConfig::new().with_api_key("").with_api_base(format!(
            "http://{}:{}/v1",
            server::HOST,
            server::PORT
        ));

        Client::with_config(config).with_backoff(backoff)
    }

However, it only works with the llama.cpp /v1/chat/completions end point, and that endpoint lacks some features (notably logit bias). The /completion endpoint, with all the extra features, does not work.

I don't know if this is a tenable long term solution, but as the rust llama.cpp crates haven't been updated for months, and the llama.cpp library seems to be moving very quickly, I was reticent to rely on crates that would require overhauls for every change. This method seemed like it would be stable long term since the local server probably won't change as often.

I think this crate has some potential to be a good base for building projects that rely on multiple APIs as the industry moves towards a standard. Interested to hear your thoughts, and if it's viable happy to contribute anything I create.

Metadata

Metadata

Assignees

No one assigned

    Labels

    out of scopeRequests which are not related to OpenAI APIwontfixThis will not be worked on

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions