doc/vimqq.txt

*vimqq.txt*  For Vim 8.0+, Neovim ?? Last change: 2024-09-16

VIMQQ ~

AI plugin for Vim/NeoVim with focus on local evaluation, flexible context
and aggressive cache warmup to hide latency.
Version: 0.0.7
Author: Oleksandr Kuvshynov

1. Introduction ........................................ |vimqq-intro|
2. Installation ........................................ |vimqq-install|
  2.1. Groq ............................................ |vimqq-groq|
  2.2. Local ........................................... |vimqq-local|
  2.3. Claude .......................................... |vimqq-claude|
  2.4. Mistral ......................................... |vimqq-mistral|
3. Usage ............................................... |vimqq-usage|
4. Commands ............................................ |vimqq-commands|
  4.1. Message sending ................................. |vimqq-commands-msg|
  4.2. Forking ......................................... |vimqq-commands-fork|
  4.3. UI Controls ..................................... |vimqq-commands-ui|
5. Mappings ............................................ |vimqq-mappings|
6. Configuration ....................................... |vimqq-config|
7. Changelog ........................................... |vimqq-changelog|

==============================================================================
1. Introduction                                                    *vimqq-intro*


Features:
 - Support for both remote models through paid APIs (Groq, Claude) 
   and local models via llama.cpp server;
 - automated KV cache warmup for local model evaluation;
 - dynamic warmup on typing - in case of long questions, it is a good idea
   to prefill cache for the question itself.
 - mixing different models in the same chat sessions. It is possible to send 
   original message to one model and use a different model for the follow-up
   questions.
 - ability to fork the existing chat, start a new branch of the chat and 
   reuse server-side cache for the shared context.
 - Support both Vim and NeoVim;

What vimqq is not doing:
 - providing any form of autocomplete.  
 - generating code in place, typing it in editor directly, all communication
   is done in the chat buffer.


==============================================================================
2. Installation                                                  *vimqq-install*


Using |packages|:

Copy over the plugin itself:
>
    git clone https://github.com/okuvshynov/vimqq.git ~/.vim/pack/plugins/start/vimqq

The command above makes vimqq automatically loaded at vim start

Update helptags in vim:
>
    :helptags ~/.vim/pack/plugins/start/vimqq/doc

Using vim-plug:
>
    Plug 'okuvshynov/vimqq'

vimqq will not work in 'compatible' mode. 'nocompatible' needs to be set.


==============================================================================
2.1 Installation: Groq                                              *vimqq-groq*


This might be the easiest option to try it out at the moment for free. 

 - Register and get API key at https://console.groq.com/keys
 - Save that key to `$GROQ_API_KEY` environment variable. 
 - Add the configuration to your vimrc:

>
    let g:vqq_groq_models = [
	  \  {'bot_name': 'groq', 'model': 'llama-3.1-70b-versatile'}
    \]


==============================================================================
2.2 Installation: Local                                            *vimqq-local*


To use local models, get and build llama.cpp server
>
    git clone https://github.com/ggerganov/llama.cpp
    cd llama.cpp
    make -j 16

Download/prepare the models and start llama server:
>
    ./llama.cpp/llama-server
      --model path/to/model.gguf
      --chat-template llama3
      --host 0.0.0.0
      --port 8080

Add a bot endpoint configuration to vimrc file, for example
>
    let g:vqq_llama_servers = [
          \  {'bot_name': 'llama', 'addr': 'http://localhost:8080'},
    \]

It is possible to have multiple bots with different names.


==============================================================================
2.3 Installation: Claude                                          *vimqq-claude*


To use claude models, register and get API key. By default vimqq will look for 
API key in environment variable `$ANTHROPIC_API_KEY`. It is possible to 
override the default with 
>
    let g:vqq_claude_api_key = ...

Bot definition looks similar:
>
    let g:vqq_claude_models = [
          \  {'bot_name': 'sonnet', 'model': 'claude-3-5-sonnet-20240620'}
    \]


==============================================================================
2.4 Installation: Mistral                                        *vimqq-mistral*


To use mistral models using the API, register and get API key. By default
vimqq will look for API key in environment variable `$MISTRAL_API_KEY`.
It is possible to override the default with 
>
    let g:vqq_mistral_api_key = ...

Bot definition looks similar:
>
    let g:vqq_mistral_models = [
          \  {'bot_name': 'mistral', 'model': 'mistral-large-latest'}
    \]

==============================================================================
3. Usage                                                           *vimqq-usage*  


Assuming we have a single bot named "groq", we can start by asking a question:
>
    :Q @groq What are basics of unix philosophy?

We can omit the tag as well, and the first configured bot will be used:
>
    :Q What are basics of unix philosophy?

After running this command new window should open in vertical split showing
the message sent and response. If it is local llama bot, we should see
the reply being appended token by token.

Pressing "q" in the window would change the window view to the list of 
past messages. Up/Down or j/k can be used to navigate the list and <cr>
can be used to select individual chat.

Chat title is generated by the same model after the first response.


==============================================================================
4. Commands                                                     *vimqq-commands*  

------------------------------------------------------------------------------
4.1. Message sending                                        *vimqq-commands-msg*

First group of commands are used to send messages.
    - `:QQ` sends/warms up a message with range provided
    - `:Q`  sends/warms up a message with no range
    - `:QQN` sends/warms up a message with range provided in new chat
    - `:QN`  sends/warms up a message with no range in new chat

The format for calling both commands is similar, except for the range part
>
    :Q [@bot_name] message

------------------------------------------------------------------------------
4.2. Forking                                               *vimqq-commands-fork*

`QF` command forks the current chat reusing the context from the first message.
It is useful in cases of long context, but when you want to start a new
discussion thread. For example,
>
    :QF Suggest a simple task for junior engineer working on the project

It will:
  - take current chat's first message, keep the context and bot 
  - modify the question with 'Suggest a ...'
  - create new chat
  - append amended message to new chat
  - send new chat to the original bot
This way we can reuse the context, which might be long (entire file/project)

Note: this command might get merged to the Q/QQ commands. Similar to `n` 
meaning 'ask in new chat' we might add an option to 'ask in fork'.

------------------------------------------------------------------------------
4.3. UI Control                                              *vimqq-commands-ui*

    - `:QQList` will show the list of past chat threads. User can navigate the
      list and select chat session by pressing <CR>. This action will make chat
      session current.

    - `:QQOpenChat chat_id` opens chat with id=`chat_id`. This action will make
      chat session current.

    - `:QQToggle` shows/hides vimqq window.


==============================================================================
5. Mappings                                                     *vimqq-mappings*  

In chat list view:
 - 'd' will show a dialog confirmation and delete chat if user answers Yes.
 - <cr> will select chat session under cursor, open it and make current.


==============================================================================
6. Configuration                                                  *vimqq-config*  

Configuration is done using global variables with prefix `g:vqq`.

    - `g:vqq_llama_servers` - list of llama.cpp bots. Default is empty.
      Each bot configuration is a dictionary with the following attributes:
      - `bot_name`. string identification for this bot.
        Must be unique and consist of [a-zA-Z0-9_] symbols. Required
      - `addr`. Path to endpoint, in the http://host:port format. Required.
      - `healthcheck_ms`. How often to do healthcheck query.
        Optional, default is 10000ms (=10s)
      - `title_tokens`. When generating title for chat, up to this many
        tokens can be produced. Optional, default is 16.
      - `max_tokens`. When producing response, up to this many tokens can
        be produced. Optional, default is 1024.
      - `system_prompt`. System prompt to include in the query and steer 
        bot behavior. Optional, default is "You are a helpful assistant".

    - `g:vqq_claude_models` - list of claude bots. Default is empty.
      Each bot configuration is a dictionary with the following attributes:
      - `model`. Model to use, must be one the models supported by Claude.
        Check https://docs.anthropic.com/en/docs/about-claude/models for 
        the model list. Required.
      - `bot_name`. string identification for this bot.
        Must be unique and consist of [a-zA-Z0-9_] symbols. Required
      - `title_tokens`. When generating title for chat, up to this many
        tokens can be produced. Optional, default is 16.
      - `max_tokens`. When producing response, up to this many tokens can
        be produced. Optional, default is 1024.

    - `g:vqq_default_bot`. bot_name of the bot used by default, if user
      omits the tag. Optional, default is first bot.

    - `g:vqq_warmup_on_chat_open`. List of bot names for which to issue 
      a warmup query automatically, every time we opened chat history
      for some chat session. Useful for resuming old sessions. Default
      is empty list.

    - `g:vqq_context_template`. Template to use to construct the message
      if some code needs to be included as a context. Replaces placeholders
      {vvq_ctx} and {vvq_msg} with context (selected code) and message.
      Optional, default is "Here's a code snippet: \n\n{vqq_ctx}\n\n{vqq_msg}"

    - `g:vqq_width`. Default chat window width in characters.
      Optional, defutlt is 80.

    - `g:vqq_time_format`. Time format to use in chat sessions list.
      Optional, default is "%b %d %H:%M ", for example "July 23, 18:05"

    - `g:vqq_chats_file`. Path to json file to store message history.
      Optional, default is expand('~/.vim/vqq_chats.json')

    - `g:vqq_context_filetypes`. When extracring project-level context,
      this list of patterns will be used to include files. 
      Default: "*.vim,*.txt,*.md,*.cpp,*.c,*.h,*.hpp,*.py,*.rs"

    - `g:vqq_autowarm_cmd_ms`. When doing warmup queries vimqq can keep
      sending warmup queries as user types the message. It is useful in
      the cases of longer messages with detailed instructions. This logic
      is based on two events: previous warmup finished and message in the
      command-line has changed. If both of them are true, new warmup will
      be sent. To the best of my knowlegde, vim doesn't provide a way to
      listen to the cmdline update events, but we can check its value
      periodically. This parameter defines this period. Default - 1000ms.

    - `g:vqq_log_file`. File location where to append logs.
      Default is `~/.vim/vimqq.log` 

    - `g:vqq_log_level`. Controls how much to log. Can be one of
      'DEBUG', 'INFO', 'ERROR', 'NONE'. All messages logged with 
      configured level or higher will be appended to the file.


==============================================================================
6. Example basic config                                   *vimqq-config-example*

>
    let g:vqq_llama_servers = [
    \{
    \  	'bot_name': 'local',
    \	'addr': 'http://localhost:8080',
    \ 	'do_autowarm': v:true
    \}
    \]

    " chat [l]ist
    nnoremap <leader>ll :<C-u>QQList<cr>

Autowarm will allow plugin to start sending warmup queries to local server
as user types, hiding the latency.

==============================================================================
7. Changelog                                                   *vimqq-changelog*  

* version 0.0.6 (2024-09-16)
  - Support for version control context based on git blame
  - Bug fixes
* version 0.0.7 (2024-09-24)
  - Support for Groq API
  - Automated tests with mock llama server
  - Bug fixes
* version 0.0.8 (2024-12-19)
  - Autowarmup improvements
  - Bug fixes

 vim:tw=78:ts=8:ft=help:noet:nospell