Code in the Matrix - Neovim llama.cpp Plugin

Info
The cover image is AI generated.
imagen

Table of Contents

Setup

I’m broke and have a slow workstation laptop and don’t have claude code subscription - if you have those then you’re in the wrong place my friend.

There are other neovim plugins like ollama.nvim that can handle prompt and generation for you - just make sure your server/pc can handle it. In this setup we will use llama-cpp and llama.vim for code completions - applicable for low end hardware.

llama.cpp

I have and old Nvidia GPU but llama.cpp no longer supports it. So we need to use llama-cpp-vulkan.

1 environment.systemPackages = with pkgs; [
2    llama-cpp # pkgs
3
4   (llama-cpp.override {
5     vulkanSupport = true;  
6    };

Also vulkan support in nixos upstream is not yet supported. We need to overwrite the build process. Add this in the top of you configuration.nix file.

 1{ config, pkgs, ... }:
 2
 3let
 4  # Override llama-cpp to enable Vulkan GPU support
 5  llamaCppVulkan = pkgs.llama-cpp.override {
 6    vulkanSupport = true;
 7    cudaSupport = false;
 8    rocmSupport = false;
 9  };
10in

Model

Check this repo for model recommendation llama.vim.

For my broke setup I used ggml-org_Qwen2.5-Coder-1.5B-Q8_0-GGUF_qwen2.5-coder-1.5b-q8_0.gguf.

llama-server

Add llama-server service. Tune based on your workstation/server specification.

 1  # llama-ccp server service
 2  services.llama-cpp = {
 3    enable = true;
 4    # package = llama-cpp-vulkan;
 5    model = "/srv/nvme/llm/models/ggml-org_Qwen2.5-Coder-1.5B-Q8_0-GGUF_qwen2.5-coder-1.5b-q8_0.gguf";
 6    host = "0.0.0.0";
 7    port = 8080;
 8    package = llamaCppVulkan;
 9    extraFlags = [
10      "-c"
11      "2048"
12      "-ngl"
13      "24"
14      "--threads"
15      "8"
16      "--parallel"
17      "1"
18      "--batch-size"
19      "128"
20      "--ubatch-size"
21      "128"
22      "--no-mmap"
23    ];
24  };

Neovim

Add this to you configuratin. Change the endpoint if you are hosting llama-server in a different host.

1    vim.g.llama_config = {
2      endpoint = "http://127.0.0.1:8080/infill",
3    }

Add in pkgs.vimPlugins.

1llama-vim

I’m using the default key binding. Again check the repo for custom key binding.

Demo