Code in the Matrix - Neovim llama.cpp Plugin
Table of Contents
Setup
I’m broke and have a slow workstation laptop and don’t have claude code subscription - if you have those then you’re in the wrong place my friend.
There are other neovim plugins like ollama.nvim that can handle prompt and generation for you - just make sure your server/pc can handle it. In this setup we will use llama-cpp and llama.vim for code completions - applicable for low end hardware.
llama.cpp
I have and old Nvidia GPU but llama.cpp no longer supports it. So we need to use llama-cpp-vulkan.
1 environment.systemPackages = with pkgs; [
2 llama-cpp # pkgs
3
4 (llama-cpp.override {
5 vulkanSupport = true;
6 };
Also vulkan support in nixos upstream is not yet supported. We need to overwrite the build process. Add this in the top of you configuration.nix file.
1{ config, pkgs, ... }:
2
3let
4 # Override llama-cpp to enable Vulkan GPU support
5 llamaCppVulkan = pkgs.llama-cpp.override {
6 vulkanSupport = true;
7 cudaSupport = false;
8 rocmSupport = false;
9 };
10in
Model
Check this repo for model recommendation llama.vim.
For my broke setup I used ggml-org_Qwen2.5-Coder-1.5B-Q8_0-GGUF_qwen2.5-coder-1.5b-q8_0.gguf.
llama-server
Add llama-server service. Tune based on your workstation/server specification.
1 # llama-ccp server service
2 services.llama-cpp = {
3 enable = true;
4 # package = llama-cpp-vulkan;
5 model = "/srv/nvme/llm/models/ggml-org_Qwen2.5-Coder-1.5B-Q8_0-GGUF_qwen2.5-coder-1.5b-q8_0.gguf";
6 host = "0.0.0.0";
7 port = 8080;
8 package = llamaCppVulkan;
9 extraFlags = [
10 "-c"
11 "2048"
12 "-ngl"
13 "24"
14 "--threads"
15 "8"
16 "--parallel"
17 "1"
18 "--batch-size"
19 "128"
20 "--ubatch-size"
21 "128"
22 "--no-mmap"
23 ];
24 };
Neovim
Add this to you configuratin. Change the endpoint if you are hosting llama-server in a different host.
1 vim.g.llama_config = {
2 endpoint = "http://127.0.0.1:8080/infill",
3 }
Add in pkgs.vimPlugins.
1llama-vim
I’m using the default key binding. Again check the repo for custom key binding.