Collie - Index-Backed Code Search

collie

Incrementally indexed code search.
Specialized for your projects.
Faster than grep / ripgrep for interactive search.

cargo install collie-search

github.com/suleymanozkeskin/collie

Under optimised search

ripgrep on a 29K-file repo (Kubernetes). Each query re-walks and re-searches candidate files.

$ rg -l
                                    "kubelet"

744–878 ms

Full scan, varies with cache

$ rg -l
                                    "context"

403–610 ms

Same cost, different query

$ collie
                                    search "kubelet"

12 ms

Index lookup, every time

Fresh run on the current build: rg still lands in the 400–880 ms range. Collie stayed under 14 ms on the lexical sample.
grep/rg also have no understanding of code structure. Just raw text.

What is Collie?

A lexical and structural search CLI that indexes your codebase once, keeps it current
while the daemon runs, and auto-stops when you're done.

Not a replacement for traditional search, specialized for agentic development, indexed per project / worktree.

~10ms

Typical lexical latency on 29K files

Languages with symbol extraction

Search modes

How it works

# Start the daemon (indexes + watches for changes)
$ collie watch .

# Search instantly
$ collie search "handler"

# Stop when done (or auto-stops after 30min idle)
$ collie stop .

Index once → query faster → incremental updates via FSEvents

Four search modes

Token

Keyword search with % wildcards

$ collie search "kube%"           # prefix
$ collie search "%config"         # suffix
$ collie search "handle request"  # multi-term AND

Symbol

Structural queries via Tree-sitter. Filter by kind: lang: path: qname:

$ collie search "kind:fn handler"
$ collie search "kind:struct lang:go"
$ collie search "kind:method qname:Server::run"

Regex

Full regex, index-accelerated. Literal fragments narrow files first.

$ collie search -e "func\s+\w+Handler"
$ collie search -e "TODO|FIXME|HACK"  # multi-pattern
$ collie search -e "impl.*Error" -i  # case-insensitive

Symbol + Regex

Structural narrowing + regex refinement via --symbol-regex

$ collie search "kind:fn %Handler" --symbol-regex '\*.*Server'

Structural query vocabulary

Filters

kind:fn|method|class|struct|enum|interface|trait
     variable|field|property|constant|module|type|import
lang:go|rust|python|typescript|c|cpp|java|csharp|ruby|zig
path:src/api/              # scope to directory
qname:Server::run          # qualified name

Examples

$ collie search "kind:fn lang:go path:pkg/api/ init"
$ collie search "kind:struct Config"
$ collie search "kind:method qname:HollowKubelet::Run"
$ collie search "kind:trait path:src/ %Error%"

14 symbol kinds. 10 languages. Powered by Tree-sitter AST parsing.

Search beyond source code

Enable in .collie/config.toml

include_pdfs = true

Search code and docs in one query

$ collie search "authentication flow"
  src/auth/handler.go:14        func authFlow() {
  docs/design-spec.pdf:3        Section 4.2: Authentication Flow
  rfcs/auth-v2.pdf:1            RFC: Revised authentication flow

Design specs, RFCs, API docs, research papers — indexed alongside code.
grep and ripgrep can't do this.

Benchmark — Kubernetes repo, 28,903 files

Lexical search: Collie vs ripgrep

Query	Collie p50	ripgrep range	Speedup
`kubelet`	12 ms	744–878 ms	63–74x
`context`	7 ms	403–610 ms	58–88x
`controller`	7 ms	398–427 ms	56–60x
`kube%` (prefix)	8 ms	491–520 ms	58–61x
`%config` (suffix)	8 ms	401–422 ms	48–51x
`%request%` (substring)	104 ms	413–429 ms	4x

Fresh run on the current build, 5 measured runs per query. Collie stayed tight; rg still paid full-scan cost on every query.

Benchmark — Kubernetes repo

Visualized: `kubelet` search

Collie

12 ms

rg best

744 ms

rg worst

878 ms

63–74x faster

Benchmark — Kubernetes repo

Symbol search (no rg equivalent)

Query	Collie p50	Range
`kind:fn handler`	7 ms	6–8 ms
`kind:struct SharedInformerFactory`	6 ms	6–7 ms
`kind:fn path:pkg/ init`	12 ms	11–13 ms
`kind:method qname:HollowKubelet::Run`	6 ms	6–8 ms
`kind:fn validate webhook`	7 ms	6–7 ms

Find functions, structs, methods — not just text matches.
Powered by Tree-sitter across 11 languages.

Benchmark — Kubernetes repo

Regex search (bounded top-k, index-accelerated)

Pattern	Collie p50 (-n 50)	ripgrep p50	Speedup
`func\s+\w+Handler`	10 ms	392 ms	38x
`interface\s*\{`	9 ms	415 ms	46x
`context\.Context`	12 ms	422 ms	34x

Collie extracts literal fragments from your regex,
narrows candidate files via the index, then applies the full regex.

Honest framing: Collie regex is optimized for interactive bounded queries.
For exhaustive -n 0 scans, ripgrep still wins today.

Why this matters for agents

Agents know what they're looking for

ripgrep: guess the source syntax

$ rg "func \(.*Authorization.*\) Validate"          522 ms

collie: say what you mean

$ collie search "kind:method path:pkg/kubeapiserver/options/ %validate%"  72 ms

Agents already know the kind, language, and approximate location.
Structural queries turn that knowledge into precise results.

New capability

Symbol + Regex: --symbol-regex

Narrow with structure first, refine with regex. No rg equivalent.

Intent	Collie	rg regex	Speedup
Methods on `*Server` ending in `Handler`	70 ms	406 ms	5.8x
Methods on `Server` containing `Handler`	7 ms	404 ms	54x
Validate functions related to webhooks	281 ms	373 ms	1.3x

$ collie search "kind:fn %Handler" --symbol-regex '\*.*Server'
$ collie search "kind:method qname:Server::" --symbol-regex 'Handler'
$ collie search "kind:fn %validate%" --symbol-regex 'webhook|Webhook'

The more structural info in the symbol query, the faster the regex refinement.

Agentic benchmark — per-task

Task breakdown (time to first hit)

Task	Symbol	Lexical	ripgrep
Find HollowKubelet::Run	43 ms	23 ms	730 ms
Webhook token authenticator	95 ms	36 ms	548 ms
Authorization options validator	72 ms	19 ms	522 ms
SharedInformerFactory constructor	111 ms	27 ms	398 ms
PodDisruptionBudget validator	69 ms	176 ms	373 ms

Built for AI agents

JSON output

--format json — structured, parseable, no regex-on-grep hacks

Exit codes

0 results 1 nothing 2 error — shell-scriptable

Skill card

collie skill prints a reference card for agent context

Latency

Agents run hundreds of searches per session. ~7–12ms vs 400–880ms compounds.

$ collie search "kind:fn handler" \
    --format json -n 5
{
  "type": "symbol",
  "count": 5,
  "results": [{
    "path": "pkg/api/handler.go",
    "line": 42,
    "kind": "function"
  }, ...]
}
$ echo $?
0

Under the hood

source files

↓

Tree-sitter AST parsing → symbols, kinds, scopes

Tantivy Full-text index → tokens, postings, reverse index

↓

notify FSEvents watcher → incremental updates

rayon Parallel indexing → 29K files in 8.7s

↓

collie search

All Rust. Single binary. No runtime dependencies.

11 languages

Go • Rust • Python • TypeScript • JavaScript
C • C++ • Java • C# • Ruby • Zig

Symbol extraction: functions, methods, structs, classes, enums,
interfaces, traits, variables, fields, constants, modules, type aliases, imports

Indexing performance

	Files	Rebuild	Index size	Peak RAM
Collie repo	75	0.3s	675 KB	26 MB
Kubernetes	28,903	8.7s	391 MB	—

One-time cost. After that, notify keeps the index current incrementally.

Get started

Install

$ cargo install collie-search

Index

$ collie watch .              # start daemon, index repo
$ collie status .             # verify it's running

$ collie search "handleRequest"
$ collie search "kind:fn path:src/ init"
$ collie search -e "func\s+\w+Handler"
$ collie search "kind:fn %Handler" --symbol-regex '\*.*Server'
$ collie search "config" --format json  # for agents

Teach your agent

$ collie skill               # prints reference card for LLM context

collie

48–88x

faster on bounded lexical queries

languages

~10ms

bounded search latency

cargo install collie-search

github.com/suleymanozkeskin/collie

Under optimised search

What is Collie?

How it works

Four search modes

Structural query vocabulary

Search beyond source code

Lexical search: Collie vs ripgrep

Visualized: kubelet search

Symbol search (no rg equivalent)

Regex search (bounded top-k, index-accelerated)

Agents know what they're looking for

Symbol + Regex: --symbol-regex

Task breakdown (time to first hit)

Built for AI agents

Under the hood

11 languages

Indexing performance

Get started

Visualized: `kubelet` search