collie
Incrementally indexed code search.
Specialized for your projects.
Faster than grep / ripgrep for interactive search.
cargo install collie-search

github.com/suleymanozkeskin/collie

Under optimised search

ripgrep on a 29K-file repo (Kubernetes). Each query re-walks and re-searches candidate files.

$ rg -l "kubelet"
744–878 ms
Full scan, varies with cache
$ rg -l "context"
403–610 ms
Same cost, different query
$ collie search "kubelet"
12 ms
Index lookup, every time

Fresh run on the current build: rg still lands in the 400–880 ms range. Collie stayed under 14 ms on the lexical sample.
grep/rg also have no understanding of code structure. Just raw text.

What is Collie?

A lexical and structural search CLI that indexes your codebase once, keeps it current
while the daemon runs, and auto-stops when you're done.

Not a replacement for traditional search, specialized for agentic development, indexed per project / worktree.

~10ms
Typical lexical latency on 29K files
11
Languages with symbol extraction
4
Search modes

How it works

# Start the daemon (indexes + watches for changes)
$ collie watch .

# Search instantly
$ collie search "handler"

# Stop when done (or auto-stops after 30min idle)
$ collie stop .

Index once → query faster → incremental updates via FSEvents

Four search modes

Token
Keyword search with % wildcards
$ collie search "kube%"           # prefix
$ collie search "%config"         # suffix
$ collie search "handle request"  # multi-term AND
Symbol
Structural queries via Tree-sitter. Filter by kind: lang: path: qname:
$ collie search "kind:fn handler"
$ collie search "kind:struct lang:go"
$ collie search "kind:method qname:Server::run"
Regex
Full regex, index-accelerated. Literal fragments narrow files first.
$ collie search -e "func\s+\w+Handler"
$ collie search -e "TODO|FIXME|HACK"  # multi-pattern
$ collie search -e "impl.*Error" -i  # case-insensitive
Symbol + Regex
Structural narrowing + regex refinement via --symbol-regex
$ collie search "kind:fn %Handler" --symbol-regex '\*.*Server'

Structural query vocabulary

Filters
kind:fn|method|class|struct|enum|interface|trait
     variable|field|property|constant|module|type|import
lang:go|rust|python|typescript|c|cpp|java|csharp|ruby|zig
path:src/api/              # scope to directory
qname:Server::run          # qualified name
Examples
$ collie search "kind:fn lang:go path:pkg/api/ init"
$ collie search "kind:struct Config"
$ collie search "kind:method qname:HollowKubelet::Run"
$ collie search "kind:trait path:src/ %Error%"

14 symbol kinds. 10 languages. Powered by Tree-sitter AST parsing.

Search beyond source code

Enable in .collie/config.toml
include_pdfs = true
Search code and docs in one query
$ collie search "authentication flow"
  src/auth/handler.go:14        func authFlow() {
  docs/design-spec.pdf:3        Section 4.2: Authentication Flow
  rfcs/auth-v2.pdf:1            RFC: Revised authentication flow

Design specs, RFCs, API docs, research papers — indexed alongside code.
grep and ripgrep can't do this.

Lexical search: Collie vs ripgrep

Query Collie p50 ripgrep range Speedup
kubelet 12 ms 744–878 ms 63–74x
context 7 ms 403–610 ms 58–88x
controller 7 ms 398–427 ms 56–60x
kube% (prefix) 8 ms 491–520 ms 58–61x
%config (suffix) 8 ms 401–422 ms 48–51x
%request% (substring) 104 ms 413–429 ms 4x

Fresh run on the current build, 5 measured runs per query. Collie stayed tight; rg still paid full-scan cost on every query.

Visualized: kubelet search

Collie
12 ms
rg best
744 ms
rg worst
878 ms
63–74x faster

Symbol search (no rg equivalent)

Query Collie p50 Range
kind:fn handler 7 ms 6–8 ms
kind:struct SharedInformerFactory 6 ms 6–7 ms
kind:fn path:pkg/ init 12 ms 11–13 ms
kind:method qname:HollowKubelet::Run 6 ms 6–8 ms
kind:fn validate webhook 7 ms 6–7 ms

Find functions, structs, methods — not just text matches.
Powered by Tree-sitter across 11 languages.

Regex search (bounded top-k, index-accelerated)

Pattern Collie p50 (-n 50) ripgrep p50 Speedup
func\s+\w+Handler 10 ms 392 ms 38x
interface\s*\{ 9 ms 415 ms 46x
context\.Context 12 ms 422 ms 34x

Collie extracts literal fragments from your regex,
narrows candidate files via the index, then applies the full regex.

Honest framing: Collie regex is optimized for interactive bounded queries.
For exhaustive -n 0 scans, ripgrep still wins today.

Agents know what they're looking for

ripgrep: guess the source syntax
$ rg "func \(.*Authorization.*\) Validate"          522 ms
collie: say what you mean
$ collie search "kind:method path:pkg/kubeapiserver/options/ %validate%"  72 ms

Agents already know the kind, language, and approximate location.
Structural queries turn that knowledge into precise results.

Symbol + Regex: --symbol-regex

Narrow with structure first, refine with regex. No rg equivalent.

Intent Collie rg regex Speedup
Methods on *Server ending in Handler 70 ms 406 ms 5.8x
Methods on Server containing Handler 7 ms 404 ms 54x
Validate functions related to webhooks 281 ms 373 ms 1.3x
$ collie search "kind:fn %Handler" --symbol-regex '\*.*Server'
$ collie search "kind:method qname:Server::" --symbol-regex 'Handler'
$ collie search "kind:fn %validate%" --symbol-regex 'webhook|Webhook'

The more structural info in the symbol query, the faster the regex refinement.

Task breakdown (time to first hit)

Task Symbol Lexical ripgrep
Find HollowKubelet::Run 43 ms 23 ms 730 ms
Webhook token authenticator 95 ms 36 ms 548 ms
Authorization options validator 72 ms 19 ms 522 ms
SharedInformerFactory constructor 111 ms 27 ms 398 ms
PodDisruptionBudget validator 69 ms 176 ms 373 ms

Built for AI agents

JSON output
--format json — structured, parseable, no regex-on-grep hacks
Exit codes
0 results   1 nothing   2 error — shell-scriptable
Skill card
collie skill prints a reference card for agent context
Latency
Agents run hundreds of searches per session. ~7–12ms vs 400–880ms compounds.
$ collie search "kind:fn handler" \
    --format json -n 5
{
  "type": "symbol",
  "count": 5,
  "results": [{
    "path": "pkg/api/handler.go",
    "line": 42,
    "kind": "function"
  }, ...]
}
$ echo $?
0

Under the hood

source files
Tree-sitter AST parsing → symbols, kinds, scopes
Tantivy Full-text index → tokens, postings, reverse index
notify FSEvents watcher → incremental updates
rayon Parallel indexing → 29K files in 8.7s
collie search

All Rust. Single binary. No runtime dependencies.

11 languages

GoRustPythonTypeScript • JavaScript
C • C++ • Java • C# • Ruby • Zig


Symbol extraction: functions, methods, structs, classes, enums,
interfaces, traits, variables, fields, constants, modules, type aliases, imports

Indexing performance


Files Rebuild Index size Peak RAM
Collie repo 75 0.3s 675 KB 26 MB
Kubernetes 28,903 8.7s 391 MB

One-time cost. After that, notify keeps the index current incrementally.

Get started

Install
$ cargo install collie-search
Index
$ collie watch .              # start daemon, index repo
$ collie status .             # verify it's running
Search
$ collie search "handleRequest"
$ collie search "kind:fn path:src/ init"
$ collie search -e "func\s+\w+Handler"
$ collie search "kind:fn %Handler" --symbol-regex '\*.*Server'
$ collie search "config" --format json  # for agents
Teach your agent
$ collie skill               # prints reference card for LLM context
collie
48–88x
faster on bounded lexical queries
11
languages
~10ms
bounded search latency
cargo install collie-search

github.com/suleymanozkeskin/collie