Architecture
Content flows through five components:Data Flow
Indexing
When you runctxc index, here’s what happens:
- Discover - Source connector lists all files (respects
.gitignore) - Filter - Skip binary files, large files, excluded patterns
- Hash - Compute file hashes to detect changes
- Diff - Compare with stored state to find new/modified/deleted files
- Index - Send changed files to Context Engine for embedding
- Save - Store new state for next incremental run
Searching
When you runctxc search or query via MCP:
- Query - User submits natural language query
- Embed - Context Engine converts query to vector
- Match - Find semantically similar code chunks
- Return - Results with file paths, line numbers, and snippets
File Reading (MCP/Agent)
When an agent needs full file content (not just search snippets):- Request - Agent requests file by path
- Fetch - MCP server reads from original source (filesystem or Git API)
- Return - Full file content returned to agent
GITHUB_TOKEN) - they read files on demand from the original source, not from the index.
Incremental Updates
Context Connectors tracks file state to avoid re-indexing unchanged files:| Scenario | What Happens |
|---|---|
| File unchanged | Skipped (hash matches) |
| File modified | Re-indexed |
| File deleted | Removed from index |
| New file | Added to index |
- Local filesystem - Platform-specific directory (e.g.,
~/.local/share/context-connectorson Linux) - S3 -
s3://bucket/index-name/prefix