Architecture
Content flows through five components:Source
Connect to your content: GitHub, GitLab, BitBucket, or website
Indexer
Discover files, filter, chunk, and send to Context Engine for embedding
Store
Persist index state (local filesystem or S3) for incremental updates
Context Engine
Augment’s semantic search backend stores embeddings and handles queries
Client
Query via CLI, MCP server, or your own application
Data Flow
Indexing
When you runctxc index, here’s what happens:
- Discover - Source connector lists all files (respects
.gitignore) - Filter - Skip binary files, large files, excluded patterns
- Hash - Compute file hashes to detect changes
- Diff - Compare with stored state to find new/modified/deleted files
- Index - Send changed files to Context Engine for embedding
- Save - Store new state for next incremental run
Searching
When you runctxc search or query via MCP:
- Query - User submits natural language query
- Embed - Context Engine converts query to vector
- Match - Find semantically similar code chunks
- Return - Results with file paths, line numbers, and snippets
File Reading (MCP/Agent)
When an agent needs full file content (not just search snippets):- Request - Agent requests file by path
- Fetch - MCP server reads from original source (filesystem or Git API)
- Return - Full file content returned to agent
GITHUB_TOKEN) - they read files on demand from the original source, not from the index.
Incremental Updates
Context Connectors tracks file state to avoid re-indexing unchanged files:| Scenario | What Happens |
|---|---|
| File unchanged | Skipped (hash matches) |
| File modified | Re-indexed |
| File deleted | Removed from index |
| New file | Added to index |
- Local filesystem - Platform-specific directory (e.g.,
~/.local/share/context-connectorson Linux) - S3 -
s3://bucket/index-name/prefix