Sourcegraph architecture overview

This document provides a high level overview of Sourcegraph's architecture, detailing the purpose and interactions of each service in the system.

Diagram

You can click on each component to jump to its respective code repository or subtree. Open in new tab

Note several omittions have been made for clarity:

Almost every service has a link back to the frontend, from which it gathers configuration updates
Telemetry to Sourcegraph.com
Sourcegraph Observability, including Prometheus, Grafana, and cAdvisor These edges are omitted for clarity.

Service Quick Links

Core Services

Frontend - Central service that serves the web UI and GraphQL API
Gitserver - Stores and provides access to Git repositories
Repo-updater - Tracks repository states and synchronizes with code hosts

Search Infrastructure

Zoekt-indexserver - Creates search indices for repositories
Zoekt-webserver - Serves search queries against the indexed repositories
Searcher - Handles non-indexed searches for repositories
Syntect Server - Provides syntax highlighting for code

Code Intelligence

Symbols - Extracts and indexes symbol information
Precise-code-intel-worker - Processes code intelligence data
Worker - Runs background tasks across the system

Data Persistence

Frontend DB - Primary PostgreSQL database for core data
Codeintel DB - Database for code intelligence data
Codeinsights DB - Database for code insights data
Blob Store - Object storage for large files
Redis - In-memory data store for caching and sessions

External Components

Executors - Isolated environments for compute-intensive operations
Code Hosts - External systems hosting repositories

Infrastructure

Observability Infrastructure - Prometheus, Grafana, and cAdvisor
Telemetry - Usage data collection
External Services and Dependencies - External services Sourcegraph can use

Cody Architecture

Cody Gateway - Routes requests to AI providers
Cody Context Fetcher - Provides relevant code context
Cody Agent - Client-side component in IDE
Completions API - Handles code completions
Policy Service - Enforces usage policies
Cody Proxy - Load balancing between AI providers
Attribution Tracking - Tracks code origin
Cody Assistant - Interactive chat interface

Core Services

Frontend

Purpose: The frontend service is the central service in Sourcegraph's architecture. It serves the web application, hosts the GraphQL API, and coordinates most user interactions with Sourcegraph.

Importance: This is the primary entrypoint for most Sourcegraph functionality. Without it, users cannot interact with Sourcegraph through the web UI or API.

Additional Details:

Handles user authentication and session management
Enforces repository permissions
Coordinates interactions between most services
Manages the settings cascade (user, organization, and global settings)
Implements the GraphQL API layer that powers both the web UI and external API clients
Stateless service that can be horizontally scaled
Organized into multiple internal packages with clear separation of concerns

Internal Architecture:

HTTP Server: Handles incoming HTTP requests using Go's standard library
GraphQL Engine: Processes GraphQL queries with custom resolvers for various data types
Authorization Layer: Enforces permissions across all API and UI operations
Request Router: Routes user requests to appropriate internal handlers
Service Clients: Contains client code for communicating with other Sourcegraph services
Database Layer: Manages connections and transactions with the PostgreSQL database

Request Flow:

User request arrives at the frontend service
Authentication and session validation occur
Permission checks are performed for the requested resource
The request is routed to the appropriate handler (e.g., search, repository view)
The handler coordinates with other services to fulfill the request
Results are transformed into the appropriate response format
Response is returned to the user

Interactions:

Serves as the central coordination point for all other services
Stores user, repository metadata, and other core data in the frontend database
Acts as a reverse proxy for client requests to other services
Forwards search requests to zoekt-webserver (indexed search) and searcher (unindexed search)
Makes API calls to gitserver for repository operations (e.g., file content, commit information)
Requests repository metadata from repo-updater
Retrieves code intelligence data from the codeintel database
Enforces permissions across all accessed resources
Provides the GraphQL API that all clients use to interact with Sourcegraph

Gitserver

Purpose: Gitserver is a shardedable service that clones and maintains local Git repositories from code hosts, making them available to other Sourcegraph services.

Importance: Without gitserver, Sourcegraph cannot access repository content, making search, code navigation, and most other features non-functional.

Additional Details:

Maintains a persistent cache of repositories, but code hosts remain the source of truth
Performs Git operations like clone, fetch, archive, and rev-parse
Implements custom Git operations optimized for Sourcegraph's use cases
Uses disk-based caching strategies to optimize performance
Handles repository cleanup and garbage collection
Repositories can sharded across multiple gitserver instances for horizontal scaling if necessary

Internal Architecture:

Repository Manager: Manages the lifecycle of repositories (cloning, updating, cleaning)
Git Command Executor: Executes Git commands with appropriate timeouts and resource limits
Request Handler: Processes API requests for repository operations
Sharding Logic: Determines which gitserver instance should host a particular repository
Cleanup Worker: Periodically removes unused repositories to free up disk space

Repository Flow:

Repository is first requested by a client (through frontend or repo-updater)
Gitserver checks if the repository exists locally
If not present, gitserver clones the repository from the code host
For subsequent operations, gitserver operates on the local copy
Periodic fetches update the repository with new commits
Git operations (archive, show, etc.) are performed directly on the local repository

Scaling Characteristics:

Each gitserver instance has an independent set of repositories
New gitserver instances can be added to handle more repositories
Repository distribution uses consistent hashing to minimize redistribution when scaling
Performance is largely determined by disk I/O speed and available memory
For detailed scaling information, see the Gitserver Scaling Guide

Interactions:

Receives repository update requests from repo-updater to clone or update repositories
Provides repository data to almost all other services through HTTP APIs
Serves git data to frontend for repository browsing and file viewing
Supplies repository content to searcher for unindexed searches
Provides repository archives to zoekt-indexserver for index creation
Communicates directly with code hosts for clone and fetch operations
Executes git commands on behalf of other services
Implements efficient caching to reduce load on code hosts

Repo-updater

Purpose: The repo-updater service is responsible for keeping repositories in gitserver up-to-date and syncing repository metadata from code hosts.

Importance: Critical for ensuring Sourcegraph has current information about repositories and respects code host rate limits.

Additional Details:

Singleton service that orchestrates repository updates
Handles code host API rate limiting and scheduling
Also responsible for permission syncing from code hosts
Manages external service connections (GitHub, GitLab, etc.)
Implements intelligent scheduling algorithms to prioritize updates
Handles authentication and authorization with various code host APIs
Maintains an in-memory queue of pending updates

Internal Architecture:

External Service Manager: Manages connections to code hosts and other external services
Repository Syncer: Synchronizes repository metadata with code hosts
Permissions Syncer: Synchronizes repository permissions from code hosts
Update Scheduler: Schedules repository updates based on priority and last update time
Rate Limiter: Enforces API rate limits for each code host
Metrics Collector: Tracks sync status, errors, and performance metrics

Operational Flow:

External services (code hosts) are configured in Sourcegraph
Repo-updater periodically polls each external service for repository information
New repositories are added to the database and existing ones are updated
Repository update operations are scheduled based on priority and last update time
Update requests are sent to gitserver instances based on the schedule
Repository permissions are synced from the code host to Sourcegraph's database
Metadata about repositories (e.g., fork status, visibility) is kept up to date

Failure Handling:

Implements exponential backoff for failed API requests
Continues functioning even if some code hosts are temporarily unavailable
Retries failed operations with appropriate delays
Can recover state after service restarts

Interactions:

Makes API calls to code hosts to fetch repository metadata and permissions
Instructs gitserver to clone, update, or remove repositories as needed
Stores repository metadata in the frontend database
Provides repository listings and metadata to frontend
Implements rate limiting for code host API requests
Synchronizes repository permissions from code hosts
Maintains repository sync schedules based on activity patterns
Validates external service configurations (GitHub, GitLab, etc.)
Handles webhooks from code hosts for immediate updates when available

Search Infrastructure

Zoekt-indexserver

Purpose: Creates and maintains the search index for repositories' default branches.

Importance: Enables fast, indexed code search across repositories, which is a core functionality of Sourcegraph.

Additional Details:

Uses a trigram index for efficient substring matching
Indexes default branches by default, but capable of indexing additional branches
Horizontally scalable for large codebases
Optimized for handling large repositories and codebases
Builds specialized indices for different types of searches (content, symbols, etc.)
Performs incremental updates when repositories change

Technical Implementation:

Trigram Indexing: Breaks down text into 3-character sequences for efficient substring searching
Sharded Index Design: Splits large indices into manageable shards
Content Extraction: Extracts content from various file formats before indexing
Symbol Extraction: Uses language-specific parsers to extract and index symbols
Custom Compression: Employs specialized compression techniques for code content

Indexing Process:

Receives a request to index a repository
Retrieves the latest content from gitserver
Analyzes repository content and extracts text and metadata
Breaks content into trigrams and other searchable units
Builds an optimized index structure with various lookup tables
Compresses the index and writes it to disk
Signals zoekt-webserver that a new index is available

Performance Characteristics:

CPU-intensive during index creation
Memory usage scales with repository size and complexity
Disk I/O intensive when writing indices
Can be scaled horizontally by adding more instances and sharding repositories

Interactions:

Gets repository content from gitserver
Creates indexes consumed by zoekt-webserver
Coordinates with frontend to determine which repositories to index
Emits metrics about indexing performance and coverage

Zoekt-webserver

Purpose: Serves search requests against the trigram search index created by zoekt-indexserver.

Importance: Provides the fast, indexed search capability that makes Sourcegraph search powerful.

Additional Details:

Highly optimized for low-latency searches
Includes ranking algorithms for result relevance
Implements sophisticated query parsing and execution
Supports various search modifiers and operators
Memory-maps index files for fast access
Horizontally scalable to handle large search loads

Technical Implementation:

In-Memory Index: Keeps critical parts of the index in memory for fast access
Query Parser: Parses complex search queries into executable search plans
Search Executor: Executes search plans against the index with parallelism
Result Ranker: Ranks search results by relevance using several signals
Result Limiter: Enforces result limits and timeouts to ensure responsiveness

Search Execution Flow:

Receives a query from frontend via the API
Parses the query into a structured search plan
Identifies which index shards need to be searched
Executes the search in parallel across relevant shards
Collects and ranks the results by relevance
Applies post-processing filters (e.g., case sensitivity, regexp matching)
Returns the formatted results to the caller

Performance Optimizations:

Uses memory mapping for fast index access
Implements concurrent search execution
Employs early termination strategies for large result sets
Caches frequent queries and partial results
Prioritizes interactive search performance

Interactions:

Receives search queries from frontend through HTTP API calls
Utilizes index files created by zoekt-indexserver stored on disk
Performs parallel searches across multiple index shards
Returns ranked and formatted search results to frontend
Communicates index status to frontend for search scoping decisions
Provides detailed metrics about search performance and throughput
Coordinates with other zoekt-webserver instances for multi-shard searches

Searcher

Purpose: Performs non-indexed, on-demand searches for content not covered by zoekt.

Importance: Provides search capability for non-default branches and unindexed repositories, ensuring comprehensive search coverage.

Additional Details:

Used for searching branches other than the default branch
Performs structural search (non-regex pattern matching)
Slower than zoekt but more flexible
Processes repositories on demand rather than pre-indexing
Supports advanced search patterns including regular expressions
Implements a local file cache to improve performance for repeated searches

Technical Implementation:

Archive Fetcher: Retrieves repository archives from gitserver
Archive Extractor: Extracts repository contents to temporary storage
Search Executor: Runs search patterns against repository contents
Pattern Matcher: Implements various pattern matching algorithms (regex, exact, structural)
Cache Manager: Manages a local cache of recently searched repositories

Search Process:

Receives a search request for a specific repository and revision
Checks if the repository is already in the local cache
If not cached, requests an archive from gitserver
Extracts the archive to a temporary location
Executes the search pattern against the extracted files
Applies filters (file path, language, etc.)
Formats and returns the matching results
Optionally caches the repository for future searches

Performance Considerations:

Uses streaming to return results as they're found
Implements timeouts to prevent long-running searches
Caches recently searched repositories to avoid repeated downloads
Applies heuristics to optimize search patterns before execution
Can be scaled horizontally to handle more concurrent searches

Interactions:

Receives search requests from frontend through HTTP API calls
Requests repository archives from gitserver for each search query
Maintains a local cache of recently searched repositories
Returns search results to frontend as they are found (streaming)
Handles multiple concurrent search requests with appropriate limits
Coordinates timeout handling with frontend for long-running searches
Reports detailed metrics about search performance and cache efficiency
Implements fallback search when zoekt indexing is incomplete or unavailable

Syntect Server

Purpose: Provides syntax highlighting for code in any language displayed in Sourcegraph.

Importance: Enhances readability of code in search results, repository browsing, and other code views.

Additional Details:

Based on the Rust Syntect library
Supports hundreds of programming languages and file formats
Optimized for high throughput and low latency

Interactions:

Receives highlighting requests from frontend
Used by search UI and repository browsing

Code Intelligence

Symbols

Purpose: Extracts and indexes symbol information (functions, classes, etc.) from code for fast symbol search.

Importance: Enables symbol search and contributes to basic code navigation features.

Additional Details:

Language-agnostic symbol extraction using regular expressions
Complements precise code intelligence for languages without dedicated indexers

Interactions:

Gets repository content from gitserver
Serves symbol search requests from frontend

Precise-code-intel-worker

Purpose: Processes and converts uploaded LSIF/SCIP code intelligence data into queryable indexes.

Importance: Enables precise code navigation (go-to-definition, find references) across repositories.

Additional Details:

Handles processing of upload records in a queue
Converts LSIF/SCIP data into an optimized index format

Interactions:

Stores processed data in the codeintel database
Accesses uploads from blob storage

Worker

Purpose: A service for executing background jobs including batch changes processing, code insights computations, and other asynchronous tasks.

Importance: Handles long-running operations that would otherwise block user interactions.

Additional Details:

Implements a work queue for distributed processing
Handles retries and error recovery
Used for executing various background jobs based on configuration

Interactions:

Communicates with frontend for job coordination
Accesses various databases depending on the job type
Interacts with gitserver for repository operations

Data Persistence

Frontend DB

Purpose: Primary PostgreSQL database that stores user data, repository metadata, configuration, and other core application data.

Importance: Stores critical data needed for almost all Sourcegraph operations.

Additional Details:

Contains user accounts, repository metadata, and configuration
Used for transactional operations across the application
Stores settings, user accounts, repository metadata, and more
Employs database migrations for schema evolution
Configured with specific optimizations for Sourcegraph's workload

Schema Structure:

Users and Authentication: Tables for users, organizations, credentials
Repository Metadata: Tables for repositories, external services, permissions
Configuration: Settings cascade for different scopes (global, org, user)
API Metadata: API tokens, client information, usage tracking
Search Metadata: Saved searches, search statistics, search contexts
Various Feature Data: Batch changes, code monitoring, notebooks, etc.

Data Access Patterns:

High read-to-write ratio for most tables
Transactional integrity for critical operations
Heavy use of indexes for performance optimization
PostgreSQL-specific features (e.g., jsonb for settings, array types, etc.)
Connection pooling to handle concurrent requests efficiently

Scaling Characteristics:

Vertical scaling for most deployments (larger DB instance)
Performance typically determined by index efficiency and query patterns
Read replicas can be configured for large-scale deployments
Designed to support thousands of repositories and users

Interactions:

Primary database for the frontend service
Used by repo-updater for external service and repository metadata
Stores permissions data for authorization checks
Referenced by nearly all services for configuration and settings

Codeintel DB

Purpose: PostgreSQL database dedicated to storing code intelligence data.

Importance: Enables precise code navigation features by storing symbol relationships.

Additional Details:

Stores processed LSIF/SCIP data in an optimized format
Separated from frontend DB for performance and scaling reasons

Interactions:

Used by precise-code-intel-worker for writing processed data
Queried by frontend for code navigation requests

Codeinsights DB

Purpose: PostgreSQL database that stores code insights data and time series information.

Importance: Persists data for code insights dashboards and historical trend analysis.

Additional Details:

Stores time series data for tracking code metrics over time
Separated from other databases for performance and scaling reasons

Interactions:

Written to by worker service when computing insights
Queried by frontend when rendering code insights dashboards

Blob Store

Purpose: Object storage service for large binary data like LSIF/SCIP uploads and other artifacts.

Importance: Provides scalable storage for large data files that would be inefficient to store in PostgreSQL.

Additional Details:

Can be configured to use cloud storage (S3, GCS) or local disk
Used primarily for code intelligence uploads and other large artifacts

Interactions:

Stores raw LSIF/SCIP uploads before processing
Accessed by precise-code-intel-worker during processing

Redis

Purpose: In-memory data store used for caching, rate limiting, and other ephemeral data.

Importance: Improves performance by caching frequently accessed data and supporting distributed locking.

Additional Details:

Used for session data, caching, and rate limiting
Supports pub/sub mechanisms used by some services

Interactions:

Used by frontend for caching and session management
Used by repo-updater for coordination and caching

External Components

Executors

Purpose: Isolated environments for running compute-intensive operations like Batch Changes and Code Insights computations.

Importance: Enables secure, scalable execution of user-provided code and resource-intensive operations.

Additional Details:

Runs as separate infrastructure from the main Sourcegraph instance
Provides isolated sandboxed environments
Horizontally scalable based on compute needs

Interactions:

Receives jobs from the main Sourcegraph instance
Returns results to the worker service

Code Hosts

Purpose: External systems (GitHub, GitLab, Bitbucket, etc.) that host the repositories Sourcegraph interacts with.

Importance: Source of truth for all code and repository metadata synchronized to Sourcegraph.

Additional Details:

Sourcegraph maintains connections to these systems via API tokens
Rate limits and permissions from code hosts must be respected

Interactions:

Repo-updater syncs repository metadata and permissions from code hosts
Gitserver clones and fetches repositories from code hosts
Batch Changes creates and updates changesets (PRs/MRs) on code hosts

Observability Infrastructure

Prometheus

Purpose: Time-series database that collects, stores, and serves metrics from all Sourcegraph services.

Importance: Critical for monitoring service health, performance, and resource usage across the entire Sourcegraph deployment.

Additional Details:

Scrapes metrics from all services at configurable intervals
Evaluates alerting rules to detect potential issues
Provides query language (PromQL) for metrics analysis
Stores time-series data with automatic downsampling

Interactions:

Scrapes metrics endpoints exposed by all Sourcegraph services
Sends alerts to configured alert managers
Supplies metrics data to Grafana for visualization

Grafana

Purpose: Visualization platform that creates dashboards and graphs from Prometheus metrics data.

Importance: Provides visual insights into system performance and enables admins to diagnose issues quickly.

Additional Details:

Ships with pre-configured dashboards for all Sourcegraph services
Supports alerting based on metric thresholds
Allows for custom dashboard creation

Interactions:

Queries Prometheus for metrics data
Displays real-time and historical performance data

cAdvisor

Purpose: Analyzes and exposes resource usage and performance data from containers.

Importance: Provides container-level metrics that are essential for understanding resource utilization.

Additional Details:

Automatically discovers all containers in a Sourcegraph deployment
Collects CPU, memory, network, and disk usage metrics
Zero configuration required in most deployments

Interactions:

Metrics are scraped by Prometheus
Data is visualized in Grafana dashboards

Telemetry

Ping Service

Purpose: Collects anonymous usage data about Sourcegraph instances and sends it to Sourcegraph.

Importance: Provides Sourcegraph with critical insights about feature usage and deployment scales to guide product development.

Additional Details:

Only sends high-level, anonymized usage statistics
Can be disabled by admins in site configuration
Runs daily as a scheduled job
No code or repository-specific data is ever transmitted

Interactions:

Frontend service collects usage data from various services
Pings are sent to Sourcegraph cloud service via HTTPS

More details can be found in Life of a ping.

Cody Architecture

Cody is Sourcegraph's AI-powered coding assistant. For detailed information on Cody's architecture and implementation, refer to the Cody Enterprise Architecture documentation.

Cody Gateway

Purpose: Manages connections to various AI providers (e.g., OpenAI, Anthropic) and handles request routing, authentication, and rate limiting.

Importance: Enables Cody's AI code assistance features while abstracting away the complexity of multiple AI providers.

Additional Details:

Supports multiple large language model providers
Handles fallback between providers when necessary
Manages rate limits and quotas
Authenticates requests to ensure proper access

Interactions:

Receives requests from Cody clients (web app, editor extensions)
Forwards appropriately formatted requests to AI providers
Returns AI-generated responses to clients

Cody Context Fetcher

Purpose: Gathers relevant code context from the repository to enhance AI prompts with local codebase knowledge.

Importance: Critical for making Cody's responses contextually aware of the user's codebase.

Additional Details:

Uses embeddings and semantic search to find relevant code
Intelligently selects context based on query and available context window
Balances context quality with token limits

Interactions:

Uses search infrastructure to find relevant code snippets
Interacts with gitserver to access repository content
Provides enhanced context to Cody Gateway for AI requests

Cody Agent

Purpose: Client-side component that runs in the IDE to handle local processing, manage state, and communicate with Sourcegraph's backend services.

Importance: Provides a smooth, responsive experience by managing the communication between the IDE and Sourcegraph services.

Additional Details:

Manages local state and caching to reduce latency
Handles connection and authentication with Sourcegraph instance
Processes local context before sending requests
Implements IDE-specific interfaces for different editor platforms

Interactions:

Communicates with Sourcegraph backend services via API
Interfaces with IDE extensions to provide UI integrations
Sends requests to Cody Gateway for AI completions and chat
Manages local file access to gather context

Completions API

Purpose: Handles code completion requests and orchestrates interactions with various LLM providers.

Importance: Core service that powers Cody's intelligent code completions feature.

Additional Details:

Optimized for low-latency completion requests
Implements specialized prompts for code completion
Supports streaming completions for responsive UI
Applies post-processing to improve completion quality

Interactions:

Receives completion requests from Cody Agent
Interfaces with Cody Gateway to access LLM providers
Utilizes Context Fetcher to enhance prompts with relevant code
Returns processed completions to clients

Policy Service

Purpose: Enforces usage policies, rate limits, and access controls for Cody features.

Importance: Ensures compliance with licensing, usage agreements, and prevents abuse of the system.

Additional Details:

Manages user quotas and rate limits
Enforces feature access based on licensing tier
Tracks usage analytics for billing and optimization
Implements configurable policies for enterprise environments

Interactions:

Validates requests against policy rules
Integrates with authentication and authorization systems
Provides usage metrics to telemetry systems
Communicates policy decisions to other Cody services

Cody Proxy

Purpose: Handles routing, load balancing, and failover between different AI providers.

Importance: Ensures high availability and optimal performance by managing connections to multiple AI backends.

Additional Details:

Implements sophisticated routing algorithms
Monitors provider health and performance
Handles transparent failover between providers
Optimizes request distribution based on cost and performance

Interactions:

Sits between Cody Gateway and external AI providers
Monitors response latency and error rates
Manages connection pooling to providers
Implements circuit breaking for unavailable services

Attribution Tracking

Purpose: Tracks which code suggestions come from which sources for proper attribution and transparency.

Importance: Critical for maintaining legal compliance, intellectual property rights, and transparency in AI-generated code.

Additional Details:

Identifies the origin of code snippets in completions
Maintains records of source repositories and licenses
Provides attribution information to users
Helps enforce license compliance for suggested code

Interactions:

Analyzes completions to identify code origins
Cross-references with repository metadata
Adds attribution metadata to completions
Integrates with policy service for license enforcement

Cody Assistant

Purpose: Manages the chat interface component that provides interactive coding assistance.

Importance: Provides an intuitive, conversational interface for developers to interact with Cody.

Additional Details:

Maintains conversation context and history
Implements specialized commands for different coding tasks
Supports rich UI elements like code blocks and diagrams
Provides contextual help and suggestions

Interactions:

Receives user queries through chat interface
Coordinates with Context Fetcher for relevant code lookup
Sends processed requests to Cody Gateway
Renders responses with appropriate formatting and UI elements

Sourcegraph is designed to scale from small deployments to large enterprise installations with thousands of repositories and users. The Scaling Overview for Services provides detailed information about how each service scales, including:

Resource requirements for each service
Scaling factors to consider (number of users, repositories, etc.)
Storage considerations for different components
Performance optimization recommendations

When planning to scale your Sourcegraph instance, consider using Grafana dashboards to monitor current resource usage and the Resource Estimator to plan for future growth.

External Services and Dependencies

Sourcegraph can be configured to use external services for improved performance, reliability, and scalability in production environments. While Sourcegraph provides bundled versions of these services, many deployments replace them with managed alternatives.

Database Services

PostgreSQL Databases:

Purpose: Sourcegraph uses PostgreSQL for all persistent relational data storage
Variants:
- Frontend DB: Stores user data, repository metadata, configuration, and other core data
- Codeintel DB: Stores code intelligence data
- Codeinsights DB: Stores code insights time series data
Cloud Alternatives: AWS RDS for PostgreSQL, Google Cloud SQL, Azure Database for PostgreSQL

Caching and Session Storage

Redis Instances:

Purpose: Provides in-memory data structure store for caching and ephemeral data
Variants:
- Redis Cache: Stores application cache data
- Redis Store: Stores short-term information such as user sessions
Cloud Alternatives: Amazon ElastiCache, Google Cloud Memorystore, Azure Cache for Redis

Object Storage

Blob Storage:

Purpose: Stores large binary objects such as LSIF/SCIP uploads and other artifacts
Default Implementation: MinIO (S3-compatible)
Cloud Alternatives: Amazon S3, Google Cloud Storage, Azure Blob Storage

Distributed Tracing

Jaeger:

Purpose: Provides end-to-end distributed tracing for debugging and monitoring
Usage: Optional component for advanced debugging and performance analysis
Cloud Alternatives: AWS X-Ray, Google Cloud Trace, Azure Monitor

External Code Hosts

Sourcegraph connects to various code hosts to synchronize repositories and metadata.

Additional Resources

Life of a repository - Detailed explanation of repository syncing
Life of a search query - How search requests flow through the system
Monitoring architecture - How Sourcegraph's observability system works
Life of a ping - How usage data is collected
Background permissions syncing - Details on permission synchronization
Using external services with Sourcegraph - How to configure external services