Data Insights Platform Runbooks
Overview
Section titled “Overview”Data Insights Platform (DIP) is a unified abstraction to ingest, process, persist & query analytical data events generated across GitLab enabling our ability to compute business insights across the product.
It’s designed to be a general-purpose data toolkit that can be used to transport events-data from one system to another while enriching ingested data dynamically. It currently serves the following use-cases:
Troubleshooting
Section titled “Troubleshooting”Resources
Section titled “Resources”- Contact Information
- Documentation
- Development
- Tooling
- Dashboards
General Architecture
Section titled “General Architecture”The following are general components that constitute a Data Insights Platform instance. For details around specific use-cases and/or environment-specific architectures, refer to their dedicated sections as linked earlier in the document.

| Component | Description |
|---|---|
| Ingress | All ingress into our currently-supported deployments of Data Insights Platform is proxied via Cloudflare. On the GKE side, we employ ingress-nginx as our ingress controller which in turn uses an IP whitelist containing advertised Cloudflare IP ranges. |
| Ingesters | Single ingestion mechanism for supported event types - which can be run both locally for development & as a cluster when in production. This layer is intentionally stateless to allow horizontal scalability to allow ingesting large data volumes. |
| Message Queue - NATS/Jetstream | All ingested data via the ingesters is first landed into NATS/Jetstream to allow for durably persisting all data before it can be parsed, enriched & exported to other downstream systems. |
| Enrichers | Custom framework to enrich incoming data with the ability to communicate with external components such as GitLab API or Data Catalog for metadata. Supported enrichments include operations such as pseudonymization or redaction of sensitive parts of ingested data, PII detection, parsing client useragent strings, etc. |
| Exporters | Custom implementations that help ship ingested data into designated persistent stores for further querying/processing: ClickHouse Exporter: ClickHouse is our designated persistent database which helps us persist all analytical data ingested by the Platform and query from using the Query API. S3/GCS Exporter: Having data shipped to S3/GCS helps land data into Snowflake powering our current analytical query-workflows using Snowflake & Tableau. |
| Storage | ClickHouse: External persistent database that allows for durable persistence and advanced OLAP querying capabilities for all analytical data ingested within the Platform. |
Message Queueing via NATS
Section titled “Message Queueing via NATS”Service Level Indicators (SLI)
Section titled “Service Level Indicators (SLI)”Self-managed & Dedicated
Section titled “Self-managed & Dedicated”We have not yet deployed a Data Insights Platform to service our self-managed and dedicated GitLab instances.
Provisioning New Deployments
Section titled “Provisioning New Deployments”To kickoff related discussions, please start with filing an issue here with details of your use-case.