Vector
- Overview
- Vector Deployments
- Vector Agent
Overview
Section titled “Overview”Vector is a high-performance, open-source observability data pipeline built in Rust by Datadog. It collects, transforms, and routes logs, metrics, and traces with a focus on correctness, performance, and operator ergonomics.
At Gitlab this is currently replacing our use of fluentd.
Vector is configured declaratively in YAML (or TOML/JSON) using a pipeline model of sources (where data comes from), transforms (how data is parsed, enriched, or filtered), and sinks (where data is sent).
For full documentation see https://vector.dev/docs/.
Vector Deployments
Section titled “Vector Deployments”We currently have the following Vector deployments:
| Deployment | Role | Description |
|---|---|---|
| vector-agent | Agent (DaemonSet) | Kubernetes log collection, replacing fluentd-elasticsearch |
| vector-archiver | Archiver (Deploy) | Log archival to GCS, replacing fluentd-archiver |
Vector Agent
Section titled “Vector Agent”Summary
Section titled “Summary”Vector Agent is the Kubernetes log collection service that replaces the legacy fluentd-elasticsearch DaemonSet. It runs as a DaemonSet on every node, collecting pod logs from /var/log/pods/ and shipping them to GCP Pub/Sub topics organized by service type.
The service is deployed via ArgoCD in the vector-agent namespace using two Helm charts:
vector(fromhttps://helm.vector.dev) — the Vector binary, deployed as a DaemonSetvector-config(fromregistry.ops.gitlab.net/gitlab-com/gl-infra/charts) — generates the pipeline ConfigMap, to replicate the existingfluentd-elasticsearchlogic.
Architecture
Section titled “Architecture”graph LR
A[Kubernetes Pods] -->|write logs| B[/var/log/pods/]
B -->|read| C[Vector Agent DaemonSet]
C -->|normalize VRL| D[Transforms]
D -->|filter| E[Filter Events]
E -->|publish| F[GCP Pub/Sub Topics]
F --> G[Pubsubbeat]
G --> H[Elasticsearch]
F --> I[GCS Archive]
Observability
Section titled “Observability”Dashboards
Section titled “Dashboards”Viewing vector-agent in ArgoCD
Section titled “Viewing vector-agent in ArgoCD”- Navigate to https://argocd.gitlab.net
- In the search/filter bar, type
vector-agent, or optionally label filter byapp.kubernetes.io/name=vector-agent. - You will see one
vector-agentApplication per cluster where vector-agent is deployed. - Click on an application to view its sync status, health, and managed resources
Configuration Management
Section titled “Configuration Management”Configuration repositories
Section titled “Configuration repositories”- ArgoCD app config: https://gitlab.com/gitlab-com/gl-infra/argocd/apps
- vector-config Helm chart: https://gitlab.com/gitlab-com/gl-infra/charts/-/tree/main/gitlab/vector-config
- Upstream Vector Helm chart: https://github.com/vectordotdev/helm-charts
Argocd Repository layout
Section titled “Argocd Repository layout”Configuration lives in the ArgoCD apps repository:
argocd/apps/services/vector-agent/├── service.yaml # ArgoCD ApplicationSet service definition├── values.yaml # Vector Helm chart values (DaemonSet config)├── values-config.yaml # Pipeline configuration (sources, transforms, sinks)└── env/ ├── gprd/ │ ├── app.yaml # Environment specifc chart versions, enabled/disabled │ └── values-config.yaml # Environment specific values overrides ├── gstg/ │ ├── app.yaml │ └── values-config.yaml ├── pre/ │ └── app.yaml └── ops/ └── app.yamlMaking changes
Section titled “Making changes”- Edit the relevant files in
argocd/apps/services/vector-agent/. - Open a merge request.
- After merge, Argocd will trigger a sync of the application. You can check the sync status/health via the UI.
For pipeline configuration changes (adding/modifying services), only values-config.yaml needs to be edited. The ConfigMap update triggers a live reload — no pod restart required.
For DaemonSet changes (resources, volumes, tolerations), edit values.yaml. These changes require a pod rollout which happens during ArgoCD sync.
How the configuration works
Section titled “How the configuration works”The vector-config Helm chart generates a single ConfigMap named vector-config containing a vector.yaml file. The Vector agent watches this ConfigMap for changes via VECTOR_WATCH_CONFIG=true, enabling live configuration reloading without pod restarts.
The configuration is built from two concepts:
- Pipeline Templates (
pipelineTemplates): Reusable definitions of sources, transforms, and sinks. These define the common log collection pattern. - Pipeline Configs (
pipelineConfigs): Concrete instances that reference a template and provide service-specific values (log paths, Pub/Sub topics, filters).
Pipeline template
Section titled “Pipeline template”The default template named kubernetes defines a three-stage pipeline:
| Stage | Component Type | Purpose |
|---|---|---|
kubernetes_logs | Source | Reads pod logs from /var/log/pods/ |
normalize | Transform (VRL) | Parses log messages (syslog, nginx, JSON), extracts timestamps, enriches with Kubernetes metadata |
filter_events | Transform (filter) | Per-pipeline configurable filter for dropping unwanted events |
pubsub | Sink | Sends events to a GCP Pub/Sub topic with JSON encoding |
Adding a new pipeline
Section titled “Adding a new pipeline”To collect logs for a new service, add an entry to pipelineConfigs in values-config.yaml:
pipelineConfigs: my-new-service: template: kubernetes pubsub_topic: "pubsub-my-new-service-inf-{{ .Values._clusterEnvironment }}" paths_include: - /var/log/pods/my-namespace_my-service-*_*/*/*.logKey fields:
| Field | Required | Description |
|---|---|---|
template | Yes | Name of the pipeline template to use (typically kubernetes) |
paths_include | Yes | Glob patterns for log files to collect |
paths_exclude | No | Glob patterns for log files to exclude |
pubsub_topic | No | Pub/Sub topic name (defaults to pubsub-<name>-inf-<env>) |
filter | No | VRL expression for filtering events (defaults to true / allow all) |
custom_records | No | Map of additional fields to add to every log event |
Path format for Kubernetes pods:
/var/log/pods/<namespace>_<pod_name>_<pod_uid>/<container_name>/<n>.logThis differs from the old fluentd format which used /var/log/containers/.
Adding a per-pipeline filter
Section titled “Adding a per-pipeline filter”Use the filter field with a VRL expression that returns a boolean:
pipelineConfigs: packagecloud: template: kubernetes pubsub_topic: "pubsub-packagecloud-inf-{{ .Values._clusterEnvironment }}" paths_include: - /var/log/pods/packagecloud_*_*/*/*.log filter: | sub = string(.subcomponent) ?? "" !match_any(sub, [ r'^metrics$', r'^request_log$' ])Updating the Vector Agent DaemonSet
Section titled “Updating the Vector Agent DaemonSet”Changes to the DaemonSet itself (resources, tolerations, volumes, ports) go in values.yaml. These are values passed to the upstream Vector Helm chart.
Key settings:
role: "Agent" # DaemonSet modeexistingConfigMaps: ["vector-config"] # External config from vector-config chartenv: - name: VECTOR_WATCH_CONFIG value: "true" # Hot-reload on ConfigMap changesresources: requests: cpu: 150m memory: 1024Mi limits: cpu: 300m memory: 2048MiUpdating chart versions
Section titled “Updating chart versions”- Update the version in the relevant
env/<environment>/app.yaml - Open a merge request
- After merge, sync the application in ArgoCD
Environment-specific overrides
Section titled “Environment-specific overrides”The value file hierarchy (evaluated in order, later files override earlier ones):
values-config.yaml— base configuration for all environmentsenv/<environment>/values-config.yaml— environment-specific overridesenv/<environment>/clusters/<cluster-name>/values-config.yaml— cluster-specific overrides
Troubleshooting
Section titled “Troubleshooting”Using vector tap for live log diagnosis
Section titled “Using vector tap for live log diagnosis”vector tap is a powerful debugging tool that lets you observe events flowing through any component in the pipeline in real time. It connects to the Vector API and streams matching events to your terminal.
Prerequisites
Section titled “Prerequisites”- The Vector API must be enabled (it is, on
127.0.0.1:8686) - You need
kubectl execaccess to a vector-agent pod
Tapping a specific component
Section titled “Tapping a specific component”To observe events flowing through a specific source, transform:
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize'
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_pubsub'** Note: taps view the logs post stage processing, and as such you can not tap a sink (output), as the logs are forwarded at this point.
Using glob patterns
Section titled “Using glob patterns”You can use glob patterns to tap multiple components at once:
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'sidekiq*_kubernetes_logs'Filtering tap output
Section titled “Filtering tap output”Combine vector tap with grep or jq to filter output:
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' | grep 'correlation_id'
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' --format json 2>/dev/null | jq '.'
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' --format json 2>/dev/null | jq 'select(.severity == "ERROR")'Limiting tap output
Section titled “Limiting tap output”kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' --limit 10
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' --format json
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' --format yamlDiagnosing a specific pipeline
Section titled “Diagnosing a specific pipeline”A common debugging workflow:
-
Check the source is receiving events:
Terminal window kubectl -n vector-agent exec -it <pod-name> -- vector tap '<pipeline>_kubernetes_logs' --limit 5If no events appear, the log path pattern may be wrong or no pods are producing logs matching the glob.
-
Check the normalize transform is parsing correctly:
Terminal window kubectl -n vector-agent exec -it <pod-name> -- vector tap '<pipeline>_normalize' --limit 5 --format json 2>/dev/null | jq '.'Check the format post normalize looks correct. We attempted to parse common log types (syslog, nginx, JSON), but always return the log body untouched if parsing fails.
-
Check the filter is not dropping all events:
Terminal window kubectl -n vector-agent exec -it <pod-name> -- vector tap '<pipeline>_normalize' --limit 100 2>/dev/null | wc -lkubectl -n vector-agent exec -it <pod-name> -- vector tap '<pipeline>_filter_events' --limit 100 2>/dev/null | wc -l