Skip to content

Vector

Vector is a high-performance, open-source observability data pipeline built in Rust by Datadog. It collects, transforms, and routes logs, metrics, and traces with a focus on correctness, performance, and operator ergonomics.

At Gitlab this is currently replacing our use of fluentd.

Vector is configured declaratively in YAML (or TOML/JSON) using a pipeline model of sources (where data comes from), transforms (how data is parsed, enriched, or filtered), and sinks (where data is sent).

For full documentation see https://vector.dev/docs/.

We currently have the following Vector deployments:

DeploymentRoleDescription
vector-agentAgent (DaemonSet)Kubernetes log collection, replacing fluentd-elasticsearch
vector-archiverArchiver (Deploy)Log archival to GCS, replacing fluentd-archiver

Vector Agent is the Kubernetes log collection service that replaces the legacy fluentd-elasticsearch DaemonSet. It runs as a DaemonSet on every node, collecting pod logs from /var/log/pods/ and shipping them to GCP Pub/Sub topics organized by service type.

The service is deployed via ArgoCD in the vector-agent namespace using two Helm charts:

  1. vector (from https://helm.vector.dev) — the Vector binary, deployed as a DaemonSet
  2. vector-config (from registry.ops.gitlab.net/gitlab-com/gl-infra/charts) — generates the pipeline ConfigMap, to replicate the existing fluentd-elasticsearch logic.
graph LR
    A[Kubernetes Pods] -->|write logs| B[/var/log/pods/]
    B -->|read| C[Vector Agent DaemonSet]
    C -->|normalize VRL| D[Transforms]
    D -->|filter| E[Filter Events]
    E -->|publish| F[GCP Pub/Sub Topics]
    F --> G[Pubsubbeat]
    G --> H[Elasticsearch]
    F --> I[GCS Archive]
  1. Navigate to https://argocd.gitlab.net
  2. In the search/filter bar, type vector-agent, or optionally label filter by app.kubernetes.io/name=vector-agent.
  3. You will see one vector-agent Application per cluster where vector-agent is deployed.
  4. Click on an application to view its sync status, health, and managed resources

Configuration lives in the ArgoCD apps repository:

argocd/apps/services/vector-agent/
├── service.yaml # ArgoCD ApplicationSet service definition
├── values.yaml # Vector Helm chart values (DaemonSet config)
├── values-config.yaml # Pipeline configuration (sources, transforms, sinks)
└── env/
├── gprd/
│ ├── app.yaml # Environment specifc chart versions, enabled/disabled
│ └── values-config.yaml # Environment specific values overrides
├── gstg/
│ ├── app.yaml
│ └── values-config.yaml
├── pre/
│ └── app.yaml
└── ops/
└── app.yaml
  1. Edit the relevant files in argocd/apps/services/vector-agent/.
  2. Open a merge request.
  3. After merge, Argocd will trigger a sync of the application. You can check the sync status/health via the UI.

For pipeline configuration changes (adding/modifying services), only values-config.yaml needs to be edited. The ConfigMap update triggers a live reload — no pod restart required.

For DaemonSet changes (resources, volumes, tolerations), edit values.yaml. These changes require a pod rollout which happens during ArgoCD sync.

The vector-config Helm chart generates a single ConfigMap named vector-config containing a vector.yaml file. The Vector agent watches this ConfigMap for changes via VECTOR_WATCH_CONFIG=true, enabling live configuration reloading without pod restarts.

The configuration is built from two concepts:

  1. Pipeline Templates (pipelineTemplates): Reusable definitions of sources, transforms, and sinks. These define the common log collection pattern.
  2. Pipeline Configs (pipelineConfigs): Concrete instances that reference a template and provide service-specific values (log paths, Pub/Sub topics, filters).

The default template named kubernetes defines a three-stage pipeline:

StageComponent TypePurpose
kubernetes_logsSourceReads pod logs from /var/log/pods/
normalizeTransform (VRL)Parses log messages (syslog, nginx, JSON), extracts timestamps, enriches with Kubernetes metadata
filter_eventsTransform (filter)Per-pipeline configurable filter for dropping unwanted events
pubsubSinkSends events to a GCP Pub/Sub topic with JSON encoding

To collect logs for a new service, add an entry to pipelineConfigs in values-config.yaml:

pipelineConfigs:
my-new-service:
template: kubernetes
pubsub_topic: "pubsub-my-new-service-inf-{{ .Values._clusterEnvironment }}"
paths_include:
- /var/log/pods/my-namespace_my-service-*_*/*/*.log

Key fields:

FieldRequiredDescription
templateYesName of the pipeline template to use (typically kubernetes)
paths_includeYesGlob patterns for log files to collect
paths_excludeNoGlob patterns for log files to exclude
pubsub_topicNoPub/Sub topic name (defaults to pubsub-<name>-inf-<env>)
filterNoVRL expression for filtering events (defaults to true / allow all)
custom_recordsNoMap of additional fields to add to every log event

Path format for Kubernetes pods:

/var/log/pods/<namespace>_<pod_name>_<pod_uid>/<container_name>/<n>.log

This differs from the old fluentd format which used /var/log/containers/.

Use the filter field with a VRL expression that returns a boolean:

pipelineConfigs:
packagecloud:
template: kubernetes
pubsub_topic: "pubsub-packagecloud-inf-{{ .Values._clusterEnvironment }}"
paths_include:
- /var/log/pods/packagecloud_*_*/*/*.log
filter: |
sub = string(.subcomponent) ?? ""
!match_any(sub, [
r'^metrics$',
r'^request_log$'
])

Changes to the DaemonSet itself (resources, tolerations, volumes, ports) go in values.yaml. These are values passed to the upstream Vector Helm chart.

Key settings:

role: "Agent" # DaemonSet mode
existingConfigMaps: ["vector-config"] # External config from vector-config chart
env:
- name: VECTOR_WATCH_CONFIG
value: "true" # Hot-reload on ConfigMap changes
resources:
requests:
cpu: 150m
memory: 1024Mi
limits:
cpu: 300m
memory: 2048Mi
  1. Update the version in the relevant env/<environment>/app.yaml
  2. Open a merge request
  3. After merge, sync the application in ArgoCD

The value file hierarchy (evaluated in order, later files override earlier ones):

  1. values-config.yaml — base configuration for all environments
  2. env/<environment>/values-config.yaml — environment-specific overrides
  3. env/<environment>/clusters/<cluster-name>/values-config.yaml — cluster-specific overrides

vector tap is a powerful debugging tool that lets you observe events flowing through any component in the pipeline in real time. It connects to the Vector API and streams matching events to your terminal.

  • The Vector API must be enabled (it is, on 127.0.0.1:8686)
  • You need kubectl exec access to a vector-agent pod

To observe events flowing through a specific source, transform:

Terminal window
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize'
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_pubsub'

** Note: taps view the logs post stage processing, and as such you can not tap a sink (output), as the logs are forwarded at this point.

You can use glob patterns to tap multiple components at once:

Terminal window
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'sidekiq*_kubernetes_logs'

Combine vector tap with grep or jq to filter output:

Terminal window
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' | grep 'correlation_id'
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' --format json 2>/dev/null | jq '.'
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' --format json 2>/dev/null | jq 'select(.severity == "ERROR")'
Terminal window
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' --limit 10
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' --format json
kubectl -n vector-agent exec -it <pod-name> -- vector tap 'rails_normalize' --format yaml

A common debugging workflow:

  1. Check the source is receiving events:

    Terminal window
    kubectl -n vector-agent exec -it <pod-name> -- vector tap '<pipeline>_kubernetes_logs' --limit 5

    If no events appear, the log path pattern may be wrong or no pods are producing logs matching the glob.

  2. Check the normalize transform is parsing correctly:

    Terminal window
    kubectl -n vector-agent exec -it <pod-name> -- vector tap '<pipeline>_normalize' --limit 5 --format json 2>/dev/null | jq '.'

    Check the format post normalize looks correct. We attempted to parse common log types (syslog, nginx, JSON), but always return the log body untouched if parsing fails.

  3. Check the filter is not dropping all events:

    Terminal window
    kubectl -n vector-agent exec -it <pod-name> -- vector tap '<pipeline>_normalize' --limit 100 2>/dev/null | wc -l
    kubectl -n vector-agent exec -it <pod-name> -- vector tap '<pipeline>_filter_events' --limit 100 2>/dev/null | wc -l