Skip to content

Vault Audit Log Analysis

Vault audit logs record all requests and responses to Vault, providing a comprehensive audit trail of who accessed which secrets and when. Analyzing these logs can help with:

  • Security monitoring and compliance verification
  • Identifying unused secrets that could be candidates for removal
  • Tracking secret rotation patterns and ensuring compliance with rotation policies
  • Capacity planning based on usage patterns
  • Troubleshooting authentication and access issues

Before you can analyze Vault audit logs, you need to load them from Google Cloud Storage (GCS) into BigQuery.

Ensure you have access to the gitlab-ops.vault_audit_investigation dataset in BigQuery. This access is typically provided to SREs, but can also be obtained through an access request.

If this dataset does not exist then you should be able to create this through the GCP console UI (related docs).

Create a local file named vault.json with the following schema definition:

[
{
"name": "time",
"type": "TIMESTAMP",
"mode": "REQUIRED",
"description": "The timestamp when the request was received by Vault"
},
{
"name": "backend_type",
"type": "STRING",
"mode": "NULLABLE",
"description": "The type of audit log entry (request, response)"
},
{
"name": "auth",
"type": "JSON",
"mode": "NULLABLE"
},
{
"name": "request",
"type": "JSON",
"mode": "NULLABLE"
},
{
"name": "response",
"type": "JSON",
"mode": "NULLABLE"
}
]

This schema captures the key components of Vault audit logs:

  • time: When the request was received
  • backend_type: Whether this is a request or response log
  • auth: JSON object containing authentication details
  • request: JSON object containing request details
  • response: JSON object containing response details

[!note] When testing modified or new queries it may be beneficial to load a smaller amount of data like a day first. This will speed up iteration and lower costs as BigQuery charges based on the amount of data processed.

Use the following command to load a single day’s worth of Vault audit logs from GCS to BigQuery:

Terminal window
bq load --ignore_unknown_values=true --project_id gitlab-ops --source_format NEWLINE_DELIMITED_JSON \
vault_audit_investigation.single_day \
'gs://gitlab-ops-logging-archive/gke/vault/dt=YYYY-MM-DD/*.gz' vault.json

[!note] Replace YYYY-MM-DD with the date for which you want to analyze logs (typically the previous day).

To load the entire available retention period of data, you can use the following command:

Terminal window
URIS=$(gsutil ls 'gs://gitlab-ops-logging-archive/gke/vault/' | sed 's/$/*.gz/' | tr '\n' ',' | sed 's/,$//')
bq load --ignore_unknown_values=true --project_id gitlab-ops --source_format NEWLINE_DELIMITED_JSON \
vault_audit_investigation.full_retention_period \
"${URIS}" vault.json

Query 1: Secret Access Metrics by Mount Type (KV v2 Data Paths)

Section titled “Query 1: Secret Access Metrics by Mount Type (KV v2 Data Paths)”

This query provides metrics on secret access patterns:

SELECT
count(1) AS count,
count(distinct json_extract_scalar(request, "$.path")) AS unique_paths,
format_timestamp("%Y-%m", date_trunc(time, MONTH)) AS month,
json_extract_scalar(response, "$.mount_type") AS mount_type
FROM `gitlab-ops.vault_audit_investigation.single_day`
WHERE
backend_type = "response" AND
json_extract_scalar(request, "$.operation") = "read" AND
json_extract_scalar(response, "$.mount_type") = "kv" AND
regexp_contains(json_extract_scalar(request, "$.path"), "[^/]+/data/.+")
GROUP BY
backend_type,
format_timestamp("%Y-%m", date_trunc(time, MONTH)),
json_extract_scalar(response, "$.mount_type")

Purpose: This query shows the total number of access requests and unique secrets accessed, broken down by mount type and month.

[!note] The count of unique secret paths will be lower than the actual number of active secrets when using the External Secrets Operator, as it will only sync each secret once during the secret versions lifecycle.

Query 2: All Secret Access Metrics by Mount Type (Excluding Metadata)

Section titled “Query 2: All Secret Access Metrics by Mount Type (Excluding Metadata)”

This query is similar to the first one but includes all mount types while excluding metadata paths:

SELECT
count(1) AS count,
count(distinct json_extract_scalar(request, "$.path")) AS unique_paths,
format_timestamp("%Y-%m", date_trunc(time, MONTH)) AS month,
json_extract_scalar(response, "$.mount_type") AS mount_type
FROM `gitlab-ops.vault_audit_investigation.single_day`
WHERE
backend_type = "response" AND
json_extract_scalar(request, "$.operation") = "read" AND
json_extract_scalar(response, "$.mount_type") = "kv" AND
NOT regexp_contains(json_extract_scalar(request, "$.path"), "[^/]+/metadata/.+")
GROUP BY
backend_type,
format_timestamp("%Y-%m", date_trunc(time, MONTH)),
json_extract_scalar(response, "$.mount_type")

Purpose: This query provides a broader view of secret access patterns across all mount types in Vault.

Query 3: Secret Rotation Metrics by Mount Type

Section titled “Query 3: Secret Rotation Metrics by Mount Type”

This query helps track secret rotation patterns:

SELECT
count(1) AS count,
count(distinct json_extract_scalar(request, "$.path")) AS unique_paths,
format_timestamp("%Y-%m", date_trunc(time, MONTH)) AS month,
json_extract_scalar(response, "$.mount_type") AS mount_type
FROM `gitlab-ops.vault_audit_investigation.single_day`
WHERE
backend_type = "response" AND
json_extract_scalar(request, "$.operation") = "update" AND
json_extract_scalar(response, "$.mount_type") = "kv" AND
json_extract_scalar(response, "$.data.version") != "1" AND
NOT regexp_contains(json_extract_scalar(request, "$.path"), "[^/]+/metadata/.+")
GROUP BY
backend_type,
format_timestamp("%Y-%m", date_trunc(time, MONTH)),
json_extract_scalar(response, "$.mount_type")

Purpose: This query shows the number of secret rotations per month, excluding initial creations (version 1).