Skip to content

Troubleshooting Hashicorp Vault

No Active Vault Instance / Vault Sealed / Vault Low Failure Tolerance

Section titled “No Active Vault Instance / Vault Sealed / Vault Low Failure Tolerance”

The Vault pods are failing to start, have lost quorum or are unable to auto-unseal.

Vault is deployed in a cluster of 5 nodes, so it needs at least 3 healthy nodes to have a quorum and be operational.

Check the status of the Vault deployment and investigate any failing pod for errors:

Terminal window
kubectl --namespace vault get pods
kubectl --namespace vault logs vault-X

You can also check the logs in Elasticsearch instead.

In case of unseal errors:

  • Verify that the Kubernetes Service Account is still associated to its Google Service Account [email protected]:

    Terminal window
    kubectl --namespace vault describe serviceaccount vault
  • Verify that this Service Account has permission to use the unseal KMS key for encryption/decryption.

Vault is unable to send its audit log and thus has stopped all operations until it is able again.

At the time of this writing, the Vault audit logs are written directly to stdout, so they can be collected by Fluentd and shipped to Elasticsearch, which makes failure extremely unlikely.

If Vault fails to write its audit logs it could mean:

  • a bug introduced in Vault: has it been upgraded recently? Search the issues on GitHub.
  • containerd not able to handle the container’s output, possibly affecting other workloads, check the health of node running the active Vault pod.