CloudFlare Troubleshooting
- Cloudflare Grafana Dashboard
- Cloudflare Dashboard
- GitLab.com Firewall Overview
- GitLab.com Traceback Tool
Symptoms
Section titled “Symptoms”There are certain conditions which indicate a CloudFlare-specific problem. For example, if there are elevated CloudFlare errors but not production errors, the problem must be inside CloudFlare.
Here is a list of potential sources of errors
Static objects cache
Section titled “Static objects cache”Static objects cache for production is deployed
as a CloudFlare worker in the gitlab.net zone. If the alert you got indicated
the gitlab.net zone, and requests to /raw/
or /-/archive
endpoints are
failing then it’s worth checking how the worker is operating. See its
runbook for troubleshooting information.
False Positive Triage Process
Section titled “False Positive Triage Process”The following information is intended help the process of the diagnosing and remediating user reports of Cloudflare blocks due to WAF enforcement. With any WAF product, there will be a small number of user impacting false positives; our goal is to reduce those as much possible given the nature of the content hosted on GitLab.com while still getting some benefit from the Cloudflare WAF product.
Supporting Artifacts to Collect
Section titled “Supporting Artifacts to Collect”- If an incident has already been created due to a large number of reports:
- Copy the generic trace template below and ask users to report their results.
- If further details are required, direct users to create confidential Cloudflare Troubleshooting Issue and link it to the incident issue.
- If the problem is a specific URI or request:
- Direct them to create a Cloudflare Troubleshooting Issue, making it confidential if necessary.
Confirming Cloudflare and other service Connectivity
Section titled “Confirming Cloudflare and other service Connectivity”- Inspect Cloudflare Grafana Dashboard the for any major discrepancies in the returns codes between Cloudflare
and
haproxy
. - Log in to https://dash.cloudflare.com and search for the requests which are not working as expected. Are they being blocked or otherwise acted on by any of the Cloudflare services?
- Search the
workhorse
andrails
production logs to determine for the corresponding requests to verify if the request is making to GitLab’s services. - On a host experiencing connection issues, add
gitlab.com
to the/etc/hosts
file with the IP of the origin and reattempt the requests to determine if the problem may be between Cloudflare and GCP.- Attempt the same connections using both the DNS supplied addresses for
gitlab.com
and the hardcoded origin addresses from different GCP regions and/or other cloud providers to further narrow down specific paths exhibiting problems.
- Attempt the same connections using both the DNS supplied addresses for
Generic trace template
Section titled “Generic trace template”<p><details><summary>`curl http://gitlab.com/cdn-cgi/trace`</summary>
<pre><code>PASTE OUTPUT HERE</code></pre>
</details></p>
<p><details><summary>`curl https://gitlab.com/cdn-cgi/trace`</summary>
<pre><code>PASTE OUTPUT HERE</code></pre>
</details></p>
<p><details><summary>`curl -svo /dev/null https://gitlab.com`</summary>
<pre><code>PASTE OUTPUT HERE</code></pre>
</details></p>
## GeoIP Troubleshooting
We use CloudFlare rules to block access to gitlab.com from various locations. When we need to torubleshoot these rules with CloudFlare support they will ask for a trace from the user being blocked. The user simply has to visit [`/cdn-cgi/trace`](https://gitlab.com/cdn-cgi/trace) and then we provide the output in the support ticket.