For a long time we used the traditional method of accessing internal database clusters by SSHing to a bastion host. Due to the overhead and limitations of maintaining the SSH configuration, we’ve moved to using Cloudflare Tunnels combined with Cloudflare Access to dramatically improve the user experience and onboarding times related to database access.
How we used to work
Internally we rely heavily on PostgreSQL to power many services at Cloudflare – including Stream, Images and the Cloudflare Dashboard itself. We run our Postgres clusters on our own hardware within our data centers, and they are not accessible to the public Internet, including employee laptops.
When an employee requires access to one of these databases – be it for staging environments, incident management, or supporting production services – an SSH user account is required. This SSH account has limited access on a bastion host, purely for querying databases within the data center.
The pain we experienced
Provisioning an SSH account to these bastion hosts requires submitting a pull request to our main Infrastructure-as-Code git repository. For engineers this is a cumbersome process, and for non-engineers it is either an unnecessary learning experience, or a burden to whomever they have to ask to complete the work for them.
Both the Security and Site Reliability Engineering (SRE) teams had tolerated this solution as a necessary evil, but had reservations about handing out shell access to machines for this purpose. While the user accounts had few privileges within the bastion hosts, this still allowed users to run commands within a host and requires a lot of trust that the whole stack is secure.
The solution: Cloudflare Zero Trust
As it turns out, the problems we encountered were the same as problems many of our customers encounter as well. We also knew these concerns could be easily addressed with our own products, Cloudflare Tunnel and Cloudflare Access.
To get started, we deployed Cloudflare Tunnel on a pod set up within our internal Kubernetes cluster that maintains access to the database clusters. This established connectivity from our origin to the Cloudflare global network. At this point, our newly created Tunnel was ready to serve traffic to our origin, in this case our PostgreSQL database server. This already simplified orchestration and management as we no longer needed to manage any Access Control List (ACL) changes for the pod itself in order for cloudflared to connect to it.
Next, to ensure that only eligible Cloudflare employees could access the database endpoints, we implemented Cloudflare Access and created identity-driven Zero Trust policies. Access then handled all user authentication for each incoming request over Tunnel and enforced a set of pre-defined identity-based policies to ensure that only certain Cloudflare employees could make connections to our database.
We were also able to better delineate access to staging and production databases by creating independent Tunnels for each. This allowed us to enforce more granular restrictions for production access without impeding our more accessible staging environments. It also had the added benefit of clearly separating the network policies used internally.
Finally, in order for our internal users to connect to these databases, they simply needed to install cloudflared client side on their machine. Once installed, they could run cloudflared access from their endpoint to establish a long-lived TCP connection from their local laptop to the desired database cluster. Each request was then routed to Cloudflare first for policy evaluation through Cloudflare Access. This prompted the user to complete an authentication event which ensured only the Cloudflare engineers defined within our Zero Trust policies were able to establish a connection to the database.
With cloudflared running locally, the user is then free to fire up their favorite database client to connect to the local port and run queries against the remote database cluster as if it is running locally. In short, our users were now able to run a lightweight daemon, cloudflared, on their local machine to route traffic to Cloudflare. Cloudflare Access then evaluated each request against the identity-driven Zero Trust security policies we defined. If the user met these requirements, the request was forwarded onto Cloudflare Tunnel which securely connected internal users to our databases behind Tunnel.
While we enjoy the benefits this workflow gives us, we needed to include a break glass procedure to ensure that we aren’t locked out of fixing our infrastructure if our infrastructure itself is having issues. For this reason, we continue to maintain SSH-based jump-hosts for a limited number of senior staff members to get in and re-establish connectivity.
What we learned
By implementing our own solutions, we were able to enhance our security posture and improve the overall experience for our internal users. We were also able to become a customer of our own products and provide value feedback, insight, and feature requests to the Access and Tunnel teams internally. Oftentimes, we get to be the first to try new features or report regressions in beta builds which ultimately leads to a better experience for our customers as well.
Overall, by implementing Access and Tunnel to forward arbitrary TCP connections, users are able to focus on their job rather than worrying about the nuances of sending strings of complex commands through an SSH client. Our Security and SRE teams are also happier knowing that any connection to our data centers have been authenticated, authorized and logged by Cloudflare Access. If you’d like to get started Cloudflare Tunnel is free for any user and any use case. To get started, sign-up for a Cloudflare Zero Trust account and create your first Tunnel directly from the Zero Trust dashboard.