During my discussions with customers I’ve sometimes heard some incorrect expectations and assumptions when people are defining their backup and recovery strategy. As a database and, in general, data centric person I think it is quite important to understand what the Point In Time Recovery (PITR) means and what Google Cloud SQL can do and what it cannot do. The information here is relevant for September 2022 when the post has been written.
Let’s start from the point of time recovery and how it works. From the high level of view the PITR should provide the ability to restore and recover your data up to the last seconds defining a desirable point of time in the past to represent the consistent dataset at that moment.
To achieve that goal the database recovery uses a combination of backup and stored transaction logs. The transaction logs contain sequential records with all the changes applied to the database dataset. The logs have different names, such as binary log, Write-Ahead Logging (WAL) or Redo logs, but conceptually they are designed to store and apply information for recovery purposes. To recover the database instance should be restored from the latest suitable backup which was completed before the PITR and apply all the changes from the transaction logs starting from time when the backup had started and until the PITR time.
The post is about backup management for AlloyDB. It might be useful for the time when it is written but, probably, will be obsolete very soon when tools and API for the service will mature. A couple of words about AlloyDB backups and how they are created. The backups are quite different from the default backups for Cloud SQL for example. As we know in Cloud SQL all the backups are bound to the instance. What it means is when the instance is deleted then all the backups disappear along with the instance. It makes sense if the backups behind the scenes are storage snapshots from the databases. But in AlloyDB all the backups are decoupled from the cluster and exist by themselves. If you delete a cluster the backups stay. I think it is a way better approach because it provides a better way to protect from some mistakes when an instance is deleted before making a clone or exporting the data. As for now you can see all the backups for existing and deleted instances using the “backups” tab in the console, gcloud utility or listing using GCP REST API.
The GCP Config Connector (CC) is a Kubernetes add-on which allows you to create, change or delete cloud resources outside your cluster. It can deploy various GCP resources such as storage, databases, network and others representing them as a set of Kubernetes resources. It helps with a unified approach for deployment using Kubernetes to deploy the full stack for your application. The resources can be deployed by kubectl, helm or any CD (Continuous Deployment) tool you use in the organization.
A short disclaimer. I am writing it in the middle of March 2022 and it is possible that when you read the blog the information published here is not relevant anymore. Cloud products are evolving very fast.
I write the post to share some observations and potential issues you might have with deploying GCP Memorystore for Redis instances through Anthos Config Connector (ACC) controller. If you are not familiar with ACCI, I strongly recommend reading at least a high level overview of the product. In essence this is a Kubernetes addon which allows you to automatically deploy and manage GCP services by applying a manifest file (YAML or Helm chart) to a Kubernetes cluster with the ACC controller. It allows you to use the Kubernetes cluster as a deployment tool for GCP resources in your organization. This is a really interesting approach and might transform your environment in the cloud. But it implies some challenges around security which I am going to discuss in the blog.
In one of my previous posts I’ve noted that the GCP Cloud SQL for SQL Server doesn’t have point of time recovery as of March 2022. As result the default out of box backups can only provide RPO as 24 hours or more. The exact RPO might vary from day to day since you can only specify a window for backup but not exact time. So far it seems like the only reasonable approach to reduce the RPO is to schedule on-demand backups, and in this post I am going to show how you can do that using a couple of different approaches.
Before starting the post let me clarify that what I am going to describe as the state of readiness of the Google Cloud SQL Server is actual for early February 2022. It is quite possible that some things can be different when you read the post.
For the last several months I was helping some big enterprises to adopt Google Cloud Platform (GCP) and, as part of the implementation, a significant number of SQL Server databases were moving to the GCP Cloud SQL service. But when we started to build the environment in GCP it was clear that the SQL Server option for Cloud SQL is much inferior not only to some other cloud offerings and on-prem installations but also to other databases engines on the same Cloud SQL. In short the SQL Server on GCP Cloud SQL service lacked some essential features. Here I will try to explain why I think the SQL Server in GCP is not mature enough for enterprise.
Lately I work primarily with Google Public Cloud (GCP) and in particular with Kubernetes services (GKE). As result my daily routine command line tools are gcloud, kubectl, nomos and other. And when the GCP cloud shell is really amazing environment which doesn’t require any effort to fire up, sometimes it is not possible to use. When it comes to work from your own laptop you have different options. You can install the tools like Google Cloud SDK following several simple steps from the Google website or you can prepare a docker image and run it in a container. I personally prefer the second way. In such case I can periodically update entire environment without too much effort and easily can span a new environment on any laptop fairly quickly. Here I am sharing what I personally use for my day-to-day activity.
If you’ve been following the recent changes in the linux world you probably remember how Red Hat and Centos announced in December 2020 that the CentOS Project was shifting focus to CentOS Stream and support for CentOS Linux 8 had been cut to December 31, 2021. It created a wave of discussions in the community about the future for Centos as an enterprise platform and some people started to look to alternative Linux distributives. As a result we got a new, community-driven downstream built, same as Centos used to be, Rocky linux.
The downstream build is based on the same code base as the vendor distributive and resembles most features of the “parent” vendor Linux. It is following all the releases after they have been built by the vendor. In most of my tests I am using Oracle Linux when I am in the Oracle cloud but I am using Centos in Google cloud and other public clouds like Azure or AWS. Now we have Rocky Linux available on those platforms and I’ve had a quick look and done some testing using the Rocky Linux 8.4 (Green Obsidian).
In the previous posts I shared my first impression and how to start using the Google Bare Metal Service (BMS). In this post I will try to show some numbers related to the performance of the solution and you can compare it with your existing environment.
Let me start from the box characteristics. For my tests I was using a “o2-standard-32-metal” box located in the us-west2 zone (Los Angeles) . The solution was configured with 2Gbps interconnect and had a couple of storage resources attached to it. The first one was represented by two 512Gb disks based on HDD storage where I placed my binaries and a recovery ASM disk group and the second was a 2Tb volume “all flash” I used for data. Here is summary table:
BMS Box type
Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz
512 Gb – Standard disk
512 Gb – Standard disk
2048 Gb – All flash
4 NICs Speed: 25000Mb/s
Oracle Linux 7.9
BMS box characteristics.
Before starting the tests I updated my Oracle Linux and installed a number of packages required for my Oracle database and packages to test IO and Network such as fio and iperf3. Here is a summary table with software and tools used to test the performance.
In the previous post I put some of my thoughts on why you would use the Google Bare Metal Service (BMS) and my first impression about it. In this post I want to talk about the first steps and how you can start to work with the Google Bare Metal Service (BMS).
To put your hands on BMS you need to contact your Google Cloud sales representative and order it. It means you need to know to some extent your requirements and prepare for that. The major preparation steps are described in the Google documentation and here I will try to go through some of them.
The first main step is to outline your architecture and identify the region for the BMS. The service is a region extension and it means it is connected to your regional Google cloud infrastructure by high speed low latency network interconnect. It makes sense to place it where the most of your applications and users are going to be. For example, in my case I’ve chosen the us-west2 (Los Angeles) and it was aligned with my main test app servers and provided the best response time. The 64 bytes ping from an app server in the same region was 0.991 ms on average.