Before starting the post let me clarify that what I am going to describe as the state of readiness of the Google Cloud SQL Server is actual for early February 2022. It is quite possible that some things can be different when you read the post.
For the last several months I was helping some big enterprises to adopt Google Cloud Platform (GCP) and, as part of the implementation, a significant number of SQL Server databases were moving to the GCP Cloud SQL service. But when we started to build the environment in GCP it was clear that the SQL Server option for Cloud SQL is much inferior not only to some other cloud offerings and on-prem installations but also to other databases engines on the same Cloud SQL. In short the SQL Server on GCP Cloud SQL service lacked some essential features. Here I will try to explain why I think the SQL Server in GCP is not mature enough for enterprise.
When we discuss designing a new application or about IT services in general we talk a lot about end user interface, end user experience, cost of downtime and a thousand other things. But I don’t remember having too many discussions about developer, infrastructure engineer or IT consultant experience and how they deal with all the processes and tools surrounding each and every step of developing and implementation of the application or infrastructure. Let me explain what I mean.
Lately I work primarily with Google Public Cloud (GCP) and in particular with Kubernetes services (GKE). As result my daily routine command line tools are gcloud, kubectl, nomos and other. And when the GCP cloud shell is really amazing environment which doesn’t require any effort to fire up, sometimes it is not possible to use. When it comes to work from your own laptop you have different options. You can install the tools like Google Cloud SDK following several simple steps from the Google website or you can prepare a docker image and run it in a container. I personally prefer the second way. In such case I can periodically update entire environment without too much effort and easily can span a new environment on any laptop fairly quickly. Here I am sharing what I personally use for my day-to-day activity.
Terraform is probably already the de-facto standard for cloud deployment. I use it on a daily basis deploying and destroying my tests and demo setups in my Oracle cloud tenancy. Sometimes the deployment environment for a demo has too many files or some files inside are really big and hard to read due to the number of different resources and parameters included there. How can we make our configuration more usable? Let’s try Terraform modules and demonstrate how they work. For our tests we are going to use terraform v1.0.3 and Oracle Cloud Infrastructure (OCI). You will need a working OCI and on your machine with terraform defined environment variables. The full list of required environment variables will be provided in the README file in the GitHub repository. Let’s say we have a simple demo or test configuration with a dedicated network, internet gateway and a VM. And we want to assign multiple security rules using security lists and maybe one or two security groups. We can include all those rules to the configuration file for the network but maybe there is a better way. What if we want to reuse the similar set of the security rules and security groups not only to that deployment but share with some other stacks? We can try to use Terraform modules.
If you’ve been following the recent changes in the linux world you probably remember how Red Hat and Centos announced in December 2020 that the CentOS Project was shifting focus to CentOS Stream and support for CentOS Linux 8 had been cut to December 31, 2021. It created a wave of discussions in the community about the future for Centos as an enterprise platform and some people started to look to alternative Linux distributives. As a result we got a new, community-driven downstream built, same as Centos used to be, Rocky linux.
The downstream build is based on the same code base as the vendor distributive and resembles most features of the “parent” vendor Linux. It is following all the releases after they have been built by the vendor. In most of my tests I am using Oracle Linux when I am in the Oracle cloud but I am using Centos in Google cloud and other public clouds like Azure or AWS. Now we have Rocky Linux available on those platforms and I’ve had a quick look and done some testing using the Rocky Linux 8.4 (Green Obsidian).
Some time ago I wrote a short blog about dependencies between the number of enabled CPUs and how many databases you could build. Today we got another error when we were trying to create a new database. Here is the screenshot of the error.
If you can’t read it on a small screen it says “Create Database operation failed due to an unknown error. Refer to work request ID 2580d3ff-064e-4e6f-ab06-1327fd02f40e when opening a Service Request at My Oracle Support.” and provide an error code which is “Error“
Some time ago I updated my terraform command line tool to the version 0.15.3 and was surprised how easy it went. Originally I planned to write a blog but it was not too much to write about. The upgrades to version 11 or 13 were much more painful. Last week HashiCorp announced Terraform version 1.0 General Availability and it meant that the time for a new upgrade had come. I upgraded it on one of my machines and decided to write a short blog about both upgrades to encourage people to try and do the upgrade.
Most Oracle DBA are sufficiently educated about benefits using large memory pages for Oracle database SGA to reduce overhead and improve performance. If you want to read more about it you can start from that Oracle blog or read it from other multiple articles and blogs. Oracle is using parameter use_large_pages to direct behaviour of an Oracle instance during startup.
In the previous versions before 19c we had three possible values – “TRUE”, “FALSE and “ONLY”. Since Oracle 220.127.116.11 the “TRUE” meant that the instance will allocate as many hugepages as free available in the system and get the rest from the normal small pages. The “FALSE” would tell it to not use the hugepages at all and the “ONLY” would be able to start an instance only if sufficient number of free hugepages is available in the system to fit all SGA in it. The “TRUE” was default for all databases.
In the 19c version we got one more value – “AUTO_ONLY” and now it is the default value for Exadata systems running Oracle Database 19c. The description in documentation is not totally clear and sounds very similar to the description of “ONLY” value. Here is an excerpt from the documentation:
“It specifies that, during startup, the instance will calculate and request the number of large pages it requires. If the operating system can fulfill this request, then the instance will start successfully. If the operating system cannot fulfill this request, then the instance will fail to start.”
Let me show you how it works. Here is my sandbox with a 19c database and no hugepages is configured on the box by default.
In the previous posts I shared my first impression and how to start using the Google Bare Metal Service (BMS). In this post I will try to show some numbers related to the performance of the solution and you can compare it with your existing environment.
Let me start from the box characteristics. For my tests I was using a “o2-standard-32-metal” box located in the us-west2 zone (Los Angeles) . The solution was configured with 2Gbps interconnect and had a couple of storage resources attached to it. The first one was represented by two 512Gb disks based on HDD storage where I placed my binaries and a recovery ASM disk group and the second was a 2Tb volume “all flash” I used for data. Here is summary table:
BMS Box type
Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz
512 Gb – Standard disk
512 Gb – Standard disk
2048 Gb – All flash
4 NICs Speed: 25000Mb/s
Oracle Linux 7.9
BMS box characteristics.
Before starting the tests I updated my Oracle Linux and installed a number of packages required for my Oracle database and packages to test IO and Network such as fio and iperf3. Here is a summary table with software and tools used to test the performance.
In the previous post I put some of my thoughts on why you would use the Google Bare Metal Service (BMS) and my first impression about it. In this post I want to talk about the first steps and how you can start to work with the Google Bare Metal Service (BMS).
To put your hands on BMS you need to contact your Google Cloud sales representative and order it. It means you need to know to some extent your requirements and prepare for that. The major preparation steps are described in the Google documentation and here I will try to go through some of them.
The first main step is to outline your architecture and identify the region for the BMS. The service is a region extension and it means it is connected to your regional Google cloud infrastructure by high speed low latency network interconnect. It makes sense to place it where the most of your applications and users are going to be. For example, in my case I’ve chosen the us-west2 (Los Angeles) and it was aligned with my main test app servers and provided the best response time. The 64 bytes ping from an app server in the same region was 0.991 ms on average.