Oracle OCI Resource Manager Discovery.

If you work with Terraform, you are quite familiar with the situation when a lot of resources have already been deployed manually. What options do we have in such a case? The first one is to use the native Terraform Resource Discovery and create the state file, which can be imported to your enterprise configuration. But if you plan to use Resource Manager in OCI, you can use the new Resource Manager Discovery feature. It creates a stack discovering your resources in a compartment.
Let’s see how it works. In my Ashburn region, I have a regional network with private and public subnets, three compute instances, and a MySQL database. All the resources were deployed manually from the console or command line without using OCI Resource Manager.

Now I am going to the Resource manager page and push the button “Create Stack”.

When you do that, you have four options, and the last one is to create a stack from the existing configuration using already deployed resources.

You can choose whether you want all services in the compartment or only a specific subset of resources. It would be nice to choose based on the compartment and tags, but it is not an option for now. Also, please be aware the sub-compartments are not going to be included.

Then you push next, next again and finally, after pushing the “Create” button, you are going to get running Oracle Resource Manager stack with all your selected resources in the compartment. That can be an excellent first step in adopting Infrastructure as Code (IaC) approach in your environment.

For those who are just started with Terraform and Resource Manager, it can be a good training material and syntax template. You can download the zip file with the terraform configuration, modify it, zip it back, and edit the stack by pushing the “Edit Stack” button to upload the new configuration.

If you have a look inside the zip file, you will see normal terraform files that could be used as a basis for your future Resource Manager deployment. The zip file contains a dedicated “*.tf” file for each group of resources. I don’t have most of them, and as a result, many “tf” files are empty.

otochkin$ ll
total 328
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff   462B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    18K 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff   999B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff   1.8K 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff   1.5K 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff   648B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    38B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff   596B 12 Aug  2020
-rw-rw-r--@ 1 otochkin  staff    63B 12 Aug  2020
otochkin$ cat
## This configuration was generated by terraform-provider-oci

Most of the resources are concentrated in the “” file where we can see all compute instances and other resources, including network and database.

otochkin$ cat
## This configuration was generated by terraform-provider-oci
resource oci_core_instance export_app-forms-01 {
  agent_config {
    is_management_disabled = "false"
    is_monitoring_disabled = "false"
  availability_domain =
  compartment_id      = var.compartment_ocid
  create_vnic_details {
    assign_public_ip = "true"
    defined_tags = {

I think the new feature is one of the key improvements that help administrators adopt the automated deployment and management framework. The next blog is about using version control for your terraform in the Oracle cloud. Stay tuned.

Oracle ExaCC Gen 2 new features and improvements.

Some time ago after the last Oracle Open World Christine Kivi wrote a blog stating that this is not “your father’s Oracle” anymore . The rapid development and continuous improvements in Oracle cloud is one of the signs that Oracle is changing. The generation 2 Exadata cloud at customer (ExaCC) was released on that last OOW 19 and initially had some limitations in options and interface. Oracle team promised to fix the issues and provide new functionality, planning some major updates in the next calendar year (2020). And so far as I can see Oracle team is working delivering the promised. Here I will try to review some of the new features implemented for the last several months. This is going to be a relatively long post. You can go to the bottom, read the summary and read in details only about changes you are interested in.

Let’s start from March 31 when shared Oracle homes feature was introduced to the ExaCC gen 2. Until that time we were able to create only one Oracle database per home and it put some limits to the potential number of the databases on the environment. After making the homes shareable it allowed to reduce usage of the local file system and potentially reduce management overhead. It looked like a simple change but considering all the required updates in internal tools responsible for creating, updating, cleaning and patching the database and changes to OCI console it didn’t look like a small fix. Now you can create multiple databases using the same home but keep in mind if you want to patch the home you will need to update all containers using it. Also be aware about location for TDE wallet and dedicated sqlnet.ora and tnsnames.ora for each database.

[oracle@virt1 ~]$ cat testdb01.env | grep TNS
TNS_ADMIN=/u02/app/oracle/product/; export TNS_ADMIN
oracle@virt1 ~]$ srvctl getenv database -db testdb02
[oracle@virt1 ~]$

In the May the improvements continued and we got some changes in the interface and functionality. One of the major changes was the option to choose a character set for you container database. In some cases it is extremely important for earlier than 12.2 database releases. If you have 11gr2 or 12cr1 databases to migrate you have to be able to choose it. The option was theoretically available even before with using “dbaasapi” tool but it didn’t work perfectly well for me. After adding support for console it was working fine from the both interfaces whether it was Web GUI or “dbaasapi” tool.

One more update was announced on the same day in May and was related to the local timezone. It allows to specify a timezone for your exadata infrastructure during creation. The default time zone before that was UTC.

The next update came about a month later in June and included two major updates in functionality. The first one was the long anticipated option to create multiple VM clusters on ExaCC gen 2. Let me explain in a couple of words what it means. The ExaCC behind the scenes is built on virtualized Exadata where host machines (Dom0) and storage managed by Oracle and the VM clusters (OVM based DomU) are managed by the customer. Initial release of ExaCC generation 2 allowed to create only one VM cluster per machine. It means the VM (DomU) on each compute node was taking all available memory, local filesystem etc. With the update you can virtually split your ExaCC to several VM clusters segregating network, storage and access for different environments. The change introduced new features in the OCI API and Web GUI console interface for the ExaCC. 

We can specify CPU, Exadata storage, memory, local filesystem size per cluster. But it is not only that. If you try to go to the page of scaling the existing cluster it has all those values. So, you should be able to scale up and down OCPU, memory and storage allocation for your VM.

One more feature was introduced on the same day in June – offline OCPU scaling for VM clusters. It provided ability to scale your OCPU allocation even when you don’t have connectivity to OCI. Of course you will not be able to scale from Web GUI but you can use the “dbaascli” utility to do that. The command “dbaascli cpuscale update” will put new number of OCPU per VM it will be synchronized with the Web console as soon as you get your connectivity back. So far it is available only in selected regions such us-ashburn-1, ap-hyderabad-1 and sa-sanpaulo-1.

The next major update came just a week after the previous one and made Oracle Autonomous database available on ExaCC. That was a long expected change promised when the generation 2 ExaCC was revealed to the public on the Oracle Open World 2019. The update allows us to create Autonomous Exadata VM Clusters and deploy Autonomous Container Databases and Autonomous Databases. I am sure most of us are aware about Oracle Autonomous databases but if you’ve missed it you can check it here. The deployment is similar to what you can have with Oracle Autonomous on dedicated Exadata Infrastructure. I have to note that we cannot mix non-Autonomous and Autonomous VM clusters on the same ExaCC. So, unfortunately we cannot deploy a non-autonomous VM cluster and try the autonomous on the same ExaCC. The autonomous deployment will take all rac resources and after the deployment we can have up to 12 Autonomous Containers. The max number of the actual Autonomous databases varies from 100 on a quarter rack ExaCC to 400 on the full rack. So the number is roughly the same as the number of available OCPU. I think it is a very promising step and look at that as the second step to proper PaaS databases.

The next update on July 14 enabled per-second billing for OCPU usage on ExaCC and a week after the same update came for Autonomous on ExaCC. Now, instead of paying for an hour even if you scaled up for several minutes you are paying just for time you were actually using the OCPU. The minimum billing is a minute, so if you’ve scaled your OCPU up for less than a minute you pay for a minute at least. I am not sure even if it is technically possible to scale up or down for less than a minute. I seriously doubt that.

The last known and publicly available update on July 28 introduced the new interface feature allowing you to patch GI and database homes from the OCI console. You need to have all nodes up and running and for a database all instances up to perform the patching. It is not too much different from requirements for the dbaascli tool. The main benefit I see is the ability to use Oracle OCI API to embed it to the other maintenance procedures in your OCI environment.

As a summary I am providing list of changes in chronological order:

  • March 31, 2020 – Shared Database Homes for Exadata Cloud at Customer Systems.
  • May 7, 2020 – Character set and national character set can now be configured.
  • June 13, 2020 – Create and manage multiple virtual machines per Exadata system.
  • June 13, 2020 – Scale OCPUs without cloud connectivity.
  • June 23, 2020 – Oracle Autonomous Database.
  • July 14, 2020 – Per-Second Billing for OCPU Usage.
  • July 28, 2020 – Oracle Grid Infrastructure and Oracle Database Patching.

It looks like the pace is gaining momentum and we are getting new updates almost every week now. What are we going to get in the next coming weeks? Let’s see.

Oracle multi-tenant: PDB scope parameters and where you can see them.

Working with Oracle multi-tenant architecture gives us some obvious benefits but also some challenges. What if we want to change a system parameter but only for a certain pluggable database (PDB) and keep it default for all others? Starting from 12.1 Oracle provides the ability to modify parameters on PDB level. If you look to a reference documentation for database parameters it states clearly whether it can be applied on CDB level or not. And with every new release we have more and more parameters which can be changed on PDB level. It has grown from 185 for to 194 on

The changing for a parameter is quite simple. You switch to the container you want to apply the parameter and run the standard “ALTER SYSTEM …” command or you can run it from the upper (CDB$ROOT) level and add clause container=all and apply it to all your containers.

testdb021> ALTER SESSION SET container=pdb02;
SESSION altered.
testdb021> ALTER system SET db_securefile='PREFERRED' sid='*' scope=spfile;
System altered.

Now we know how to modify a parameter. But where are those PDB scope parameters stored and how to verify it? If you try to read your spfile you will not be able to see the pdb scope parameters. The v$spparameter view doesn’t provide result either:

testdb021> SELECT COUNT(*) FROM v$spparameter WHERE con_id=2;

But  you can see the parameters in the v$system_parameter view filtering it by the con_id value:

testdb021> SELECT name,VALUE FROM v$system_parameter WHERE con_id=2 AND isdefault='FALSE';
----------------------------------- ----------------------------------------------------------------------
sga_target			    0
db_securefile			    PREFERRED

So, we can check it in the v$system_parameter view. But how are they actually stored. We have a table pdb_spfile$ in the SYSTEM tablespace where all our PDB-scope parameters are defined.

estdb021>; SELECT sid,name,VALUE$,pdb_uid FROM pdb_spfile$;
SID	   NAME 			       VALUE$						     PDB_UID
---------- ----------------------------------- -------------------------------------------------- ----------
*	   db_securefile		       'PREFERRED'					  2060554552
*	   db_securefile		       'PREFERRED'					  2007906006
testdb021  local_listener		       '(ADDRESS=(PROTOCOL=TCP)(HOST= 2007906006
testdb022  local_listener		       '(ADDRESS=(PROTOCOL=TCP)(HOST= 2007906006
*	   nls_date_format		       'YYYY-MM-DD HH24:MI:SS'				  2007906006

So that table is used to apply the parameters every time when you open the pluggable database. The PDB_UID value can be found in the v$pdb or dba_pdbs views.

testdb021> SELECT pdb_name,con_uid FROM dba_pdbs;
PDB_NAME															    CON_UID
-------------------------------------------------------------------------------------------------------------------------------- ----------
PDB$SEED															 2060554552
PDB02																 2007906006

Hope that small dive into the multitenant architecture can help you in your daily tasks or when you need to adjust your PDB according to requirements. 

Happy tuning.

Migrating a new PDB to the existing 19c DataGuard on ExaCC.

We’ve already discussed  how to migrate databases from a standalone 12.2 database to a pluggable database (PDB) in a 19c container in the Oracle cloud. But what if the target container database (CDB)  is already part of a Data Guard configuration and has several PDB in it? I will try to go through the main steps on how to do that without breaking the replication. 

From the high level the process boiled down to several main steps. We plugged in the new database with the clause “standbys=none”. Then we did all our patching, upgrade and all necessary configuration changes. After that I could instantiate the data on the disaster recovery (DR) site in our standby PDB. The final steps made the new PDB on standby site to use the copied data files and enabled the replication.

Here is the high level flowchart for the process:

We started from adding the new PDB on the primary side. You might notice the clause “standbys=none” in the command.

testdb011> CREATE pluggable DATABASE testdg02 AS clone USING '/home/oracle/working/bcadmgo/pdbcdb2.xml' standbys=NONE;
Pluggable DATABASE created.
testdb011> SELECT name,open_mode,recovery_status FROM v$pdbs;
------------------------------------------------------------ ---------- --------
testdb011> ALTER pluggable DATABASE TESTDG02 OPEN upgrade;
Pluggable DATABASE altered.
testdb011> SHOW pdbs
--------------------- ------------------------------ ---------- ----------
		 2.00 PDB$SEED			     READ ONLY	NO
		 3.00 TESTPDB1			     READ WRITE NO
		 4.00 TESTDG01			     READ WRITE NO
		 5.00 TESTDG02			     MIGRATE	YES

You can see in the output the database plugged in and in “MIGRATE” state on the primary side and the recovery state is “DISABLED” . What would we see on the DR site?

testdb011> SELECT name,open_mode,recovery_status FROM v$pdbs;
------------------------------------------------------------ ---------- --------

The state is “MOUNTED” and recovery is “DISABLED”. It was expected but what should it show for data files since nothing was really replicated. The alert log showed us that the PDB was plugged into the standby CDB with nonexistent datafile using name pattern UNNAMED*.

Recovery created pluggable database TESTDG02
TESTDG02(5):File #26 added to control file as 'UNNAMED00026'. Originally created as:
TESTDG02(5):because the pluggable database was created with nostandby
TESTDG02(5):or the tablespace belonging to the pluggable database is
TESTDG02(5):File #27 added to control file as 'UNNAMED00027'. Originally created as:
TESTDG02(5):because the pluggable database was created with nostandby
TESTDG02(5):or the tablespace belonging to the pluggable database is
TESTDG02(5):File #28 added to control file as 'UNNAMED00028'. Originally created as:
TESTDG02(5):because the pluggable database was created with nostandby
TESTDG02(5):or the tablespace belonging to the pluggable database is

After all patching and upgrade activities on the primary side which you could read in the previous post we were able to open the database. The next step was to make an actual copy of the PDB on the DR site. We used RMAN restore “from service” option. It copied all PDB data files from the source over the network.

Before running the script I recommend to make sure no archived logs are going to be deleted on primary and standby. You can just run the command on both sides to change the RMAN archivelog deletion policy. And monitor the  FRA free size after running it.

RMAN> configure archivelog deletion policy to none;

And if you were using Transparent Data Encryption (TDE) you should have copied your updated wallet over to be able to read the encrypted data files. In the Oracle cloud it will be certainly the case.

The RMAN script is to copy the data was quite simple:

set newname for pluggable database TESTDG02 to new;
restore pluggable database TESTDG02 from service testdb01_moncton;

After running the script we got the copy of all data files for our standby PDB. After doing that we should stop our replication. We used the Data Guard Broker and our command looked like that:

DGMGRL> show configuration
Configuration - dgconf_testdb01
  Protection Mode: MaxPerformance
  testdb01_moncton - Primary database
    testdb01_halifax - Physical standby database
Fast-Start Failover:  Disabled
Configuration Status:
SUCCESS   (status updated 105 seconds ago)
DGMGRL> edit database testdb01_halifax set state='apply-off';

Having the replication stopped we switched our standby PDB to use the copied files.

RMAN> switch pluggable database TESTDG02 to copy;
datafile 26 switched to datafile copy "+DATAC1/TESTDB01_HALIFAX/A748656F0AC4F483E0530AFE10AC16AA/DATAFILE/system.519.1042285693"
datafile 27 switched to datafile copy "+DATAC1/TESTDB01_HALIFAX/A748656F0AC4F483E0530AFE10AC16AA/DATAFILE/sysaux.520.1042285701"
datafile 28 switched to datafile copy "+DATAC1/TESTDB01_HALIFAX/A748656F0AC4F483E0530AFE10AC16AA/DATAFILE/undotbs1.521.1042285709"
datafile 29 switched to datafile copy "+DATAC1/TESTDB01_HALIFAX/A748656F0AC4F483E0530AFE10AC16AA/DATAFILE/undo_2.522.1042285713"
datafile 30 switched to datafile copy "+DATAC1/TESTDB01_HALIFAX/A748656F0AC4F483E0530AFE10AC16AA/DATAFILE/users.523.1042285719"

And enabled the replication for the PDB:

testdb011> ALTER SESSION SET container=TESTDG02;
SESSION altered.
testdb011> ALTER pluggable DATABASE enable recovery;
Pluggable DATABASE altered.

That would not be enough if you had database version 18 and higher. You should change status for the datafile to online. We had version 19.7 and ran a script to online the files.

testdb011> ALTER DATABASE datafile '+DATAC1/TESTDB01_HALIFAX/A748656F0AC4F483E0530AFE10AC16AA/DATAFILE/system.519.1042285693' online;
ALTER DATABASE datafile '+DATAC1/TESTDB01_HALIFAX/A748656F0AC4F483E0530AFE10AC16AA/DATAFILE/sysaux.520.1042285701' online;
ALTER DATABASE datafile '+DATAC1/TESTDB01_HALIFAX/A748656F0AC4F483E0530AFE10AC16AA/DATAFILE/undotbs1.521.1042285709' online;
ALTER DATABASE datafile '+DATAC1/TESTDB01_HALIFAX/A748656F0AC4F483E0530AFE10AC16AA/DATAFILE/undo_2.522.1042285713' online;
ALTER DATABASE datafile '+DATAC1/TESTDB01_HALIFAX/A748656F0AC4F483E0530AFE10AC16AA/DATAFILE/users.523.1042285719' online;
DATABASE altered.

Everything was ready and the last task was to resume the replication in the Data Guard Broker:

DGMGRL> edit database testdb01_halifax set state='apply-on';
DGMGRL> disable configuration
DGMGRL> enable configuration

We could see that the replication was enabled for the PDB:

testdb011> SELECT name,open_mode,recovery_status FROM v$pdbs;
------------------------------------------------------------ ---------- --------

The rest of the action such as switching the database to read-only mode (if you have Active Data Guard) or other configuration steps are up to your requirements.

As you could see the procedure is not complicated but still have quite a few steps to execute. Some steps could be and should be automated but it would be subject for another topic. Happy migrations and stay tuned.

Moving Oracle database to the cloud. 12.2 standalone to 19C RAC PDB.

I see more and more Oracle databases are moving to the public cloud or to a hybrid cloud solution. Depending on the platform, size and used options it could be a different path but the general approach boils down to three main options – Oracle RMAN backup and restore, Oracle Data Guard or Oracle Data Pump with or without transportable tablespaces. Here I want to share our approach for migration from 12.2 standalone database to 19c RAC container as PDB in the Oracle Cloud Infrastructure (OCI). 

Here are the initial conditions and requirements. We had multiple Linux 86×64 12.2 Enterprise edition standalone databases on a file system based storage moving  to Oracle Cloud Extreme Performance 19c with RAC on ASM.

Considering size for the databases and the endian format (little) the most viable option was Oracle DataGuard (DG) . The main question was whether we upgrade our database and convert to a pluggable database (PDB) on-prem and move it to the cloud after or do the migration, upgrade and conversion in one shot using the same downtime window. We’ve chosen the latter.

Here is the high level diagram:

And here is general workflow:

The source database was analyzed by the Oracle preupgrade.jar tool to verify if it was ready to be upgraded to 19c. A few issues were fixed in advance and some reported problems were ignored. There was no universal solution for everybody to tell which warning should be ignored and which one should be taken into consideration and fixed. 

The next step was to prepare the database to use TDE on the target platform. An Oracle encryption wallet and master key was created for the original database. The basic steps are following (all paths and values are arbitrary):

SQL> administer KEY management CREATE keystore '/u01/app/oracle/product/12.2.0/dbhome_1/network/admin' IDENTIFIED BY #SYS_PASSWORD#;
keystore altered.
SQL> administer KEY management SET keystore OPEN IDENTIFIED BY #SYS_PASSWORD#;
keystore altered.
SQL> administer KEY management SET KEY IDENTIFIED BY #SYS_PASSWORD# WITH backup;
keystore altered.
SQL> administer KEY management CREATE auto_login keystore FROM keystore '/u01/app/oracle/product/12.2.0/dbhome_1/network/admin' IDENTIFIED BY #SYS_PASSWORD#;
keystore altered.
SQL> administer KEY management SET keystore close IDENTIFIED BY #SYS_PASSWORD#;
keystore altered.

You can verify the status of your wallet in the v$encryption_wallet view and make sure it is shown as “open”.

After that we created the target 19c container RAC database on DBCS using ASM as storage (you remember you have two options for DBCS). Having the database created we were able to use the first node as a staging area for our DR. We cloned the database software from on-prem to that node and created a dummy database with the same database name but with a different unique name.

[oracle@oracloud01 ~]$./ ORACLE_HOME=/u01/app/oracle/staging/12.2.0/dbhome_1 ORACLE_HOME_NAME=OraHome12201_dbhome_1 ORACLE_BASE=/u01/app/oracle 
[oracle@oracloud01 ~]$ dbca -createDatabase -silent -createAsContainerDatabase false -templateName General_Purpose.dbc -gdbName nsppwcb -storageType ASM -diskGroupName +DATAC1 -recoveryAreaDestination +RECOC1 -recoveryAreaSize 10240 -initParams db_name=appdb,db_unique_name=appdbstg,sga_target=2G,sga_max_size=2G
Enter SYS user password:
Enter SYSTEM user password:

Having set it up and copied the wallet from source to the staging node we were ready to set up a Data Guard standby where the cloud based standby would be using the cloned home and be encrypted using the master key we created on the source. 

I used the Oracle Zero Downtime Migration tool to establish the replication and prepare the staging. It saved time, effort and avoided human mistakes providing a unified, consistent approach to all the migration. We spent some time troubleshooting different  issues during implementation and a dry run for the first database but it paid off later.  We used the parameter “-pauseafter ZDM_CONFIGURE_DG_SRC” to wait before actual cutover. As you could see the Oracle ZDM can be useful even if it cannot cover the complete migration path. 

Before doing cutover we also moved the standby datafiles to be inside the ASM under ” <Disk group>/<CDB unique name>/<Future PDB GUID>>” using rman backup as copy. I might write another short blog with all the details on how to do that.

We were keeping the Data Guard replication until the cutover time when the real downtime for production started. On the scheduled time we resumed the ZDM job to complete the database switchover and make our staging database primary. The command was simple:

./zdmcli resume job -jobid 3
and in the output for "query job" command we got list of completed actions:
ZDM_CLONE_TGT ................. COMPLETED
[zdmuser@vlxpr1008 ~]$

After the successful switchover we started the staging database in read only mode, exported the encryption key and description file to be plugged to the CDB.

SQL> administer KEY management export encryption KEYS WITH secret "my_secret" TO '/home/oracle/appdb01_export.p12' force keystore IDENTIFIED BY #SYS_PASSWORD;
keystore altered.
SQL> !ls -l /home/oracle/appdb01_export.p12
-rw-r--r-- 1 oracle asmadmin 2548 Mar  2 19:48 /home/oracle/appdb01_export.p12
    pdb_descr_file => '/home/oracle/appdb01.xml');
END;  2    3    4
  5  /
PL/SQL PROCEDURE successfully completed.

It is recommended to verify the target PDB for any violations if it will be plugged into the target CDB using “DBMS_PDB.CHECK_PLUG_COMPATIBILITY” package and the exported xml file.

  compatible CONSTANT VARCHAR2(3) :=
           pdb_descr_file => '/home/oracle/appdb1.xml',
           pdb_name       => 'appdb1')
    ELSE 'NO'
END;SQL>   2    3    4    5    6    7    8    9   10   11
 12  /
PL/SQL PROCEDURE successfully completed.
SQL> SELECT line,message,STATUS FROM pdb_plug_in_violations WHERE name='APPDB1' ORDER BY TIME,line;

Some violations like version mismatch and the fact the database is not yet container PDB could be ignored. Also keep in mind that some violations are “ERROR” type and should be fixed sooner or later but some are just “WARNING” and might not have any impact.

After that we shut down our staging database and plugged it into the target CDB with “nocopy” option effectively using already encrypted data files and saving time during the cutover downtime. 

SQL> CREATE pluggable DATABASE appdb01 USING '/home/oracle/appdb01.xml' nocopy;
Pluggable DATABASE created.
SQL> SHOW pdbs
---------- ------------------------------ ---------- ----------

Our database was plugged in to the target container CDB and was ready for upgrade. Before doing the upgrade I imported the master encryption key we used on the source and staging.

SQL> administer KEY management import encryption KEYS WITH secret "my_secret" FROM '/home/oracle/appdb01_export.p12' force keystore IDENTIFIED BY #SYS_PASSWORD WITH backup;
keystore altered.

The next step is to upgrade our new PDB to make it the same version as the container.

SQL>alter session set container=APPDB01;
SQL>startup upgrade;
[oracle@oracloud01 ~]$ cd $ORACLE_HOME/rdbms/admin
[oracle@oracloud01 admin]$ $ORACLE_HOME/perl/bin/perl -c APPDB01 catupgrd.sql
Argument list for []
For Oracle internal use only A = 0
Run in                       c = APPDB01
Do not run in                C = 0
Input Directory              d = 0
Echo OFF                     e = 1
Simulate                     E = 0
Forced cleanup               F = 0
Log Id                       i = 0
Child Process                I = 0
Log Dir                      l = 0
Priority List Name           L = 0
Upgrade Mode active          M = 0
SQL Process Count            n = 0
SQL PDB Process Count        N = 0
Open Mode Normal             o = 0
Start Phase                  p = 0
End Phase                    P = 0
Reverse Order                r = 0
AutoUpgrade Resume           R = 0
Script                       s = 0
Serial Run                   S = 0
RO User Tablespaces          T = 0
Display Phases               y = 0
Debug              z = 0
Debug              Z = 0 VERSION: []
           STATUS: [Production]

After all successful upgrade steps (including fixup.sql script if it is required) our database was almost ready and required only to be converted into PDB.

SQL> ALTER SESSION SET container=appdb01;
SESSION altered.
SQL> @$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql
SQL> SET trimout ON
SQL> SET trimspool ON
SQL> SET underline "-"
SQL> SET verify OFF
SQL> SET wrap ON
SQL> SET xmloptimizationcheck OFF
SQL> SHOW pdbs
    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         4 APPDB01                        READ WRITE YES
SQL> SHOW con_name
Pluggable DATABASE closed.
SQL> startup
Pluggable DATABASE opened.

And you can verify all components in your pluggable database using dba_registry_view:

SQL> col comp_name FOR a50
SQL> SELECT comp_name, version, STATUS FROM dba_registry;
COMP_NAME                                          VERSION                        STATUS
-------------------------------------------------- ------------------------------ --------------------
Oracle DATABASE Catalog Views                                 VALID
Oracle DATABASE Packages AND Types                            VALID
JServer JAVA Virtual Machine                                  VALID
Oracle XDK                                                    VALID
Oracle DATABASE Java Packages                                 VALID
OLAP Analytic Workspace                                       VALID
Oracle REAL Application Clusters                              VALID
Oracle XML DATABASE                                           VALID
Oracle Workspace Manager                                      VALID
Oracle Text                                                   VALID
Oracle Multimedia                                             VALID
Spatial                                                       VALID
Oracle OLAP API                                               VALID
Oracle Label Security                                         VALID
Oracle DATABASE Vault                                         VALID
15 ROWS selected.

The result is a fully migrated and upgraded database after 1 hour 30 min cutover time. Of course you still need to create services, complete acceptance and verification tests, adjust UNDO tablespaces  and do some application or company specific actions but the migration is done. The staging home and leftovers from the database could be removed if they are not going to be used for the next migration to the same container.

I didn’t put all the small details or issues we encountered and should solve during our migrations – it would be too long and totally unreadable. Hopefully I will be able to create a webinar or discuss on one of the virtual events about different pitfalls and unexpected issues you can expect during migration.

This is only one case when you have a 12.2 source and 19c target in the Oracle cloud but for the last several months we did different migrations involving other source and target versions and platforms. Let us know if you need our help and we will be happy to do that. 

Oracle Zero Downtime Migration troubleshooting.

The Oracle Zero Downtime Migration (ZDM)  tool has been created to help with Oracle databases migrations. It saves time and automates many tasks. The saved effort is getting bigger and accumulates on scale when you need to move multiple databases. Behind the scenes it uses the very well known Oracle Data Guard. As a result you have good solid technology on the basis but at the same time limited to what the DataGuard can do and what it cannot. All details and documentation are available here.  The tool works fine when all prerequisites are met but when you hit an issue you need to dig in and troubleshoot. Here I will try to share some experience with the ZDM troubleshooting. Please note that the information in the blog is actual for the 19.2 version of ZDM and it is possible that the behaviour will be different in the future versions.

We set up the tool, verified all prerequisites and ran a migration job in evaluation mode using the “-eval” parameter but the job failed.

To monitor a job execution you can use a command like “zdmcli query job -jobid 5”. The output will provide the basic information about the job and result of each phase. In case of success it looks like:

[zdmuser@zdmserver ~]$ /opt/oracle/app/zdmhome/bin/zdmcli query job -jobid 8
zdmserver: Audit ID: 516
Job ID: 8
User: zdmuser
Client: zdmuser
Scheduled job command: "zdmcli migrate database -sourcesid SOURCEDB -sourcenode source.localdomain -srcauth zdmauth -srcarg1 user:zdmuser -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode target.localdomain -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo -targethome /u02/app/oracle/product/12.2.0/dbhome1 -rsp /home/zdmuser/zdm_template_sourcedbstg.rsp -eval"
Scheduled job execution start time: 2020-04-20T14:26:25-03. Equivalent local time: 2020-04-20 14:26:25
Current status: SUCCEEDED
Result file path: "/opt/oracle/app/zdmbase/chkbase/scheduled/job-8-2020-04-20-14:26:38.log"
Job execution start time: 2020-04-20 14:26:38
Job execution end time: 2020-04-20 14:32:29
Job execution elapsed time: 5 minutes 50 seconds

But, unfortunately it failed in our case and you could see something like that:

zdmserver: Processing response file …
zdmserver: Starting zero downtime migrate operation …
zdmserver: Executing phase ZDM_GET_SRC_INFO
zdmserver: retrieving information about database "DSST" …
zdmserver: Executing phase ZDM_GET_TGT_INFO
zdmserver: Retrieving information from target node "target.localhost" …
zdmserver: Executing phase ZDM_SETUP_SRC
zdmserver: Setting up ZDM on the source node source.localhost …
zdmserver: Executing phase ZDM_SETUP_TGT
zdmserver: Setting up ZDM on the target node target.localhost …
zdmserver: Executing phase ZDM_GEN_RMAN_PASSWD
zdmserver: Executing phase ZDM_PREUSERACTIONS
zdmserver: Executing phase ZDM_PREUSERACTIONS_TGT
zdmserver: Executing phase ZDM_VALIDATE_SRC
zdmserver: Validating standby on the source node source.localhost …
zdmserver: Executing phase ZDM_VALIDATE_TGT
zdmserver: Validating standby on the target node target.localhost …

It was apparently not enough to troubleshoot the problem. We needed more logs. Luckily we had had full set of the logs on the source and target in the /tmp/zdm-*/log/ directory.

zdmuser@plxde746 ~]$ view /tmp/zdm-237637609/zdm/log/mZDM_oss_standby_validate_src_3119.log
19:25:17.000: Command received is : mZDM_oss_standby_validate_src -sdbsid SOURCEDB -sdbdomain localdomain -sdbhome /opt/oracle/release/ -dbid 111111111 -scn 331201554581 -tdbname sourcedb -tdbhome /u02/app/oracle/product/12.2.0/dbhome1 -sdbScanName source.localdomain -tdbScanName test-scan.localdomain -tdbScanPort 1521 -tdatadg +DATAC1 -tredodg +DATAC1 -trecodg +RECOC1 -bkpPath /migration/staging
19:25:17.000: ### Printing the configuration values from files:
19:25:17.000: /tmp/zdm-237637609/zdm/mZDMconfig_params
19:25:17.000: DATA_DG=+DATAC1

And that log provide fully detailed execution with all parameters, commands and values. That helped to nail down the problem and resolve it.

But it doesn’t work for all cases. Sometimes, when by some reasons it cannot even create the /tmp/zdm* directory on source or target you don’t have any logs at all.

For example, the zdm user connected to the source doesn’t have the privilege to execute some commands as root and fails on the very first steps. In such case you don’t have any other option then try to execute the migration job not in evaluation mode (-eval) but in real migration mode. I recommend in such cases put parameter -pauseafter and specify where you want the job to stop. In my case I used “-pauseafter ZDM_SETUP_SRC” . We ran the job and the execution failed on the very first step.

[zdmuser@zdmserver ~]$ /opt/oracle/app/zdmhome/bin/zdmcli query job -jobid 6
zdmserver: Audit ID: 514
Job ID: 6
User: zdmuser
Client: vlxpr1008
Scheduled job command: "zdmcli migrate database -sourcesid SOURCEDB -sourcenode source.localdomain -srcauth zdmauth -srcarg1 user:zdmuser -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode target.localdomain -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo -targethome /u02/app/oracle/product/12.2.0./dbhome1 -rsp /home/zdmuser/zdm_template_sourcedbstg.rsp -pauseafter ZDM_SETUP_SRC"
Scheduled job execution start time: 2020-03-27T09:20:30-03. Equivalent local time: 2020-03-27 09:20:30
Current status: FAILED
Result file path: "/opt/oracle/app/zdmbase/chkbase/scheduled/job-6-2020-03-27-09:20:31.log"
Job execution start time: 2020-03-27 09:20:31
Job execution end time: 2020-03-27 09:20:48
Job execution elapsed time: 16 seconds

I checked the log and found the following.

[zdmuser@zdmserver ~]$ cat /opt/oracle/app/zdmbase/chkbase/scheduled/job-6-2020-03-27-09\:20\:31.log
zdmserver: Processing response file …
zdmserver: Starting zero downtime migrate operation …
zdmserver: Executing phase ZDM_GET_SRC_INFO
zdmserver: retrieving information about database "DSST" …
PRCF-2056 : The copy operation failed on node: "source.localdomain". Details:
PRCZ-4002 : failed to execute command "/bin/cp" using the privileged execution plugin "zdmauth" on nodes "source.localdomain"
[zdmuser@zdmserver ~]$

From the first glance it looked like we were unable to use “sudo cp” but after several tests it was discovered that we lacked privilege to run “/bin/scp” on the source and could not copy the zdm files from zdmserver. After fixing the problem you can either resume the job using command “zdmcli resume job -jobid 6” or destroy it and run the “eval” job again. To destroy a job you need to run “zdmcli abort job -jobid 6”.

In my experience working with ZDM on several different environments the most of the problems boiled down to network, database software, instances configuration, and permission issues. Let me stop on the last category. The ZDM user on source and target is supposed to have full privileges as a superuser. In Oracle cloud it is an “opc” user which can run any command from “sudo”. But if you move the database from on-prem you might encounter some difficulties getting such privileges. In my case we used help from the Oracle ZDM team and from the Oracle product manager for ZDM. We also did some troubleshooting and adjusting by ourselves to put all commands to /etc/sudoers list for the ZDM user on source machines.

A couple of other problems were related to discrepancy in software level between source and target. The documented parameter “-ignore PATCH_CHECK” didn’t work for us and we used “-ignore ALL” instead. Also I found that for 12.1 and 11g the ZDM tool didn’t encrypt tablespaces on the cloud side during standby creation and used “restoreDatabase” subprogram instead “restoreAndEncryptDatabase” which was used for 12.2 and later versions. 

In summary I can say that despite few bumps and problems with the tool the ZDM was able to significantly reduce effort and number of errors during migrations even for cases when it was used only as a part of the migration process. I am looking forward to the new version and hope it provides more options for migrations. Shoot me an email or get me on twitter if you need help with migration or to make ZDM working.