Oracle 21c warm failover with the Oracle Database Kubernetes Operator (OraOperator), OCI and Portworx storage

Background

In Getting started with Oracle 21c, Oracle database Kubernetes operator (OraOperator), OCI and Portworx storage I shared details of my Terraform script which I use to build OKE clusters with Portworx storage. I detailed how to install the Oracle Database Kubernetes Operator and how to configure it for a single instance database running on Oracle Container Engine for Kubernetes (OKE) service within Oracle Cloud Infrastructure (OCI).

In the post I share how we can use Portworx replication to provide the ability to failover a Kubernetes Oracle 21c database pod across Availability Domains (ADs) with OCI.

OKE Environment

If you followed my previous post you should already have an OCI OKE and Portworx Storage cluster running an Oracle 21c database, so before we start a quick look at the OKE cluster.

Kubernetes Nodes

Use kubectl get nodes to see the Kubernetes cluster, providing OKE labels we can also confirm Region and Availability Domains (ADs) for each worker node for example.

% kubectl get nodes -L topology.kubernetes.io/region,topology.kubernetes.io/zone 
NAME          STATUS   ROLES   AGE    VERSION    REGION        ZONE
10.0.10.209   Ready    node    5d2h   v1.20.11   uk-london-1   UK-LONDON-1-AD-1
10.0.10.53    Ready    node    5d2h   v1.20.11   uk-london-1   UK-LONDON-1-AD-3
10.0.10.87    Ready    node    5d2h   v1.20.11   uk-london-1   UK-LONDON-1-AD-2

Portworx

Set-up an alias for pxctl and use it to check version of Portworx

% export PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')
% alias pxctl='kubectl exec -n kube-system ${PX_POD} -it -- /opt/pwx/bin/pxctl' 
% pxctl -v                                                                                                 
Defaulted container "portworx" out of: portworx, csi-node-driver-registrar
pxctl version 2.8.1.5-937bfdd

Oracle Database

To avoid having to specify our namespace with -n each time I will use the set-context to specify a default namespace of oracle-namespace.

% kubectl config set-context --current --namespace=oracle-namespace
Context "context-c4tnd3w25yq" modified.

Kubectl will now default to using the oracle–namespace for example,

List Databases

To list Oracle databases and confirm database version and identify which worker node is being used use kubectl get singleinstancedatabase, for example

% kubectl get singleinstancedatabase -n oracle-namespace  
NAME        EDITION      STATUS    VERSION      CONNECT STR                 OEM EXPRESS URL
sidb-pstg   Enterprise   Healthy   21.3.0.0.0   10.0.10.209:30898/PSTGCDB   https://10.0.10.209:30228/em                 

So, from the above I have determined my OKE Oracle 21c environment looks like this:

Oracle 21c running in Availability Domain 1 #OCI #OKE #Portworx
Oracle 21c running in Availability Domain 1

None of my OKE worker nodes have public interfaces, however my Terraform script created a Bastion host which has sqlplus installed. So I can use my Bastion host to connect to my database using any of the worker node IPs.

Using sqlplus we can confirm database instance nameversion and edition using the v$instance for example.

[opc@px-bastion ~]$ sqlplus sys/Kube#2022@10.0.10.87:30898/PSTGPDB1 as sysdba

SQL*Plus: Release 21.0.0.0.0 - Production on Thu Apr 7 16:32:13 2022
Version 21.3.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.

Last Successful login time: Thu Apr 07 2022 16:31:40 +00:00

Connected to:
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.3.0.0.0

SQL> set linesize 200
SQL> select INSTANCE_NAME, HOST_NAME, VERSION_FULL, EDITION from v$instance;

INSTANCE_NAME	 HOST_NAME							  VERSION_FULL	    EDITION
---------------- ---------------------------------------------------------------- ----------------- -------
PSTGCDB 	 sidb-pstg-dxlwf						  21.3.0.0.0	    EE

Note: the Oracle database host name is the Kubernetes pod name not the OKE worker node name.

Let’s create some test data

SQL> CREATE TABLE test_table TABLESPACE USERS AS
  ( SELECT LEVEL id, 
         SYSDATE+DBMS_RANDOM.VALUE(-1000, 1000) date_value, 
         DBMS_RANDOM.string('A', 20) text_value
  FROM dual
  CONNECT BY LEVEL <= 1000000)
/

Table created.

SQL> select count(*) from test_table;

  COUNT(*)
----------
   1000000

Pod Failover

Cordon Node

Before we kill the Oracle 21c pod, I will remove the Kubernetes worker node for future scheduling using kubectl cordon.

% kubectl cordon 10.0.10.209                                                                               
node/10.0.10.209 cordoned

% kubectl get nodes
NAME          STATUS                     ROLES   AGE     VERSION
10.0.10.209   Ready,SchedulingDisabled   node    5h13m   v1.20.11         
10.0.10.53    Ready                      node    5h13m   v1.20.11
10.0.10.87    Ready                      node    5h13m   v1.20.11        

Kill Pod

To simulate a node crash we can delete the running pod.

% kubectl delete pod/sidb-pstg-dxlwf
pod "sidb-pstg-dxlwf" deleted
Oracle 21c killed in Availability Domain 1 #OCI #OKE #Portworx
Oracle 21c killed in Availability Domain 1

Check Pod Status

We can see that a new Kubernetes pod has been automatically started on a different worker node in Availability Domain 2.

%  kubectl get pods -o wide
NAME              READY   STATUS    RESTARTS   AGE   IP            NODE          NOMINATED NODE   READINESS GATES
sidb-pstg-khp3d   1/1     Running   0          60s   10.244.0.137   10.0.10.87   <none>           <none>

Normally this would be an issue as a worker node is unable to access block storage from a different Availability Domain.

However, as my OKE cluster is using Portworx storage this is not a problem as the data has been replicated across the 3 Availability Domains by the Portworx storage class and is available for use in any AD.

Oracle 21c running in Availability Domain 2 #OCI #OKE #Portworx
Oracle 21c running in Availability Domain 2

Check Database Status

If I return to my Bastion host and use sqlplus with the same database connection string, we can see the host name has changed to reflect the new pod name.

[opc@px-bastion ~]$ sqlplus sys/Kube#2022@10.0.10.87:30898/PSTGPDB1 as sysdba

SQL*Plus: Release 21.0.0.0.0 - Production on Tue Apr 12 13:59:08 2022
Version 21.3.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.


Connected to:
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.3.0.0.0

SQL> set linesize 200
SQL> select INSTANCE_NAME, HOST_NAME, VERSION_FULL, EDITION from v$instance;

INSTANCE_NAME	 HOST_NAME							  VERSION_FULL	    EDITION
---------------- ---------------------------------------------------------------- ----------------- -------
PSTGCDB 	 sidb-pstg-khp3d						  21.3.0.0.0	    EE

And if we query the test table we can see how data is still available.

SQL> select count(*) from test_table;

  COUNT(*)
----------
   1000000

Portworx Volume

We can confirm that our Oracle 21c migrated database running on a worker node in Availability Domain 2 is using local block storage by using pxctl volume list and checking status and looking to see where it’s attached, for example up – attached on 10.0.10.87

% pxctl volume list
Defaulted container "portworx" out of: portworx, csi-node-driver-registrar
ID		     NAME				    SIZE   HA SHARED ENCRYPTED PROXY-VOLUME IO_PRIORITY STATUS	
1106215858654398303 pvc-a9dcdce1-99a2-4db6-b0e6-e7b166ddeb7c 60 GiB 3  no     no.      no	    MEDIUM	up - attached on 10.0.10.87	

Un-cordon Node

Before we finish, let’s un-cordon the node making it available again for Kubernetes scheduling.

% kubectl uncordon 10.0.10.239 
node/10.0.10.209 uncordoned

% kubectl get nodes
NAME          STATUS   ROLES   AGE     VERSION
10.0.10.209   Ready    node    5h19m   v1.20.11
10.0.10.53    Ready    node    5h19m   v1.20.11
10.0.10.87    Ready    node    5h19m   v1.20.11  

Summary

In this post I have shared how to deploy an Oracle 21c database with the Oracle Database Kubernetes Operator (OraOperator), and how to use Portworx to provide storage replication across three OCI Availability Domains (ADs) with an Oracle Cloud Infrastructure (OCI) Region.

I have demonstrated how a Portworx Kubernetes storage class can be used to protect data and provide database failover.

Please Note: The current release of OraOperator (v0.1.0) is for development and test only.

DO NOT USE IN PRODUCTION (yet!)

2 thoughts on “Oracle 21c warm failover with the Oracle Database Kubernetes Operator (OraOperator), OCI and Portworx storage

Add yours

Leave a Reply

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: