How to control Kubernetes volume provisioning with Portworx commit labels

Introduction

By default, Portworx thin provisions volumes and balances them according to current usage and load within the cluster, requiring only minimal configuration. This approach enables applications to provision volumes simply as long as you have enough backing storage for the volume usage.

However if the volume usage exceeds your available backing storage, and allocating additional storage is not an option, your application will encounter space issues, in this blog I will show how we can avoid these problems.

Getting Started

Let’s start by checking the size of our Persistent Volumes (PVs)

# kubectl get pv -n oracle-namespace
NAME                                CAPACITY ACC POLICY STATUS CLAIM                                         STORAGECLASS
pvc-1f478ff6-9dc5-4a95-96a3-2016f542c3f6 20Gi RWO Delete Bound oracle-namespace/ora-data193-oracle19c-0      px-ora-sc
...

And the view from our database container

# kubectl exec -it oracle19c-0 -n oracle-namespace -- /bin/bash
[oracle@oracle19c-0 ~]$ df -ht ext4
Filesystem                       Size  Used Avail Use% Mounted on
/dev/pxd/pxd1073375706949672013   20G  3.8G   15G  21% /opt/oracle/oradata
...

And finally, using pxctl volume list

# pxctl volume list --label version=19.3.0.1, app=database
ID NAME SIZE HA SHARED ENCRYPTED IO_PRIORITY STATUS SNAP-ENABLED 
1073375706949672013 pvc-1f478ff6-9dc5-4a95-96a3-2016f542c3f6 20 GiB 3 no no HIGH up - attached on
...

Grow Filesystem

Now let’s grow the ora-data ext4 filesystem by increasing the size of our Persistent Volume Claim to 250GB.

# kubectl edit pvc/ora-data193-oracle19c-0
persistentvolumeclaim/ora-data193-oracle19c-0 edited

# kubectl get pvc -n oracle-namespace
NAME                   STATUS VOLUME                                CAPACITY ACCESS MODES STORAGECLASS   
ora-data193-oracle19c-0 Bound pvc-1f478ff6-9dc5-4a95-96a3-2016f542c3f6 250Gi      RWO     px-ora-sc      

By shelling into our container we can also confirm that the ext4 filesystem has been increased ok.

# kubectl exec -it oracle19c-0 -n oracle-namespace -- /bin/bash
[oracle@oracle19c-0 ~]$ df -ht ext4
Filesystem                       Size  Used Avail Use% Mounted on
/dev/pxd/pxd1073375706949672013  246G  3.8G  232G   2% /opt/oracle/oradata

Now, lets create a big Tablespace and see what happens.

SQL> create bigfile tablespace SOE datafile '/opt/oracle/oradata/PSTG/PSTGPDB1/soe.dbf' size 50g;
*
ERROR at line 1:
ORA-19502: write error on file "/opt/oracle/oradata/PSTG/PSTGPDB1/soe.dbf",
block number 6440576 (block size=8192)
ORA-27072: File I/O error
Additional information: 4
Additional information: 6440576
Additional information: 409600

Why Did it Fail ?

OK, our Oracle tablespace file creation failed, but what went wrong ?

We increased the size of our Persistent Volume and we saw our ext4 file system grew OK.

Unfortunately, we did not consider the back-end storage and check that it was adequately sized to satisfy the request.

We can find out what we have available by first finding out what nodes are being used for our Portworx volumes with pxctl volume inspect.

# pxctl volume inspect pvc-ef5bb9c5-80e5-4c7b-bc48-bf56f6788145
Volume :  148734364355060346
Name              :  pvc-ef5bb9c5-80e5-4c7b-bc48-bf56f6788145
Size              :  20 GiB
Format            :  ext4
HA                :  3
IO Priority       :  HIGH
Creation time     :  Feb 11 16:40:36 UTC 2021
Shared            :  no
Status            :  up
State             :  Attached: 4afe8cdf-cac3-40e4-a382-d34755f6f98f (10.225.115.119)
Device Path       :  /dev/pxd/pxd148734364355060346
Labels            :  namespace=oracle-namespace,priority_io=high,pvc=ora-data193-oracle19c-0,repl=3,version=19.3.0.1,app=database,io_profile=db
Reads             :  40384
Reads MS          :  623182
Bytes Read        :  1826086912
Writes            :  1271325
Writes MS         :  12774392
Bytes Written     :  15616274432
IOs in progress   :  0
Bytes used        :  4.2 GiB
Replica sets on nodes:
Set 0
  Node   : 10.225.115.117 (Pool 403bc3eb-6c9d-4b54-88d0-247c49ad8761 )
  Node   : 10.225.115.118 (Pool f563ea69-b62c-4add-aff5-65647d1b194b )
  Node   : 10.225.115.121 (Pool 1b4e6e38-9ff4-45a7-b259-e1bacc264ec3 )
Replication Status  :  Up
Volume consumers  : 
- Name           : oracle19c-0 (b1dcbd3d-4d5e-44af-8bae-ba8c04228640) (Pod)
  Namespace      : oracle-namespace
  Running on     : node-1-4
  Controlled by  : oracle19c (StatefulSet)

And then check the space available on our storage nodes using pxctl status.

# pxctl status
...
Cluster Summary
Cluster ID: px-deploy-1
Cluster UUID: 7f443fd8-6591-42a3-b87c-2d96cafe8213
Scheduler: kubernetes
Nodes: 7 node(s) with storage (7 online)
IP ID SchedulerNodeName StorageNode Used Capacity StatusStorageStatus Version Kernel OS
...
10.225.115.117 1ac8941f-3024-4c91-9aeb-7242fef44e55 node-1-2 Yes 10 GiB 64 GiB OnlineUp 2.5.6.0-80bd45b 3.10.0-1127.19.1.el7.x86_64 CentOS Linux 7 (Core)
...
10.225.115.118 6f07c7b6-ea0f-4381-bc21-bc84031813b7 node-1-3 Yes 10 GiB 64 GiB OnlineUp 2.5.6.0-80bd45b 3.10.0-1127.19.1.el7.x86_64 CentOS Linux 7 (Core)
...
10.225.115.121 65978e6e-a776-4cd2-a76e-e29d90e7fc73 node-1-5 Yes 10 GiB 64 GiB Online Up 2.5.6.0-80bd45b 3.10.0-1127.19.1.el7.x86_64 CentOS Linux 7 (Core)
...
Warnings: 
 WARNING: Persistent journald logging is not enabled on this node.
Global Storage Pool
 Total Used     :  51 GiB
 Total Capacity :  448 GiB

From the above, we can see our backing storage is undersized.

Disable Thin Provisioning for some nodes

To disable thin provisioning for some nodes in our cluster, we can use the pxctl cluster options update command with the –provisioning-commit-labels flag, providing the following fields in JSON:

  • LabelSelector with the key values for labels and the node key with a comma separated list of the node IDs you wish to apply this rule to
  •  OverCommitPercent with the maximum storage percentage volumes can provision against backing storage set to 100
  • SnapReservePercent with the percent of the previously specified maximum storage storage percent that is reserved for snapshots

Start by labelling our Kubernetes nodes to identify which ones are included, in this example I will create a label call app with a value of database.

# kubectl label nodes node-1-1 node-1-2 node-1-3 node-1-4 node-1-5 node-1-6 node-1-7 app=database 
node/node-1-1 labeled
node/node-1-2 labeled
node/node-1-3 labeled
node/node-1-4 labeled
node/node-1-5 labeled
node/node-1-6 labeled
node/node-1-7 labeled

We can easily see labels with kubectl describe node/<node name> or kubectl get nodes –show labels <node name>

# kubectl get nodes --show-labels node-1-1
NAME       STATUS   ROLES    AGE   VERSION   LABELS
node-1-1   Ready    <none>   95d   v1.17.0   app=database,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-1-1,kubernetes.io/os=linux

Now, let’s create a rule to disable thin provisioning for nodes that have a label called app set to database.

# pxctl cluster options update --provisioning-commit-labels '[{"OverCommitPercent": 100, "SnapReservePercent":30,"LabelSelector": {"app": "database"}}]'
Successfully updated cluster wide options

And check the cluster options with the -j (JSON) output

[root@node-1-1 ~]# pxctl cluster options list -j
{
 "ReplMoveTimeoutMinutes": 1440,
 "AutoDecommissionTimeoutMinutes": 20,
 "InternalSnapIntervalMinutes": 30,
 "ResyncReplAddEnabled": false,
 "ReAddWaitMinutes": 1440,
 "DomainPolicy": 1,
 "OptimizedRestores": false,
 "SmAbortTimeoutSeconds": 0,
 "DisableProvisionRule": {
  "LabelSelector": null
 },
 "ProvisionCommitRule": [
  {
   "OverCommitPercent": 100,
   "SnapReservePercent": 30,
   "LabelSelector": {
    "app": "database"
   }
  }
 ],
...

Test Thin Provisioning Rule

Let’s test our new provisioning rule by trying to create a new 100GB volume using the following.

---
 kind: PersistentVolumeClaim
 apiVersion: v1
 metadata:
   name: test1
   namespace: oracle-namespace
   labels:
     app: database
     version: 19.3.0.1
 spec:
   storageClassName: px-ora-sc
   accessModes:
     - ReadWriteOnce
   resources:
     requests:
      storage: 100Gi

If we apply the the above and then describe the persistent volume claim we see the request has been refused by Portworx as per our rule.

# kubectl apply -f pvc-px.yaml 
persistentvolumeclaim/test1 created

# kubectl describe pvc/test1
Name:          test1
Namespace:     oracle-namespace
StorageClass:  px-ora-sc
Status:        Pending
Volume:        
Labels:        app=database
               version=19.3.0.1
Annotations:   kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"labels":{"app":"database","version":"19.3.0.1"},"name":"te...
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/portworx-volume
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Mounted By:    <none>

Events:
Type     Reason              Age                From                         Message
----     ------              ----               ----                         -------
Warning  ProvisioningFailed  11s (x2 over 13s)  persistentvolume-controller  Failed to provision volume with StorageClass "px-ora-sc": rpc error: code = Internal desc = Failed to create volume: could not find enough nodes to provision volume: 7 out of 7 pools could not be selected because they did not satisfy the following requirement: pools must not over-commit provisioning space (required 100 GiB) : over-commit limit: 100 %, snap-overcommit limit: 30 % for nodes/pools matching labels: app=database.

Summary

In this post I have shared why we may need to understand the backing storage presented to Portworx and demonstrated how we can use the –provisioning-commit-labels to disable thin provisioning and avoid application issues due lack of suitable storage.

[twitter-follow screen_name=’RonEkins’ show_count=’yes’]

Leave a Reply

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: