Table of contents
Preamble
After the creation of our first Kubernetes Nginx stateless pod let see why stateless pod is an issue (not only) for database pod. In this third article I will focus on why you need stateful container for certain types of workload.
As it has kept me busy for some time we will also see how to create the share storage.
In next blog post we will see which Kubernetes storage plugin to use to access it.
PostgreSQL stateless deployment creation
I will obviously use the official PostgreSQL image that can be find on Docker Hub:
For this new deployment I have decided to create my own YAML file that I would be able to use over and over to add new functionalities. I have started by the one of my Nginx deployment as a skeleton:
apiVersion: apps/v1 kind: Deployment metadata: name: postgres namespace: default spec: replicas: 1 selector: matchLabels: app: postgres template: metadata: labels: app: postgres spec: containers: - image: postgres:latest name: postgres ports: - containerPort: 5433 restartPolicy: Always schedulerName: default-scheduler |
Let’s load this new deployment:
[root@server1 ~]# kubectl apply -f postgres.yaml deployment.apps/postgres created [root@server1 ~]# kubectl get deployment NAME READY UP-TO-DATE AVAILABLE AGE httpd 1/1 1 1 22h nginx 1/1 1 1 6d23h postgres 0/1 1 0 16s |
After a while my deployement failed:
[root@server1 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE httpd-757fb56c8d-7cdj5 1/1 Running 0 22h nginx-6799fc88d8-xg5kd 1/1 Running 0 23h postgres-74b5d46bcb-tvv8v 0/1 ContainerCreating 0 59s [root@server1 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE httpd-757fb56c8d-7cdj5 1/1 Running 0 22h nginx-6799fc88d8-xg5kd 1/1 Running 0 23h postgres-74b5d46bcb-tvv8v 0/1 Error 2 (18s ago) 82s |
Get the log with:
[root@server1 ~]# kubectl logs postgres-74b5d46bcb-tvv8v Error: Database is uninitialized and superuser password is not specified. You must specify POSTGRES_PASSWORD to a non-empty value for the superuser. For example, "-e POSTGRES_PASSWORD=password" on "docker run". You may also use "POSTGRES_HOST_AUTH_METHOD=trust" to allow all connections without a password. This is *not* recommended. See PostgreSQL documentation about "trust": https://www.postgresql.org/docs/current/auth-trust.html |
So added to my YAML file:
env: - name: POSTGRES_PASSWORD value: secure_password - name: POSTGRES_DB value: testdb |
The options I have changed are the postgres superuser password with POSTGRES_PASSWORD and I have asked the creation of a default database called testdb with POSTGRES_DB. The Docker Hub PostgresSQL default page gives a clear explanation and a list of all possible environment variables.
Delete and re-deploy:
[root@server1 ~]# kubectl delete deployment postgres deployment.apps "postgres" deleted [root@server1 ~]# kubectl get deployment NAME READY UP-TO-DATE AVAILABLE AGE httpd 1/1 1 1 22h nginx 1/1 1 1 6d23h [root@server1 ~]# kubectl apply -f postgres.yaml deployment.apps/postgres created [root@server1 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE httpd-757fb56c8d-7cdj5 1/1 Running 0 22h nginx-6799fc88d8-xg5kd 1/1 Running 0 23h postgres-6d7fcf96b5-gfpxf 1/1 Running 0 5s |
You can now connect to the PostgreSQL database inside the pod with (we see that the port is not 5433 as expected but still the default 5432):
[root@server1 ~]# kubectl exec -it postgres-6d7fcf96b5-gfpxf -- /bin/bash root@postgres-6d7fcf96b5-gfpxf:/# su - postgres postgres@postgres-6d7fcf96b5-gfpxf:~$ psql psql (14.0 (Debian 14.0-1.pgdg110+1)) Type "help" for help. postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+------------+------------+----------------------- postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 | template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres testdb | postgres | UTF8 | en_US.utf8 | en_US.utf8 | (4 rows) postgres=# |
Or directly from the server where is running the pod after I got the IP address of the pod:
[root@server2 ~]# psql --host=192.168.55.19 --port=5432 --username=postgres Password for user postgres: psql (13.4, server 14.0 (Debian 14.0-1.pgdg110+1)) WARNING: psql major version 13, server major version 14. Some psql features might not work. Type "help" for help. postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+------------+------------+----------------------- postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 | template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres testdb | postgres | UTF8 | en_US.utf8 | en_US.utf8 | (4 rows) postgres=# |
It finally appears that the – containerPort: 5433 is not providing the expected result. The solution to change default port (5432) is:
args: ["-c", "port=5433"] |
The stateless issue with database containers
Let’s create a new table in my tesdb database and insert a new row into it:
[root@server1 ~]# kubectl exec -it postgres-5594494b8f-2wsvh -- /bin/bash root@postgres-5594494b8f-2wsvh:/# su - postgres postgres@postgres-5594494b8f-2wsvh:~$ psql --port=5433 psql (14.0 (Debian 14.0-1.pgdg110+1)) Type "help" for help. postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+------------+------------+----------------------- postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 | template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres testdb | postgres | UTF8 | en_US.utf8 | en_US.utf8 | (4 rows) postgres=# \c testdb You are now connected to database "testdb" as user "postgres". testdb=# create table test01(id integer, descr varchar(20)); CREATE TABLE testdb=# insert into test01 values(1,'One'); INSERT 0 1 testdb=# select * from test01; id | descr ----+------- 1 | One (1 row) testdb=# |
Now I delete the pod, like it would happen in a real life k8s cluster. The pod is automatically recreated by the deployment and I then try to get my test table figures:
[root@server1 ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES httpd-757fb56c8d-7cdj5 1/1 Running 0 23h 192.168.55.17 server2.domain.com <none> <none> nginx-6799fc88d8-xg5kd 1/1 Running 0 24h 192.168.55.16 server2.domain.com <none> <none> postgres-5594494b8f-2wsvh 1/1 Running 0 17m 192.168.55.21 server2.domain.com <none> <none> [root@server1 ~]# kubectl delete pod postgres-5594494b8f-2wsvh pod "postgres-5594494b8f-2wsvh" deleted [root@server1 ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES httpd-757fb56c8d-7cdj5 1/1 Running 0 23h 192.168.55.17 server2.domain.com <none> <none> nginx-6799fc88d8-xg5kd 1/1 Running 0 24h 192.168.55.16 server2.domain.com <none> <none> postgres-5594494b8f-p88h9 1/1 Running 0 5s 192.168.55.22 server2.domain.com <none> <none> [root@server1 ~]# kubectl exec -it postgres-5594494b8f-p88h9 -- /bin/bash root@postgres-5594494b8f-p88h9:/# su - postgres postgres@postgres-5594494b8f-p88h9:~$ psql --port=5433 --dbname=testdb psql (14.0 (Debian 14.0-1.pgdg110+1)) Type "help" for help. testdb=# select * from test01; ERROR: relation "test01" does not exist LINE 1: select * from test01; ^ testdb=# |
Oups, well as expected I would say, the information has gone and again in the case of a database this is clearly not acceptable. To make persistent the content we need to work a little more with persistent volume and persistent volume claim.
Creation of the cluster filesystem between Kubernetes nodes
Kubernetes has a lot of available persistent volumes plugins to allow you to mount an incredible number of different storage types on your k8s nodes. On my trial k8s cluster made of two virtual machines I have decided to use a shared disk same as we have already seen in my Oracle Real Application Cluster (RAC) configuration trial. Once this shared storage and cluster filesystem will be created the idea is to use the local Kubernetes storage plugin.
Once I have attached the shared disk let’s create a new partition:
[root@server1 ~]# fdisk /dev/sdb Welcome to fdisk (util-linux 2.32.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Device does not contain a recognized partition table. Created a new DOS disklabel with disk identifier 0xde5a83e7. Command (m for help): n Partition type p primary (0 primary, 0 extended, 4 free) e extended (container for logical partitions) Select (default p): Using default response p. Partition number (1-4, default 1): First sector (2048-2097151, default 2048): Last sector, +sectors or +size{K,M,G,T,P} (2048-2097151, default 2097151): Created a new partition 1 of type 'Linux' and of size 1023 MiB. Command (m for help): w The partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks. [root@server1 ~]# fdisk -l /dev/sdb Disk /dev/sdb: 1 GiB, 1073741824 bytes, 2097152 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xde5a83e7 Device Boot Start End Sectors Size Id Type /dev/sdb1 2048 2097151 2095104 1023M 83 Linux [root@server1 ~]# mkfs -t xfs /dev/sdb1 meta-data=/dev/sdb1 isize=512 agcount=4, agsize=65472 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=261888, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1566, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 |
On second node to make the shared partition available I had to trick a bit fdisk:
[root@server2 ~]# blkid /dev/sdb1 [root@server2 ~]# fdisk /dev/sdb Welcome to fdisk (util-linux 2.32.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Command (m for help): w The partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks. |
Then as I will not have Oracle ASM you have to decide to use a shared filesystem…
GFS2
This cluster FS is sponsored by RedHat and installation is simple as:
[root@server1 ~]# dnf install gfs2-utils.x86_64 |
Then create the FS with:
[root@server1 ~]# mkfs -t gfs2 -p lock_dlm -t cluster01:postgres -j 8 /dev/sdb1 It appears to contain an existing filesystem (xfs) This will destroy any data on /dev/sdb1 Are you sure you want to proceed? [y/n] y Discarding device contents (may take a while on large devices): Done Adding journals: Done Building resource groups: Done Creating quota file: Done Writing superblock and syncing: Done Device: /dev/sdb1 Block size: 4096 Device size: 1.00 GB (261888 blocks) Filesystem size: 1.00 GB (261886 blocks) Journals: 8 Journal size: 8MB Resource groups: 12 Locking protocol: "lock_dlm" Lock table: "cluster01:postgres" UUID: 90f456e8-cf74-43af-a838-53b129682f7d |
But when I tried to mount it I got:
[root@server1 ~]# mount -a mount: /mnt/shared: mount(2) system call failed: Transport endpoint is not connected. |
From the RedHat official solution I have discovered that lock_dlm modules was not loaded:
[root@server1 ~]# lsmod |grep lock [root@server1 ~]# modprobe lock_dlm modprobe: FATAL: Module lock_dlm not found in directory /lib/modules/4.18.0-305.19.1.el8_4.x86_64 [root@server1 ~]# dmesg | grep gfs [ 8255.885092] gfs2: GFS2 installed [ 8255.896751] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres" [ 8255.899671] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107 [ 8291.542025] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres" [ 8291.542186] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107 [ 8376.146357] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres" [ 8376.156197] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107 [ 8442.132982] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres" [ 8442.137871] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107 [12479.923651] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres" [12479.924713] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107 [12861.644565] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres" [12861.644663] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107 [13016.278584] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres" [13016.279004] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107 [13042.852965] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres" [13042.866282] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107 [13362.619425] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres" [13362.631850] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107 |
I have tried to install kernel extra modules:
[root@server1 ~]# dnf install kernel-modules-extra.x86_64 |
To finally realized that GFS2 is delivered as extra cost add-ons in Red Hat Enterprise Linux, such as the High Availability Add-On for clustering and the Resilient Storage Add-On for GFS2. I thought the FS was free but apparently GFS was free but GFS2 is not…
OCFS2
As I’m using the Oracle Linux OCFS2 sounds like a good idea and for sure OCFS2 is released under the GNU General Public License. Install it with:
[root@server1 ~]# dnf install ocfs2-tools.x86_64 |
On one of your node create the OCFS2 cluster with:
[root@server1 ~]# o2cb add-cluster k8socfs2 [root@server1 ~]# o2cb add-node --ip 192.168.56.101 --port 7777 --number 1 k8socfs2 server1.domain.com [root@server1 ~]# o2cb add-node --ip 192.168.56.102 --port 7777 --number 2 k8socfs2 server2.domain.com [root@server1 ~]# o2cb register-cluster k8socfs2 [root@server1 ~]# o2cb start-heartbeat k8socfs2 [root@server1 ~]# cat /etc/ocfs2/cluster.conf cluster: heartbeat_mode = local node_count = 2 name = k8socfs2 node: number = 1 cluster = k8socfs2 ip_port = 7777 ip_address = 192.168.56.101 name = server1.domain.com node: number = 2 cluster = k8socfs2 ip_port = 7777 ip_address = 192.168.56.102 name = server2.domain.com |
Copy this configuration file (/etc/ocfs2/cluster.conf) on all nodes of your OCFS2 cluster.
Initialize the OCFS2 cluster stack (O2CB) with:
[root@server1 ~]# o2cb.init --help Usage: /usr/sbin/o2cb.init {start|stop|restart|force-reload|enable|disable|configure|load|unload|online|offline|force-offline|status|online-status} [root@server1 /]# o2cb.init configure Configuring the O2CB driver. This will configure the on-boot properties of the O2CB driver. The following questions will determine whether the driver is loaded on boot. The current values will be shown in brackets ('[]'). Hitting <ENTER> without typing an answer will keep that current value. Ctrl-C will abort. Load O2CB driver on boot (y/n) [n]: y Cluster stack backing O2CB [o2cb]: Cluster to start on boot (Enter "none" to clear) [ocfs2]: k8socfs2 Specify heartbeat dead threshold (>=7) [31]: Specify network idle timeout in ms (>=5000) [30000]: Specify network keepalive delay in ms (>=1000) [2000]: Specify network reconnect delay in ms (>=2000) [2000]: Writing O2CB configuration: OK checking debugfs... Loading filesystem "ocfs2_dlmfs": Unable to load filesystem "ocfs2_dlmfs" Failed [root@server1 /]# [root@server1 /]# lsmod |egrep -i "ocfs|o2" [root@server1 /]# modprobe ocfs2_dlmfs modprobe: FATAL: Module ocfs2_dlmfs not found in directory /lib/modules/4.18.0-305.19.1.el8_4.x86_64 [root@server1 /]# o2cb.init status Driver for "configfs": Loaded Filesystem "configfs": Mounted Driver for "ocfs2_dlmfs": Not loaded Checking O2CB cluster "ociocfs2": Offline stat: cannot read file system information for '/dlm': No such file or directory Debug file system at /sys/kernel/debug: mounted |
I realized that all issues I had (including mount.ocfs2: Unable to access cluster service while trying initialize cluster) were linked to the kernel I was using. Not the UEK Oracle kernel. All issues have been resolved at the moment I switch to Oracle UEK kernel !
Do not forget to start and enable o2cb service with:
[root@server2 ~]# systemctl status o2cb ● o2cb.service - Load o2cb Modules Loaded: loaded (/usr/lib/systemd/system/o2cb.service; disabled; vendor preset: disabled) Active: inactive (dead) [root@server2 ~]# systemctl start o2cb [root@server2 ~]# systemctl status o2cb ● o2cb.service - Load o2cb Modules Loaded: loaded (/usr/lib/systemd/system/o2cb.service; disabled; vendor preset: disabled) Active: active (exited) since Mon 2021-10-18 14:04:15 CEST; 1s ago Process: 73099 ExecStart=/sbin/o2cb.init enable (code=exited, status=0/SUCCESS) Main PID: 73099 (code=exited, status=0/SUCCESS) Oct 18 14:04:14 server2.domain.com systemd[1]: Starting Load o2cb Modules... Oct 18 14:04:15 server2.domain.com o2cb.init[73099]: checking debugfs... Oct 18 14:04:15 server2.domain.com o2cb.init[73099]: Setting cluster stack "o2cb": OK Oct 18 14:04:15 server2.domain.com o2cb.init[73099]: Cluster ociocfs2 already online Oct 18 14:04:15 server2.domain.com systemd[1]: Started Load o2cb Modules. [root@server2 ~]# systemctl enable o2cb Created symlink /etc/systemd/system/multi-user.target.wants/o2cb.service → /usr/lib/systemd/system/o2cb.service. |
Create the FS with:
[root@server1 ~]# man mkfs.ocfs2 [root@server1 ~]# mkfs -t ocfs2 --cluster-name=k8socfs2 --fs-feature-level=max-features --cluster-stack=o2cb -N 4 /dev/sdb1 mkfs.ocfs2 1.8.6 Cluster stack: o2cb Cluster name: k8socfs2 Stack Flags: 0x0 NOTE: Feature extended slot map may be enabled Overwriting existing ocfs2 partition. Proceed (y/N): y Label: Features: sparse extended-slotmap backup-super unwritten inline-data strict-journal-super metaecc xattr indexed-dirs usrquota grpquota refcount discontig-bg Block size: 2048 (11 bits) Cluster size: 4096 (12 bits) Volume size: 1072693248 (261888 clusters) (523776 blocks) Cluster groups: 17 (tail covers 7936 clusters, rest cover 15872 clusters) Extent allocator size: 4194304 (1 groups) Journal size: 33554432 Node slots: 4 Creating bitmaps: done Initializing superblock: done Writing system files: done Writing superblock: done Writing backup superblock: 0 block(s) Formatting Journals: done Growing extent allocator: done Formatting slot map: done Formatting quota files: done Writing lost+found: done mkfs.ocfs2 successful |
Get the id of the device with:
[root@server1 ~]# blkid /dev/sdb1 /dev/sdb1: UUID="ea6e9804-105d-4d4c-96e8-bd54ab5e93d2" BLOCK_SIZE="2048" TYPE="ocfs2" PARTUUID="de5a83e7-01" [root@server1 ~]# echo "ea6e9804-105d-4d4c-96e8-bd54ab5e93d2" >> /etc/fstab [root@server1 ~]# vi /etc/fstab [root@server1 ~]# tail -n 2 /etc/fstab # Shared storage UUID="ea6e9804-105d-4d4c-96e8-bd54ab5e93d2" /mnt/shared ocfs2 defaults 0 0 [root@server1 ~]# mount -a |
With Oracle UEK kernel I now have:
[root@server3 postgres]# o2cb.init status Driver for "configfs": Loaded Filesystem "configfs": Mounted Stack glue driver: Loaded Stack plugin "o2cb": Loaded Driver for "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster "ociocfs2": Online Heartbeat dead threshold: 31 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 2000 Heartbeat mode: Local Checking O2CB heartbeat: Active Debug file system at /sys/kernel/debug: mounted [root@server3 ~]# df /mnt/shared Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 1047552 78956 968596 8% /mnt/shared |
And I can share file from all nodes of my k8s cluster…
I had an issue where the FS was automatically unmounted by the system:
Oct 18 15:10:50 server1 kernel: o2dlm: Joining domain E80332E942C649EB942623C43D2B35DC Oct 18 15:10:50 server1 kernel: ( Oct 18 15:10:50 server1 kernel: 1 Oct 18 15:10:50 server1 kernel: ) 1 nodes Oct 18 15:10:50 server1 kernel: ocfs2: Mounting device (8,17) on (node 1, slot 0) with ordered data mode. Oct 18 15:10:50 server1 systemd[1]: mnt-postgres.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-b7f61498\x2da4f0\x2d4570\x2da0ed\x2dcb50caa98165.device. Stopping, too. Oct 18 15:10:50 server1 systemd[1]: Unmounting /mnt/shared... Oct 18 15:10:50 server1 systemd[10949]: mnt-postgres.mount: Succeeded. |
Solved with:
[root@server1 /]# systemctl daemon-reload |
I also had an issue where o2cb was not able to register my cluster name:
[root@server2 ~]# o2cb register-cluster k8socfs2 o2cb: Internal logic failure while registering cluster 'k8socfs2' |
Old trials not completely removed and solved it with:
[root@server2 /]# o2cb cluster-status Cluster 'ociocfs2' is online [root@server2 /]# o2cb unregister-cluster ociocfs2 |
References
- Docker PostgreSQL Official Image
- A Simple Guide to Oracle Cluster File System (OCFS2) using iSCSI on Oracle Cloud Infrastructure
- Creating the Configuration File for the Cluster Stack
- GFS2 file system creation