IT World https://blog.yannickjaquier.com/ RDBMS, Unix and many more... Tue, 30 Aug 2022 08:33:10 +0000 en-US hourly 1 https://wordpress.org/?v=6.0.2 PostgreSQL logical replication hands-on https://blog.yannickjaquier.com/postgresql/postgresql-logical-replication-hands-on.html https://blog.yannickjaquier.com/postgresql/postgresql-logical-replication-hands-on.html#respond Thu, 01 Sep 2022 08:29:19 +0000 https://blog.yannickjaquier.com/?p=5342 Preamble PostgreSQL logical replication replicate data objects and their changes between two instances of PostgreSQL. PostgreSQL logical replication is the equivalent of MySQL replication and what you would achieve with GoldenGate in Oracle. Opposite to PostgreSQL physical replication which replicate byte to byte PostgreSQL logical replication allow you to fine grained choose which objects you […]

The post PostgreSQL logical replication hands-on appeared first on IT World.

]]>

Table of contents

Preamble

PostgreSQL logical replication replicate data objects and their changes between two instances of PostgreSQL. PostgreSQL logical replication is the equivalent of MySQL replication and what you would achieve with GoldenGate in Oracle.

Opposite to PostgreSQL physical replication which replicate byte to byte PostgreSQL logical replication allow you to fine grained choose which objects you aim to replicate. This would target few below objectives (see official documentation for a more verbose list):

  • Replicate only a subset of a database, few schemas or few objects, and allow to manage read-write other objects in parallel in another schema.
  • Implement upgrade scenario between different major releases (physical replication in only between same major releases).
  • Making a subset of the source database available for reporting with additional indexes for performance.

In PostgreSQL logical replication wording it works by publication on source database and subscription, to those publications, on target database:

postgresql_logical_replication01
postgresql_logical_replication01

In all logical replication technologies there is something inherent that is the presence of a primary key in the objects you plan to replicate this to easily find the corresponding rows on the target database. Not having a primary key is possible but you must understand the performance impact that could it implies…

My test environment will be made of two virtual machines:

  • server3.domain.com (192.168.56.103) will be my primary server
  • server4.domain.com (192.168.56.104) will be my standby server

There are both running Oracle Enterprise Linux OEL 8.4 and PostgreSQL 14.

PostgreSQL configuration

I have created two distinct PostgreSQL instance on my two servers with the below default postgresql.conf configuration file (that is initially created from the non commented out variables from default file):

listen_addresses = 'server3.domain.com' # Match server name
port = 5433
archive_mode = on
archive_command = 'test ! -f /postgres/arch/%f && cp %p /postgres/arch/%f'
wal_level = logical # Not required on target instance
max_connections = 100                   # (change requires restart)
shared_buffers = 128MB                  # min 128kB
dynamic_shared_memory_type = posix      # the default is the first option
max_wal_size = 1GB
min_wal_size = 80MB
log_destination = 'stderr'              # Valid values are combinations of
logging_collector = on                  # Enable capturing of stderr and csvlog
log_directory = 'log'                   # directory where log files are written,
log_filename = 'postgresql-%a.log'      # log file name pattern,
log_rotation_age = 1d                   # Automatic rotation of logfiles will
log_rotation_size = 0                   # Automatic rotation of logfiles will
log_truncate_on_rotation = on           # If on, an existing log file with the
log_line_prefix = '%m [%p] '            # special values:
log_timezone = 'CET'
datestyle = 'iso, mdy'
timezone = 'CET'
lc_messages = 'en_US.UTF-8'                     # locale for system error message
lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting
default_text_search_config = 'pg_catalog.english'

One important parameter is wal_level that must be set to logical:

postgres=# show wal_level;
wal_level
-----------
logical
(1 row)

And depending of what you plan to implement check the value of those other parameters on published and subscriber side:

postgres=# select name,setting from pg_settings where name in ('max_replication_slots','max_wal_senders','max_worker_processes','max_logical_replication_workers');
              name               | setting
---------------------------------+---------
 max_logical_replication_workers | 4
 max_replication_slots           | 10
 max_wal_senders                 | 10
 max_worker_processes            | 8
(4 rows)

I create a dedicated superuser for me as well as a replication dedicated account. I always try to avoid superuser privilege for this replication account even if it is more complex to handle:

postgres=# create role yjaquier with superuser login password 'secure_password';
CREATE ROLE
postgres=# create role repuser with replication login password 'secure_password';
CREATE ROLE

And I modify accordingly the pg_hba.conf file by adding (I coud have further restricted from where to connect with repuser by specifying exact IP address):

host    all        yjaquier                  0.0.0.0/0            md5
host    testdb     repuser             192.168.56.0/24            md5

PostgreSQL logical replication configuration and testing

For my testing on my primary server (server3.domain.com) I create a test database and a test table with a primary key and a row inside to test the initial snapshot:

postgres=# create database testdb;
CREATE DATABASE
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# create table test01(id int primary key, descr varchar(20));
CREATE TABLE
testdb=# insert into test01 values(1,'One');
INSERT 0 1

I create a publication for my test01 table (all DML, by default, will be taken into account but you can easily restrict not to duplicate delete for example):

testdb=# create publication publication01 for table test01;
CREATE PUBLICATION

There is a complete chapter in official documentation on minimum security you must apply for your dedicated PostgreSQL logical replication user and if like me your account has not superuser privilege you must give a select privilege or you get this error:

2021-11-11 12:49:26.454 CET [11474] ERROR:  could not start initial contents copy for table "public.test01": ERROR:  permission denied for table test01
2021-11-11 12:49:26.458 CET [9922] LOG:  background worker "logical replication worker" (PID 11474) exited with exit code 1

So:

testdb=# grant select on test01 to repuser;
GRANT

If you plan to add more tables time to time (or if you have created the publication with FOR ALL TABLES option) you can also grant the more permissive below rights:

testdb=# grant select on all tables in schema public to repuser;
GRANT

On the target instance I initialize the test database and the definition of my test table because the schema definitions are not replicated:

postgres=# create database testdb;
CREATE DATABASE
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# create table test01(id int primary key, descr varchar(20));
CREATE TABLE

On target instance create a subscription (that can also be synchronous) with:

testdb=# create subscription subscription01
testdb-# connection 'host=server3.domain.com port=5433 user=repuser dbname=testdb password=secure_password'
testdb-# publication publication01;
NOTICE:  created replication slot "subscription01" on publisher
CREATE SUBSCRIPTION

Magically the initial loading of my first row is done:

2021-11-11 12:51:55.585 CET [11575] LOG:  logical replication table synchronization worker for subscription "subscription01", table "test01" has started
2021-11-11 12:51:55.672 CET [11575] LOG:  logical replication table synchronization worker for subscription "subscription01", table "test01" has finished

Well not magically but expected:

The initial data in existing subscribed tables are snapshotted and copied in a parallel instance of a special kind of apply process. This process will create its own replication slot and copy the existing data. As soon as the copy is finished the table contents will become visible to other backends. Once existing data is copied, the worker enters synchronization mode, which ensures that the table is brought up to a synchronized state with the main apply process by streaming any changes that happened during the initial data copy using standard logical replication. During this synchronization phase, the changes are applied and committed in the same order as they happened on the publisher. Once synchronization is done, control of the replication of the table is given back to the main apply process where replication continues as normal.

And when I start to insert, update and delete rows in my source table they are obviously replicated !

PostgreSQL logical replication monitoring

Official documentation says logical replication monitoring is the same as physical replication monitoring so:

On source (publisher) instance:

postgres=# select * from pg_replication_slots;
   slot_name    |  plugin  | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
----------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
 subscription01 | pgoutput | logical   |  16386 | testdb   | f         | t      |       9705 |      |          743 | 0/3001D10   | 0/3001D48           | reserved   |               | f
(1 row)

postgres=# select * from pg_publication;
  oid  |    pubname    | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate | pubviaroot
-------+---------------+----------+--------------+-----------+-----------+-----------+-------------+------------
 16394 | publication01 |       10 | f            | t         | t         | t         | t           | f
(1 row)

postgres=# select * from pg_publication_tables;
    pubname    | schemaname | tablename
---------------+------------+-----------
 publication01 | public     | test01
(1 row)

On target (subscriber) instance:

postgres=# select * from pg_stat_subscription;
 subid |    subname     |  pid  | relid | received_lsn |      last_msg_send_time       |     last_msg_receipt_time     | latest_end_lsn |        latest_end_time
-------+----------------+-------+-------+--------------+-------------------------------+-------------------------------+----------------+-------------------------------
 16395 | subscription01 | 11262 |       | 0/3001D48    | 2021-11-11 13:06:09.304963+01 | 2021-11-11 13:06:10.650019+01 | 0/3001D48      | 2021-11-11 13:06:09.304963+01
(1 row)

postgres=# select * from pg_subscription;
  oid  | subdbid |    subname     | subowner | subenabled | subbinary | substream |                                      subconninfo                                      |  subslotname   | subsynccommit | subpublications
-------+---------+----------------+----------+------------+-----------+-----------+---------------------------------------------------------------------------------------+----------------+---------------+-----------------
 16395 |   16389 | subscription01 |       10 | t          | f         | f         | host=server3.domain.com port=5433 user=repuser dbname=testdb password=secure_password | subscription01 | off           | {publication01}
(1 row)

Remark:
A bit amazed to see the password in clear in pg_subscription table. For sure the ~/.pgpass file is a must !

References

The post PostgreSQL logical replication hands-on appeared first on IT World.

]]>
https://blog.yannickjaquier.com/postgresql/postgresql-logical-replication-hands-on.html/feed 0
PostgreSQL physical replication hands-on https://blog.yannickjaquier.com/postgresql/postgresql-physical-replication-hands-on.html https://blog.yannickjaquier.com/postgresql/postgresql-physical-replication-hands-on.html#comments Tue, 02 Aug 2022 14:52:19 +0000 https://blog.yannickjaquier.com/?p=5339 Preamble We are just starting our PostgreSQL journey but we already have requirements for high availability. This can be achieved by a (simple) Operating System cluster that balancing a virtual IP and the lead on shared disks on a remote server. In our particular case the disk array is kept up–to-date (synchronous or not) with […]

The post PostgreSQL physical replication hands-on appeared first on IT World.

]]>

Table of contents

Preamble

We are just starting our PostgreSQL journey but we already have requirements for high availability. This can be achieved by a (simple) Operating System cluster that balancing a virtual IP and the lead on shared disks on a remote server. In our particular case the disk array is kept up–to-date (synchronous or not) with a black fiber link between our data centers.

Same as Data Guard for Oracle there is a pure database solution with PostgreSQL called Write-Ahead Log (WAL) Shipping or physical replication. Same as Data Guard this solution has the benefit of reducing the required bandwidth over the network as well as, for PostgreSQL, making this standby server available for free for read only queries.

From official documentation:

Warm and hot standby servers can be kept current by reading a stream of write-ahead log (WAL) records. If the main server fails, the standby contains almost all of the data of the main server, and can be quickly made the new primary database server. This can be synchronous or asynchronous and can only be done for the entire database server.

The overall picture of PostgreSQL physical replication is the following:

postgresql_physical_replication01
postgresql_physical_replication01

My test environment will be made of two virtual machines:

  • server3.domain.com (192.168.56.103) will be my primary server
  • server4.domain.com (192.168.56.104) will be my standby server

There are both running Oracle Enterprise Linux OEL 8.4 and PostgreSQL 14.

In a further post I will probably give a try with Replication Manager for PostgreSQL clusters (repmgr) but at the time of writing this post the tool is not yet compatible with PostgreSQL 14…

Primary server setup

Installed PostgreSQL directly from official repository:

[root@server3 ~]# dnf -y install postgresql14-server

Created a dedicated mount point:

[root@server3 /]# ll -d /postgres
drwxr-xr-x 3 postgres postgres 20 Oct 29 12:18 /postgres
[root@server3 ~]# df -h /postgres
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/vg00-lvol30 1014M   40M  975M   4% /postgres

Customized the profile by adding few shortcuts:

[postgres@server3 ~]$ cat /var/lib/pgsql/.pgsql_profile
PATH=/usr/pgsql-14/bin:$PATH
export PATH
MANPATH=/usr/local/pgsql/share/man:$MANPATH
export MANPATH
PGDATA=/postgres/data01
export PGDATA
alias psql="psql --port=5433"
alias pg_start="cd ${PGDATA};pg_ctl -l logfile start"
alias pg_restart="cd ${PGDATA};pg_ctl -l logfile restart"
alias pg_stop="cd ${PGDATA};pg_ctl -l logfile stop"

Initialized the starter instance with:

[postgres@server3 ~]$ pg_ctl initdb -D /postgres/data01
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /postgres/data01 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... CET
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/pgsql-14/bin/pg_ctl -D /postgres/data01 -l logfile start

Then updated a bit the postgresql.conf parameter file:

[postgres@server3 data01]$ cp postgresql.conf postgresql.conf.org
[postgres@server3 data01]$ egrep -v "(^\s*#|^#)" postgresql.conf.org | grep . > postgresql.conf
[postgres@server3 data01]$ vi postgresql.conf
[postgres@server3 data01]$ cat postgresql.conf
listen_addresses = 'server3.domain.com'
port = 5433
archive_mode = on
archive_command = 'test ! -f /postgres/arch/%f && cp %p /postgres/arch/%f'
max_connections = 100                   # (change requires restart)
shared_buffers = 128MB                  # min 128kB
dynamic_shared_memory_type = posix      # the default is the first option
max_wal_size = 1GB
min_wal_size = 80MB
log_destination = 'stderr'              # Valid values are combinations of
logging_collector = on                  # Enable capturing of stderr and csvlog
log_directory = 'log'                   # directory where log files are written,
log_filename = 'postgresql-%a.log'      # log file name pattern,
log_rotation_age = 1d                   # Automatic rotation of logfiles will
log_rotation_size = 0                   # Automatic rotation of logfiles will
log_truncate_on_rotation = on           # If on, an existing log file with the
log_line_prefix = '%m [%p] '            # special values:
log_timezone = 'CET'
datestyle = 'iso, mdy'
timezone = 'CET'
lc_messages = 'en_US.UTF-8'                     # locale for system error message
lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting
default_text_search_config = 'pg_catalog.english'

For administrative tools I create my own personal account:

postgres=# create role yjaquier with superuser login password 'secure_password';
CREATE ROLE

Added in pg_hba.conf to allow me to connect from anywhere:

host    all             yjaquier             0.0.0.0/0            md5

It is strongly recommended to create a dedicated replication user with:

postgres=# create role repuser with replication login password 'secure_password';
CREATE ROLE

Added in pg_hba.conf to allow replication related connection from standby node:

host    replication     all             192.168.56.0/24            md5

Remark:
You can further increase security by putting the exact IP address of your standby server and /32 network mask. Example: 192.168.56.104/32. You can also specify the repuser we have created for replication purpose…

Standby server setup for physical replication

Your standby server must be created with a copy of your primary primary instance or you will get this error message:

2021-11-01 11:52:07.697 CET [7416] FATAL:  database system identifier differs between the primary and standby
2021-11-01 11:52:07.697 CET [7416] DETAIL:  The primary's identifier is 7024430574144521833, the standby's identifier is 7024465441503407932.

I will use the default PostgreSQL backup tool (https://blog.yannickjaquier.com/postgresql/postgresql-backup-and-restore-tools-comparison-for-pitr-recovery.html#pg_basebackup) called pg_basebackup. Full instance backup:

[postgres@server3 ~]$ pg_basebackup --pgdata=/postgres/backup/full_backup --format=t --compress=9 --progress --verbose --port=5433 --username=postgres
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_12536"
34891/34891 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/5000138
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed

To simulate users’ activity on my primary database I create a new database and a test table with one row after I have performed the full backup:

postgres=# create database testdb;
CREATE DATABASE
postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# create table test01(id integer, descr varchar(20));
CREATE TABLE
testdb=# insert into test01 values(1,'One');
INSERT 0 1

Copy the backup files to your standby server (using scp):

[postgres@server3 data01]$ scp -rp /postgres/backup/full_backup server4.domain.com:/postgres/backup
postgres@server4.domain.com's password:
base.tar.gz                                                                 100% 4294KB  46.1MB/s   00:00
pg_wal.tar.gz                                                               100%   17KB   9.4MB/s   00:00
backup_manifest

On the standby server restore your full backup in an empty PGDATA directory with something like:

[postgres@server4 data01]$ cd ${PGDATA}
[postgres@server4 data01]$ tar xf /postgres/backup/full_backup/base.tar.gz
[postgres@server4 data01]$ tar xf /postgres/backup/full_backup/pg_wal.tar.gz --directory pg_wal

Then after you have two options to setup WAL shipping.

Without streaming replication

In this situation it implies that your standby server has an access to the WAL archive directory of your primary server (NFS or whatever) as explained in official documentation:

The archive location should be accessible from the standby even when the primary is down, i.e., it should reside on the standby server itself or another trusted server, not on the primary server.

Or that you have a process to stream your primary WAL files to the standby server. To test this (sub-optimal) method I have set in my standby postgresql.conf file (listen_addresses also modified to match my standby server name):

restore_command = 'cp /postgres/arch/%f %p'

Create the important standby.signal file:

[postgres@server4 data01]$ touch standby.signal

Start the standby PostgreSQL instance:

[postgres@server4 data01]$ pg_ctl -l logfile start

One cool feature with PostgreSQL is that your standby server is accessible in read-only mode by default (hot_standby = on by default). So for Oracle database guys you have active Data Guard for free. If you connect to the standby instance you see all is there:

postgres=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 testdb    | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
(4 rows)

postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# \dt
         List of relations
 Schema |  Name  | Type  |  Owner
--------+--------+-------+----------
 public | test01 | table | postgres
(1 row)

testdb=# select * from test01;
 id | descr
----+-------
  1 | One
(1 row)

If you insert new rows in the test01 test table and copy the generated archive file (I have set archive_timeout = 1min on my primary server to have archive command triggered more often, never do this on a production server). Ultimately you can also copy (scp) the latest generated WAL files from the pg_wal directory of your primary to your standby server you will have your standby server process them::

2021-11-05 17:33:09.883 CET [84126] LOG:  restored log file "000000010000000000000011" from archive
2021-11-05 17:33:24.901 CET [84126] LOG:  restored log file "000000010000000000000012" from archive

This because PostgreSQL log-shipping replication is made like this:

At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_wal directory. If that fails, and streaming replication has been configured, the standby tries to connect to the primary server and start streaming WAL from the last valid record found in archive or pg_wal. If that fails or streaming replication is not configured, or if the connection is later disconnected, the standby goes back to step 1 and tries to restore the file from the archive again. This loop of retries from the archive, pg_wal, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file.

But of course there is a much more elegant solution with streaming replication…

With streaming replication

By far the best method. Update postgresql.conf change listen_addresses to match standby server name, add same restore_command and add primary_conninfo parameter (instead of supplying password in clear text in postgresql.conf file you can also use ~/.pgpass file):

restore_command = 'cp /postgres/arch/%f %p'
primary_conninfo = 'host=server3.domain.com port=5433 user=repuser password=secure_password options=''-c wal_sender_timeout=5000'''

Create the important standby.signal file:

[postgres@server4 data01]$ touch standby.signal

Start the standby PostgreSQL instance:

[postgres@server4 data01]$ pg_ctl -l logfile start

And magically the standby server is recovering what has been done on live server since your full backup. If you check the log you see output like:

2021-11-01 14:29:33.568 CET [17242] LOG:  starting PostgreSQL 14.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
2021-11-01 14:29:33.568 CET [17242] LOG:  listening on IPv4 address "192.168.56.104", port 5433
2021-11-01 14:29:33.570 CET [17242] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
2021-11-01 14:29:33.571 CET [17242] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2021-11-01 14:29:33.575 CET [17244] LOG:  database system was interrupted; last known up at 2021-11-01 12:25:59 CET
2021-11-01 14:29:33.835 CET [17244] LOG:  entering standby mode
2021-11-01 14:29:33.838 CET [17244] LOG:  redo starts at 0/5000028
2021-11-01 14:29:33.841 CET [17244] LOG:  consistent recovery state reached at 0/5000138
2021-11-01 14:29:33.841 CET [17242] LOG:  database system is ready to accept read-only connections
2021-11-01 14:29:33.885 CET [17248] LOG:  started streaming WAL from primary at 0/6000000 on timeline 1

If I insert a new row in my primary database:

testdb=# insert into test01 values(2,'Two');
INSERT 0 1

I immediately see it in the hot standby node without any file copy this time:

testdb=# select * from test01;
 id | descr
----+-------
  1 | One
  2 | Two
(2 rows)

Replication slots for physical replication

As the documentation says:

Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys. In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using wal_keep_size, or by storing the segments in an archive using archive_command.

Create a replication slot with:

postgres=# SELECT * FROM pg_create_physical_replication_slot('server3_slot');
  slot_name   | lsn
--------------+-----
 server3_slot |
(1 row)

postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
  slot_name   | slot_type | active
--------------+-----------+--------
 server3_slot | physical  | f
(1 row)

Add it in postgres.conf file of your standby instance:

primary_slot_name = 'server3_slot'

Then on primary the active column changed:

postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
  slot_name   | slot_type | active
--------------+-----------+--------
 server3_slot | physical  | t
(1 row)

To drop the replication slot, either stop the standby (cleaning process) or unset primary_slot_name parameter on your standby instance, then:

postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
  slot_name   | slot_type | active
--------------+-----------+--------
 server3_slot | physical  | f
(1 row)

postgres=# select pg_drop_replication_slot('server3_slot');
 pg_drop_replication_slot
--------------------------

(1 row)

Promoting a standby node

Reading the official documentation I have not seen any mention to switchover (switching from primary to standby when both are still up). They only mention failover (the primary is dead). Let’s simulate this by killing my primary server:

[postgres@server3 data01]$ ps -ef | grep pgsql-14 | grep -v grep
postgres    4485       1  0 10:37 ?        00:00:00 /usr/pgsql-14/bin/postgres
[postgres@server3 data01]$ kill -9 4485

Something must be done on the standby server as it is still in read only mode:

testdb=# insert into test01 values(10,'Ten');
ERROR:  cannot execute INSERT in a read-only transaction

You can either issue run pg_ctl promote or execte pg_promote(), let’s try with pg_ctl promote:

[postgres@server4 data01]$ pg_ctl promote
waiting for server to promote.... done
server promoted

And that’s it (standby.signal file has been deleted from ${PGDATA} directory):

testdb=# insert into test01 values(10,'Ten');
INSERT 0 1

In this situation the standby is your new primary and the previous primary will have to be re-instantiated to become a standby. To help to re-synchronize if you have big PostgreSQL cluster you might want to study pg_rewind tool…

Physical replication monitoring

On primary instance one very useful table is pg_stat_replication, we can see:

postgres=# select * from pg_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr   | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | re
play_lag | sync_priority | sync_state |          reply_time
-------+----------+---------+------------------+----------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+---
---------+---------------+------------+-------------------------------
 17763 |    16386 | repuser | walreceiver      | 192.168.56.104 |                 |       54096 | 2021-11-01 14:29:33.487621+01 |              | streaming | 0/601A308 | 0/601A308 | 0/601A308 | 0/601A308  |           |           |
         |             0 | async      | 2021-11-01 14:59:02.010878+01
(1 row)

On the standby instance you can compute the apply lag with a query like:

postgres=# select last_msg_receipt_time, now(), now()-last_msg_receipt_time delay,to_char(current_timestamp-last_msg_receipt_time,'hh24:mi:ss:ms') delay from pg_stat_wal_receiver;
    last_msg_receipt_time     |              now              |      delay      |    delay
------------------------------+-------------------------------+-----------------+--------------
 2021-11-01 15:47:26.11948+01 | 2021-11-01 15:47:27.362229+01 | 00:00:01.242749 | 00:00:01:242
(1 row)

All in one pg_basebackup remote command

The de-facto PostgreSQL backup tool (pg_basebackup) provide a all-in-one (complex) command to duplicate the primary instance to your standby server and configure the log-shipping replication at same time (RMAN duplicate for Oracle DBAs). From my standby node in an empty PGDATA directory issue something like:

[postgres@server4 data01]$ pg_basebackup --host=server3.domain.com --username=repuser --port=5433 --pgdata=/postgres/data01/ --format=p --wal-method=stream --progress --write-recovery-conf --create-slot --slot=server3_slot
Password:
34921/34921 kB (100%), 1/1 tablespace

Required options are:

  • –format=p Backup in plain text
  • –wal-method=stream Include WAL streaming
  • –write-recovery-conf Automatically generate replication configuration
  • –create-slot Create the replication slot automatically
  • –slot=server3_slot Provide the name of the replication slot

Checking my target /postgres/data01 directory all is there and even the standby.signal file:

[postgres@server4 data01]$ ll
total 252
-rw------- 1 postgres postgres    225 Nov  1 17:55 backup_label
-rw------- 1 postgres postgres 181149 Nov  1 17:55 backup_manifest
drwx------ 6 postgres postgres     54 Nov  1 17:55 base
-rw------- 1 postgres postgres     30 Nov  1 17:55 current_logfiles
drwx------ 2 postgres postgres   4096 Nov  1 17:55 global
drwx------ 2 postgres postgres     58 Nov  1 17:55 log
-rw------- 1 postgres postgres   1315 Nov  1 17:55 logfile
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_commit_ts
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_dynshmem
-rw------- 1 postgres postgres   4925 Nov  1 17:55 pg_hba.conf
-rw------- 1 postgres postgres   1636 Nov  1 17:55 pg_ident.conf
drwx------ 4 postgres postgres     68 Nov  1 17:55 pg_logical
drwx------ 4 postgres postgres     36 Nov  1 17:55 pg_multixact
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_notify
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_replslot
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_serial
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_snapshots
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_stat
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_stat_tmp
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_subtrans
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_tblspc
drwx------ 2 postgres postgres      6 Nov  1 17:55 pg_twophase
-rw------- 1 postgres postgres      3 Nov  1 17:55 PG_VERSION
drwx------ 3 postgres postgres     60 Nov  1 17:55 pg_wal
drwx------ 2 postgres postgres     18 Nov  1 17:55 pg_xact
-rw------- 1 postgres postgres    376 Nov  1 17:55 postgresql.auto.conf
-rw------- 1 postgres postgres   1190 Nov  1 17:55 postgresql.conf
-rw------- 1 postgres postgres  28750 Nov  1 17:55 postgresql.conf.org
-rw------- 1 postgres postgres      0 Nov  1 17:55 standby.signal

The replication configuration is in postgres.auto.conf file. Replication slot and so on is already there and when trying to insert a new row in my primary test table it has been replicated with no issue:

[postgres@server4 data01]$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=repuser password=secure_password channel_binding=prefer host=server3.domain.com port=5433 sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
primary_slot_name = 'server3_slot'

I just had to change listen_address parameter to match my standby server name and all other tasks have been done by pg_basebackup… Almost magical for small server where you don’t have lots of data to transfer over the network.

References

The post PostgreSQL physical replication hands-on appeared first on IT World.

]]>
https://blog.yannickjaquier.com/postgresql/postgresql-physical-replication-hands-on.html/feed 1
Kubernetes on virtual machines hands-on – part 4 https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-4.html https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-4.html#respond Mon, 04 Jul 2022 08:05:29 +0000 https://blog.yannickjaquier.com/?p=5328 Preamble We have seen in previous post how to create a shared storage for our Kubernetes nodes. Now let’s see what we need to create in k8s and how to use this persistent storage to create a stateful container. Persistent Volume creation I have spent quite a lot of time on this subject and digging […]

The post Kubernetes on virtual machines hands-on – part 4 appeared first on IT World.

]]>

Table of contents

Preamble

We have seen in previous post how to create a shared storage for our Kubernetes nodes. Now let’s see what we need to create in k8s and how to use this persistent storage to create a stateful container.

Persistent Volume creation

I have spent quite a lot of time on this subject and digging a bit on Internet I have realized that it is apparently a tricky subject for lots of people.

Taking the list of available type of Persistent Volume (PV) I have realized that almost nothing can take into consideration the clustered file system I have created with OCFS2:

kubernetes08
kubernetes08

The only option would be to use local for local storage but my shared storage is a bit more advanced because I do not loose it when the node is down and it is accessible from all Kubernetes cluster nodes.

In a more real life scenario you may consider fiber channel (FC) coming from SAN, iSCSI or even Container Storage Interface (CSI) with the right plugin. For sure one of the easiest if you have it in your company (all companies have more or less) is a NFS mount point. I will probably come back on his story later as my knowledge will increase about it…

Inspired by the official documentation I have the below yaml file (pv.yaml):

apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-pv-volume
spec:
  capacity:
    storage: 1Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /mnt/shared/postgres
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - server2.domain.com

Load the definition with:

[root@server1 ~]# kubectl create -f pv.yaml
persistentvolume/postgres-pv-volume created
[root@server1 ~]# kubectl get pv -o wide
NAME                 CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS    REASON   AGE   VOLUMEMODE
postgres-pv-volume   1Gi        RWO            Delete           Available           local-storage            3s    Filesystem

As my “local” storage is not really local, to recall I have create an OCFS2 cluster FS storage on a shared disk between my k8s nodes, in one of my trial I have tried to allow all nodes in nodeSelectorTerms property to be able to access the persistent volume. But it ended with a failing pod and this error:

[root@server1 ~]# kubectl describe pod/postgres-59b594d497-76mnk
Name:           postgres-59b594d497-76mnk
Namespace:      default
Priority:       0
Node:           
Labels:         app=postgres
                pod-template-hash=59b594d497
Annotations:    
Status:         Pending
IP:
IPs:            
Controlled By:  ReplicaSet/postgres-59b594d497
Containers:
  postgres:
    Image:      postgres:latest
    Port:       
    Host Port:  
    Args:
      -c
      port=5433
    Environment:
      POSTGRES_PASSWORD:  secure_password
      POSTGRES_DB:        testdb
    Mounts:
      /var/lib/postgresql/datal from postgres-pv-claim (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-njzgf (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  postgres-pv-claim:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  postgres-pv-claim
    ReadOnly:   false
  kube-api-access-njzgf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  3s (x17 over 16m)  default-scheduler  0/2 nodes are available: 2 node(s) had volume node affinity conflict.

Persistent Volume Claim creation

To allow the pod to request physical storage you have to create a Persistent Volume Claim. To do this I have created below file (pvc.yaml):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pv-claim
spec:
  storageClassName: local-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Mi

Load the definition with:

[root@server1 ~]# kubectl apply -f pvc.yaml
persistentvolumeclaim/postgres-pv-claim created
[root@server1 ~]# kubectl get pvc -o wide
NAME                STATUS   VOLUME               CAPACITY   ACCESS MODES   STORAGECLASS    AGE   VOLUMEMODE
postgres-pv-claim   Bound    postgres-pv-volume   1Gi        RWO            local-storage   5s    Filesystem

PostgreSQL stateful deployment creation

To redeploy the PostgreSQL deployment using the persistent volume claim I have modified my yaml file to have something like:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - image: postgres:latest
        name: postgres
        args: ["-c", "port=5433"]
        env:
        - name: POSTGRES_PASSWORD
          value: secure_password
        - name: POSTGRES_DB
          value: testdb
        volumeMounts:
        - mountPath: "/var/lib/postgresql/data"
          name: postgres-pv-claim
      restartPolicy: Always
      schedulerName: default-scheduler
      volumes:
      - name: postgres-pv-claim
        persistentVolumeClaim:
          claimName: postgres-pv-claim

One beginner error I have done is to use the root mount point of my shared storage that obviously contains the lost+found directory. As such the PostgreSQL inintdb tool is not able to create the PostgreSQL database directory complaining the folder is not empty:

[root@server1 ~]# kubectl logs postgres-54d7bbfd6c-kxqhh
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

initdb: error: directory "/var/lib/postgresql/data" exists but is not empty
It contains a lost+found directory, perhaps due to it being a mount point.
Using a mount point directly as the data directory is not recommended.
Create a subdirectory under the mount point.

Load this new deployment applying the yaml file with:

[root@server1 ~]# kubectl apply -f ~yjaquier/postgres.yaml

The pod should be re-deployed and when the pod is running:

[root@server1 ~]# kubectl get pods
NAME                        READY   STATUS    RESTARTS        AGE
httpd-757fb56c8d-7cdj5      1/1     Running   4 (2d18h ago)   6d23h
nginx-6799fc88d8-xg5kd      1/1     Running   4 (2d18h ago)   7d
postgres-54d7bbfd6c-jzmk2   1/1     Running   0               66s

You should already see something created in /mnt/shared/postgres which is a good sign:

[root@server1 ~]# ll /mnt/shared/postgres/
total 78
drwx------ 6 systemd-coredump input  1848 Oct 21 12:50 base
drwx------ 2 systemd-coredump input  1848 Oct 21 12:51 global
drwx------ 2 systemd-coredump input  1848 Oct 21 12:50 pg_commit_ts
drwx------ 2 systemd-coredump input  1848 Oct 21 12:50 pg_dynshmem
-rw------- 1 systemd-coredump input  4821 Oct 21 12:50 pg_hba.conf
-rw------- 1 systemd-coredump input  1636 Oct 21 12:50 pg_ident.conf
drwx------ 4 systemd-coredump input  1848 Oct 21 12:55 pg_logical
drwx------ 4 systemd-coredump input  1848 Oct 21 12:50 pg_multixact
drwx------ 2 systemd-coredump input  1848 Oct 21 12:50 pg_notify
drwx------ 2 systemd-coredump input  1848 Oct 21 12:50 pg_replslot
drwx------ 2 systemd-coredump input  1848 Oct 21 12:50 pg_serial
drwx------ 2 systemd-coredump input  1848 Oct 21 12:50 pg_snapshots
drwx------ 2 systemd-coredump input  1848 Oct 21 12:50 pg_stat
drwx------ 2 systemd-coredump input  1848 Oct 21 12:58 pg_stat_tmp
drwx------ 2 systemd-coredump input  1848 Oct 21 12:50 pg_subtrans
drwx------ 2 systemd-coredump input  1848 Oct 21 12:50 pg_tblspc
drwx------ 2 systemd-coredump input  1848 Oct 21 12:50 pg_twophase
-rw------- 1 systemd-coredump input     3 Oct 21 12:50 PG_VERSION
drwx------ 3 systemd-coredump input  1848 Oct 21 12:50 pg_wal
drwx------ 2 systemd-coredump input  1848 Oct 21 12:50 pg_xact
-rw------- 1 systemd-coredump input    88 Oct 21 12:50 postgresql.auto.conf
-rw------- 1 systemd-coredump input 28835 Oct 21 12:50 postgresql.conf
-rw------- 1 systemd-coredump input    53 Oct 21 12:50 postmaster.opts
-rw------- 1 systemd-coredump input    94 Oct 21 12:50 postmaster.pid

PostgreSQL stateful deployment testing

I connect to the PostgreSQL database of my stateful pod, create a new table and insert a row into it:

[root@server1 ~]# kubectl exec -it postgres-54d7bbfd6c-jzmk2 -- /bin/bash
root@postgres-54d7bbfd6c-jzmk2:/# su - postgres
postgres@postgres-54d7bbfd6c-jzmk2:~$ psql --port=5433 --dbname=testdb
psql (14.0 (Debian 14.0-1.pgdg110+1))
Type "help" for help.

testdb=# create table test01(id integer, descr varchar(20));
CREATE TABLE
testdb=# insert into test01 values(1,'One');
INSERT 0 1
testdb=# select * from test01;
 id | descr
----+-------
  1 | One
(1 row)

testdb=#

Now let’s delete the pod to let Kubernetes schedule a new one:

[root@server1 ~]# kubectl get pods
NAME                        READY   STATUS    RESTARTS        AGE
httpd-757fb56c8d-7cdj5      1/1     Running   4 (2d19h ago)   7d1h
nginx-6799fc88d8-xg5kd      1/1     Running   4 (2d19h ago)   7d1h
postgres-54d7bbfd6c-jzmk2   1/1     Running   0               74m
[root@server1 ~]# kubectl delete pod postgres-54d7bbfd6c-jzmk2
pod "postgres-54d7bbfd6c-jzmk2" deleted

After a brief period a new pod has replaced the one we deleted:

[root@server1 ~]# kubectl get pods
NAME                        READY   STATUS    RESTARTS        AGE
httpd-757fb56c8d-7cdj5      1/1     Running   4 (2d20h ago)   7d1h
nginx-6799fc88d8-xg5kd      1/1     Running   4 (2d20h ago)   7d1h
postgres-54d7bbfd6c-jc6t7   1/1     Running   0               6s

Now let’s see if our table is still there:

[root@server1 ~]# kubectl exec -it postgres-54d7bbfd6c-jc6t7 -- /bin/bash
root@postgres-54d7bbfd6c-jc6t7:/# su - postgres
postgres@postgres-54d7bbfd6c-jc6t7:~$ psql --port=5433 --dbname=testdb
psql (14.0 (Debian 14.0-1.pgdg110+1))
Type "help" for help.

testdb=# select * from test01;
 id | descr
----+-------
  1 | One
(1 row)

testdb=#

Hurrah !! My test table is still existing and we have not lost anything so really the deployment is stateful which is the targeted scenario in the case of a database !

Other Persistent Volumes plugins trials

OpenEBS

OpenEBS (https://github.com/openebs/openebs) self-claim to be the most widely deployed and easy to use open-source storage solution for Kubernetes. Promising on the paper…

Following the installation guide I have done:

[root@server1 ~]# kubectl apply -f https://openebs.github.io/charts/openebs-operator.yaml
namespace/openebs created
serviceaccount/openebs-maya-operator created
clusterrole.rbac.authorization.k8s.io/openebs-maya-operator created
clusterrolebinding.rbac.authorization.k8s.io/openebs-maya-operator created
customresourcedefinition.apiextensions.k8s.io/blockdevices.openebs.io created
customresourcedefinition.apiextensions.k8s.io/blockdeviceclaims.openebs.io created
configmap/openebs-ndm-config created
daemonset.apps/openebs-ndm created
deployment.apps/openebs-ndm-operator created
deployment.apps/openebs-ndm-cluster-exporter created
service/openebs-ndm-cluster-exporter-service created
daemonset.apps/openebs-ndm-node-exporter created
service/openebs-ndm-node-exporter-service created
deployment.apps/openebs-localpv-provisioner created
storageclass.storage.k8s.io/openebs-hostpath created
storageclass.storage.k8s.io/openebs-device created

Get Storage classes with (kubectl api-resources for a list of K8s resources):

[root@server1 ~]# kubectl get sc
NAME               PROVISIONER        RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
openebs-device     openebs.io/local   Delete          WaitForFirstConsumer   false                  83s
openebs-hostpath   openebs.io/local   Delete          WaitForFirstConsumer   false                  83s

OpenEBs created pods:

[root@server1 ~]# kubectl get pods -n openebs
NAME                                            READY   STATUS             RESTARTS      AGE
openebs-localpv-provisioner-6756f57d65-5nbf9    1/1     Running            3 (38s ago)   3m21s
openebs-ndm-cluster-exporter-5c985f8b77-zvgm9   1/1     Running            3 (29s ago)   3m22s
openebs-ndm-lspxp                               1/1     Running            0             3m22s
openebs-ndm-node-exporter-qwr4s                 1/1     Running            0             3m21s
openebs-ndm-node-exporter-xrgdm                 0/1     CrashLoopBackOff   2 (27s ago)   3m21s
openebs-ndm-operator-9bdd87f58-t6nx5            0/1     Running            3 (22s ago)   3m22s
openebs-ndm-p77wg                               1/1     Running            0             3m22s

I had issues with the OpenEBS pods, plenty are constantly crashing and I have not been able to solve it…

Container Storage Interface (CSI) open-local

As documentation state, Container Storage Interface (CSI) is a standard to expose storage to containers. Based on this generic specification you can find multiple drivers that will implement various storage technologies. From the list of CSI drivers one kept my attention for my fake attached shared disks: open-local.

To install open-local I add to install Helm:

[root@server1 ~]# curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
[root@server1 ~]# ll
total 16000
-rw-------. 1 root root     1897 Dec 10  2019 anaconda-ks.cfg
-rw-r--r--  1 root root   145532 Jul 15 14:20 certifi-2021.5.30-py2.py3-none-any.whl
-rw-r--r--  1 root root    35525 Jul 15 14:20 charset_normalizer-2.0.2-py3-none-any.whl
-rw-r--r--  1 root root       99 Jul 29 14:26 dashboard-adminuser.yaml
-rw-r--r--  1 root root      270 Jul 29 14:29 dashboard-authorization-adminuser.yaml
-rw-r--r--  1 root root    11248 Oct 22 14:20 get_helm.sh
-rwxr-xr-x  1 root root      187 Oct 21 16:40 getscsi
-rw-r--r--  1 root root     1566 Oct 14 16:34 httpd.yaml
-rw-r--r--  1 root root    59633 Jul 15 14:20 idna-3.2-py3-none-any.whl
-rw-r--r--  1 root root 15893915 Jul  7 23:56 minikube-latest.x86_64.rpm
-rw-r--r--  1 root root      216 Aug  2 16:08 nginx.yaml
-rw-r--r--  1 root root    62251 Jul 15 14:20 requests-2.26.0-py2.py3-none-any.whl
-rw-r--r--  1 root root   138494 Jul 15 14:20 urllib3-1.26.6-py2.py3-none-any.whl
[root@server1 ~]# chmod 700 get_helm.sh
[root@server1 ~]# ./get_helm.sh
Downloading https://get.helm.sh/helm-v3.7.1-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm

To be honest the English version of the open-local installation guide is not very easy to follow. It is not really written but you need a zip copy of the GitHub project that you unzip in a folder of your control node. The customize the ./heml/values.yaml file to match your storage:

[root@server1 helm]# ll
total 8
-rw-r--r-- 1 root root  105 Oct 14 14:03 Chart.yaml
drwxr-xr-x 2 root root  275 Oct 14 14:03 crds
drwxr-xr-x 2 root root  275 Oct 14 14:03 templates
-rw-r--r-- 1 root root 1201 Oct 14 14:03 values.yaml
[root@server1 helm]# vi values.yaml
[root@server1 helm]# grep device: values.yaml
  device: /dev/sdb
[root@server1 open-local-main]# helm install open-local ./helm
NAME: open-local
LAST DEPLOYED: Fri Oct 22 14:37:25 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

If you want to add later additional storage to the values.yaml file you can then “upgrade” your open-local installation with:

[root@server1 open-local-main]# helm upgrade open-local ./helm
Release "open-local" has been upgraded. Happy Helming!
NAME: open-local
LAST DEPLOYED: Fri Oct 22 15:53:56 2021
NAMESPACE: default
STATUS: deployed
REVISION: 2
TEST SUITE: None
[root@server1 open-local-main]# helm list
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
open-local      default         2               2021-10-22 15:53:56.56525654 +0200 CEST deployed        open-local-v0.2.2

Installation wen well:

[root@server1 ~]# kubectl get po -nkube-system  -l app=open-local -o wide
NAME                                              READY   STATUS      RESTARTS   AGE   IP               NODE                 NOMINATED NODE   READINESS GATES
open-local-agent-6d5mz                            3/3     Running     0          10m   192.168.56.102   server2.domain.com              
open-local-agent-kd49b                            3/3     Running     0          10m   192.168.56.101   server1.domain.com              
open-local-csi-provisioner-59cd8644ff-sv42x       1/1     Running     0          10m   192.168.56.102   server2.domain.com              
open-local-csi-resizer-554f54b5b4-6mzvz           1/1     Running     0          10m   192.168.56.102   server2.domain.com              
open-local-csi-snapshotter-64dff4b689-pq8r4       1/1     Running     0          10m   192.168.56.102   server2.domain.com              
open-local-init-job--1-59cnq                      0/1     Completed   0          10m   192.168.56.101   server1.domain.com              
open-local-init-job--1-84j9g                      0/1     Completed   0          10m   192.168.56.101   server1.domain.com              
open-local-init-job--1-nh6nn                      0/1     Completed   0          10m   192.168.56.101   server1.domain.com              
open-local-scheduler-extender-5d48bc465c-cj84l    1/1     Running     0          10m   192.168.56.101   server1.domain.com              
open-local-snapshot-controller-846c8f6578-qbqgx   1/1     Running     0          10m   192.168.56.102   server2.domain.com              
[root@server1 ~]# kubectl get nodelocalstorage
NAME                 STATE       PHASE     AGENTUPDATEAT   SCHEDULERUPDATEAT   SCHEDULERUPDATESTATUS
server1.domain.com   DiskReady   Running   6s              9m16s
server2.domain.com   DiskReady   Running   20s             8m49s

All storage classes well created:

[root@server1 ~]# kubectl get sc
NAME                        PROVISIONER            RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
open-local-device-hdd       local.csi.aliyun.com   Delete          WaitForFirstConsumer   false                  11m
open-local-device-ssd       local.csi.aliyun.com   Delete          WaitForFirstConsumer   false                  11m
open-local-lvm              local.csi.aliyun.com   Delete          WaitForFirstConsumer   true                   11m
open-local-lvm-xfs          local.csi.aliyun.com   Delete          WaitForFirstConsumer   true                   11m
open-local-mountpoint-hdd   local.csi.aliyun.com   Delete          WaitForFirstConsumer   false                  11m
open-local-mountpoint-ssd   local.csi.aliyun.com   Delete          WaitForFirstConsumer   false                  11m

Before trying with my own PostgreSQL container I have tried the provided Nginx example, more particularly the block one available in example/device/sts-block.yaml of the GitHub project. But it didn’t work and the Persistent Volume has not been created:

[root@server1 ~]# kubectl get sts
NAME           READY   AGE
nginx-device   0/1     2d19h
[root@server1 ~]# kubectl get pods
NAME             READY   STATUS    RESTARTS   AGE
nginx-device-0   0/1     Pending   0          2d19h
[root@server1 ~]# kubectl get pvc
NAME                  STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS            AGE
html-nginx-device-0   Pending                                      open-local-device-hdd   2d19h
[root@server1 ~]# kubectl get pv
No resources found
[root@server1 ~]# kubectl describe pvc html-nginx-device-0
Name:          html-nginx-device-0
Namespace:     default
StorageClass:  open-local-device-hdd
Status:        Pending
Volume:
Labels:        app=nginx-device
Annotations:   volume.beta.kubernetes.io/storage-provisioner: local.csi.aliyun.com
               volume.kubernetes.io/selected-node: server2.domain.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Block
Used By:       nginx-device-0
Events:
  Type     Reason                Age                     From                                                                          Message
  ----     ------                ----                    ----                                                                          -------
  Normal   Provisioning          8m51s (x268 over 169m)  local.csi.aliyun.com_server2.domain.com_aa0fd726-4fe8-46b0-9ab6-2e669a1b052f  External provisioner is provisioning volume for claim "default/html-nginx-device-0"
  Normal   ExternalProvisioning  4m26s (x636 over 169m)  persistentvolume-controller                                                   waiting for a volume to be created, either by external provisioner "local.csi.aliyun.com" or manually created by system administrator
  Warning  ProvisioningFailed    4m16s (x271 over 168m)  local.csi.aliyun.com_server2.domain.com_aa0fd726-4fe8-46b0-9ab6-2e669a1b052f  failed to provision volume with StorageClass "open-local-device-hdd": rpc error: code = DeadlineExceeded desc = context deadline exceeded

References

The post Kubernetes on virtual machines hands-on – part 4 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-4.html/feed 0
Kubernetes on virtual machines hands-on – part 3 https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-3.html https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-3.html#comments Fri, 17 Jun 2022 07:36:44 +0000 https://blog.yannickjaquier.com/?p=5326 Preamble After the creation of our first Kubernetes Nginx stateless pod let see why stateless pod is an issue (not only) for database pod. In this third article I will focus on why you need stateful container for certain types of workload. As it has kept me busy for some time we will also see […]

The post Kubernetes on virtual machines hands-on – part 3 appeared first on IT World.

]]>

Table of contents

Preamble

After the creation of our first Kubernetes Nginx stateless pod let see why stateless pod is an issue (not only) for database pod. In this third article I will focus on why you need stateful container for certain types of workload.

As it has kept me busy for some time we will also see how to create the share storage.

In next blog post we will see which Kubernetes storage plugin to use to access it.

PostgreSQL stateless deployment creation

I will obviously use the official PostgreSQL image that can be find on Docker Hub:

kubernetes07
kubernetes07

For this new deployment I have decided to create my own YAML file that I would be able to use over and over to add new functionalities. I have started by the one of my Nginx deployment as a skeleton:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - image: postgres:latest
        name: postgres
        ports:
        - containerPort: 5433
      restartPolicy: Always
      schedulerName: default-scheduler

Let’s load this new deployment:

[root@server1 ~]# kubectl apply -f postgres.yaml
deployment.apps/postgres created
[root@server1 ~]# kubectl get deployment
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
httpd      1/1     1            1           22h
nginx      1/1     1            1           6d23h
postgres   0/1     1            0           16s

After a while my deployement failed:

[root@server1 ~]# kubectl get pod
NAME                        READY   STATUS              RESTARTS   AGE
httpd-757fb56c8d-7cdj5      1/1     Running             0          22h
nginx-6799fc88d8-xg5kd      1/1     Running             0          23h
postgres-74b5d46bcb-tvv8v   0/1     ContainerCreating   0          59s
[root@server1 ~]# kubectl get pod
NAME                        READY   STATUS    RESTARTS      AGE
httpd-757fb56c8d-7cdj5      1/1     Running   0             22h
nginx-6799fc88d8-xg5kd      1/1     Running   0             23h
postgres-74b5d46bcb-tvv8v   0/1     Error     2 (18s ago)   82s

Get the log with:

[root@server1 ~]# kubectl logs postgres-74b5d46bcb-tvv8v
Error: Database is uninitialized and superuser password is not specified.
       You must specify POSTGRES_PASSWORD to a non-empty value for the
       superuser. For example, "-e POSTGRES_PASSWORD=password" on "docker run".

       You may also use "POSTGRES_HOST_AUTH_METHOD=trust" to allow all
       connections without a password. This is *not* recommended.

       See PostgreSQL documentation about "trust":
       https://www.postgresql.org/docs/current/auth-trust.html

So added to my YAML file:

env:
- name: POSTGRES_PASSWORD
  value: secure_password
- name: POSTGRES_DB
  value: testdb

The options I have changed are the postgres superuser password with POSTGRES_PASSWORD and I have asked the creation of a default database called testdb with POSTGRES_DB. The Docker Hub PostgresSQL default page gives a clear explanation and a list of all possible environment variables.

Delete and re-deploy:

[root@server1 ~]# kubectl delete deployment postgres
deployment.apps "postgres" deleted
[root@server1 ~]# kubectl get deployment
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
httpd   1/1     1            1           22h
nginx   1/1     1            1           6d23h
[root@server1 ~]# kubectl apply -f postgres.yaml
deployment.apps/postgres created
[root@server1 ~]# kubectl get pod
NAME                        READY   STATUS    RESTARTS   AGE
httpd-757fb56c8d-7cdj5      1/1     Running   0          22h
nginx-6799fc88d8-xg5kd      1/1     Running   0          23h
postgres-6d7fcf96b5-gfpxf   1/1     Running   0          5s

You can now connect to the PostgreSQL database inside the pod with (we see that the port is not 5433 as expected but still the default 5432):

[root@server1 ~]# kubectl exec -it postgres-6d7fcf96b5-gfpxf -- /bin/bash
root@postgres-6d7fcf96b5-gfpxf:/# su - postgres
postgres@postgres-6d7fcf96b5-gfpxf:~$ psql
psql (14.0 (Debian 14.0-1.pgdg110+1))
Type "help" for help.

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 testdb    | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
(4 rows)

postgres=#

Or directly from the server where is running the pod after I got the IP address of the pod:

[root@server2 ~]# psql --host=192.168.55.19 --port=5432 --username=postgres
Password for user postgres:
psql (13.4, server 14.0 (Debian 14.0-1.pgdg110+1))
WARNING: psql major version 13, server major version 14.
         Some psql features might not work.
Type "help" for help.

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 testdb    | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
(4 rows)

postgres=#

It finally appears that the – containerPort: 5433 is not providing the expected result. The solution to change default port (5432) is:

args: ["-c", "port=5433"]

The stateless issue with database containers

Let’s create a new table in my tesdb database and insert a new row into it:

[root@server1 ~]# kubectl exec -it postgres-5594494b8f-2wsvh -- /bin/bash
root@postgres-5594494b8f-2wsvh:/# su - postgres
postgres@postgres-5594494b8f-2wsvh:~$ psql --port=5433
psql (14.0 (Debian 14.0-1.pgdg110+1))
Type "help" for help.

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 testdb    | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
(4 rows)

postgres=# \c testdb
You are now connected to database "testdb" as user "postgres".
testdb=# create table test01(id integer, descr varchar(20));
CREATE TABLE
testdb=# insert into test01 values(1,'One');
INSERT 0 1
testdb=# select * from test01;
 id | descr
----+-------
  1 | One
(1 row)

testdb=#

Now I delete the pod, like it would happen in a real life k8s cluster. The pod is automatically recreated by the deployment and I then try to get my test table figures:

[root@server1 ~]# kubectl get pod -o wide
NAME                        READY   STATUS    RESTARTS   AGE   IP              NODE                 NOMINATED NODE   READINESS GATES
httpd-757fb56c8d-7cdj5      1/1     Running   0          23h   192.168.55.17   server2.domain.com              
nginx-6799fc88d8-xg5kd      1/1     Running   0          24h   192.168.55.16   server2.domain.com              
postgres-5594494b8f-2wsvh   1/1     Running   0          17m   192.168.55.21   server2.domain.com              
[root@server1 ~]# kubectl delete pod postgres-5594494b8f-2wsvh
pod "postgres-5594494b8f-2wsvh" deleted
[root@server1 ~]# kubectl get pod -o wide
NAME                        READY   STATUS    RESTARTS   AGE   IP              NODE                 NOMINATED NODE   READINESS GATES
httpd-757fb56c8d-7cdj5      1/1     Running   0          23h   192.168.55.17   server2.domain.com              
nginx-6799fc88d8-xg5kd      1/1     Running   0          24h   192.168.55.16   server2.domain.com              
postgres-5594494b8f-p88h9   1/1     Running   0          5s    192.168.55.22   server2.domain.com              
[root@server1 ~]# kubectl exec -it postgres-5594494b8f-p88h9 -- /bin/bash
root@postgres-5594494b8f-p88h9:/# su - postgres
postgres@postgres-5594494b8f-p88h9:~$ psql --port=5433 --dbname=testdb
psql (14.0 (Debian 14.0-1.pgdg110+1))
Type "help" for help.

testdb=# select * from test01;
ERROR:  relation "test01" does not exist
LINE 1: select * from test01;
                      ^
testdb=#

Oups, well as expected I would say, the information has gone and again in the case of a database this is clearly not acceptable. To make persistent the content we need to work a little more with persistent volume and persistent volume claim.

Creation of the cluster filesystem between Kubernetes nodes

Kubernetes has a lot of available persistent volumes plugins to allow you to mount an incredible number of different storage types on your k8s nodes. On my trial k8s cluster made of two virtual machines I have decided to use a shared disk same as we have already seen in my Oracle Real Application Cluster (RAC) configuration trial. Once this shared storage and cluster filesystem will be created the idea is to use the local Kubernetes storage plugin.

Once I have attached the shared disk let’s create a new partition:

[root@server1 ~]# fdisk /dev/sdb

Welcome to fdisk (util-linux 2.32.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xde5a83e7.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p):

Using default response p.
Partition number (1-4, default 1):
First sector (2048-2097151, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-2097151, default 2097151):

Created a new partition 1 of type 'Linux' and of size 1023 MiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

[root@server1 ~]# fdisk -l /dev/sdb
Disk /dev/sdb: 1 GiB, 1073741824 bytes, 2097152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xde5a83e7

Device     Boot Start     End Sectors  Size Id Type
/dev/sdb1        2048 2097151 2095104 1023M 83 Linux
[root@server1 ~]# mkfs -t xfs /dev/sdb1
meta-data=/dev/sdb1              isize=512    agcount=4, agsize=65472 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=261888, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1566, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

On second node to make the shared partition available I had to trick a bit fdisk:

[root@server2 ~]# blkid /dev/sdb1
[root@server2 ~]# fdisk /dev/sdb

Welcome to fdisk (util-linux 2.32.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Then as I will not have Oracle ASM you have to decide to use a shared filesystem…

GFS2

This cluster FS is sponsored by RedHat and installation is simple as:

[root@server1 ~]# dnf install gfs2-utils.x86_64

Then create the FS with:

[root@server1 ~]# mkfs -t gfs2 -p lock_dlm -t cluster01:postgres -j 8 /dev/sdb1
It appears to contain an existing filesystem (xfs)
This will destroy any data on /dev/sdb1
Are you sure you want to proceed? [y/n] y
Discarding device contents (may take a while on large devices): Done
Adding journals: Done
Building resource groups: Done
Creating quota file: Done
Writing superblock and syncing: Done
Device:                    /dev/sdb1
Block size:                4096
Device size:               1.00 GB (261888 blocks)
Filesystem size:           1.00 GB (261886 blocks)
Journals:                  8
Journal size:              8MB
Resource groups:           12
Locking protocol:          "lock_dlm"
Lock table:                "cluster01:postgres"
UUID:                      90f456e8-cf74-43af-a838-53b129682f7d

But when I tried to mount it I got:

[root@server1 ~]# mount -a
mount: /mnt/shared: mount(2) system call failed: Transport endpoint is not connected.

From the RedHat official solution I have discovered that lock_dlm modules was not loaded:

[root@server1 ~]# lsmod |grep lock
[root@server1 ~]# modprobe lock_dlm
modprobe: FATAL: Module lock_dlm not found in directory /lib/modules/4.18.0-305.19.1.el8_4.x86_64
[root@server1 ~]# dmesg | grep gfs
[ 8255.885092] gfs2: GFS2 installed
[ 8255.896751] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres"
[ 8255.899671] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107
[ 8291.542025] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres"
[ 8291.542186] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107
[ 8376.146357] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres"
[ 8376.156197] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107
[ 8442.132982] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres"
[ 8442.137871] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107
[12479.923651] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres"
[12479.924713] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107
[12861.644565] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres"
[12861.644663] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107
[13016.278584] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres"
[13016.279004] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107
[13042.852965] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres"
[13042.866282] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107
[13362.619425] gfs2: fsid=cluster01:postgres: Trying to join cluster "lock_dlm", "cluster01:postgres"
[13362.631850] gfs2: fsid=cluster01:postgres: dlm_new_lockspace error -107

I have tried to install kernel extra modules:

[root@server1 ~]# dnf install kernel-modules-extra.x86_64

To finally realized that GFS2 is delivered as extra cost add-ons in Red Hat Enterprise Linux, such as the High Availability Add-On for clustering and the Resilient Storage Add-On for GFS2. I thought the FS was free but apparently GFS was free but GFS2 is not…

OCFS2

As I’m using the Oracle Linux OCFS2 sounds like a good idea and for sure OCFS2 is released under the GNU General Public License. Install it with:

[root@server1 ~]# dnf install ocfs2-tools.x86_64

On one of your node create the OCFS2 cluster with:

[root@server1 ~]# o2cb add-cluster k8socfs2
[root@server1 ~]# o2cb add-node --ip 192.168.56.101 --port 7777 --number 1 k8socfs2 server1.domain.com
[root@server1 ~]# o2cb add-node --ip 192.168.56.102 --port 7777 --number 2 k8socfs2 server2.domain.com
[root@server1 ~]# o2cb register-cluster k8socfs2
[root@server1 ~]# o2cb start-heartbeat k8socfs2
[root@server1 ~]# cat /etc/ocfs2/cluster.conf
cluster:
        heartbeat_mode = local
        node_count = 2
        name = k8socfs2

node:
        number = 1
        cluster = k8socfs2
        ip_port = 7777
        ip_address = 192.168.56.101
        name = server1.domain.com

node:
        number = 2
        cluster = k8socfs2
        ip_port = 7777
        ip_address = 192.168.56.102
        name = server2.domain.com

Copy this configuration file (/etc/ocfs2/cluster.conf) on all nodes of your OCFS2 cluster.

Initialize the OCFS2 cluster stack (O2CB) with:

[root@server1 ~]# o2cb.init --help
Usage: /usr/sbin/o2cb.init {start|stop|restart|force-reload|enable|disable|configure|load|unload|online|offline|force-offline|status|online-status}
[root@server1 /]# o2cb.init configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot.  The current values will be shown in brackets ('[]').  Hitting
 without typing an answer will keep that current value.  Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [n]: y
Cluster stack backing O2CB [o2cb]:
Cluster to start on boot (Enter "none" to clear) [ocfs2]: k8socfs2
Specify heartbeat dead threshold (>=7) [31]:
Specify network idle timeout in ms (>=5000) [30000]:
Specify network keepalive delay in ms (>=1000) [2000]:
Specify network reconnect delay in ms (>=2000) [2000]:
Writing O2CB configuration: OK
checking debugfs...
Loading filesystem "ocfs2_dlmfs": Unable to load filesystem "ocfs2_dlmfs"
Failed
[root@server1 /]#
[root@server1 /]# lsmod |egrep -i "ocfs|o2"
[root@server1 /]# modprobe ocfs2_dlmfs
modprobe: FATAL: Module ocfs2_dlmfs not found in directory /lib/modules/4.18.0-305.19.1.el8_4.x86_64
[root@server1 /]# o2cb.init status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Driver for "ocfs2_dlmfs": Not loaded
Checking O2CB cluster "ociocfs2": Offline
stat: cannot read file system information for '/dlm': No such file or directory
Debug file system at /sys/kernel/debug: mounted

I realized that all issues I had (including mount.ocfs2: Unable to access cluster service while trying initialize cluster) were linked to the kernel I was using. Not the UEK Oracle kernel. All issues have been resolved at the moment I switch to Oracle UEK kernel !

Do not forget to start and enable o2cb service with:

[root@server2 ~]# systemctl status o2cb
● o2cb.service - Load o2cb Modules
   Loaded: loaded (/usr/lib/systemd/system/o2cb.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
[root@server2 ~]# systemctl start o2cb
[root@server2 ~]# systemctl status o2cb
● o2cb.service - Load o2cb Modules
   Loaded: loaded (/usr/lib/systemd/system/o2cb.service; disabled; vendor preset: disabled)
   Active: active (exited) since Mon 2021-10-18 14:04:15 CEST; 1s ago
  Process: 73099 ExecStart=/sbin/o2cb.init enable (code=exited, status=0/SUCCESS)
 Main PID: 73099 (code=exited, status=0/SUCCESS)

Oct 18 14:04:14 server2.domain.com systemd[1]: Starting Load o2cb Modules...
Oct 18 14:04:15 server2.domain.com o2cb.init[73099]: checking debugfs...
Oct 18 14:04:15 server2.domain.com o2cb.init[73099]: Setting cluster stack "o2cb": OK
Oct 18 14:04:15 server2.domain.com o2cb.init[73099]: Cluster ociocfs2 already online
Oct 18 14:04:15 server2.domain.com systemd[1]: Started Load o2cb Modules.
[root@server2 ~]# systemctl enable o2cb
Created symlink /etc/systemd/system/multi-user.target.wants/o2cb.service → /usr/lib/systemd/system/o2cb.service.

Create the FS with:

[root@server1 ~]# man mkfs.ocfs2
[root@server1 ~]# mkfs -t ocfs2 --cluster-name=k8socfs2 --fs-feature-level=max-features --cluster-stack=o2cb -N 4 /dev/sdb1
mkfs.ocfs2 1.8.6
Cluster stack: o2cb
Cluster name: k8socfs2
Stack Flags: 0x0
NOTE: Feature extended slot map may be enabled
Overwriting existing ocfs2 partition.
Proceed (y/N): y
Label:
Features: sparse extended-slotmap backup-super unwritten inline-data strict-journal-super metaecc xattr indexed-dirs usrquota grpquota refcount discontig-bg
Block size: 2048 (11 bits)
Cluster size: 4096 (12 bits)
Volume size: 1072693248 (261888 clusters) (523776 blocks)
Cluster groups: 17 (tail covers 7936 clusters, rest cover 15872 clusters)
Extent allocator size: 4194304 (1 groups)
Journal size: 33554432
Node slots: 4
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 0 block(s)
Formatting Journals: done
Growing extent allocator: done
Formatting slot map: done
Formatting quota files: done
Writing lost+found: done
mkfs.ocfs2 successful

Get the id of the device with:

[root@server1 ~]# blkid /dev/sdb1
/dev/sdb1: UUID="ea6e9804-105d-4d4c-96e8-bd54ab5e93d2" BLOCK_SIZE="2048" TYPE="ocfs2" PARTUUID="de5a83e7-01"
[root@server1 ~]# echo "ea6e9804-105d-4d4c-96e8-bd54ab5e93d2" >> /etc/fstab
[root@server1 ~]# vi /etc/fstab
[root@server1 ~]# tail -n 2 /etc/fstab
# Shared storage
UUID="ea6e9804-105d-4d4c-96e8-bd54ab5e93d2"     /mnt/shared      ocfs2    defaults     0 0
[root@server1 ~]# mount -a

With Oracle UEK kernel I now have:

[root@server3 postgres]# o2cb.init status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Stack glue driver: Loaded
Stack plugin "o2cb": Loaded
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster "ociocfs2": Online
  Heartbeat dead threshold: 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
  Heartbeat mode: Local
Checking O2CB heartbeat: Active
Debug file system at /sys/kernel/debug: mounted
[root@server3 ~]# df /mnt/shared
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/sdb1        1047552 78956    968596   8% /mnt/shared

And I can share file from all nodes of my k8s cluster…

I had an issue where the FS was automatically unmounted by the system:

Oct 18 15:10:50 server1 kernel: o2dlm: Joining domain E80332E942C649EB942623C43D2B35DC
Oct 18 15:10:50 server1 kernel: (
Oct 18 15:10:50 server1 kernel: 1
Oct 18 15:10:50 server1 kernel: ) 1 nodes
Oct 18 15:10:50 server1 kernel: ocfs2: Mounting device (8,17) on (node 1, slot 0) with ordered data mode.
Oct 18 15:10:50 server1 systemd[1]: mnt-postgres.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-b7f61498\x2da4f0\x2d4570\x2da0ed\x2dcb50caa98165.device. Stopping, too.
Oct 18 15:10:50 server1 systemd[1]: Unmounting /mnt/shared...
Oct 18 15:10:50 server1 systemd[10949]: mnt-postgres.mount: Succeeded.

Solved with:

[root@server1 /]# systemctl daemon-reload

I also had an issue where o2cb was not able to register my cluster name:

[root@server2 ~]# o2cb register-cluster k8socfs2
o2cb: Internal logic failure while registering cluster 'k8socfs2'

Old trials not completely removed and solved it with:

[root@server2 /]# o2cb cluster-status
Cluster 'ociocfs2' is online
[root@server2 /]# o2cb unregister-cluster ociocfs2

References

The post Kubernetes on virtual machines hands-on – part 3 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-3.html/feed 1
Kubernetes on virtual machines hands-on – part 2 https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-2.html https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-2.html#comments Sat, 28 May 2022 07:45:05 +0000 https://blog.yannickjaquier.com/?p=5278 Preamble After a first part where we have setup the Kubernetes cluster and added a second node, this second part will be about pods creation with direct download of the image and creation of YAML file to use them later to deploy and, more importantly, modify existing pod. This second part will mainly focus on […]

The post Kubernetes on virtual machines hands-on – part 2 appeared first on IT World.

]]>

Table of contents

Preamble

After a first part where we have setup the Kubernetes cluster and added a second node, this second part will be about pods creation with direct download of the image and creation of YAML file to use them later to deploy and, more importantly, modify existing pod.

This second part will mainly focus on setup of useful tools to handle YAML files and creation of stateless pods.

Stateless Nginx web server with a deployment

The suggested stateless application deployment to test your newly installed Kubernetes cluster is a simple Nginx web server though a Kubernetes deployment. Create it simply with below command and wait a few minutes to have it ready (number of replicas is 1 by default):

[root@server1 ~]# kubectl create deployment nginx --image=nginx
deployment.apps/nginx created
[root@server1 ~]# kubectl get deployment
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   0/1     1            0           9s
[root@server1 ~]# kubectl get deployment
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           104s

Confirm the pod is running with one replicaset. By default the running node of my Kubernetes cluster is OT the control node (so server2.domain.com in my case):

[root@server1 ~]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP             NODE                 NOMINATED NODE   READINESS GATES
nginx-6799fc88d8-zn2dm   1/1     Running   0          40s   192.168.55.2   server2.domain.com              
[root@server1 ~]# kubectl get replicaset
NAME               DESIRED   CURRENT   READY   AGE
nginx-6799fc88d8   1         1         1       6d22h

Expose the deployment as a service on port 80 and get the port on the k8s cluster of your deployment with:

[root@server1 ~]# kubectl expose deployment nginx --type=NodePort --port=80
service/nginx exposed
[root@server1 ~]# kubectl get service nginx
NAME    TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
nginx   NodePort   10.105.114.178           80:32609/TCP   177m

Then you can control it works with your web browser (accessing my worker IP address i.e. server2.domain.com on port 32609):

kubernetes04
kubernetes04

To get more information on your newly created service:

[root@server1 ~]# kubectl describe service nginx
Name:                     nginx
Namespace:                default
Labels:                   app=nginx
Annotations:              
Selector:                 app=nginx
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.105.114.178
IPs:                      10.105.114.178
Port:                       80/TCP
TargetPort:               80/TCP
NodePort:                   32609/TCP
Endpoints:                192.168.55.2:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   
[root@server1 ~]# kubectl get ep nginx
NAME    ENDPOINTS         AGE
nginx   192.168.55.2:80   46m

You can also curl inside the container with exec command using the cluster IP address with something like:

[root@server1 ~]# kubectl exec nginx-6799fc88d8-zn2dm -- curl -s http://10.105.114.178
.
.
.

Accessing a container from outside the Kubernetes cluster

Even if my network expertise is ridiculous I really wanted to access the pod with its own IP address i.e. 192.168.55.2. For this I have started to modify the routing table on my virtual machines host that is my Windows 10 desktop:

PS C:\WINDOWS\system32> route print
===========================================================================
Interface List
  6...0a 00 27 00 00 06 ......VirtualBox Host-Only Ethernet Adapter
  5...48 0f cf 33 0a 07 ......Intel(R) Ethernet Connection (2) I218-LM
  1...........................Software Loopback Interface 1
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0    10.70.101.254    10.70.101.129     35
      10.70.101.0    255.255.255.0         On-link     10.70.101.129    291
    10.70.101.129  255.255.255.255         On-link     10.70.101.129    291
    10.70.101.255  255.255.255.255         On-link     10.70.101.129    291
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    331
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    331
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    331
     192.168.56.0    255.255.255.0         On-link      192.168.56.1    281
     192.168.56.1  255.255.255.255         On-link      192.168.56.1    281
   192.168.56.255  255.255.255.255         On-link      192.168.56.1    281
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    331
        224.0.0.0        240.0.0.0         On-link     10.70.101.129    291
        224.0.0.0        240.0.0.0         On-link      192.168.56.1    281
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    331
  255.255.255.255  255.255.255.255         On-link     10.70.101.129    291
  255.255.255.255  255.255.255.255         On-link      192.168.56.1    281
===========================================================================
Persistent Routes:
  None

IPv6 Route Table
===========================================================================
Active Routes:
 If Metric Network Destination      Gateway
  1    331 ::1/128                  On-link
  1    331 ff00::/8                 On-link
===========================================================================
Persistent Routes:
  None

With initial configuration any TRACERT.EXE 192.168.55.18 or TRACERT.EXE 192.168.55.1 would end up with something not answering…

So (as administrator), inspired from 192.168.56.0/24 subnet configured by VirtualBox, I have issued the two below commands (that will give a route to 192.168.55.0/24 subnet). I also had to specify the interface to use the VirtualBox one (if option):

PS C:\WINDOWS\system32> route add 192.168.55.0 mask 255.255.255.0 192.168.55.1 if 6
 OK!
PS C:\WINDOWS\system32> route add 192.168.55.1 mask 255.255.255.255 192.168.56.1 if 6
 OK!
PS C:\WINDOWS\system32> route add 192.168.55.255 mask 255.255.255.255 192.168.56.1 if 6
 OK!

To remove what you added you can use:

PS C:\WINDOWS\system32> route delete 192.168.55.0
 OK!
PS C:\WINDOWS\system32> route delete 192.168.55.1
 OK!
PS C:\WINDOWS\system32> route delete 192.168.55.255
 OK!

To end up with this routing table:

PS C:\Users\yjaquier> route print
===========================================================================
Interface List
  6...0a 00 27 00 00 06 ......VirtualBox Host-Only Ethernet Adapter
  5...48 0f cf 33 0a 07 ......Intel(R) Ethernet Connection (2) I218-LM
  1...........................Software Loopback Interface 1
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0    10.70.101.254    10.70.101.129     35
      10.70.101.0    255.255.255.0         On-link     10.70.101.129    291
    10.70.101.129  255.255.255.255         On-link     10.70.101.129    291
    10.70.101.255  255.255.255.255         On-link     10.70.101.129    291
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    331
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    331
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    331
     192.168.55.0    255.255.255.0     192.168.55.1     192.168.56.1     26
     192.168.55.1  255.255.255.255         On-link      192.168.56.1     26
   192.168.55.255  255.255.255.255         On-link      192.168.56.1     26
     192.168.56.0    255.255.255.0         On-link      192.168.56.1    281
     192.168.56.1  255.255.255.255         On-link      192.168.56.1    281
   192.168.56.255  255.255.255.255         On-link      192.168.56.1    281
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    331
        224.0.0.0        240.0.0.0         On-link      192.168.56.1    281
        224.0.0.0        240.0.0.0         On-link     10.70.101.129    291
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    331
  255.255.255.255  255.255.255.255         On-link      192.168.56.1    281
  255.255.255.255  255.255.255.255         On-link     10.70.101.129    291
===========================================================================
Persistent Routes:
  None

IPv6 Route Table
===========================================================================
Active Routes:
 If Metric Network Destination      Gateway
  1    331 ::1/128                  On-link
  1    331 ff00::/8                 On-link
===========================================================================
Persistent Routes:
  None

This has allowed access to Nginx web server fetching 192.168.55.2 from my desktop. I can also ping the IP address directly from my desktop (so outside of the cluster). I can access the nginx server fron the server (server2.domain.com) running the pod but not from my controller node (server1.domain.com):

[root@server1 ~]# ping -c 1 192.168.55.2
PING 192.168.55.2 (192.168.55.2) 56(84) bytes of data.
From 192.168.55.1 icmp_seq=1 Destination Host Unreachable

--- 192.168.55.2 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
[root@server1 ~]# curl http://192.168.55.2
curl: (7) Failed to connect to 192.168.55.2 port 80: No route to host
[root@server2 ~]# ping -c 1 192.168.55.2
PING 192.168.55.2 (192.168.55.2) 56(84) bytes of data.
64 bytes from 192.168.55.2: icmp_seq=1 ttl=64 time=0.079 ms

--- 192.168.55.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.079/0.079/0.079/0.000 ms
[root@server2 ~]# curl 192.168.55.2
.
.
.
kubernetes05
kubernetes05

One option to access from outside the cluster is to enable port forwarding with kubectl port-forward command, so using something like:

[root@server1 ~]# kubectl get pods -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP             NODE                 NOMINATED NODE   READINESS GATES
nginx-6799fc88d8-zn2dm   1/1     Running   0          3h36m   192.168.55.2   server2.domain.com              
[root@server1 ~]# kubectl port-forward pod/nginx-6799fc88d8-zn2dm 8080:80
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
[root@server1 ~]# kubectl port-forward pod/nginx-6799fc88d8-zn2dm :80
Forwarding from 127.0.0.1:40859 -> 80
Forwarding from [::1]:40859 -> 80
[root@server1 ~]# kubectl port-forward --address 0.0.0.0 pod/nginx-6799fc88d8-zn2dm 8080:80
Forwarding from 0.0.0.0:8080 -> 80

With last command on any IP address on controller node (server1.domain.com) and port 8080 I can access to my Nginx server:

kubernetes06
kubernetes06

Overall this part is not fully clear and I really need to progress on this Kubernetes area…

How to scale a pod with ReplicaSet

For example if I scale my Nginx application to 3 pods:

[root@server1 ~]# kubectl get deployment
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           3d5h
[root@server1 ~]# kubectl scale --replicas=3 deployment/nginx
deployment.apps/nginx scaled
[root@server1 ~]# kubectl get deployment
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/3     3            1           3d5h
[root@server1 ~]# kubectl get deployment
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   3/3     3            3           3d5h
[root@server1 ~]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS        AGE    IP             NODE                 NOMINATED NODE   READINESS GATES
nginx-6799fc88d8-g8v2v   1/1     Running   0               20s    192.168.55.4   server2.domain.com              
nginx-6799fc88d8-zhhsn   1/1     Running   0               20s    192.168.55.5   server2.domain.com              
nginx-6799fc88d8-zn2dm   1/1     Running   1 (2d23h ago)   3d5h   192.168.55.3   server2.domain.com              

If I delete (or kill) one pod then automatically a new one is created:

[root@server1 ~]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS        AGE     IP             NODE                 NOMINATED NODE   READINESS GATES
nginx-6799fc88d8-g8v2v   1/1     Running   0               6m16s   192.168.55.4   server2.domain.com              
nginx-6799fc88d8-zhhsn   1/1     Running   0               6m16s   192.168.55.5   server2.domain.com              
nginx-6799fc88d8-zn2dm   1/1     Running   1 (2d23h ago)   3d5h    192.168.55.3   server2.domain.com              
[root@server1 ~]# kubectl delete pod nginx-6799fc88d8-zhhsn
pod "nginx-6799fc88d8-zhhsn" deleted
[root@server1 ~]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS        AGE     IP             NODE                 NOMINATED NODE   READINESS GATES
nginx-6799fc88d8-9pbcf   1/1     Running   0               7s      192.168.55.6   server2.domain.com              
nginx-6799fc88d8-g8v2v   1/1     Running   0               6m40s   192.168.55.4   server2.domain.com              
nginx-6799fc88d8-zn2dm   1/1     Running   1 (2d23h ago)   3d5h    192.168.55.3   server2.domain.com              

To see if my pod would also go on my controller node I have authorized pod creation on it because by default it’s forbidden:

[root@server1 ~]# kubectl taint nodes --all node-role.kubernetes.io/master-
node/server1.domain.com untainted
error: taint "node-role.kubernetes.io/master" not found

To be honest none of my pods went to my master node and if you dig a bit on Internet you will see that the pod allocation on nodes is a recurreing issue for poeple. To come back to original situation simply do:

[root@server1 ~]# kubectl taint nodes server1.domain.com node-role.kubernetes.io/master=:NoSchedule
node/server1.domain.com tainted

Move pods from one node to another

Once I have added this additional node I wanted to move out from my control node the Nginx pod we have created above. I thought it would be a simple command but I was highly wrong… Currently it is not possible to move on-the-fly pods from one node to another. The only available option is to re-schedule the pod and use node affinity, with labels, to force a pod running on a node.

I’m not yet at this level but I have seen plenty of blog posts of people complaining that in their cluster some nodes are highly used while some others are almost idle and they have no option to solve the situation…

How to sanitize your deployment YAML files

To create a deployment or a pod or to modify existing resources you often create a YAML file of an existing resource. You can also create this YAML file directly from scratch and the official k8s documentation is full of example. One issue I have immediately see is the verbosity of the YAML file generated from an existing resource with a command like:

[root@server1 ~]# kubectl get deployment
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
httpd   1/1     1            1           21h
nginx   1/1     1            1           6d22h
[root@server1 ~]# kubectl get deployment httpd -o yaml

When you have done it you realize that you need a tool to sanitize those generated YAML file because they are really far from lean file we see in official documentation. One tool that often came in discussions is kubectl-neat. To implement it first start by installing krew with (git is a prerequisite):

[root@server1 ~]# (
>   set -x; cd "$(mktemp -d)" &&
>   OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
>   ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
>   KREW="krew-${OS}_${ARCH}" &&
>   curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
>   tar zxvf "${KREW}.tar.gz" &&
>   ./"${KREW}" install krew
> )
++ mktemp -d
+ cd /tmp/tmp.mjm5SmGWMR
++ uname
++ tr '[:upper:]' '[:lower:]'
+ OS=linux
++ uname -m
++ sed -e s/x86_64/amd64/ -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/'
+ ARCH=amd64
+ KREW=krew-linux_amd64
+ curl -fsSLO https://github.com/kubernetes-sigs/krew/releases/latest/download/krew-linux_amd64.tar.gz
+ tar zxvf krew-linux_amd64.tar.gz
./LICENSE
./krew-linux_amd64
+ ./krew-linux_amd64 install krew
Adding "default" plugin index from https://github.com/kubernetes-sigs/krew-index.git.
Updated the local copy of plugin index.
Installing plugin: krew
Installed plugin: krew
\
 | Use this plugin:
 |      kubectl krew
 | Documentation:
 |      https://krew.sigs.k8s.io/
 | Caveats:
 | \
 |  | krew is now installed! To start using kubectl plugins, you need to add
 |  | krew's installation directory to your PATH:
 |  |
 |  |   * macOS/Linux:
 |  |     - Add the following to your ~/.bashrc or ~/.zshrc:
 |  |         export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
 |  |     - Restart your shell.
 |  |
 |  |   * Windows: Add %USERPROFILE%\.krew\bin to your PATH environment variable
 |  |
 |  | To list krew commands and to get help, run:
 |  |   $ kubectl krew
 |  | For a full list of available plugins, run:
 |  |   $ kubectl krew search
 |  |
 |  | You can find documentation at
 |  |   https://krew.sigs.k8s.io/docs/user-guide/quickstart/.
 | /
/

Control it works with:

[root@server1 ~]# export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
[root@server1 ~]# kubectl krew
krew is the kubectl plugin manager.
You can invoke krew through kubectl: "kubectl krew [command]..."

Usage:
  kubectl krew [command]

Available Commands:
  completion  generate the autocompletion script for the specified shell
  help        Help about any command
  index       Manage custom plugin indexes
  info        Show information about an available plugin
  install     Install kubectl plugins
  list        List installed kubectl plugins
  search      Discover kubectl plugins
  uninstall   Uninstall plugins
  update      Update the local copy of the plugin index
  upgrade     Upgrade installed plugins to newer versions
  version     Show krew version and diagnostics

Flags:
  -h, --help      help for krew
  -v, --v Level   number for the log level verbosity

Use "kubectl krew [command] --help" for more information about a command.

Install kubectl-neat with:

[root@server1 ~]# kubectl krew install neat
Updated the local copy of plugin index.
Installing plugin: neat
Installed plugin: neat
\
 | Use this plugin:
 |      kubectl neat
 | Documentation:
 |      https://github.com/itaysk/kubectl-neat
/
WARNING: You installed plugin "neat" from the krew-index plugin repository.
   These plugins are not audited for security by the Krew maintainers.
   Run them at your own risk.

Then after if you would like to get a skeleton of a pod to create a similar one or simply extract a clean yaml pod file from a running pod to modify it you would do something like:

[root@server1 ~]# kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
httpd-757fb56c8d-7cdj5   1/1     Running   0          20h
nginx-6799fc88d8-xg5kd   1/1     Running   0          21h
httpd-757fb56c8d-7cdj5   1/1     Running   0          20h
[root@server1 ~]# kubectl get pod httpd-757fb56c8d-7cdj5 -o yaml

The problem is that this extracted yaml file contains a lot extra information, to remove all this redundant extra information use kubectl-neat with something like:

[root@server1 ~]# kubectl get pod httpd-757fb56c8d-7cdj5 -o yaml | kubectl neat
[root@server1 ~]# kubectl neat get pod httpd-757fb56c8d-7cdj5 -o yaml

Useful commands

Access a pod:

[root@server1 ~]# kubectl exec -it nginx-6799fc88d8-tdh4p -- /bin/bash
root@nginx-6799fc88d8-tdh4p:/# ls -l /usr/share/nginx/html
total 8
-rw-r--r-- 1 root root 494 Jul  6 14:59 50x.html
-rw-r--r-- 1 root root 612 Jul  6 14:59 index.html

Delete a deployment:

[root@server1 ~]# kubectl get deployment
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           3d5h
[root@server1 ~]# kubectl delete deployment nginx
deployment.apps "nginx" deleted

Print the supported API resources on the server:

[root@server1 ~]# kubectl api-resources

References

The post Kubernetes on virtual machines hands-on – part 2 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-2.html/feed 1
Kubernetes on virtual machines hands-on – part 1 https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-1.html https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-1.html#comments Thu, 28 Apr 2022 06:55:50 +0000 https://blog.yannickjaquier.com/?p=5267 Preamble Before describing Kubernetes (K8s for short) let’s briefly see what is container technology. Container is similar to virtualization technology, such a VMWare, but they are lightweight as the underlining operating system is shared amongst containers. Same as VM resource allocation (filesystem, CPU, memory, ..) is controlled to avoid having one container eating all resources […]

The post Kubernetes on virtual machines hands-on – part 1 appeared first on IT World.

]]>

Table of contents

Preamble

Before describing Kubernetes (K8s for short) let’s briefly see what is container technology. Container is similar to virtualization technology, such a VMWare, but they are lightweight as the underlining operating system is shared amongst containers. Same as VM resource allocation (filesystem, CPU, memory, ..) is controlled to avoid having one container eating all resources and impacting the others. Containers are also portable across OS and different Cloud providers.

Container technology is not something new and I have personally started to hear a lot about container with the rise of Docker. Even if nowadays you have plenty of products that compete with Docker: Podman, containerd and CRI-O to name a few. Even if you have not created containers on your own you might have tested an application that was containerized and if you had the underlining infrastructure you have experienced how easy it is to deploy a container and to use the application almost immediately without the burden of configuration or so.

Once you have all those applicative containers running how do you manage them ? Kubernetes of course ! Kubernetes is a open source platform to manage containerized workloads and services. Example of tasks handle by Kubernetes are scaling, managing downtime and much more…

My ultimate goal, as you might guess, is to create container running a database (I have already tested SQL Server in container) and to create what we call stateful container. It means that the container has persistent storage, yes of course you do not want to lose the content of your database in case of a container crash. This fist article will focus only on stateless container that is typically a web server where you do not mind of loosing the content…

For my testing I have used two virtual machines (with VirtualBox) running Oracle Linux Server release 8.4. Kubernetes version is 1.22.2 and Docker is 20.10.9.

This first part will be “just” about creating the cluster with the main first master node as well as adding a second worker node to handle workload…

Kubernetes installation

I have used a virtual machine under VirtualBox running Oracle Linux 8.4. One important configuration is to activate virtualization feature and nested virtualization feature (not available since long on VirtualBox) with:

kubernetes01
kubernetes01

You can confirm nested virtualization feature is active with:

[root@server1 ~]# grep -E --color 'vmx|svm' /proc/cpuinfo
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm invpcid_single pti tpr_shadow vnmi flexpriority vpid fsgsbase avx2 invpcid md_clear flush_l1d
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm invpcid_single pti tpr_shadow vnmi flexpriority vpid fsgsbase avx2 invpcid md_clear flush_l1d

Add the Kubernetes repository with:

[root@server1 ~]# cat << EOF > /etc/yum.repos.d/kubernetes.repo
  > [kubernetes]
  > name=Kubernetes
  > baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
  > enabled=1
  > gpgcheck=1
  > repo_gpgcheck=1
  > gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
  > EOF

To download packages and if, like me, you are behind a corporate proxy configura it with a .wgetrc file in the home folder of your account (root in my case). I have also added check_certificate=off to free me from certificate that are not verified by my proxy server (the literature is either mentioning check-certificate or check_certificate so I guess both are working):

[root@server1 ~]# cat .wgetrc
use_proxy=on
https_proxy=http://proxy_server:proxy_port/
http_proxy=http://proxy_server:proxy_por/
proxy_user=proxy_account
proxy_password=proxy_password
check_certificate=off
no_proxy=192.168.0.0/16,10.0.0.0/8

You also need to configure dnf package manager to go though your proxy by adding in /etc/dnf/dnf.conf file:

# The proxy server - proxy server:port number
proxy=http://proxy_server:proxy_port
# The account details for yum connections
proxy_username=proxy_account
proxy_password=proxy_password
sslverify=False

Starting from the Kubernetes getting started web page I have asked myself where to start. I have initially tried with minikube but if like me you really start with your own virtual machine and plan to add one or more worker to your Kubernetes instance then it’s a cluster managed by yourself and you should use kubeadm ! Install kubctl, kubeadm and kubelet with:

[root@server1 ~]# dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

Enable kubelet service with:

[root@server1 ~]# systemctl enable kubelet.service
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /usr/lib/systemd/system/kubelet.service.

Deactivate swap as requested:

[root@server1 ~]# cat /etc/fstab | grep swap
#/dev/mapper/vg00-swap   swap                    swap    defaults        0 0
[root@server1 ~]# swapoff -a
[root@server1 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:          7.6Gi       311Mi       5.8Gi       8.0Mi       1.5Gi       7.0Gi
Swap:            0B          0B          0B

Network prerequisites for container runtime:

[root@server1 ~]# cat /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
[root@server1 ~]# sysctl --system
[root@server1 ~]# cat /etc/modules-load.d/k8s.conf
overlay
br_netfilter
[root@server1 ~]# modprobe overlay
[root@server1 ~]# modprobe br_netfilter

SELinux already deactivated on all my nodes…

I have also tried to choose a different container runtime than Docker. I have tried using Podamn (that I already widely use) and its podam-docker package to have Docker-like commands. Never been able to configure my Kubernetes cluster. I have also trie only with containerd, same story all trials went wrong.

Few errors encountered and their solution:

  1. Podman even with Docker compatible package is not really compatible with Kubernetes, particularly for the service that is not there. So use command line option to pass false errors: –ignore-preflight-errors=all or to fine tune –ignore-preflight-errors IsDockerSystemdCheck,SystemVerification,Service-Docker
  2. [server.go:629] “Failed to get the kubelet’s cgroup. Kubelet system container metrics may be missing.” err=”cpu and memory cgroup hierarchy not unified.
    I solved this one by installing libcgroup with dnf install libcgroup
  3. [WARNING HTTPProxy]: Connection to “https://192.168.56.101” uses proxy “http://proxy_user:proxy_password@proxy_server:proxy_port”. If that is not intended, adjust your proxy settings
    I solved it by exporting this variable with export NO_PROXY=192.168.0.0/16,10.0.0.0/8
  4. [WARNING FileExisting-tc]: tc not found in system path
    Solved this one by installation iproute-tc with dnf install iproute-tc

At each run to purge previous configuration file use:

[root@server1 ~]# kubeadm reset

Sometime it worked but when looking in log files I had many error, particularly with kubelet service (if you have not yet played with kubeadm command the kubelet service is simply in activating (auto-restart)):

[root@server1 ~]# vi /var/log/messages
[root@server1 ~]# systemctl status kubelet
[root@server1 ~]# journalctl -xeu kubelet

Finally installed docker-ce, to be honest all had work like a charm at this point:

[root@server1 ~]# dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
Adding repo from: https://download.docker.com/linux/centos/docker-ce.repo
[root@server1 ~]# dnf remove -y buildah runc
[root@server1 ~]# dnf install -y containerd.io docker-ce
[root@server1 ~]# systemctl enable docker
[root@server1 ~]# systemctl daemon-reload
[root@server1 ~]# systemctl restart docker

And configured docker for proxy access and to use systemd for the management of the container’s cgroups:

[root@server1 ~]# cat << EOF > /etc/systemd/system/docker.service.d/https-proxy.conf
> [Service]
> Environment="HTTPS_PROXY=http://proxy_account:proxy_password@proxy_server:proxy_port"
EOF

[root@server1 ~]# mkdir /etc/docker
[root@server1 ~]# cat << EOF > /etc/docker/daemon.json
> {
>   "exec-opts": ["native.cgroupdriver=systemd"],
>   "log-driver": "json-file",
>   "log-opts": {
>     "max-size": "100m"
>   },
>   "storage-driver": "overlay2"
> }
EOF
[root@server1 ~]# systemctl daemon-reload
[root@server1 ~]# systemctl restart docker

I have anyway benefited from my previous trials and end-up with this kubeadm init command (I have decided to use a different subnet for the pod network 192.168.55.0/24):

[root@server1 ~]# export HTTPS_PROXY='http://proxy_account:proxy_password@proxy_server:proxy_port'
[root@server1 ~]# export NO_PROXY=192.168.0.0/16,10.0.0.0/8
[root@server1 ~]# kubeadm init --apiserver-advertise-address 192.168.56.101 --pod-network-cidr 192.168.55.0/24
[init] Using Kubernetes version: v1.22.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local server1.domain.com] and IPs [10.96.0.1 192.168.56.101]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost server1.domain.com] and IPs [192.168.56.101 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost server1.domain.com] and IPs [192.168.56.101 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 24.005381 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node server1.domain.com as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node server1.domain.com as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: wa7cwx.pml1pqv2i9tnhqkf
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.56.101:6443 --token wa7cwx.pml1pqv2i9tnhqkf \
        --discovery-token-ca-cert-hash sha256:83352a4e1e18e59b4e5a453c1d8573b1fcd718982e0e398741d9182a966472fa

Remark:
If you have lost the join command you can regenerate it with:

[root@server1 ~]# kubeadm token create --print-join-command --v=5

When required to install a Pod network addon I have chosen Flannel. To be able to download from Internet you might be needed to export your proxy configuration if like me you have a corporate proxy (KUBECONFIG cat be put directly in the profile of your root account):

[root@server1 ~]# export KUBECONFIG=/etc/kubernetes/admin.conf
[root@server1 ~]# export HTTPS_PROXY='http://proxy_account:proxy_password@proxy_server:proxy_port'
[root@server1 ~]# export NO_PROXY=192.168.0.0/16,10.0.0.0/8
[root@server1 ~]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
[root@server1 ~]# kubectl cluster-info
Kubernetes control plane is running at https://192.168.56.101:6443
CoreDNS is running at https://192.168.56.101:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root@server1 ~]# kubectl get nodes
NAME                 STATUS   ROLES                  AGE   VERSION
server1.domain.com   Ready    control-plane,master   82m   v1.21.3
[root@server1 ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                         READY   STATUS    RESTARTS   AGE
kube-system   coredns-78fcd69978-8mtv9                     1/1     Running   0          87s
kube-system   coredns-78fcd69978-d5vpz                     1/1     Running   0          87s
kube-system   etcd-server1.domain.com                      1/1     Running   2          98s
kube-system   kube-apiserver-server1.domain.com            1/1     Running   0          103s
kube-system   kube-controller-manager-server1.domain.com   1/1     Running   0          98s
kube-system   kube-flannel-ds-4lpk4                        1/1     Running   0          33s
kube-system   kube-proxy-9c2pr                             1/1     Running   0          87s
kube-system   kube-scheduler-server1.domain.com            1/1     Running   0          98s

Kubernetes Web UI dashboard

For the Kubernetes list of addons I have chosen to install the web UI dashboard as it always help to have a graphical interface to manage things even if at the end of day you mainly work command line:

[root@server1 ~]# kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.3.1/aio/deploy/recommended.yaml
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
Warning: spec.template.metadata.annotations[seccomp.security.alpha.kubernetes.io/pod]: deprecated since v1.19; use the "seccompProfile" field instead
deployment.apps/dashboard-metrics-scraper created

I had an issue and the pods did not come up, so investigated with:

[root@server1 ~]# kubectl describe pods --namespace=kubernetes-dashboard
Name:         dashboard-metrics-scraper-856586f554-lj429
Namespace:    kubernetes-dashboard
Priority:     0
Node:         server1.domain.com/192.168.56.101
Start Time:   Mon, 26 Jul 2021 15:53:35 +0200
Labels:       k8s-app=dashboard-metrics-scraper
              pod-template-hash=856586f554
.
.
.
Events:
  Type     Reason   Age                    From     Message
  ----     ------   ----                   ----     -------
  Warning  Failed   10m (x19 over 104m)    kubelet  Failed to pull image "kubernetesui/dashboard:v2.3.1": rpc error: code = Unknown desc = context canceled
  Normal   Pulling  5m19s (x20 over 105m)  kubelet  Pulling image "kubernetesui/dashboard:v2.3.1"
  Normal   BackOff  26s (x355 over 104m)   kubelet  Back-off pulling image "kubernetesui/dashboard:v2.3.1"

And pulled the Docker image manually with:

[root@server1 ~]# docker pull kubernetesui/dashboard:v2.3.1
v2.3.1: Pulling from kubernetesui/dashboard
b82bd84ec244: Pull complete
21c9e94e8195: Pull complete
Digest: sha256:ec27f462cf1946220f5a9ace416a84a57c18f98c777876a8054405d1428cc92e
Status: Downloaded newer image for kubernetesui/dashboard:v2.3.1
docker.io/kubernetesui/dashboard:v2.3.1

Finally the dashboard pods were up and running:

[root@server1 ~]# kubectl get pods --namespace=kubernetes-dashboard
NAME                                         READY   STATUS    RESTARTS   AGE
dashboard-metrics-scraper-856586f554-lj429   1/1     Running   1          2d21h
kubernetes-dashboard-67484c44f6-h6zwl        1/1     Running   1          2d21h

Then accessing this dashboard from remote (my desktop running VirtualBox) has not been that simple. Firstly, as explained in official documentation, I have used kubectl -n kubernetes-dashboard edit service kubernetes-dashboard command to expose an external port (valid only in a development spirit) and found it with:

[root@server1 ~]# kubectl -n kubernetes-dashboard get pod
NAME                                         READY   STATUS    RESTARTS   AGE
dashboard-metrics-scraper-856586f554-lj429   1/1     Running   1          2d20h
kubernetes-dashboard-67484c44f6-h6zwl        1/1     Running   1          2d20h
[root@server1 ~]# kubectl -n kubernetes-dashboard get service kubernetes-dashboard
NAME                   TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
kubernetes-dashboard   NodePort   10.110.133.71           443:30736/TCP   2d20h

Accessing this url I get the login url, not in the form of user/account unfortunately so a bit more of work is required:

kubernetes02
kubernetes02

So let’s create this account to get its token and login:

[root@server1 ~]# cat << EOF > dashboard-adminuser.yaml
> apiVersion: v1
> kind: ServiceAccount
> metadata:
>   name: admin-user
>   namespace: kubernetes-dashboard
> EOF
[root@server1 ~]# kubectl apply -f dashboard-adminuser.yaml
serviceaccount/admin-user created
[root@server1 ~]# cat << EOF > dashboard-authorization-adminuser.yaml
> apiVersion: rbac.authorization.k8s.io/v1
> kind: ClusterRoleBinding
> metadata:
>   name: admin-user
> roleRef:
>   apiGroup: rbac.authorization.k8s.io
>   kind: ClusterRole
>   name: cluster-admin
> subjects:
> - kind: ServiceAccount
>   name: admin-user
>   namespace: kubernetes-dashboard
> EOF
[root@server1 ~]# kubectl apply -f dashboard-authorization-adminuser.yaml
clusterrolebinding.rbac.authorization.k8s.io/admin-user created
[root@server1 ~]# kubectl get serviceaccounts --namespace=kubernetes-dashboard
NAME                   SECRETS   AGE
admin-user             1         6m3s
default                1         2d22h
kubernetes-dashboard   1         2d22h
[root@server1 ~]# kubectl -n kubernetes-dashboard get secret $(kubectl -n kubernetes-dashboard get sa/admin-user -o jsonpath="{.secrets[0].name}") -o go-template="{{.data.token | base64decode}}"
eyJhbGciOiJSUzI1NiIsImtpZCI6IkhmUkV4c1BvSmZwSGZrdk5RdEw2LXBZYklEUWdTREpHZENXclBpRnktSEEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi11c2VyLXRva2VuLWhkZ242Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFkbWluLXVzZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI3ZTA3ZTg3NS1lYzU1LTQ1YTEtODUwNy1hNzRlOWJjMjQ4M2MiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZXJuZXRlcy1kYXNoYm9hcmQ6YWRtaW4tdXNlciJ9.s79nHkqPi8wI3R131vKbvxzLW-N5Th6dsdEvQ8oCh31xIyjh5eOCWTFuG4Jyqra02Uu8CeHThh2SyjyRvJcMy948Oah1SIzyTmGwTxzOO0_hyYDNKCRlSYFbKqMqqKoGlaFoqTObi0-wYzgjrMmrIRMt6JYkm05fQgMVYaXBlUIZMbCx3uhBQKyZ270YQe5os1E_6yhNjI30w2SvpG6aVcrr1pDC-wT7aizJ42_oHx0ZB2REOcJhdUII1nCwF6Cd-kbfMN_kqkxLhi5AIHWGWINDoSAR89jR8-DVmd_ttG9Ou5dhiQ4anXYwcF3BhzQsZdZsY8aoEwxni-aLK9DqXQ

By suppling the token I am finally able to connect:

kubernetes03
kubernetes03

Add a node to a Kubernetes cluster

So far I have only the control-plane node in my cluster:

[root@server1 ~]# kubectl get nodes
NAME                 STATUS   ROLES                  AGE    VERSION
server1.domain.com   Ready    control-plane,master   3d1h   v1.21.3

on my second node (server2.domain.com) I configure the Kubernetes, Docker repository and I install the same packages as on server1.domain.com. I have also obviously configured all the operating system requirements same as server1.domain.com. Finally issue the suggested kubeadm command:

[root@server2 ~]# export HTTPS_PROXY='http://proxy_account:proxy_password@proxy_server:proxy_port'
[root@server2 ~]# export NO_PROXY=192.168.0.0/16,10.0.0.0/8
[root@server2 ~]# kubeadm join 192.168.56.101:6443 --token bbvzqr.z1201gns44iewbo8 --discovery-token-ca-cert-hash sha256:f8dbf9a512fe242b8b818b6528a43285ad8fc41612502a968a09907b8e5e78e7
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "bbvzqr"
To see the stack trace of this error execute with --v=5 or higher

Unfortunately my token expired (by default, tokens expire after 24 hours) so had to recreate a new one on control-plane node:

[root@server1 ~]# kubeadm token list
[root@server1 ~]# kubeadm token create
w84q7r.6v9kttkhvj34mco5
[root@server1 ~]# kubeadm token list
TOKEN                     TTL         EXPIRES                     USAGES                   DESCRIPTION                                                EXTRA GROUPS
w84q7r.6v9kttkhvj34mco5   23h         2021-07-30T17:58:10+02:00   authentication,signing                                                        system:bootstrappers:kubeadm:default-node-token
[root@server1 ~]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
f8dbf9a512fe242b8b818b6528a43285ad8fc41612502a968a09907b8e5e78e7

And this time it went well:

[root@server2 ~]# kubeadm join 192.168.56.101:6443 --token w84q7r.6v9kttkhvj34mco5 --discovery-token-ca-cert-hash sha256:f8dbf9a512fe242b8b818b6528a43285ad8fc41612502a968a09907b8e5e78e7
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

Node added and role has to be set manually from what I have read:

[root@server1 ~]# kubectl get nodes
NAME                 STATUS     ROLES                  AGE    VERSION
server1.domain.com   Ready      control-plane,master   3d2h   v1.21.3
server2.domain.com   NotReady                    35s    v1.21.3

Or more verbose for nodes labels:

[root@server1 ~]# kubectl get nodes --show-labels
NAME                 STATUS   ROLES                  AGE     VERSION   LABELS
server1.domain.com   Ready    control-plane,master   3d22h   v1.21.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=server1.domain.com,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
server2.domain.com   Ready                     20h     v1.21.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=server2.domain.com,kubernetes.io/os=linux

To change the label of freshly added node to worker (from what I have read there is no naming convention for nodes labels):

[root@server1 ~]# kubectl label node server2.domain.com node-role.kubernetes.io/worker=
node/server2.domain.com labeled
[root@server1 ~]# kubectl get nodes
NAME                 STATUS   ROLES                  AGE     VERSION
server1.domain.com   Ready    control-plane,master   3d22h   v1.21.3
server2.domain.com   Ready    worker                 20h     v1.21.3
[root@server1 ~]# kubectl describe node server2.domain.com
Name:               server2.domain.com
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=server2.domain.com
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
.
.
.

Lots of system pods have been added:

[root@server1 ~]# kubectl get pods --all-namespaces -o wide
NAMESPACE              NAME                                         READY   STATUS              RESTARTS         AGE    IP               NODE                 NOMINATED NODE   READINESS GATES
default                nginx-6799fc88d8-jk727                       0/1     ContainerCreating   0                19m               server2.domain.com              
kube-system            coredns-78fcd69978-8mtv9                     1/1     Running             0                111m   192.168.55.3     server1.domain.com              
kube-system            coredns-78fcd69978-d5vpz                     1/1     Running             0                111m   192.168.55.2     server1.domain.com              
kube-system            etcd-server1.domain.com                      1/1     Running             2                111m   192.168.56.101   server1.domain.com              
kube-system            kube-apiserver-server1.domain.com            1/1     Running             0                111m   192.168.56.101   server1.domain.com              
kube-system            kube-controller-manager-server1.domain.com   1/1     Running             0                111m   192.168.56.101   server1.domain.com              
kube-system            kube-flannel-ds-2922x                        0/1     CrashLoopBackOff    21 (2m51s ago)   87m    192.168.56.102   server2.domain.com              
kube-system            kube-flannel-ds-4lpk4                        1/1     Running             0                110m   192.168.56.101   server1.domain.com              
kube-system            kube-proxy-9c2pr                             1/1     Running             0                111m   192.168.56.101   server1.domain.com              
kube-system            kube-proxy-9p268                             1/1     Running             0                87m    192.168.56.102   server2.domain.com              
kube-system            kube-scheduler-server1.domain.com            1/1     Running             0                111m   192.168.56.101   server1.domain.com              
kubernetes-dashboard   dashboard-metrics-scraper-856586f554-mwwzw   1/1     Running             0                109m   192.168.55.5     server1.domain.com              
kubernetes-dashboard   kubernetes-dashboard-67484c44f6-ggvj8        1/1     Running             0                109m   192.168.55.4     server1.domain.com              

The Flannel container is not able to start on newly added node, this is apparently a bug:

[root@server1 ~]# kubectl -n kube-system  logs -p kube-flannel-ds-7hr6x
I1008 09:59:16.295218       1 main.go:520] Determining IP address of default interface
I1008 09:59:16.296819       1 main.go:533] Using interface with name enp0s8 and address 10.70.101.44
I1008 09:59:16.296883       1 main.go:550] Defaulting external address to interface address (10.70.101.44)
W1008 09:59:16.296945       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1008 09:59:17.400125       1 kube.go:116] Waiting 10m0s for node controller to sync
I1008 09:59:17.400262       1 kube.go:299] Starting kube subnet manager
I1008 09:59:18.400588       1 kube.go:123] Node controller sync successful
I1008 09:59:18.400644       1 main.go:254] Created subnet manager: Kubernetes Subnet Manager - server2.domain.com
I1008 09:59:18.400670       1 main.go:257] Installing signal handlers
I1008 09:59:18.401529       1 main.go:392] Found network config - Backend type: vxlan
I1008 09:59:18.401704       1 vxlan.go:123] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E1008 09:59:18.402563       1 main.go:293] Error registering network: failed to acquire lease: node "server2.domain.com" pod cidr not assigned
I1008 09:59:18.403201       1 main.go:372] Stopping shutdownHandler...

Solved it with:

[root@server1 ~]# kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'
192.168.55.0/24
[root@server1 ~]# kubectl patch node server2.domain.com -p '{"spec":{"podCIDR":"192.168.55.0/24"}}'
node/server2.domain.com patched
[root@server1 ~]# kubectl get pods --all-namespaces -o wide
NAMESPACE              NAME                                         READY   STATUS    RESTARTS      AGE   IP               NODE                 NOMINATED NODE   READINESS GATES
kube-system            coredns-78fcd69978-8mtv9                     1/1     Running   0             20h   192.168.55.3     server1.domain.com              
kube-system            coredns-78fcd69978-d5vpz                     1/1     Running   0             20h   192.168.55.2     server1.domain.com              
kube-system            etcd-server1.domain.com                      1/1     Running   2             20h   192.168.56.101   server1.domain.com              
kube-system            kube-apiserver-server1.domain.com            1/1     Running   0             20h   192.168.56.101   server1.domain.com              
kube-system            kube-controller-manager-server1.domain.com   1/1     Running   0             20h   192.168.56.101   server1.domain.com              
kube-system            kube-flannel-ds-4lpk4                        1/1     Running   0             20h   192.168.56.101   server1.domain.com              
kube-system            kube-flannel-ds-7hr6x                        1/1     Running   8 (11m ago)   22m   192.168.56.102   server2.domain.com              
kube-system            kube-proxy-86rws                             1/1     Running   0             22m   192.168.56.102   server2.domain.com              
kube-system            kube-proxy-9c2pr                             1/1     Running   0             20h   192.168.56.101   server1.domain.com              
kube-system            kube-scheduler-server1.domain.com            1/1     Running   0             20h   192.168.56.101   server1.domain.com              
kubernetes-dashboard   dashboard-metrics-scraper-856586f554-mwwzw   1/1     Running   0             20h   192.168.55.5     server1.domain.com              
kubernetes-dashboard   kubernetes-dashboard-67484c44f6-ggvj8        1/1     Running   0             20h   192.168.55.4     server1.domain.com              

To remove this node from the cluster use:

[root@server1 ~]# kubectl delete node server2.domain.com
node "server2.domain.com" deleted

Useful commands to debug container issues

List of commands that give containers status as well as container logs:

kubectl get pods --all-namespaces -o wide
kubectl -n kube-system  describe pod kube-flannel-ds-2922x
kubectl -n kube-system  logs -p kube-flannel-ds-2922x

References

The post Kubernetes on virtual machines hands-on – part 1 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/linux/kubernetes-on-virtual-machines-hands-on-part-1.html/feed 1