MariaDB ColumnStore installation and testing – part 1

Preamble

After I have seen the announcement of MariaDB Community saying that ColumnStore has been added as a pluggable storage engine for free I wanted to test it. I have anyway fighted with few changes in the way to install and configure from scratch a MariaDB server so decided to put a small chapter on this part.

This blog post has been written using Oracle Linux 8.2 (yeah I know MariaDB is not supported on this OS but it is really similar to RedHat and free) and MariaDB Community Server 10.5.4.

My first (failed) try has been using my own personalized installation of MariaDB and I have tried to add the ColumnStore storage engine inside, what I have done is presented in a second part and I might come back on it later on. To be honest at the time of writing this post the official documentation is too poor and I have not been able to conclude on this point. So decided to fall back to container implementation as MariaDB has done a lot in blog posts and webinars they created… I’m, of course, using the hype container engine called Podman.

I have also decided to make MariaDB my standard MySQL flavor and so will not use anymore the one coming from Oracle. The main reason being the open source strategy and the governance of MariaDB versus the one of Oracle corporation with MySQL. By the way many big players have already made this transition few years back (Wikipedia, Google, …).

MariaDB ColumnStore container installation and configuration with Podman

I start first by creating a dedicated LVM volume to store containers and images:

[root@server4 ~]# lvcreate -L 10g -n lvol20 vg00
  Logical volume "lvol20" created.
[root@server4 ~]# mkfs -t xfs /dev/vg00/lvol20
meta-data=/dev/vg00/lvol20       isize=512    agcount=4, agsize=655360 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=2621440, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@server4 containers]# grep containers /etc/fstab
/dev/mapper/vg00-lvol20   /var/lib/containers                    xfs    defaults        0 0

Then trying to download the official MariaDB ColumnStore image:

[root@server4 ~]# podman pull mariadb/columnstore
Trying to pull container-registry.oracle.com/mariadb/columnstore...
  Get https://container-registry.oracle.com/v2/: dial tcp: lookup container-registry.oracle.com on 164.129.154.205:53: no such host
Trying to pull docker.io/mariadb/columnstore...
  Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 164.129.154.205:53: no such host
Trying to pull registry.fedoraproject.org/mariadb/columnstore...
  Get https://registry.fedoraproject.org/v2/: dial tcp: lookup registry.fedoraproject.org on 164.129.154.205:53: no such host
Trying to pull quay.io/mariadb/columnstore...
  Get https://quay.io/v2/: dial tcp: lookup quay.io on 164.129.154.205:53: no such host
Trying to pull registry.centos.org/mariadb/columnstore...
  Get : dial tcp: lookup registry.centos.org on 164.129.154.205:53: no such host
Error: error pulling image "mariadb/columnstore": unable to pull mariadb/columnstore: 5 errors occurred:
        * Error initializing source docker://container-registry.oracle.com/mariadb/columnstore:latest: error pinging docker registry container-registry.oracle.com: Get https://container-registry.oracle.com/v2/: dial tcp: lookup container-registry.oracle.com on 164.129.154.205:53: no such host
        * Error initializing source docker://mariadb/columnstore:latest: error pinging docker registry registry-1.docker.io: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 164.129.154.205:53: no such host
        * Error initializing source docker://registry.fedoraproject.org/mariadb/columnstore:latest: error pinging docker registry registry.fedoraproject.org: Get https://registry.fedoraproject.org/v2/: dial tcp: lookup registry.fedoraproject.org on 164.129.154.205:53: no such host
        * Error initializing source docker://quay.io/mariadb/columnstore:latest: error pinging docker registry quay.io: Get https://quay.io/v2/: dial tcp: lookup quay.io on 164.129.154.205:53: no such host
        * Error initializing source docker://registry.centos.org/mariadb/columnstore:latest: error pinging docker registry registry.centos.org: Get : dial tcp: lookup registry.centos.org on 164.129.154.205:53: no such host

As suggested I had to configure my corporate proxy:

[root@server4 ~]# cat /etc/profile.d/http_proxy.sh
export HTTP_PROXY=http://proxy_account:proxy_password@proxy_serveur:proxy_port
export HTTPS_PROXY=http://proxy_account:proxy_password@proxy_serveur:proxy_port

Failed for a proxy certificate issue:

[root@server4 ~]# podman pull mariadb/columnstore
Trying to pull container-registry.oracle.com/mariadb/columnstore...
  Get https://container-registry.oracle.com/v2/: x509: certificate signed by unknown authority
Trying to pull docker.io/mariadb/columnstore...
  Get https://registry-1.docker.io/v2/: x509: certificate signed by unknown authority
Trying to pull registry.fedoraproject.org/mariadb/columnstore...
  manifest unknown: manifest unknown
Trying to pull quay.io/mariadb/columnstore...
  Get https://quay.io/v2/: x509: certificate signed by unknown authority
Trying to pull registry.centos.org/mariadb/columnstore...
  Get : x509: certificate signed by unknown authority
Error: error pulling image "mariadb/columnstore": unable to pull mariadb/columnstore: 5 errors occurred:
        * Error initializing source docker://container-registry.oracle.com/mariadb/columnstore:latest: error pinging docker registry container-registry.oracle.com: Get https://container-registry.oracle.com/v2/: x509: certificate signed by unknown authority
        * Error initializing source docker://mariadb/columnstore:latest: error pinging docker registry registry-1.docker.io: Get https://registry-1.docker.io/v2/: x509: certificate signed by unknown authority
        * Error initializing source docker://registry.fedoraproject.org/mariadb/columnstore:latest: Error reading manifest latest in registry.fedoraproject.org/mariadb/columnstore: manifest unknown: manifest unknown
        * Error initializing source docker://quay.io/mariadb/columnstore:latest: error pinging docker registry quay.io: Get https://quay.io/v2/: x509: certificate signed by unknown authority
        * Error initializing source docker://registry.centos.org/mariadb/columnstore:latest: error pinging docker registry registry.centos.org: Get : x509: certificate signed by unknown authority

Exported the one of my Windows/Chrome configuration:

columnstore01
columnstore01

And loaded it in my Linux guest (VirtualBox):

[root@server4 ~]# cp /tmp/zarootca.cer /etc/pki/ca-trust/source/anchors/
[root@server4 ~]# update-ca-trust extract

Went well this time:

[root@server4 ~]# podman pull mariadb/columnstore
Trying to pull container-registry.oracle.com/mariadb/columnstore...
  unable to retrieve auth token: invalid username/password: unauthorized: authentication required
Trying to pull docker.io/mariadb/columnstore...
Getting image source signatures
Copying blob 7361994e337a done
Copying blob 6910e5a164f7 done
Copying blob d3a9faedef9c done
Copying blob 09d6834f75a6 done
Copying blob 68e5e07852c8 done
Copying blob df75e1d0f89f done
Copying blob 026abfbced9b done
Copying blob 97d3b9b39f85 done
Copying blob ae7bd0c62cca done
Copying blob 4feabe6971fa done
Copying blob 3833a7277c1f done
Copying blob 97e0996c4e98 done
Copying config 5a61255d05 done
Writing manifest to image destination
Storing signatures
5a61255d059ff8e913b623218d59b45fcda11364676abcde26c188ab5248dec3

Simply create a new container (mcs_container) using this newly download image with:

[root@server4 ~]# podman run -d -p 3306:3306 --name mcs_container mariadb/columnstore
25809ac451884ab4753be0ed512a6fb08bc2d2c2a8f7384659a4281e2a2fa36d
[root@server4 ~]# podman ps -a
CONTAINER ID  IMAGE                                 COMMAND               CREATED        STATUS            PORTS                   NAMES
25809ac45188  docker.io/mariadb/columnstore:latest  /bin/sh -c column...  9 seconds ago  Up 6 seconds ago  0.0.0.0:3306->3306/tcp  mcs_container

Connect to it with:

[root@server4 ~]# podman exec -it mcs_container bash
[root@25809ac45188 /]# mariadb
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.5.4-MariaDB MariaDB Server
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> show engines;
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                                         | Transactions | XA   | Savepoints |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| Columnstore        | YES     | ColumnStore storage engine                                                                      | YES          | NO   | NO         |
| MRG_MyISAM         | YES     | Collection of identical MyISAM tables                                                           | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables                                       | NO           | NO   | NO         |
| Aria               | YES     | Crash-safe tables with MyISAM heritage. Used for internal temporary tables and privilege tables | NO           | NO   | NO         |
| MyISAM             | YES     | Non-transactional engine with good performance and small data footprint                         | NO           | NO   | NO         |
| SEQUENCE           | YES     | Generated tables filled with sequential values                                                  | YES          | NO   | YES        |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables                | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                                              | NO           | NO   | NO         |
| CSV                | YES     | Stores tables as CSV files                                                                      | NO           | NO   | NO         |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
9 rows in set (0.001 sec)

MariaDB ColumnStore default schema configuration

On MariaDB official Github I have downloaded a zip copy of their mariadb-columnstore-samples. Push it to your container with:

[root@server4 ~]# podman cp /tmp/mariadb-columnstore-samples-master.zip mcs_container:/tmp

In flights sub-directory the schema creation and loading is made of:

  • create_flights_db.sh
  • get_flight_data.sh
  • load_flight_data.sh

Creation went well but to get (on internet) the data I had (again) to configure my corporate proxy:

[root@25809ac45188 flights]# cat ~/.curlrc
proxy = proxy_serveur:proxy_port
proxy-user = "proxy_account:proxy_password"

I have also added -k option to curl in get_flight_data.sh to avoid certificate issue:

#!/bin/bash
#
# This script will remotely invoke the bureau of transportation statistics web form to retrieve data by month:
# https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time
# for the specific columns listed in the SQL and utilized by the sample schema.
 
mkdir -p data
for y in {2018..2018}; do
  for m in {1..12}; do
    yyyymm="$y-$(printf %02d $m)"
    echo "$yyyymm"
    curl -k -L -o data.zip -d "sqlstr=+SELECT+YEAR%2CMONTH%2CDAY_OF_MONTH%2CDAY_OF_WEEK%2CFL_DATE%2CCARRIER%2CTAIL_NUM%2CFL_NUM%2CORIGIN%2CDEST%2CCRS_DEP_TIME%2CDEP_TIME%2CDEP_DELAY%2CTAXI_OUT%2CWHEELS_OFF%2CWHEELS_ON%2CTAXI_IN%2CCRS_ARR_TIME%2CARR_TIME%2CARR_DELAY%2CCANCELLED%2CCANCELLATION_CODE%2CDIVERTED%2CCRS_ELAPSED_TIME%2CACTUAL_ELAPSED_TIME%2CAIR_TIME%2CDISTANCE%2CCARRIER_DELAY%2CWEATHER_DELAY%2CNAS_DELAY%2CSECURITY_DELAY%2CLATE_AIRCRAFT_DELAY+FROM++T_ONTIME+WHERE+Month+%3D$m+AND+YEAR%3D$y" https://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=236
    rm -f *.csv
    unzip data.zip
    rm -f data.zip
    mv *.csv $yyyymm.csv
    tail -n +2 $yyyymm.csv > data/$yyyymm.csv
    rm -f $yyyymm.csv
  done
done

Data download and loading went well and I end up the configuration by creating an account to access to figures from remote:

MariaDB [(none))]> grant all on *.* to 'yjaquier'@'%' identified by 'secure_password';
Query OK, 0 rows affected (0.001 sec)

Power BI Desktop configuration

I have obviously started by downloading and installing Power BI Desktop. I have also installed MariaDB Connector/ODBC (3.1.9).

Configure a User DSN in ODBC Data Sources (64 bits):

columnstore02
columnstore02

Supply the account and password we have just created above:

columnstore03
columnstore03

In Power BI choose an ODBC database connection and use the recently created User DSN:

columnstore04
columnstore04

Finally by using the few queries provide in MariaDB columnstore samples Github I have been able to make some graphics. Airports map:

columnstore05
columnstore05

Delay by airlines and by delay type:

columnstore06
columnstore06

Is MariaDB ColumnStore worth the effort ?

Is it really faster to use ColumnStore ? My Linux guest (VirtualBox 6.1.12) has 4 cores and 8GB of RAM, I’m also using VirtualBox Host I/ Cache (Sata 7200 RPM HDD) for the guest disk configuration.

In no way this is a benchmark but I really wanted to have a feeling on how much performance improvement this new columnar storage is delivering. I have not tuned any parameter from the official column store container from MariaDB (InnoDB buffer pool is 128MB).

I have just created a standard InnoDB flights2 table with exact same columns as flights table and fill it with:

MariaDB [flights]> insert into flights2 select * from flights;
Query OK, 7856869 rows affected (6 min 28.798 sec)

And used the airline_delay_types_by_year.sql script, I have created an InnoDB version using my flights2 table and got below result that is an average over five runs:

ColumnstoreInnoDB
1 minute 30 seconds2 minutes 30 seconds

References

About Post Author

Share the knowledge!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>