Table of contents
Preamble
After I have seen the announcement of MariaDB Community saying that ColumnStore has been added as a pluggable storage engine for free I wanted to test it. I have anyway fighted with few changes in the way to install and configure from scratch a MariaDB server so decided to put a small chapter on this part.
This blog post has been written using Oracle Linux 8.2 (yeah I know MariaDB is not supported on this OS but it is really similar to RedHat and free) and MariaDB Community Server 10.5.4.
My first (failed) try has been using my own personalized installation of MariaDB and I have tried to add the ColumnStore storage engine inside, what I have done is presented in a second part and I might come back on it later on. To be honest at the time of writing this post the official documentation is too poor and I have not been able to conclude on this point. So decided to fall back to container implementation as MariaDB has done a lot in blog posts and webinars they created… I’m, of course, using the hype container engine called Podman.
I have also decided to make MariaDB my standard MySQL flavor and so will not use anymore the one coming from Oracle. The main reason being the open source strategy and the governance of MariaDB versus the one of Oracle corporation with MySQL. By the way many big players have already made this transition few years back (Wikipedia, Google, …).
MariaDB ColumnStore container installation and configuration with Podman
I start first by creating a dedicated LVM volume to store containers and images:
[root@server4 ~]# lvcreate -L 10g -n lvol20 vg00 Logical volume "lvol20" created. |
[root@server4 ~]# mkfs -t xfs /dev/vg00/lvol20 meta-data=/dev/vg00/lvol20 isize=512 agcount=4, agsize=655360 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=2621440, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 |
[root@server4 containers]# grep containers /etc/fstab /dev/mapper/vg00-lvol20 /var/lib/containers xfs defaults 0 0 |
Then trying to download the official MariaDB ColumnStore image:
[root@server4 ~]# podman pull mariadb/columnstore Trying to pull container-registry.oracle.com/mariadb/columnstore... Get https://container-registry.oracle.com/v2/: dial tcp: lookup container-registry.oracle.com on 164.129.154.205:53: no such host Trying to pull docker.io/mariadb/columnstore... Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 164.129.154.205:53: no such host Trying to pull registry.fedoraproject.org/mariadb/columnstore... Get https://registry.fedoraproject.org/v2/: dial tcp: lookup registry.fedoraproject.org on 164.129.154.205:53: no such host Trying to pull quay.io/mariadb/columnstore... Get https://quay.io/v2/: dial tcp: lookup quay.io on 164.129.154.205:53: no such host Trying to pull registry.centos.org/mariadb/columnstore... Get : dial tcp: lookup registry.centos.org on 164.129.154.205:53: no such host Error: error pulling image "mariadb/columnstore": unable to pull mariadb/columnstore: 5 errors occurred: * Error initializing source docker://container-registry.oracle.com/mariadb/columnstore:latest: error pinging docker registry container-registry.oracle.com: Get https://container-registry.oracle.com/v2/: dial tcp: lookup container-registry.oracle.com on 164.129.154.205:53: no such host * Error initializing source docker://mariadb/columnstore:latest: error pinging docker registry registry-1.docker.io: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 164.129.154.205:53: no such host * Error initializing source docker://registry.fedoraproject.org/mariadb/columnstore:latest: error pinging docker registry registry.fedoraproject.org: Get https://registry.fedoraproject.org/v2/: dial tcp: lookup registry.fedoraproject.org on 164.129.154.205:53: no such host * Error initializing source docker://quay.io/mariadb/columnstore:latest: error pinging docker registry quay.io: Get https://quay.io/v2/: dial tcp: lookup quay.io on 164.129.154.205:53: no such host * Error initializing source docker://registry.centos.org/mariadb/columnstore:latest: error pinging docker registry registry.centos.org: Get : dial tcp: lookup registry.centos.org on 164.129.154.205:53: no such host |
As suggested I had to configure my corporate proxy:
[root@server4 ~]# cat /etc/profile.d/http_proxy.sh export HTTP_PROXY=http://proxy_account:proxy_password@proxy_serveur:proxy_port export HTTPS_PROXY=http://proxy_account:proxy_password@proxy_serveur:proxy_port |
Failed for a proxy certificate issue:
[root@server4 ~]# podman pull mariadb/columnstore Trying to pull container-registry.oracle.com/mariadb/columnstore... Get https://container-registry.oracle.com/v2/: x509: certificate signed by unknown authority Trying to pull docker.io/mariadb/columnstore... Get https://registry-1.docker.io/v2/: x509: certificate signed by unknown authority Trying to pull registry.fedoraproject.org/mariadb/columnstore... manifest unknown: manifest unknown Trying to pull quay.io/mariadb/columnstore... Get https://quay.io/v2/: x509: certificate signed by unknown authority Trying to pull registry.centos.org/mariadb/columnstore... Get : x509: certificate signed by unknown authority Error: error pulling image "mariadb/columnstore": unable to pull mariadb/columnstore: 5 errors occurred: * Error initializing source docker://container-registry.oracle.com/mariadb/columnstore:latest: error pinging docker registry container-registry.oracle.com: Get https://container-registry.oracle.com/v2/: x509: certificate signed by unknown authority * Error initializing source docker://mariadb/columnstore:latest: error pinging docker registry registry-1.docker.io: Get https://registry-1.docker.io/v2/: x509: certificate signed by unknown authority * Error initializing source docker://registry.fedoraproject.org/mariadb/columnstore:latest: Error reading manifest latest in registry.fedoraproject.org/mariadb/columnstore: manifest unknown: manifest unknown * Error initializing source docker://quay.io/mariadb/columnstore:latest: error pinging docker registry quay.io: Get https://quay.io/v2/: x509: certificate signed by unknown authority * Error initializing source docker://registry.centos.org/mariadb/columnstore:latest: error pinging docker registry registry.centos.org: Get : x509: certificate signed by unknown authority |
Exported the one of my Windows/Chrome configuration:
And loaded it in my Linux guest (VirtualBox):
[root@server4 ~]# cp /tmp/zarootca.cer /etc/pki/ca-trust/source/anchors/ [root@server4 ~]# update-ca-trust extract |
Went well this time:
[root@server4 ~]# podman pull mariadb/columnstore Trying to pull container-registry.oracle.com/mariadb/columnstore... unable to retrieve auth token: invalid username/password: unauthorized: authentication required Trying to pull docker.io/mariadb/columnstore... Getting image source signatures Copying blob 7361994e337a done Copying blob 6910e5a164f7 done Copying blob d3a9faedef9c done Copying blob 09d6834f75a6 done Copying blob 68e5e07852c8 done Copying blob df75e1d0f89f done Copying blob 026abfbced9b done Copying blob 97d3b9b39f85 done Copying blob ae7bd0c62cca done Copying blob 4feabe6971fa done Copying blob 3833a7277c1f done Copying blob 97e0996c4e98 done Copying config 5a61255d05 done Writing manifest to image destination Storing signatures 5a61255d059ff8e913b623218d59b45fcda11364676abcde26c188ab5248dec3 |
Simply create a new container (mcs_container) using this newly download image with:
[root@server4 ~]# podman run -d -p 3306:3306 --name mcs_container mariadb/columnstore 25809ac451884ab4753be0ed512a6fb08bc2d2c2a8f7384659a4281e2a2fa36d [root@server4 ~]# podman ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 25809ac45188 docker.io/mariadb/columnstore:latest /bin/sh -c column... 9 seconds ago Up 6 seconds ago 0.0.0.0:3306->3306/tcp mcs_container |
Connect to it with:
[root@server4 ~]# podman exec -it mcs_container bash [root@25809ac45188 /]# mariadb Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 4 Server version: 10.5.4-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> show engines; +--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+ | Engine | Support | Comment | Transactions | XA | Savepoints | +--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+ | Columnstore | YES | ColumnStore storage engine | YES | NO | NO | | MRG_MyISAM | YES | Collection of identical MyISAM tables | NO | NO | NO | | MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO | | Aria | YES | Crash-safe tables with MyISAM heritage. Used for internal temporary tables and privilege tables | NO | NO | NO | | MyISAM | YES | Non-transactional engine with good performance and small data footprint | NO | NO | NO | | SEQUENCE | YES | Generated tables filled with sequential values | YES | NO | YES | | InnoDB | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables | YES | YES | YES | | PERFORMANCE_SCHEMA | YES | Performance Schema | NO | NO | NO | | CSV | YES | Stores tables as CSV files | NO | NO | NO | +--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+ 9 rows in set (0.001 sec) |
MariaDB ColumnStore default schema configuration
On MariaDB official Github I have downloaded a zip copy of their mariadb-columnstore-samples. Push it to your container with:
[root@server4 ~]# podman cp /tmp/mariadb-columnstore-samples-master.zip mcs_container:/tmp |
In flights sub-directory the schema creation and loading is made of:
- create_flights_db.sh
- get_flight_data.sh
- load_flight_data.sh
Creation went well but to get (on internet) the data I had (again) to configure my corporate proxy:
[root@25809ac45188 flights]# cat ~/.curlrc proxy = proxy_serveur:proxy_port proxy-user = "proxy_account:proxy_password" |
I have also added -k option to curl in get_flight_data.sh to avoid certificate issue:
#!/bin/bash # # This script will remotely invoke the bureau of transportation statistics web form to retrieve data by month: # https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time # for the specific columns listed in the SQL and utilized by the sample schema. mkdir -p data for y in {2018..2018}; do for m in {1..12}; do yyyymm="$y-$(printf %02d $m)" echo "$yyyymm" curl -k -L -o data.zip -d "sqlstr=+SELECT+YEAR%2CMONTH%2CDAY_OF_MONTH%2CDAY_OF_WEEK%2CFL_DATE%2CCARRIER%2CTAIL_NUM%2CFL_NUM%2CORIGIN%2CDEST%2CCRS_DEP_TIME%2CDEP_TIME%2CDEP_DELAY%2CTAXI_OUT%2CWHEELS_OFF%2CWHEELS_ON%2CTAXI_IN%2CCRS_ARR_TIME%2CARR_TIME%2CARR_DELAY%2CCANCELLED%2CCANCELLATION_CODE%2CDIVERTED%2CCRS_ELAPSED_TIME%2CACTUAL_ELAPSED_TIME%2CAIR_TIME%2CDISTANCE%2CCARRIER_DELAY%2CWEATHER_DELAY%2CNAS_DELAY%2CSECURITY_DELAY%2CLATE_AIRCRAFT_DELAY+FROM++T_ONTIME+WHERE+Month+%3D$m+AND+YEAR%3D$y" https://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=236 rm -f *.csv unzip data.zip rm -f data.zip mv *.csv $yyyymm.csv tail -n +2 $yyyymm.csv > data/$yyyymm.csv rm -f $yyyymm.csv done done |
Data download and loading went well and I end up the configuration by creating an account to access to figures from remote:
MariaDB [(none))]> grant all on *.* to 'yjaquier'@'%' identified by 'secure_password'; Query OK, 0 rows affected (0.001 sec) |
Power BI Desktop configuration
I have obviously started by downloading and installing Power BI Desktop. I have also installed MariaDB Connector/ODBC (3.1.9).
Configure a User DSN in ODBC Data Sources (64 bits):
Supply the account and password we have just created above:
In Power BI choose an ODBC database connection and use the recently created User DSN:
Finally by using the few queries provide in MariaDB columnstore samples Github I have been able to make some graphics. Airports map:
Delay by airlines and by delay type:
Is MariaDB ColumnStore worth the effort ?
Is it really faster to use ColumnStore ? My Linux guest (VirtualBox 6.1.12) has 4 cores and 8GB of RAM, I’m also using VirtualBox Host I/ Cache (Sata 7200 RPM HDD) for the guest disk configuration.
In no way this is a benchmark but I really wanted to have a feeling on how much performance improvement this new columnar storage is delivering. I have not tuned any parameter from the official column store container from MariaDB (InnoDB buffer pool is 128MB).
I have just created a standard InnoDB flights2 table with exact same columns as flights table and fill it with:
MariaDB [flights]> insert into flights2 select * from flights; Query OK, 7856869 rows affected (6 min 28.798 sec) |
And used the airline_delay_types_by_year.sql script, I have created an InnoDB version using my flights2 table and got below result that is an average over five runs:
Columnstore | InnoDB |
---|---|
1 minute 30 seconds | 2 minutes 30 seconds |
References
- Setting up HTTP Proxy variables for podman
- How to use curl command with proxy username/password on Linux/ Unix
- https connection using CURL from command line
- See Columnar Storage for Analytics In Action and Get an Overview of MariaDB Community Server 10.5
- ColumnStore Native Part of MariaDB 10.5