IT World https://blog.yannickjaquier.com RDBMS, Unix and many more... Fri, 26 Mar 2021 14:38:12 +0000 en-US hourly 1 https://wordpress.org/?v=5.7.1 Automate your cluster with Ambari API and Ambari Metrics System https://blog.yannickjaquier.com/hadoop/automate-your-cluster-with-ambari-api-and-ams.html https://blog.yannickjaquier.com/hadoop/automate-your-cluster-with-ambari-api-and-ams.html#respond Fri, 26 Mar 2021 08:33:11 +0000 https://blog.yannickjaquier.com/?p=5112 Preamble Even if we upgraded our HortonWorks Hadoop Platform (HDP) to 3.1.4 to solve many HDP 2 issues we still encountered few components that are failing for unexpected reason and/or that keep increasing their memory usage, and never decreasing, for no particular reason (at least not one we have found so far) like Hive Metastore. […]

The post Automate your cluster with Ambari API and Ambari Metrics System appeared first on IT World.

]]>

Table of contents

Preamble

Even if we upgraded our HortonWorks Hadoop Platform (HDP) to 3.1.4 to solve many HDP 2 issues we still encountered few components that are failing for unexpected reason and/or that keep increasing their memory usage, and never decreasing, for no particular reason (at least not one we have found so far) like Hive Metastore.

Of course we are using Ambari (2.7.4) and Hive Dashboard (Grafana) but this is only in a reactive approach to stop/start components and to proactively monitor memory usage and so on.

To try to automate we decided to investigate Ambari API and Ambari Metrics System (AMS). The documentation of those two components is not the best I have seen and it might be difficult to enter into the usage of those two components. But once the basic understanding is there the limit on what’s possible to do is infinite…

AMS will be to programmatically get what you would get from Grafana and then trigger an action with Ambari API to restart a component.

All the script below with be in Python 3.8.6 using Python HTTP-speaking package requests for convenience. I guess it can be transposed to any language of your choice but around Big Data platform (and in general) Python is quite popular…

Ambari Metrics System (AMS)

The idea behind the usage of this RESTAPI is to get latest memory usage and configured maximum memory usage value of Hive Metastore.

First identify your AMS server by clicking on Ambari Metrics service and getting details of Metric Collector component:

ams01
ams01

Once you have identified your AMS server (that will be call AMS_SERVER in below) check you can access the REATAPI by using http://AMS_SERVER:6188/ws/v1/timeline/metrics/hosts or http://AMS_SERVER:6188/ws/v1/timeline/metrics/metadata urls in a web browser. First url is giving a picture of your servers with their associated components that I cannot share here and second url is giving a list of all the available metrics (the Firefox 84.0.2 display is much better than Chrome one as JSON output is formatted, you can even filter it):

ams02
ams02

To get metrics value you must HTTP GET on a rule of the form:

http://AMS_HOST:6188/ws/v1/timeline/metrics?metricNames=<>&hostname=<>&appId=<>&startTime=<>&endTime=<>&precision=<>

If you want only the latest raw value of one or many metric names you can omit appId, startTime, endTime and precision parameters of the url…

Then to know which metrics name you need to get you can use your Ambari Grafana component and after authentication you can edit any chart and get metric name (in my example I have my Hive Metastore running on two different hosts):

ams03
ams03

So I need to get default.General.heap.max and default.General.heap.used metric names from the hostnames(s) where is running the metastore appId. In other words an url with this parameter: metricNames=default.General.heap.max,default.General.heap.used&appId=hivemetastore&hostname=hiveserver01.domain.com

In Python it gives something like:

import requests
import json

# -----------------------------------------------------------------------------
#                       Functions
# -----------------------------------------------------------------------------
def human_readable(num):
  """
  this function will convert bytes to MB.... GB... etc
  """
  step_unit = 1024.0
  for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
    if num < step_unit:
      return "%3.1f %s" % (num, x)
    num /= step_unit

# -----------------------------------------------------------------------------
#         Variables
# -----------------------------------------------------------------------------
AMS_SERVER = 'amshost01.domain.com'
AMS_PORT = '6188'
AMS_URL = 'http://' + AMS_SERVER + ':' + AMS_PORT + '/ws/v1/timeline/'

# -----------------------------------------------------------------------------
#                Main
# -----------------------------------------------------------------------------

try:
  request01 = requests.get(AMS_URL + "metrics?metricNames=default.General.heap.max,default.General.heap.used&appId=hivemetastore&hostname=hiveserver01.domain.com")
  request01_dict = json.loads(request01.text)
  output = {}
  for row in request01_dict['metrics']:
    for key01, value01 in row.items():
      if key01 == 'metricname':
        metricname = value01
      if key01 == 'metrics':
        for key02, value02 in value01.items():
          metricvalue = value02
    output[metricname] = metricvalue
  print('Hive Metastore Heap Max: ' + human_readable(output['default.General.heap.max']))
  print('Hive Metastore Heap Used: ' + human_readable(output['default.General.heap.used']))
  print(("Hive Metastore percentage memory used: {:.0f}").format(output['default.General.heap.used']*100/output['default.General.heap.max']))
except:
  print("Cannot contact AMS server")
  exit(1)

exit(0)

For my live Hive Metastore it gives:

# ./ams_restapi.py
Hive Metastore Heap Max: 24.0 GB
Hive Metastore Heap Used: 5.3 GB
Hive Metastore percentage memory used: 22

You can obviously double-check in Grafana to be 100% sure you are returning what you expect...

Ambari API

In this first chapter we have seen how to get metrics information of your components that can be used to trigger an action of this component. To trigger this action by script you need to use Ambari API.

As you are using Ambari you need to have an Ambari account and password to execute this RESTAPI. This is simple to implement with Python requests package.

All

The first information you need to get is the cluster name you have chosen when installing HDP. In Python this can simply be done with, in below AMBARI_URL is in the form of 'http://' + AMBARI_SERVER + ':' + AMBARI_PORT + '/api/v1/clusters/'. For example 'http://ambariserver01.domain.com:8080/api/v1/clusters/'

try:
  request01 = requests.get(AMBARI_URL, auth=('ambari_account', 'ambari_password'))
  request01_dict = json.loads(request01.text)
  cluster_name = request01_dict['items'][0]['Clusters']['cluster_name']
except:
  logging.error("Cannot contact Ambari server")
  print("Cannot contact Ambari server")
  exit(1)

This cluster_name variable will be used in below sub-chapters...

List services and components

You get service list with:

try:
  request01 = requests.get(AMBARI_URL + cluster_name + '/services', auth=('ambari_account', 'ambari_password'))
  request01_dict = json.loads(request01.text)
  for row in request01_dict['items']:
    print(row['ServiceInfo']['service_name'])
except:
  logging.error("Cannot contact Ambari server")
  print("Cannot contact Ambari server")
  exit(1)

And you can get components list per service with:

try:
  request01 = requests.get(AMBARI_URL + cluster_name + '/services', auth=('ambari_account', 'ambari_password'))
  request01_dict = json.loads(request01.text)
  # print(request01_dict)
  for row01 in request01_dict['items']:
    print('Service: ' + row01['ServiceInfo']['service_name'])
    request02 = requests.get(AMBARI_URL + cluster_name + '/services/' + row01['ServiceInfo']['service_name'] + '/components', auth=('ambari_account', 'ambari_password'))
    request02_dict = json.loads(request02.text)
    for row02 in request02_dict['items']:
      print('Component: ' + row02['ServiceComponentInfo']['component_name'])
except:
  logging.error("Cannot contact Ambari server")
  print("Cannot contact Ambari server")
  exit(1)

Status of services and components

To get service status (SERVICE is the variable containing your service name). I replace INSTALLED by STOPPED because a service/component is in INSTALLED state when stopped:

try:
  request01 = requests.get(AMBARI_URL + cluster_name + '/services/' + SERVICE + '?fields=ServiceInfo/state', auth=('ambari_account', 'ambari_password'))
  request01_dict = json.loads(request01.text)
  print('Service ' + SERVICE + ' status: '+ request01_dict['ServiceInfo']['state'].replace("INSTALLED","STOPPED"))
except:
  logging.error("Cannot contact Ambari server")
  print("Cannot contact Ambari server")
  exit(1)

To get component status (SERVICE and COMPONENT variables respectively contains service and componentn anmes):

try:
  request01 = requests.get(AMBARI_URL + cluster_name + '/services/' + SERVICE + '/components/' + COMPONENT, auth=('ambari_account', 'ambari_password'))
  request01_dict = json.loads(request01.text)
  # If a componennt is running on multiple hosts
  for row in request01_dict["host_components"]:
    print('Component ' + COMPONENT + ' of service ' + SERVICE + ' running on ' + row["HostRoles"]["host_name"] + ' status: '+ request01_dict['ServiceComponentInfo']['state'].replace("INSTALLED","STOPPED"))
except:
  logging.error("Cannot contact Ambari server")
  print("Cannot contact Ambari server")
  exit(1)

Stop services and components

To stop a service you need an HTTP PUT request, so with a body to simply overwrite the state of the service/component by INSTALLED. Context can be customize to see self explaining display in Ambari web application like this:

ambari_api01
ambari_api01

You must also set the header of your PUT request. The PUT request return a message to tell you if your request has been accepted or not:

To stop a service and so all its components:

try:
  data={"RequestInfo":{"context":"Stop service "+SERVICE},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}
  request01 = requests.put(AMBARI_URL + cluster_name + '/services/' + SERVICE, json=data, headers={'X-Requested-By': 'ambari'}, auth=('ambari_account', 'ambari_password'))
  print(request01.text)
except:
  print("Cannot stop service")
  logging.error("Cannot stop service")
  print(request01.status_code)
  print(request01.text)
  exit(1)

To stop a component is a bit more complex because you need to get first the hostname(s) on which your component is running:

try:
  request01 = requests.get(AMBARI_URL + cluster_name + '/services/' + SERVICE + '/components/' + COMPONENT, auth=('ambari_account', 'ambari_password'))
  request01_dict = json.loads(request01.text)
  hosts_put_url = request01_dict['host_components']
except:
  logging.error("Cannot contact Ambari server")
  print("Cannot contact Ambari server")
  exit(1)
try:
  for row in hosts_put_url:
    data={"RequestInfo":{"context":"Stop component " + COMPONENT + " on " + row["HostRoles"]["host_name"]},"Body":{"HostRoles":{"state":"INSTALLED"}}}
    host_put_url = row["href"]
    request02 = requests.put(host_put_url, json=data, headers={'X-Requested-By': 'ambari'}, auth=('ambari_account', 'ambari_password'))
    print(request02.text)
except:
  print("Cannot stop component")
  logging.error("Cannot stop component")
  print(request02.status_code)
  print(request02.text)
  exit(1)

Start services and components

Starting services and components is exactly the same principle as stopping them except that the desired state is STARTED.

To start a service and all its components:

try:
  data={"RequestInfo":{"context":"Start service "+SERVICE},"Body":{"ServiceInfo":{"state":"STARTED"}}}
  request01 = requests.put(AMBARI_URL + cluster_name + '/services/' + SERVICE, json=data, headers={'X-Requested-By': 'ambari'}, auth=('ambari_account', 'ambari_password'))
  print(request01.text)
except:
  print("Cannot start service")
  logging.error("Cannot start service")
  print(request01.status_code)
  print(request01.text)
  exit(1)

To stop a component on all the hostnames where it is running:

try:
  request01 = requests.get(AMBARI_URL + cluster_name + '/services/' + SERVICE + '/components/' + COMPONENT, auth=('ambari_account', 'ambari_password'))
  request01_dict = json.loads(request01.text)
  hosts_put_url = request01_dict['host_components']
except:
  logging.error("Cannot contact Ambari server")
  print("Cannot contact Ambari server")
  exit(1)
try:
  for row in hosts_put_url:
    data={"RequestInfo":{"context":"Start component " + COMPONENT + " on " + row["HostRoles"]["host_name"]},"Body":{"HostRoles":{"state":"STARTED"}}}
    host_put_url = row["href"]
    request02 = requests.put(host_put_url, json=data, headers={'X-Requested-By': 'ambari'}, auth=('ambari_account', 'ambari_password'))
    print(request02.text)
except:
  print("Cannot start component")
  logging.error("Cannot start component")
  print(request02.status_code)
  print(request02.text)
  exit(1)

Conclusion

What I have finally done is an executable file with parameters (using args python package) and I have now an usable script to interreact with components. This script can be given to level 1 operations to perform routine action while on duty like:

# ./ambari_restapi.py --help
usage: ambari_restapi.py [-h] [--list_services] [--list_components] [--status] [--service SERVICE] [--component COMPONENT] [--stop] [--start]

optional arguments:
  -h, --help            show this help message and exit
  --list_services       List all services
  --list_components     List all components
  --status              Status of service or component, works in conjunction with --service or --component
  --service SERVICE     Service name
  --component COMPONENT
                        Component name
  --stop                Stop service or component, works in conjunction with --service or --component
  --start               Start service or component, works in conjunction with --service or --component

# ./ambari_restapi.py --list_services
AMBARI_INFRA_SOLR
AMBARI_METRICS
HBASE
HDFS
HIVE
MAPREDUCE2
OOZIE
PIG
SMARTSENSE
SPARK2
TEZ
YARN
ZEPPELIN
ZOOKEEPER

Then with a monitoring tool if the AMS result is above a defined threshold we can trigger action using Ambari API script...

References

The post Automate your cluster with Ambari API and Ambari Metrics System appeared first on IT World.

]]>
https://blog.yannickjaquier.com/hadoop/automate-your-cluster-with-ambari-api-and-ams.html/feed 0
Privilege Analysis hands-on for least privileges principle https://blog.yannickjaquier.com/oracle/privilege-analysis-for-least-privileges-principle.html https://blog.yannickjaquier.com/oracle/privilege-analysis-for-least-privileges-principle.html#respond Thu, 25 Feb 2021 09:38:06 +0000 https://blog.yannickjaquier.com/?p=5102 Preamble If there is one subject raising quickly where you work it is, with no doubt, security ! The subject is complex in essence since there is obviously not a clear procedure to follow to be secure, would be too simple. When it comes to Oracle databases privileges there is clearly a principle that you […]

The post Privilege Analysis hands-on for least privileges principle appeared first on IT World.

]]>

Table of contents

Preamble

If there is one subject raising quickly where you work it is, with no doubt, security ! The subject is complex in essence since there is obviously not a clear procedure to follow to be secure, would be too simple. When it comes to Oracle databases privileges there is clearly a principle that you MUST apply everywhere called least privileges principle.

Where I work it is always a challenge to apply this principle because in many cases for legacy (bad) reasons many applicative accounts have been granted many xxx ANY yyy privileges to avoid granting real required objects one by one. Not to say that some applicative accounts have even been granted with DBA role… Then it becomes really difficult to guess what the users really need and use to remove the high privileges to grant many other smaller ones. You also often do not get any support from users in such a scenario…

With Oracle 12cR1 (12.1.0.1) has been released a very interesting feature to achieve this goal of least privileges principle call Privilege Analysis (PA). The process is very simple and Oracle official documentation state it very well:

Privilege analysis enables customers to create a profile for a database user and capture the list of system and object privileges that are being used by this user. The customer can then compare the user’s list of used privileges with the list of granted privileges and reduce the list of granted privileges to match the used privileges.

Unfortunately this feature has been made available only within the Database Vault paid option. The excellent news that Oracle published on December 7th, 2018 is that they made this feature free in all Oracle Enterprise Edition version !

My test database is running on a RedHat 7.8 server and is release 19c with October 2020 Release Update (RU) 31771877 so exact database Release is 19.9.0.0.201020. In fact a pluggable database inside this container one.

Privilege analysis test case

When creating my test database I have selected the sample schemas and so have a bit of figures already created inside the database:

SQL> set lines 200
SQL> select * from hr.employees fetch next 5 rows only;

EMPLOYEE_ID FIRST_NAME           LAST_NAME                 EMAIL                     PHONE_NUMBER         HIRE_DATE JOB_ID         SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID
----------- -------------------- ------------------------- ------------------------- -------------------- --------- ---------- ---------- -------------- ---------- -------------
        100 Steven               King                      SKING                     515.123.4567         17-JUN-03 AD_PRES         24000                                      90
        101 Neena                Kochhar                   NKOCHHAR                  515.123.4568         21-SEP-05 AD_VP           17000                       100            90
        102 Lex                  De Haan                   LDEHAAN                   515.123.4569         13-JAN-01 AD_VP           17000                       100            90
        103 Alexander            Hunold                    AHUNOLD                   590.423.4567         03-JAN-06 IT_PROG          9000                       102            60
        104 Bruce                Ernst                     BERNST                    590.423.4568         21-MAY-07 IT_PROG          6000                       103            60

The EMPLOYEES table is part of the well known HR schema.

Now let’s imagine I have a HRUSER account used inside an application and by laziness I grant SELECT ANY TABLE and UPDATE ANY TABLE privileges to this account to be able to (simply) access and update HR schema tables.

This also raise some questions when you have multiple schema owner inside the same database or pluggable database as the powerful SELECT ANY TABLE privilege will allow the account to select figures inside objects of all database schemas and NOT ONLY the desired one linked to the application this account is supporting… No even mentioning the extra threat with UPDATE ANY TABLE, DELETE ANY TABLE and so on…

Simple creation:

SQL> create user hruser identified by "secure_password";

User created.

SQL> grant connect, select any table, update any table to hruser;

Grant succeeded.

Privilege analysis testing

Now that our fictive application is running using HRUSER account we would like to analyze really what the HRUSER account is using as privileges and see how we can reduce its rights to apply the least privileges principle.

All is done through the DBMS_PRIVILEGE_CAPTURE PL/SQL supplied package and you need the CAPTURE_ADMIN role to use it, for convenience I will use SYS account.

I start by creating a privilege analysis policy for my HRUSER account. I have chosen to create a G_CONTEXT analysis policy to focus on my HRUSER account as explained in official documentation:

  • G_DATABASE: Captures all privilege use in the database, except privileges used by the SYS user.
  • G_ROLE: Captures the use of a privilege if the privilege is part of a specified role or list of roles.
  • G_CONTEXT: Captures the use of a privilege if the context specified by the condition parameter evaluates to true.
  • G_ROLE_AND_CONTEXT: Captures the use of a privilege if the privilege is part of the specified list of roles and when the condition specified by the condition parameter is true.
SQL> exec dbms_privilege_capture.create_capture(name=>'hruser_prileges_analysis', description=>'Analyze HRUSER privileges usage',-
> type=>DBMS_PRIVILEGE_CAPTURE.G_CONTEXT,-
> condition=>'SYS_CONTEXT(''USERENV'', ''SESSION_USER'')=''HRUSER''');

PL/SQL procedure successfully completed.

SQL> col description for a40
SQL> col context for a50
SQL> select description,type,context from dba_priv_captures where name='hruser_prileges_analysis';

DESCRIPTION                              TYPE             CONTEXT
---------------------------------------- ---------------- --------------------------------------------------
Analyze HRUSER privileges usage          CONTEXT          SYS_CONTEXT('USERENV', 'SESSION_USER')='HRUSER'

You must then enable the privilege analysis policy with below command. Specifying a run_name is interesting to generate report and compare privilege analysis policy result over multiple capture periods:

SQL> exec dbms_privilege_capture.enable_capture(name=>'hruser_prileges_analysis', run_name=>'hruser_18_dec_2020');

PL/SQL procedure successfully completed.

SQL> col run_name for a20
SQL> select description,type,context,run_name from dba_priv_captures where name='hruser_prileges_analysis';

DESCRIPTION                              TYPE             CONTEXT                                            RUN_NAME
---------------------------------------- ---------------- -------------------------------------------------- --------------------
Analyze HRUSER privileges usage          CONTEXT          SYS_CONTEXT('USERENV', 'SESSION_USER')='HRUSER'    HRUSER_18_DEC_2020

Then with my previously created HRUSER I simulate what would be done through a classical application. I select salary and commission percentage of a sales employee and because he has performed very well over the past year I increase his commission:

SQL> select first_name,last_name,salary,commission_pct from hr.employees where employee_id=165;

FIRST_NAME           LAST_NAME                     SALARY COMMISSION_PCT
-------------------- ------------------------- ---------- --------------
David                Lee                             6800             .1

SQL> update hr.employees set commission_pct=0.2 where employee_id=165;

1 row updated.

SQL> commit;

Commit complete.

SQL> select first_name,last_name,salary,commission_pct from hr.employees where employee_id=165;

FIRST_NAME           LAST_NAME                     SALARY COMMISSION_PCT
-------------------- ------------------------- ---------- --------------
David                Lee                             6800             .2

The duration of the capture activity might be quite complex to determine. What if you have weekly jobs, monthly jobs or even yearly jobs. I tend to say that it must run for at least a week but the good duration is totally up to you and your environment…

Privilege analysis reports and conclusion

Before generating a privilege anaylsys policy report you must disable it:

SQL> exec dbms_privilege_capture.disable_capture(name=> 'hruser_prileges_analysis');

PL/SQL procedure successfully completed.

I generate a privilege analysis report using the run name I have specified when enabling it:

SQL> exec dbms_privilege_capture.generate_result(name=>'hruser_prileges_analysis', run_name=>'HRUSER_18_DEC_2020', dependency=>true);

PL/SQL procedure successfully completed.

To be honest I am a little bit disappointed here as I expected the procedure like AWR one to generate an html nice and usable result. But in fact the generate result procedure simply fill the DBA_USED_* data dictionary privilege analysis views and you have then to fetch them to get your result. The complete list of this views is available in official documentation.

SQL> col username format a10
SQL> col sys_priv format a16
SQL> col object_owner format a13
SQL> col object_name format a23
SQL> select username,sys_priv, object_owner, object_name from dba_used_privs where capture='hruser_prileges_analysis' and run_name='HRUSER_18_DEC_2020';

USERNAME   SYS_PRIV         OBJECT_OWNER  OBJECT_NAME
---------- ---------------- ------------- -----------------------
HRUSER                      SYS           DBMS_APPLICATION_INFO
HRUSER     UPDATE ANY TABLE HR            EMPLOYEES
HRUSER     CREATE SESSION
HRUSER     SELECT ANY TABLE HR            EMPLOYEES
HRUSER                      SYS           DUAL

In below output we see that HRUSER has used UPDATE ANY TABLE and SELECT ANY TABLE to respectively update and read HR.EMPLOYEES table. So a direct grant and update on this table would replace the two high ANY privileges.

Let’s now generate system privileges HRUSER has used:

SQL> select username, sys_priv from dba_used_sysprivs where capture='hruser_prileges_analysis' and run_name='HRUSER_18_DEC_2020';

USERNAME   SYS_PRIV
---------- ----------------
HRUSER     CREATE SESSION
HRUSER     SELECT ANY TABLE
HRUSER     UPDATE ANY TABLE

In my example I have nothing related to objects privileges but if you need you can use DBA_USED_OBJPRIVS(_PATH) and DBA_UNUSED_OBJPRIVS(_PATH) views.

In this final example we can also see that from CONNECT role HRUSER has not used the SET CONTAINER privilege so we could also replace CONNECT role by CREATE SESSION privilege.

SQL> col path for a40
SQL> select sys_priv, path from dba_used_sysprivs_path where capture='hruser_prileges_analysis'  and run_name='HRUSER_18_DEC_2020';

SYS_PRIV         PATH
---------------- ----------------------------------------
UPDATE ANY TABLE GRANT_PATH('HRUSER')
CREATE SESSION   GRANT_PATH('HRUSER', 'CONNECT')
SELECT ANY TABLE GRANT_PATH('HRUSER')

SQL> select sys_priv, path from dba_unused_sysprivs_path where capture='hruser_prileges_analysis' and run_name='HRUSER_18_DEC_2020' and username='HRUSER';

SYS_PRIV         PATH
---------------- ----------------------------------------
SET CONTAINER    GRANT_PATH('HRUSER', 'CONNECT')

All those SQL*Plus queries are cool but at few years light from what you can get with Cloud Control (Security / Privilege Analysis menu). The version I have is Cloud Control 13c release 2 (13.2):

Global view (you even have start and end time capture that do not appear easily in PA views):

privilege_analysis01
privilege_analysis01

Used privileges:

privilege_analysis02
privilege_analysis02

Unused privileges (I had to filter on my HRUSER account in Cloud Control):

privilege_analysis03
privilege_analysis03

References

The post Privilege Analysis hands-on for least privileges principle appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/privilege-analysis-for-least-privileges-principle.html/feed 0
MariaDB ColumnStore installation and testing – part 2 https://blog.yannickjaquier.com/mysql/mariadb-columnstore-installation-and-testing-part-2.html https://blog.yannickjaquier.com/mysql/mariadb-columnstore-installation-and-testing-part-2.html#comments Sun, 24 Jan 2021 08:53:27 +0000 https://blog.yannickjaquier.com/?p=5060 Preamble After a first blog post using the container edition of MariaDB ColumnStore I wanted to deploy it on an existing custom MariaDB server installation. Because where I work we do prefer to put files where we like using the MOCA architecture. I had give up on this part as the MariaDB documentation is really […]

The post MariaDB ColumnStore installation and testing – part 2 appeared first on IT World.

]]>

Table of contents

Preamble

After a first blog post using the container edition of MariaDB ColumnStore I wanted to deploy it on an existing custom MariaDB server installation. Because where I work we do prefer to put files where we like using the MOCA architecture.

I had give up on this part as the MariaDB documentation is really too poor and might come back to this article to update if things evolve positively…

MariaDB Community Server installation and configuration

I have updated my MOCA layout for MariaDB that we have seen a long time ago. MOCA stands for MariaDB Optimal Configuration Architecture (MOCA). So below MariaDB directory naming convention, mariadb01 is the name of the instance:

Directory Used for
/mariadb/data01/mariadb01 Strore MyISAM and InnoDB files, dataxx directories can also be created to spread I/O
/mariadb/dump/mariadb01 All log files (slow log, error log, general log, …)
/mariadb/logs/mariadb01 All binary logs (log-bin, relay_log)
/mariadb/software/mariadb01 MariaDB binaries, you might also want to use /mariadb/software/10.5.4 and factor binaries for multiple MariaDB instances.
I personally believe that the extra 1GB for binaries is worth the flexibility it gives. In other words you can upgrade one without touching the others.
The my.cnf file is then stored in a conf sub-directory, as well as socket and pid files.

I have create a mariadb Linux account in dba group and a /mariadb mount point of 5GB (xfs).

The binaries I downloaded is mariadb-10.5.4-linux-systemd-x86_64.tar.gz (for systems with systemd) as I have a recent Linux… The tar.gz release is obviously deliberate as I want to be able to put it in the directory of my choice:

columnstore07
columnstore07

If you take the RPM you can only have one engine per server that can be really limiting (and really hard to manage with your customers) with modern powerful servers…

I created /mariadb/software/mariadb01/conf/my.cnf file with below content (this is just a starting point, any tuning on it for your own workload is mandatory):

[server]
# Primary variables
basedir                         = /mariadb/software/mariadb01
datadir                         = /mariadb/data01/mariadb01
max_allowed_packet              = 256M
max_connect_errors              = 1000000
pid_file                        = /mariadb/software/mariadb01/conf/mariadb01.pid
skip_external_locking
skip_name_resolve

# Logging
log_error                       = /mariadb/dump/mariadb01/mariadb01.err
log_queries_not_using_indexes   = ON
long_query_time                 = 5
slow_query_log                  = ON     # Disabled for production
slow_query_log_file             = /mariadb/dump/mariadb01/mariadb01-slow.log

tmpdir                          = /tmp
user                            = mariadb

# InnoDB Settings
default_storage_engine          = InnoDB
innodb_buffer_pool_size         = 1G    # Use up to 70-80% of RAM
innodb_file_per_table           = ON
innodb_flush_log_at_trx_commit  = 0
innodb_flush_method             = O_DIRECT
innodb_log_buffer_size          = 16M
innodb_log_file_size            = 512M
innodb_stats_on_metadata        = ON
innodb_read_io_threads          = 64
innodb_write_io_threads         = 64

# New plugin directory for Columnstore
plugin_dir                      = /usr/lib64/mysql/plugin
plugin_maturity                 = beta

[client-server]
port                            = 3316
socket                          = /mariadb/software/mariadb01/conf/mariadb01.sock

As root account I executed:

[root@server4 ~]# /mariadb/software/mariadb01/scripts/mariadb-install-db --defaults-file=/mariadb/software/mariadb01/conf/my.cnf --user=mariadb
Installing MariaDB/MySQL system tables in '/mariadb/data01/mariadb01' ...
OK

To start mysqld at boot time you have to copy
support-files/mysql.server to the right place for your system


Two all-privilege accounts were created.
One is root@localhost, it has no password, but you need to
be system 'root' user to connect. Use, for example, sudo mysql
The second is mariadb@localhost, it has no password either, but
you need to be the system 'mariadb' user to connect.
After connecting you can set the password, if you would need to be
able to connect as any of these users with a password and without sudo

See the MariaDB Knowledgebase at https://mariadb.com/kb or the
MySQL manual for more instructions.

You can start the MariaDB daemon with:
cd '/mariadb/software/mariadb01' ; /mariadb/software/mariadb01/bin/mysqld_safe --datadir='/mariadb/data01/mariadb01'

You can test the MariaDB daemon with mysql-test-run.pl
cd '/mariadb/software/mariadb01/mysql-test' ; perl mysql-test-run.pl

Please report any problems at https://mariadb.org/jira

The latest information about MariaDB is available at https://mariadb.org/.
You can find additional information about the MySQL part at:
https://dev.mysql.com
Consider joining MariaDB's strong and vibrant community:
Get Involved

This is new (at least to me) that from now one you can connect with mariadb or root account without any password. In my mariadb Linux account I created three below aliases:

alias mariadb01='/mariadb/software/mariadb01/bin/mariadb --defaults-file=/mariadb/software/mariadb01/conf/my.cnf --user=mariadb'
alias start_mariadb01='cd /mariadb/software/mariadb01/; ./bin/mariadbd-safe --defaults-file=/mariadb/software/mariadb01/conf/my.cnf &'
alias stop_mariadb01='/mariadb/software/mariadb01/bin/mariadb-admin --defaults-file=/mariadb/software/mariadb01/conf/my.cnf --user=mariadb shutdown' 

Start and stop command are working fine but client connection (mariadb01 alias) failed for:

/mariadb/software/mariadb01/bin/mariadb: error while loading shared libraries: libncurses.so.5: cannot open shared object file: No such file or directory

I resolved it with:

dnf -y install ncurses-compat-libs-6.1-7.20180224.el8.x86_64

You can also connect with root Linux account using (MariaDB accounts cannot be faked):

[root@server4 ~]# /mariadb/software/mariadb01/bin/mariadb --defaults-file=/mariadb/software/mariadb01/conf/my.cnf --user=root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 5
Server version: 10.5.4-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]>

MariaDB ColumnStore installation and configuration

I expected, as it is written everywhere, to have ColumnStore available as a storage engine. But found nothing implemented by default:

MariaDB [(none)]> show engines;
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                                         | Transactions | XA   | Savepoints |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| MRG_MyISAM         | YES     | Collection of identical MyISAM tables                                                           | NO           | NO   | NO         |
| CSV                | YES     | Stores tables as CSV files                                                                      | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables                                       | NO           | NO   | NO         |
| SEQUENCE           | YES     | Generated tables filled with sequential values                                                  | YES          | NO   | YES        |
| Aria               | YES     | Crash-safe tables with MyISAM heritage. Used for internal temporary tables and privilege tables | NO           | NO   | NO         |
| MyISAM             | YES     | Non-transactional engine with good performance and small data footprint                         | NO           | NO   | NO         |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                                              | NO           | NO   | NO         |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables                | YES          | YES  | YES        |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
8 rows in set (0.000 sec)

MariaDB [(none)]> show plugins;
+-------------------------------+----------+--------------------+---------+---------+
| Name                          | Status   | Type               | Library | License |
+-------------------------------+----------+--------------------+---------+---------+
| binlog                        | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| mysql_native_password         | ACTIVE   | AUTHENTICATION     | NULL    | GPL     |
| mysql_old_password            | ACTIVE   | AUTHENTICATION     | NULL    | GPL     |
| MRG_MyISAM                    | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| MEMORY                        | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| CSV                           | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| Aria                          | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| MyISAM                        | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| SPATIAL_REF_SYS               | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| GEOMETRY_COLUMNS              | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| inet6                         | ACTIVE   | DATA TYPE          | NULL    | GPL     |
| inet_aton                     | ACTIVE   | FUNCTION           | NULL    | GPL     |
| inet_ntoa                     | ACTIVE   | FUNCTION           | NULL    | GPL     |
| inet6_aton                    | ACTIVE   | FUNCTION           | NULL    | GPL     |
| inet6_ntoa                    | ACTIVE   | FUNCTION           | NULL    | GPL     |
| is_ipv4                       | ACTIVE   | FUNCTION           | NULL    | GPL     |
| is_ipv6                       | ACTIVE   | FUNCTION           | NULL    | GPL     |
| is_ipv4_compat                | ACTIVE   | FUNCTION           | NULL    | GPL     |
| is_ipv4_mapped                | ACTIVE   | FUNCTION           | NULL    | GPL     |
| CLIENT_STATISTICS             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INDEX_STATISTICS              | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| TABLE_STATISTICS              | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| USER_STATISTICS               | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| wsrep                         | ACTIVE   | REPLICATION        | NULL    | GPL     |
| SQL_SEQUENCE                  | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| PERFORMANCE_SCHEMA            | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| InnoDB                        | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| INNODB_TRX                    | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_LOCKS                  | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_LOCK_WAITS             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP                    | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP_RESET              | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMPMEM                 | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMPMEM_RESET           | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP_PER_INDEX          | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_CMP_PER_INDEX_RESET    | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_BUFFER_PAGE            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_BUFFER_PAGE_LRU        | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_BUFFER_POOL_STATS      | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_METRICS                | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_DEFAULT_STOPWORD    | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_DELETED             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_BEING_DELETED       | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_CONFIG              | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_INDEX_CACHE         | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_FT_INDEX_TABLE         | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_TABLES             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_TABLESTATS         | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_INDEXES            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_COLUMNS            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_FIELDS             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_FOREIGN            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_FOREIGN_COLS       | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_TABLESPACES        | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_DATAFILES          | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_VIRTUAL            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_MUTEXES                | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_SYS_SEMAPHORE_WAITS    | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_TABLESPACES_ENCRYPTION | ACTIVE   | INFORMATION SCHEMA | NULL    | BSD     |
| SEQUENCE                      | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| user_variables                | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| unix_socket                   | ACTIVE   | AUTHENTICATION     | NULL    | GPL     |
| FEEDBACK                      | DISABLED | INFORMATION SCHEMA | NULL    | GPL     |
| THREAD_POOL_GROUPS            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| THREAD_POOL_QUEUES            | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| THREAD_POOL_STATS             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| THREAD_POOL_WAITS             | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| partition                     | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
+-------------------------------+----------+--------------------+---------+---------+
68 rows in set (0.002 sec)

You have also this query that I found on MariaDB web site:

SELECT plugin_name, plugin_version, plugin_maturity FROM information_schema.plugins ORDER BY plugin_name;

I had to configure official MariaDB repository as explained in documentation:

[root@server4 ~]# cat /etc/yum.repos.d/mariadb.repo
# MariaDB 10.5 RedHat repository list - created 2020-07-09 15:06 UTC
# http://downloads.mariadb.org/mariadb/repositories/
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.5/rhel8-amd64
module_hotfixes=1
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1

One the repository is configured you can see what’s available with:

dnf list mariadb*

I can see a MariaDB-columnstore-engine.x86_64 but its installation will also install MariaDB-server.x86_64 which I do not want… So far I have not found a way to just have the .so file to inject this ColumStore storage engine in my custom MariaDB Server installation…

[root@server4 ~]# mcsSetConfig CrossEngineSupport Host 127.0.0.1
[root@server4 ~]# mcsSetConfig CrossEngineSupport Port 3316
[root@server4 ~]# mcsSetConfig CrossEngineSupport User cross_engine
[root@server4 ~]# mcsSetConfig CrossEngineSupport Password cross_engine_passwd
MariaDB [(none)]> CREATE USER 'cross_engine'@'127.0.0.1' IDENTIFIED BY "cross_engine_passwd";
Query OK, 0 rows affected (0.001 sec)

MariaDB [(none)]> GRANT SELECT ON *.* TO 'cross_engine'@'127.0.0.1';
Query OK, 0 rows affected (0.001 sec)
[root@server4 ~]# systemctl status mariadb-columnstore
● mariadb-columnstore.service - mariadb-columnstore
   Loaded: loaded (/usr/lib/systemd/system/mariadb-columnstore.service; enabled; vendor preset: disabled)
   Active: active (exited) since Mon 2020-07-13 15:42:19 CEST; 3min 15s ago
  Process: 27960 ExecStop=/usr/bin/mariadb-columnstore-stop.sh (code=exited, status=0/SUCCESS)
  Process: 27998 ExecStart=/usr/bin/mariadb-columnstore-start.sh (code=exited, status=0/SUCCESS)
 Main PID: 27998 (code=exited, status=0/SUCCESS)

Jul 13 15:42:11 server4.domain.com systemd[1]: Stopped mariadb-columnstore.
Jul 13 15:42:11 server4.domain.com systemd[1]: Starting mariadb-columnstore...
Jul 13 15:42:12 server4.domain.com mariadb-columnstore-start.sh[27998]: Job for mcs-storagemanager.service failed because the control process exited with error code.
Jul 13 15:42:12 server4.domain.com mariadb-columnstore-start.sh[27998]: See "systemctl status mcs-storagemanager.service" and "journalctl -xe" for details.
Jul 13 15:42:19 server4.domain.com systemd[1]: Started mariadb-columnstore.
[root@server4 ~]# systemctl status mcs-storagemanager.service
● mcs-storagemanager.service - storagemanager
   Loaded: loaded (/usr/lib/systemd/system/mcs-storagemanager.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2020-07-13 15:42:14 CEST; 8min ago
  Process: 28010 ExecStartPre=/usr/bin/mcs-start-storagemanager.py (code=exited, status=1/FAILURE)

Jul 13 15:42:14 server4.domain.com systemd[1]: Starting storagemanager...
Jul 13 15:42:14 server4.domain.com mcs-start-storagemanager.py[28010]: S3 storage has not been set up for MariaDB ColumnStore. StorageManager service fails to start.
Jul 13 15:42:14 server4.domain.com systemd[1]: mcs-storagemanager.service: Control process exited, code=exited status=1
Jul 13 15:42:14 server4.domain.com systemd[1]: mcs-storagemanager.service: Failed with result 'exit-code'.
Jul 13 15:42:14 server4.domain.com systemd[1]: Failed to start storagemanager.
[root@server4 columnstore]# cat /var/log/mariadb/columnstore/debug.log
Jul 13 15:05:42 server4 IDBFile[26302]: 42.238256 |0|0|0| D 35 CAL0002: Failed to open file: /var/lib/columnstore/data1/systemFiles/dbrm/tablelocks, exception: unable to open Buffered file
Jul 13 15:05:42 server4 controllernode[26302]: 42.238358 |0|0|0| D 29 CAL0000: TableLockServer::load(): could not open the save file/var/lib/columnstore/data1/systemFiles/dbrm/tablelocks
Jul 13 15:42:17 server4 IDBFile[28020]: 17.117913 |0|0|0| D 35 CAL0002: Failed to open file: /var/lib/columnstore/data1/systemFiles/dbrm/tablelocks, exception: unable to open Buffered file
Jul 13 15:42:17 server4 controllernode[28020]: 17.118009 |0|0|0| D 29 CAL0000: TableLockServer::load(): could not open the save file/var/lib/columnstore/data1/systemFiles/dbrm/tablelocks
[root@server4 columnstore]# grep -v ^# /etc/columnstore/storagemanager.cnf | grep -v -e '^$'
[ObjectStorage]
service = LocalStorage
object_size = 5M
metadata_path = /mariadb/columnstore/storagemanager/metadata
journal_path = /mariadb/columnstore/storagemanager/journal
max_concurrent_downloads = 21

max_concurrent_uploads = 21
common_prefix_depth = 3
[S3]
region = some_region
bucket = some_bucket
[LocalStorage]
path = /mariadb/columnstore/storagemanager/fake-cloud
fake_latency = n
max_latency = 50000
[Cache]
cache_size = 2g
path = /mariadb/columnstore/storagemanager/cache
[mariadb@server4 mariadb]$ mkdir -p /mariadb/columnstore/storagemanager/fake-cloud
[mariadb@server4 mariadb]$ mkdir -p /mariadb/columnstore/storagemanager/cache
[mariadb@server4 mariadb]$ mkdir -p /mariadb/columnstore/storagemanager/metadata
[mariadb@server4 mariadb]$ mkdir -p /mariadb/columnstore/storagemanager/journal
[root@server4 ~]# mcsGetConfig -a | grep /var/lib
SystemConfig.DBRoot1 = /var/lib/columnstore/data1
SystemConfig.DBRMRoot = /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves
SystemConfig.TableLockSaveFile = /var/lib/columnstore/data1/systemFiles/dbrm/tablelocks
SessionManager.TxnIDFile = /var/lib/columnstore/data1/systemFiles/dbrm/SMTxnID
OIDManager.OIDBitmapFile = /var/lib/columnstore/data1/systemFiles/dbrm/oidbitmap
WriteEngine.BulkRoot = /var/lib/columnstore/data/bulk
WriteEngine.BulkRollbackDir = /var/lib/columnstore/data1/systemFiles/bulkRollback
[root@server4 ~]# mcsSetConfig SystemConfig DBRoot1 /mariadb/columnstore/data1
[root@server4 ~]# mcsGetConfig SystemConfig DBRoot1
/mariadb/columnstore/data1
[root@server4 ~]# mcsSetConfig SystemConfig DBRMRoot /mariadb/columnstore/data1/systemFiles/dbrm/BRM_saves
[root@server4 ~]# mcsSetConfig SystemConfig TableLockSaveFile /mariadb/columnstore/data1/systemFiles/dbrm/tablelocks
[root@server4 ~]# mcsSetConfig SessionManager TxnIDFile /mariadb/columnstore/data1/systemFiles/dbrm/SMTxnID
[root@server4 ~]# mcsSetConfig OIDManager OIDBitmapFile /mariadb/columnstore/data1/systemFiles/dbrm/oidbitmap
[root@server4 ~]# mcsSetConfig WriteEngine BulkRoot /mariadb/columnstore/data/bulk
[root@server4 ~]# mcsSetConfig WriteEngine BulkRollbackDir /mariadb/columnstore/data1/systemFiles/bulkRollback
[root@server4 ~]# mkdir -p /mariadb/columnstore/data1/systemFiles/dbrm/BRM_saves /mariadb/columnstore/data1/systemFiles/dbrm/tablelocks
[root@server4 ~]# mkdir -p /mariadb/columnstore/data1/systemFiles/dbrm/SMTxnID /mariadb/columnstore/data1/systemFiles/dbrm/SMTxnID
[root@server4 ~]# mkdir -p /mariadb/columnstore/data/bulk /mariadb/columnstore/data1/systemFiles/bulkRollback

Taking inspiration from the container version I have changed plugin_dir variable and plugin maturity allowance to:

# New plugin directory for Columnstore
plugin_dir                      = /usr/lib64/mysql/plugin
plugin_maturity                 = beta

Plugin maturity parameter is to avoid:

MariaDB [(none)]> INSTALL PLUGIN IF NOT EXISTS Columnstore SONAME 'ha_columnstore.so';
ERROR 1126 (HY000): Can't open shared library 'ha_columnstore.so' (errno: 1, Loading of beta plugin Columnstore is prohibited by --plugin-maturity=gamma)

And tried to load the plugin with:

MariaDB [(none)]> show engines;
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                                         | Transactions | XA   | Savepoints |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| CSV                | YES     | Stores tables as CSV files                                                                      | NO           | NO   | NO         |
| MRG_MyISAM         | YES     | Collection of identical MyISAM tables                                                           | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables                                       | NO           | NO   | NO         |
| Aria               | YES     | Crash-safe tables with MyISAM heritage. Used for internal temporary tables and privilege tables | NO           | NO   | NO         |
| MyISAM             | YES     | Non-transactional engine with good performance and small data footprint                         | NO           | NO   | NO         |
| SEQUENCE           | YES     | Generated tables filled with sequential values                                                  | YES          | NO   | YES        |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables                | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                                              | NO           | NO   | NO         |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
8 rows in set (0.001 sec)

MariaDB [(none)]> INSTALL PLUGIN IF NOT EXISTS Columnstore SONAME 'ha_columnstore.so';
Query OK, 0 rows affected (0.111 sec)

MariaDB [(none)]> show engines;
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                                         | Transactions | XA   | Savepoints |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| Columnstore        | YES     | ColumnStore storage engine                                                                      | YES          | NO   | NO         |
| MRG_MyISAM         | YES     | Collection of identical MyISAM tables                                                           | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables                                       | NO           | NO   | NO         |
| Aria               | YES     | Crash-safe tables with MyISAM heritage. Used for internal temporary tables and privilege tables | NO           | NO   | NO         |
| MyISAM             | YES     | Non-transactional engine with good performance and small data footprint                         | NO           | NO   | NO         |
| SEQUENCE           | YES     | Generated tables filled with sequential values                                                  | YES          | NO   | YES        |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables                | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                                              | NO           | NO   | NO         |
| CSV                | YES     | Stores tables as CSV files                                                                      | NO           | NO   | NO         |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
9 rows in set (0.001 sec)

On the paper it works but the connection is lost when you try to create a table with ColumnStore storage engine…

References

The post MariaDB ColumnStore installation and testing – part 2 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/mysql/mariadb-columnstore-installation-and-testing-part-2.html/feed 1
MariaDB ColumnStore installation and testing – part 1 https://blog.yannickjaquier.com/mysql/mariadb-columnstore-installation-and-testing-part-1.html https://blog.yannickjaquier.com/mysql/mariadb-columnstore-installation-and-testing-part-1.html#comments Wed, 23 Dec 2020 09:49:32 +0000 https://blog.yannickjaquier.com/?p=5057 Preamble After I have seen the announcement of MariaDB Community saying that ColumnStore has been added as a pluggable storage engine for free I wanted to test it. I have anyway fighted with few changes in the way to install and configure from scratch a MariaDB server so decided to put a small chapter on […]

The post MariaDB ColumnStore installation and testing – part 1 appeared first on IT World.

]]>

Table of contents

Preamble

After I have seen the announcement of MariaDB Community saying that ColumnStore has been added as a pluggable storage engine for free I wanted to test it. I have anyway fighted with few changes in the way to install and configure from scratch a MariaDB server so decided to put a small chapter on this part.

This blog post has been written using Oracle Linux 8.2 (yeah I know MariaDB is not supported on this OS but it is really similar to RedHat and free) and MariaDB Community Server 10.5.4.

My first (failed) try has been using my own personalized installation of MariaDB and I have tried to add the ColumnStore storage engine inside, what I have done is presented in a second part and I might come back on it later on. To be honest at the time of writing this post the official documentation is too poor and I have not been able to conclude on this point. So decided to fall back to container implementation as MariaDB has done a lot in blog posts and webinars they created… I’m, of course, using the hype container engine called Podman.

I have also decided to make MariaDB my standard MySQL flavor and so will not use anymore the one coming from Oracle. The main reason being the open source strategy and the governance of MariaDB versus the one of Oracle corporation with MySQL. By the way many big players have already made this transition few years back (Wikipedia, Google, …).

MariaDB ColumnStore container installation and configuration with Podman

I start first by creating a dedicated LVM volume to store containers and images:

[root@server4 ~]# lvcreate -L 10g -n lvol20 vg00
  Logical volume "lvol20" created.
[root@server4 ~]# mkfs -t xfs /dev/vg00/lvol20
meta-data=/dev/vg00/lvol20       isize=512    agcount=4, agsize=655360 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=2621440, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@server4 containers]# grep containers /etc/fstab
/dev/mapper/vg00-lvol20   /var/lib/containers                    xfs    defaults        0 0

Then trying to download the official MariaDB ColumnStore image:

[root@server4 ~]# podman pull mariadb/columnstore
Trying to pull container-registry.oracle.com/mariadb/columnstore...
  Get https://container-registry.oracle.com/v2/: dial tcp: lookup container-registry.oracle.com on 164.129.154.205:53: no such host
Trying to pull docker.io/mariadb/columnstore...
  Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 164.129.154.205:53: no such host
Trying to pull registry.fedoraproject.org/mariadb/columnstore...
  Get https://registry.fedoraproject.org/v2/: dial tcp: lookup registry.fedoraproject.org on 164.129.154.205:53: no such host
Trying to pull quay.io/mariadb/columnstore...
  Get https://quay.io/v2/: dial tcp: lookup quay.io on 164.129.154.205:53: no such host
Trying to pull registry.centos.org/mariadb/columnstore...
  Get https://registry.centos.org/v2/: dial tcp: lookup registry.centos.org on 164.129.154.205:53: no such host
Error: error pulling image "mariadb/columnstore": unable to pull mariadb/columnstore: 5 errors occurred:
        * Error initializing source docker://container-registry.oracle.com/mariadb/columnstore:latest: error pinging docker registry container-registry.oracle.com: Get https://container-registry.oracle.com/v2/: dial tcp: lookup container-registry.oracle.com on 164.129.154.205:53: no such host
        * Error initializing source docker://mariadb/columnstore:latest: error pinging docker registry registry-1.docker.io: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 164.129.154.205:53: no such host
        * Error initializing source docker://registry.fedoraproject.org/mariadb/columnstore:latest: error pinging docker registry registry.fedoraproject.org: Get https://registry.fedoraproject.org/v2/: dial tcp: lookup registry.fedoraproject.org on 164.129.154.205:53: no such host
        * Error initializing source docker://quay.io/mariadb/columnstore:latest: error pinging docker registry quay.io: Get https://quay.io/v2/: dial tcp: lookup quay.io on 164.129.154.205:53: no such host
        * Error initializing source docker://registry.centos.org/mariadb/columnstore:latest: error pinging docker registry registry.centos.org: Get https://registry.centos.org/v2/: dial tcp: lookup registry.centos.org on 164.129.154.205:53: no such host

As suggested I had to configure my corporate proxy:

[root@server4 ~]# cat /etc/profile.d/http_proxy.sh
export HTTP_PROXY=http://proxy_account:proxy_password@proxy_serveur:proxy_port
export HTTPS_PROXY=http://proxy_account:proxy_password@proxy_serveur:proxy_port

Failed for a proxy certificate issue:

[root@server4 ~]# podman pull mariadb/columnstore
Trying to pull container-registry.oracle.com/mariadb/columnstore...
  Get https://container-registry.oracle.com/v2/: x509: certificate signed by unknown authority
Trying to pull docker.io/mariadb/columnstore...
  Get https://registry-1.docker.io/v2/: x509: certificate signed by unknown authority
Trying to pull registry.fedoraproject.org/mariadb/columnstore...
  manifest unknown: manifest unknown
Trying to pull quay.io/mariadb/columnstore...
  Get https://quay.io/v2/: x509: certificate signed by unknown authority
Trying to pull registry.centos.org/mariadb/columnstore...
  Get https://registry.centos.org/v2/: x509: certificate signed by unknown authority
Error: error pulling image "mariadb/columnstore": unable to pull mariadb/columnstore: 5 errors occurred:
        * Error initializing source docker://container-registry.oracle.com/mariadb/columnstore:latest: error pinging docker registry container-registry.oracle.com: Get https://container-registry.oracle.com/v2/: x509: certificate signed by unknown authority
        * Error initializing source docker://mariadb/columnstore:latest: error pinging docker registry registry-1.docker.io: Get https://registry-1.docker.io/v2/: x509: certificate signed by unknown authority
        * Error initializing source docker://registry.fedoraproject.org/mariadb/columnstore:latest: Error reading manifest latest in registry.fedoraproject.org/mariadb/columnstore: manifest unknown: manifest unknown
        * Error initializing source docker://quay.io/mariadb/columnstore:latest: error pinging docker registry quay.io: Get https://quay.io/v2/: x509: certificate signed by unknown authority
        * Error initializing source docker://registry.centos.org/mariadb/columnstore:latest: error pinging docker registry registry.centos.org: Get https://registry.centos.org/v2/: x509: certificate signed by unknown authority

Exported the one of my Windows/Chrome configuration:

columnstore01
columnstore01

And loaded it in my Linux guest (VirtualBox):

[root@server4 ~]# cp /tmp/zarootca.cer /etc/pki/ca-trust/source/anchors/
[root@server4 ~]# update-ca-trust extract

Went well this time:

[root@server4 ~]# podman pull mariadb/columnstore
Trying to pull container-registry.oracle.com/mariadb/columnstore...
  unable to retrieve auth token: invalid username/password: unauthorized: authentication required
Trying to pull docker.io/mariadb/columnstore...
Getting image source signatures
Copying blob 7361994e337a done
Copying blob 6910e5a164f7 done
Copying blob d3a9faedef9c done
Copying blob 09d6834f75a6 done
Copying blob 68e5e07852c8 done
Copying blob df75e1d0f89f done
Copying blob 026abfbced9b done
Copying blob 97d3b9b39f85 done
Copying blob ae7bd0c62cca done
Copying blob 4feabe6971fa done
Copying blob 3833a7277c1f done
Copying blob 97e0996c4e98 done
Copying config 5a61255d05 done
Writing manifest to image destination
Storing signatures
5a61255d059ff8e913b623218d59b45fcda11364676abcde26c188ab5248dec3

Simply create a new container (mcs_container) using this newly download image with:

[root@server4 ~]# podman run -d -p 3306:3306 --name mcs_container mariadb/columnstore
25809ac451884ab4753be0ed512a6fb08bc2d2c2a8f7384659a4281e2a2fa36d
[root@server4 ~]# podman ps -a
CONTAINER ID  IMAGE                                 COMMAND               CREATED        STATUS            PORTS                   NAMES
25809ac45188  docker.io/mariadb/columnstore:latest  /bin/sh -c column...  9 seconds ago  Up 6 seconds ago  0.0.0.0:3306->3306/tcp  mcs_container

Connect to it with:

[root@server4 ~]# podman exec -it mcs_container bash
[root@25809ac45188 /]# mariadb
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.5.4-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show engines;
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                                         | Transactions | XA   | Savepoints |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
| Columnstore        | YES     | ColumnStore storage engine                                                                      | YES          | NO   | NO         |
| MRG_MyISAM         | YES     | Collection of identical MyISAM tables                                                           | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables                                       | NO           | NO   | NO         |
| Aria               | YES     | Crash-safe tables with MyISAM heritage. Used for internal temporary tables and privilege tables | NO           | NO   | NO         |
| MyISAM             | YES     | Non-transactional engine with good performance and small data footprint                         | NO           | NO   | NO         |
| SEQUENCE           | YES     | Generated tables filled with sequential values                                                  | YES          | NO   | YES        |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables                | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                                              | NO           | NO   | NO         |
| CSV                | YES     | Stores tables as CSV files                                                                      | NO           | NO   | NO         |
+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+
9 rows in set (0.001 sec)

MariaDB ColumnStore default schema configuration

On MariaDB official Github I have downloaded a zip copy of their mariadb-columnstore-samples. Push it to your container with:

[root@server4 ~]# podman cp /tmp/mariadb-columnstore-samples-master.zip mcs_container:/tmp

In flights sub-directory the schema creation and loading is made of:

  • create_flights_db.sh
  • get_flight_data.sh
  • load_flight_data.sh

Creation went well but to get (on internet) the data I had (again) to configure my corporate proxy:

[root@25809ac45188 flights]# cat ~/.curlrc
proxy = proxy_serveur:proxy_port
proxy-user = "proxy_account:proxy_password"

I have also added -k option to curl in get_flight_data.sh to avoid certificate issue:

#!/bin/bash
#
# This script will remotely invoke the bureau of transportation statistics web form to retrieve data by month:
# https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time
# for the specific columns listed in the SQL and utilized by the sample schema.

mkdir -p data
for y in {2018..2018}; do
  for m in {1..12}; do
    yyyymm="$y-$(printf %02d $m)"
    echo "$yyyymm"
    curl -k -L -o data.zip -d "sqlstr=+SELECT+YEAR%2CMONTH%2CDAY_OF_MONTH%2CDAY_OF_WEEK%2CFL_DATE%2CCARRIER%2CTAIL_NUM%2CFL_NUM%2CORIGIN%2CDEST%2CCRS_DEP_TIME%2CDEP_TIME%2CDEP_DELAY%2CTAXI_OUT%2CWHEELS_OFF%2CWHEELS_ON%2CTAXI_IN%2CCRS_ARR_TIME%2CARR_TIME%2CARR_DELAY%2CCANCELLED%2CCANCELLATION_CODE%2CDIVERTED%2CCRS_ELAPSED_TIME%2CACTUAL_ELAPSED_TIME%2CAIR_TIME%2CDISTANCE%2CCARRIER_DELAY%2CWEATHER_DELAY%2CNAS_DELAY%2CSECURITY_DELAY%2CLATE_AIRCRAFT_DELAY+FROM++T_ONTIME+WHERE+Month+%3D$m+AND+YEAR%3D$y" https://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=236
    rm -f *.csv
    unzip data.zip
    rm -f data.zip
    mv *.csv $yyyymm.csv
    tail -n +2 $yyyymm.csv > data/$yyyymm.csv
    rm -f $yyyymm.csv
  done
done

Data download and loading went well and I end up the configuration by creating an account to access to figures from remote:

MariaDB [(none))]> grant all on *.* to 'yjaquier'@'%' identified by 'secure_password';
Query OK, 0 rows affected (0.001 sec)

Power BI Desktop configuration

I have obviously started by downloading and installing Power BI Desktop. I have also installed MariaDB Connector/ODBC (3.1.9).

Configure a User DSN in ODBC Data Sources (64 bits):

columnstore02
columnstore02

Supply the account and password we have just created above:

columnstore03
columnstore03

In Power BI choose an ODBC database connection and use the recently created User DSN:

columnstore04
columnstore04

Finally by using the few queries provide in MariaDB columnstore samples Github I have been able to make some graphics. Airports map:

columnstore05
columnstore05

Delay by airlines and by delay type:

columnstore06
columnstore06

Is MariaDB ColumnStore worth the effort ?

Is it really faster to use ColumnStore ? My Linux guest (VirtualBox 6.1.12) has 4 cores and 8GB of RAM, I’m also using VirtualBox Host I/ Cache (Sata 7200 RPM HDD) for the guest disk configuration.

In no way this is a benchmark but I really wanted to have a feeling on how much performance improvement this new columnar storage is delivering. I have not tuned any parameter from the official column store container from MariaDB (InnoDB buffer pool is 128MB).

I have just created a standard InnoDB flights2 table with exact same columns as flights table and fill it with:

MariaDB [flights]> insert into flights2 select * from flights;
Query OK, 7856869 rows affected (6 min 28.798 sec)

And used the airline_delay_types_by_year.sql script, I have created an InnoDB version using my flights2 table and got below result that is an average over five runs:

Columnstore InnoDB
1 minute 30 seconds 2 minutes 30 seconds

References

The post MariaDB ColumnStore installation and testing – part 1 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/mysql/mariadb-columnstore-installation-and-testing-part-1.html/feed 1
Hiveserver2 monitoring with Jconsole and Grafana in HDP 3.x https://blog.yannickjaquier.com/hadoop/hiveserver2-monitoring-with-jconsole-and-grafana-in-hdp-3-x.html https://blog.yannickjaquier.com/hadoop/hiveserver2-monitoring-with-jconsole-and-grafana-in-hdp-3-x.html#respond Sun, 22 Nov 2020 08:54:14 +0000 https://blog.yannickjaquier.com/?p=5027 Preamble Since we have migrated in HFP 3 we had recurring issue with HiveServer2 memory (Ambari memory alert) or the process simply got stuck. When trying to monitor memory consumption we discovered that with HDP 3.x the Ambari Metrics for HiveServer2 heap usage are not displayed anymore in Grafana (No datapoints): I have tried to […]

The post Hiveserver2 monitoring with Jconsole and Grafana in HDP 3.x appeared first on IT World.

]]>

Table of contents

Preamble

Since we have migrated in HFP 3 we had recurring issue with HiveServer2 memory (Ambari memory alert) or the process simply got stuck. When trying to monitor memory consumption we discovered that with HDP 3.x the Ambari Metrics for HiveServer2 heap usage are not displayed anymore in Grafana (No datapoints):

hiveserver201
hiveserver201

I have tried to correct the chart in Grafana when logged (admin account) but even if the metrics are listed there is nothing to display:

hiveserver202
hiveserver202

We got advised to use Java Management Extensions (JMX) Technology to monitor and manage this HiveServer2 Java process using jconsole. Using jconsole would also allow us to issue on-demand a garbage collector task.

Last but not least I have finally found an article on how to restore the HiveServer2 metrics in Grafana…

We are running Hortonworks Data Platform (HDP) 3.1.4 so hive 3.1.0 and Ambari 2.7.4.

Hiveserver2 monitoring with Jconsole

As it is clearly described in Java official documentation to activate JMX for your Java process you simply need to add below option when executing your process:

-Dcom.sun.management.jmxremote

Adding only this parameter will allow local monitoring i.e. the jconsole must be launched on the server where is running your Java process.

The Cloudera documentation has (as usual) quite a few typo issues and what you need to modify is HADOOP_OPTS environment variable. This is done in Ambari for Hiveserver2 in Hive services and hive-env template parameter. I have chosen to re-export the variable keeping its previous value not to change the default Ambari setting:

hiveserver203
hiveserver203

So I added:

export HADOOP_OPTS="-Dcom.sun.management.jmxremote=true $HADOOP_OPTS"

You can find your Hiveserver2 process pid using:

[root@hive_server ~]# ps -ef | grep java |grep hiveserver2
hive      9645     1  0 09:55 ?        00:01:14 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_jar -Dhdp.version=3.1.4.0-315 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=8004 -Xloggc:/var/log/hive/hiveserver2-gc-%t.log -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCCause -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive/hs2_heapdump.hprof -Dhive.log.dir=/var/log/hive -Dhive.log.file=hiveserver2.log -Dhdp.version=3.1.4.0-315 -Xmx8192m -Dproc_hiveserver2 -Xmx48284m -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/usr/hdp/current/hive-server2/conf//parquet-logging.properties -Dyarn.log.dir=/var/log/hadoop/hive -Dyarn.log.file=hadoop.log -Dyarn.home.dir=/usr/hdp/3.1.4.0-315/hadoop-yarn -Dyarn.root.logger=INFO,console -Djava.library.path=:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/hdp/3.1.4.0-315/hadoop/lib/native/Linux-amd64-64:/usr/hdp/current/hadoop-client/lib/native -Dhadoop.log.dir=/var/log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/current/hadoop-client -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/hdp/3.1.4.0-315/hive/lib/hive-service-3.1.0.3.1.4.0-315.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar
hive     18276     1  0 10:02 ?        00:00:49 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_jar -Dhdp.version=3.1.4.0-315 -Djava.net.preferIPv4Stack=true -Xloggc:/var/log/hive/hiveserverinteractive-gc-%t.log -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCCause -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive/hsi_heapdump.hprof -Dhive.log.dir=/var/log/hive -Dhive.log.file=hiveserver2Interactive.log -Dhdp.version=3.1.4.0-315 -Xmx8192m -Dproc_hiveserver2 -Xmx2048m -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/usr/hdp/current/hive-server2/conf_llap//parquet-logging.properties -Dyarn.log.dir=/var/log/hadoop/hive -Dyarn.log.file=hadoop.log -Dyarn.home.dir=/usr/hdp/3.1.4.0-315/hadoop-yarn -Dyarn.root.logger=INFO,console -Djava.library.path=:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/hdp/3.1.4.0-315/hadoop/lib/native/Linux-amd64-64:/usr/hdp/current/hadoop-client/lib/native -Dhadoop.log.dir=/var/log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/current/hadoop-client -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/hdp/3.1.4.0-315/hive/lib/hive-service-3.1.0.3.1.4.0-315.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///usr/hdp/current/hive-server2/lib/hive-hcatalog-core.jar

Then run jconsole command from your $JDK_HOME/bin directory. You run it from $JDK_HOME/bin directory and you obviously need to set the DISPLAY environment variable and have a X server running on your desktop (MobaXterm strongly recommended):

hiveserver204
hiveserver204

Click on Connect and acknowledge the SSL warning, you can now start monitoring:

hiveserver205
hiveserver205

Or perform a garbage collector if you think you have a non-expected memory issue with the process:

hiveserver206
hiveserver206

If you want to be able to remotely monitor you need to add com.sun.management.jmxremote.port=portnum parameter.

To disable SSL if you do not have a certificate use com.sun.management.jmxremote.ssl=false.

Then comes authentication, you can:

  • De-activate it with com.sun.management.jmxremote.authenticate=false (not recommended but simplest way to remotely connect)
  • Use LDAP authentication with com.sun.management.jmxremote.login.config=ExampleCompanyConfig and java.security.auth.login.config=ldap.config
  • Use a file base authentication using com.sun.management.jmxremote.password.file=pwFilePath

To, at least, activate file base authentication get the password file template from $JDK_HOME/jre/lib/management/jmxremote.password.template file and put it in /usr/hdp/current/hive-server2/conf/. Rename it to jmxremote.password and create the users (role in fact according to official documentation):

[root@hive_server ~]# tail /usr/hdp/current/hive-server2/conf/jmxremote.password
# For # security, you should either restrict the access to this file,
# or specify another, less accessible file in the management config file
# as described above.
#
# Following are two commented-out entries.  The "measureRole" role has
# password "QED".  The "controlRole" role has password "R&D".
#
# monitorRole  QED
# controlRole   R&D
yjaquier   secure_password

Also modify $JDK_HOME//jre/lib/management/jmxremote.access to reflect your newly create role/user. Here I have just chosen to inherit highest privilege role:

[root@hive_server jdk1.8.0_112]# tail $JDK_HOME/jre/lib/management/jmxremote.access
# o The "controlRole" role has readwrite access and can create the standard
#   Timer and Monitor MBeans defined by the JMX API.

monitorRole   readonly
controlRole   readwrite \
              create javax.management.monitor.*,javax.management.timer.* \
              unregister
yjaquier   readwrite \
           create javax.management.monitor.*,javax.management.timer.* \
           unregister

My HADOOP_OPTS variable is now:

export HADOOP_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.password.file=$HIVE_CONF_DIR/jmxremote.password -Dcom.sun.management.jmxremote.port=8004 $HADOOP_OPTS"

If HiveServer2 is not restarting you will most probably find in error log something like:

[root@hive_server hive]# cat hive-server2.err
Error: Password file read access must be restricted: /usr/hdp/current/hive-server2/conf//jmxremote.password
Error: Password file read access must be restricted: /usr/hdp/current/hive-server2/conf//jmxremote.password

Easy to solve with:

[root@hive_server conf]# chmod 600 jmxremote.password
[root@hive_server conf]# ll jmxremote.password
-rw------- 1 hive hadoop 2880 Jun 25 14:32 jmxremote.password

And with my local jconsole program (C:\Program Files\Java\jdk1.8.0_241\bin for me) of my desktop JDK installation I can remotely connect (also notice the better visual quality with Windows edition):

hiveserver207
hiveserver207

And have the exact same display as the Linux release, in a way it is more convenient because from your desktop you can access to any Java process of your Hadoop cluster…

hiveserver208
hiveserver208

Hiveserver2 monitoring with Grafana

Coming back to Ambari configuration I have seen a strange list of parameters:

hiveserver209
hiveserver209

While in Confluence official docuementation I can see (https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Metrics.1):

hiveserver210
hiveserver210

And this parameter is truly set to false in my environment:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.server2.metrics.enabled;
+-------------------------------------+
|                 set                 |
+-------------------------------------+
| hive.server2.metrics.enabled=false  |
+-------------------------------------+
1 row selected (0.251 seconds)

So clearly Ambari has a bug and the parameter in “Advanced hiveserver2-site” is not the good one. I have decided to add this in “Custom hiveserver2-site”, saved and restarted required component to end up with a strange behavior. the parameter got moved to “Advanced hiveserver2-site” with a checkbox (like it should have been since the beginning):

hiveserver211
hiveserver211

Back in Ambari I have modified the Hiveserver2 chart to use default.General.memory.heap.max, default.General.memory.heap.used and default.General.memory.heap.committed metrics to finally get:

hiveserver212
hiveserver212

References

The post Hiveserver2 monitoring with Jconsole and Grafana in HDP 3.x appeared first on IT World.

]]>
https://blog.yannickjaquier.com/hadoop/hiveserver2-monitoring-with-jconsole-and-grafana-in-hdp-3-x.html/feed 0
Spark dynamic allocation how to configure and use it https://blog.yannickjaquier.com/hadoop/spark-dynamic-allocation-how-to-configure-and-use-it.html https://blog.yannickjaquier.com/hadoop/spark-dynamic-allocation-how-to-configure-and-use-it.html#respond Thu, 22 Oct 2020 08:52:23 +0000 https://blog.yannickjaquier.com/?p=4994 Preamble Since we have started to put Spark job in production we asked ourselves the question of how many executors, number of cores per executor and executor memory we should put. What if we put too much and are wasting resources and could we improve the response time if we put more ? In other […]

The post Spark dynamic allocation how to configure and use it appeared first on IT World.

]]>

Table of contents

Preamble

Since we have started to put Spark job in production we asked ourselves the question of how many executors, number of cores per executor and executor memory we should put. What if we put too much and are wasting resources and could we improve the response time if we put more ?

In other words those spark-submit parameters (we have an Hortonworks Hadoop cluster and so are using YARN):

  • –executor-memory MEM – Memory per executor (e.g. 1000M, 2G) (Default: 1G).
  • –executor-cores NUM – Number of cores per executor. (Default: 1 in YARN mode, or all available cores on the worker in standalone mode)
  • –num-executors NUM – Number of executors to launch (Default: 2). If dynamic allocation is enabled, the initial number of executors will be at least NUM.

And in fact it is written in above description of num-executors Spark dynamic allocation is partially answering to the former question.

Spark dynamic allocation is a feature allowing your Spark application to automatically scale up and down the number of executors. And only the number of executors not the memory size and not the number of cores of each executor that must still be set specifically in your application or when executing spark-submit command. So the promise is your application will dynamically be able to request more executors and release them back to cluster pool based on your application workload. Of course if using YARN you will be tightly linked to the ressource allocated to the queue to which you have submitted your application (–queue parameter of spark-submit).

This blog post has been written using Hortonworks Data Platform (HDP) 3.1.4 and so Spark2 2.3.2.

Spark dynamic allocation setup

As it is written in official documentation the shuffle jar must be added to the classpath of all NodeManagers. If like me you are running HDP 3 I have discovered that everything was already configured. The jar of this external shuffle library is:

[root@server jars]# ll /usr/hdp/current/spark2-client/jars/*shuffle*
-rw-r--r-- 1 root root 67763 Aug 23  2019 /usr/hdp/current/spark2-client/jars/spark-network-shuffle_2.11-2.3.2.3.1.4.0-315.jar

And in Ambari the YARN configuration was also already done:

spark_dynamic_allocation01
spark_dynamic_allocation01

Remark:
We still have old Spark 1 variables and you should now concentrate only on the spark2_xx variables. Same this is spark2_shuffle that must be appended to yarn.nodemanager.aux-services.

Then again quoting official documentation you have two parameters to set inside your application to have the feature activated:

There are two requirements for using this feature. First, your application must set spark.dynamicAllocation.enabled to true. Second, you must set up an external shuffle service on each worker node in the same cluster and set spark.shuffle.service.enabled to true in your application.

This part was not obvious to me but as it is written spark.dynamicAllocation.enabled and spark.shuffle.service.enabled must not only be set at cluster level but also in your application or as a spark-submit parameter ! I would even say that setting those parameters in Ambari makes no difference but as you can see below all was done by default in my HDP 3.1.4 cluster:

spark_dynamic_allocation02
spark_dynamic_allocation02
spark_dynamic_allocation03
spark_dynamic_allocation03

For the complete list of parameters refer to the official Spark dynamic allocation parameter list.

Spark dynamic allocation testing

For the testing code I have done a mix in PySpark of multiple test code I have seen around on Internet. Using Python is avoiding me a boring sbt compilation phase before testing…

The source code is (spark_dynamic_allocation.py):

# from pyspark.sql import SparkSession
from pyspark import SparkConf
from pyspark import SparkContext
# from pyspark_llap import HiveWarehouseSession
from time import sleep

def wait_x_seconds(x):
  sleep(x*10)

conf = SparkConf().setAppName("Spark dynamic allocation").\
        set("spark.dynamicAllocation.enabled", "true").\
        set("spark.shuffle.service.enabled", "true").\
        set("spark.dynamicAllocation.initialExecutors", "1").\
        set("spark.dynamicAllocation.executorIdleTimeout", "5s").\
        set("spark.executor.cores", "1").\
        set("spark.executor.memory", "512m")

sc = SparkContext.getOrCreate(conf)

# spark = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()
# spark.stop()

sc.parallelize(range(1,6), 5).foreach(wait_x_seconds)

exit()

So in short I run five parallel processes that will each wait x*10 seconds when x is from 1 to 5 (range(1,6)). We will start with one executor and expect Spark to scale up and then down as the shorter timers will end in order (10 seconds, 20 seconds, ..). I have also exaggerated a bit in parameters as spark.dynamicAllocation.executorIdleTimeout is changed to 5s that I see in my example the executors being killed (default is 60s).

The command to execute it is, Hive Warehouse Connector not really mandatory here but it became an habit. Notice that I do not specify anything in command line as all will be setup in Python script:

spark-submit --master yarn --queue llap --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.4.0-315.jar
--py-files /usr/hdp/current/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.4.0-315.zip spark_dynamic_allocation.py

By default our spark-submit is in INFO mode, and the important part of the output is:

.
20/04/09 14:34:14 INFO Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
20/04/09 14:34:16 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.75.37.249:36332) with ID 1
20/04/09 14:34:16 INFO ExecutorAllocationManager: New executor 1 has registered (new total is 1)
.
.
20/04/09 14:34:17 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 2)
20/04/09 14:34:18 INFO ExecutorAllocationManager: Requesting 2 new executors because tasks are backlogged (new desired total will be 4)
20/04/09 14:34:19 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 5)
20/04/09 14:34:20 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.75.37.249:36354) with ID 2
20/04/09 14:34:20 INFO ExecutorAllocationManager: New executor 2 has registered (new total is 2)
20/04/09 14:34:20 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, yarn01.domain.com, executor 2, partition 1, PROCESS_LOCAL, 7869 bytes)
20/04/09 14:34:20 INFO BlockManagerMasterEndpoint: Registering block manager yarn01.domain.com:29181 with 114.6 MB RAM, BlockManagerId(2, yarn01.domain.com, 29181, None)
20/04/09 14:34:20 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on yarn01.domain.com:29181 (size: 3.7 KB, free: 114.6 MB)
20/04/09 14:34:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.75.37.249:36366) with ID 3
20/04/09 14:34:21 INFO ExecutorAllocationManager: New executor 3 has registered (new total is 3)
20/04/09 14:34:21 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, yarn01.domain.com, executor 3, partition 2, PROCESS_LOCAL, 7869 bytes)
20/04/09 14:34:21 INFO BlockManagerMasterEndpoint: Registering block manager yarn01.domain.com:44000 with 114.6 MB RAM, BlockManagerId(3, yarn01.domain.com, 44000, None)
20/04/09 14:34:21 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on yarn01.domain.com:44000 (size: 3.7 KB, free: 114.6 MB)
20/04/09 14:34:22 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.75.37.249:36376) with ID 5
20/04/09 14:34:22 INFO ExecutorAllocationManager: New executor 5 has registered (new total is 4)
20/04/09 14:34:22 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, yarn01.domain.com, executor 5, partition 3, PROCESS_LOCAL, 7869 bytes)
20/04/09 14:34:22 INFO BlockManagerMasterEndpoint: Registering block manager yarn01.domain.com:32822 with 114.6 MB RAM, BlockManagerId(5, yarn01.domain.com, 32822, None)
20/04/09 14:34:22 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on yarn01.domain.com:32822 (size: 3.7 KB, free: 114.6 MB)
20/04/09 14:34:27 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, yarn01.domain.com, executor 1, partition 4, PROCESS_LOCAL, 7869 bytes)
20/04/09 14:34:27 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 10890 ms on yarn01.domain.com (executor 1) (1/5)
20/04/09 14:34:27 INFO PythonAccumulatorV2: Connected to AccumulatorServer at host: 127.0.0.1 port: 31354
20/04/09 14:34:29 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.75.37.248:57764) with ID 4
20/04/09 14:34:29 INFO ExecutorAllocationManager: New executor 4 has registered (new total is 5)
20/04/09 14:34:29 INFO BlockManagerMasterEndpoint: Registering block manager worker01.domain.com:38365 with 114.6 MB RAM, BlockManagerId(4, worker01.domain.com, 38365, None)
20/04/09 14:34:34 INFO ExecutorAllocationManager: Request to remove executorIds: 4
20/04/09 14:34:34 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 4
20/04/09 14:34:34 INFO YarnClientSchedulerBackend: Actual list of executor(s) to be killed is 4
20/04/09 14:34:34 INFO ExecutorAllocationManager: Removing executor 4 because it has been idle for 5 seconds (new desired total will be 4)
20/04/09 14:34:38 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 4.
20/04/09 14:34:38 INFO DAGScheduler: Executor lost: 4 (epoch 0)
20/04/09 14:34:38 INFO BlockManagerMasterEndpoint: Trying to remove executor 4 from BlockManagerMaster.
20/04/09 14:34:38 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(4, worker01.domain.com, 38365, None)
20/04/09 14:34:38 INFO BlockManagerMaster: Removed 4 successfully in removeExecutor
20/04/09 14:34:38 INFO YarnScheduler: Executor 4 on worker01.domain.com killed by driver.
20/04/09 14:34:38 INFO ExecutorAllocationManager: Existing executor 4 has been removed (new total is 4)
20/04/09 14:34:41 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 20892 ms on yarn01.domain.com (executor 2) (2/5)
20/04/09 14:34:46 INFO ExecutorAllocationManager: Request to remove executorIds: 2
20/04/09 14:34:46 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 2
20/04/09 14:34:46 INFO YarnClientSchedulerBackend: Actual list of executor(s) to be killed is 2
20/04/09 14:34:46 INFO ExecutorAllocationManager: Removing executor 2 because it has been idle for 5 seconds (new desired total will be 3)
20/04/09 14:34:48 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 2.
20/04/09 14:34:48 INFO DAGScheduler: Executor lost: 2 (epoch 0)
20/04/09 14:34:48 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
20/04/09 14:34:48 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, yarn01.domain.com, 29181, None)
20/04/09 14:34:48 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor
20/04/09 14:34:48 INFO YarnScheduler: Executor 2 on yarn01.domain.com killed by driver.
20/04/09 14:34:48 INFO ExecutorAllocationManager: Existing executor 2 has been removed (new total is 3)
20/04/09 14:34:52 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 30897 ms on yarn01.domain.com (executor 3) (3/5)
20/04/09 14:34:57 INFO ExecutorAllocationManager: Request to remove executorIds: 3
20/04/09 14:34:57 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 3
20/04/09 14:34:57 INFO YarnClientSchedulerBackend: Actual list of executor(s) to be killed is 3
20/04/09 14:34:57 INFO ExecutorAllocationManager: Removing executor 3 because it has been idle for 5 seconds (new desired total will be 2)
20/04/09 14:34:59 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 3.
20/04/09 14:34:59 INFO DAGScheduler: Executor lost: 3 (epoch 0)
20/04/09 14:34:59 INFO BlockManagerMasterEndpoint: Trying to remove executor 3 from BlockManagerMaster.
20/04/09 14:34:59 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, yarn01.domain.com, 44000, None)
20/04/09 14:34:59 INFO BlockManagerMaster: Removed 3 successfully in removeExecutor
20/04/09 14:34:59 INFO YarnScheduler: Executor 3 on yarn01.domain.com killed by driver.
20/04/09 14:34:59 INFO ExecutorAllocationManager: Existing executor 3 has been removed (new total is 2)
20/04/09 14:35:03 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 40831 ms on yarn01.domain.com (executor 5) (4/5)
20/04/09 14:35:08 INFO ExecutorAllocationManager: Request to remove executorIds: 5
20/04/09 14:35:08 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 5
20/04/09 14:35:08 INFO YarnClientSchedulerBackend: Actual list of executor(s) to be killed is 5
20/04/09 14:35:08 INFO ExecutorAllocationManager: Removing executor 5 because it has been idle for 5 seconds (new desired total will be 1)
20/04/09 14:35:10 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 5.
20/04/09 14:35:10 INFO DAGScheduler: Executor lost: 5 (epoch 0)
20/04/09 14:35:10 INFO BlockManagerMasterEndpoint: Trying to remove executor 5 from BlockManagerMaster.
20/04/09 14:35:10 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(5, yarn01.domain.com, 32822, None)
20/04/09 14:35:10 INFO BlockManagerMaster: Removed 5 successfully in removeExecutor
20/04/09 14:35:10 INFO YarnScheduler: Executor 5 on yarn01.domain.com killed by driver.
20/04/09 14:35:10 INFO ExecutorAllocationManager: Existing executor 5 has been removed (new total is 1)
20/04/09 14:35:17 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 50053 ms on yarn01.domain.com (executor 1) (5/5)
20/04/09 14:35:17 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
.

We clearly see the allocation and removal of executors but it is even more clear with the Spark UI web interface:

spark_dynamic_allocation04
spark_dynamic_allocation04

The executors dynamically added in blue well contrast with the ones dynamically removed in red…

One of my colleague asked me if by mistake he allocates too many initial executors and his over allocation is wasting ressource. I have done this trial by specifying in my code:

set("spark.dynamicAllocation.initialExecutors", "1").\

And Spark Dynamic allocation has been really clever by de-allocating almost instantly the non-needed executors:

spark_dynamic_allocation05
spark_dynamic_allocation05

References

The post Spark dynamic allocation how to configure and use it appeared first on IT World.

]]>
https://blog.yannickjaquier.com/hadoop/spark-dynamic-allocation-how-to-configure-and-use-it.html/feed 0