How to load JSON into MariaDB

Preamble

I got contacted on how to load a JSON file into a MariaDB table. I initially thought it would be a 5 minutes job but at the end it was not that straightforward and I have not found many good blog post about it so this small article…

To generate a dummy JSON file to load I have used one of the first search result I found: JSON Generator – Tool for generating random data, Mockaroo is also a good fit. I used below code to generate what I wanted:

JG.repeat(3, 3, {
  id: JG.objectId(),
  first_name: JG.firstName(),
  last_name: JG.lastName(),
  company: JG.company(),
  email() {
    return (
      _.snakeCase(this.first_name) + '.' + _.snakeCase(this.last_name) +
      '@' +
      this.company +
      JG.domainZone()
    ).toLowerCase();
  },
});

This gave me this dummy JSON file to load:

[
  {
    "id": "6423f77eda3e4733f720bd62",
    "first_name": "Suarez",
    "last_name": "Hurst",
    "company": "Ohmnet",
    "email": "suarez.hurst@ohmnet.cymru"
  },
  {
    "id": "6423f77e05169d9c0a7e0f25",
    "first_name": "Dickson",
    "last_name": "Wolf",
    "company": "Ceprene",
    "email": "dickson.wolf@ceprene.sexy"
  },
  {
    "id": "6423f77e2d0484b76b85bd29",
    "first_name": "Cynthia",
    "last_name": "Rodriquez",
    "company": "Rubadub",
    "email": "cynthia.rodriquez@rubadub.guge"
  }
]

My (powerful) test server is made of 12 x86 cores and 64GB memory running Red Hat Enterprise Linux Server release 7.9 (Maipo) and MariaDB Community 10.6.12 where I have allocated 1GB to InnoDB.

json_mariadb01
json_mariadb01

Preparation

I used my MOCA filesystems layout and this below my.cnf:

[server]
# Primary variables
basedir                         = /mariadb/software/mariadb01
datadir                         = /mariadb/data01/mariadb01
max_allowed_packet              = 256M
max_connect_errors              = 1000000
pid_file                        = /mariadb/software/mariadb01/conf/mariadb01.pid
skip_external_locking
skip_name_resolve
log_bin                         = /mariadb/logs/mariadb01/mariadb-bin
log_bin_trust_function_creators = on
 
# Logging
log_error                       = /mariadb/dump/mariadb01/mariadb01.err
log_queries_not_using_indexes   = ON
long_query_time                 = 5
slow_query_log                  = ON     # Disabled for production
slow_query_log_file             = /mariadb/dump/mariadb01/mariadb01-slow.log
 
tmpdir                          = /tmp
user                            = mariadb
 
# InnoDB Settings
default_storage_engine          = InnoDB
innodb_buffer_pool_size         = 1G    # Use up to 70-80% of RAM
innodb_file_per_table           = ON
innodb_flush_log_at_trx_commit  = 0
innodb_flush_method             = O_DIRECT
innodb_log_buffer_size          = 16M
innodb_log_file_size            = 512M
innodb_stats_on_metadata        = ON
innodb_read_io_threads          = 64
innodb_write_io_threads         = 64
 
# Query cache
query_cache_size = 10M
query_cache_type = ON
 
[client-server]
port                            = 3316
socket                          = /mariadb/software/mariadb01/conf/mariadb01.sock

I have created below table to load my JSON file:

MariaDB [test]> create table test01 (
    -> id int not null auto_increment,
    -> val json not null,
    ->    primary key (id)
    -> );
Query OK, 0 rows affected (0.004 sec)

MariaDB says JSON datatype is an alias for LOGNTEXT and I was thinking to add a check constraint to ensure I’m inserting valid JSON only but in fact this is done by default with JSON datatype:

MariaDB [test]> show create table test01;
+--------+------------------------------------------------------------------------------------+
| Table  | Create Table                                                                       |
+--------+------------------------------------------------------------------------------------+
| test01 | CREATE TABLE `test01` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `val` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL CHECK (json_valid(`val`)),
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci |
+--------+------------------------------------------------------------------------------------+
1 row in set (0.000 sec)

If you want to reset your table for multiple trials do:

MariaDB [test]> truncate table test01;
Query OK, 0 rows affected (0.006 sec)
 
MariaDB [test]> alter table test01 auto_increment=1;
Query OK, 0 rows affected (0.005 sec)
Records: 0  Duplicates: 0  Warnings: 0

Load JSON into MariaDB only with SQL

My first idea was to be SQL only but obviously LOAD DATA INFILE cannot be used as this is more for CSV files and in my case a row is spread over multiple physical lines in my JSON file…

MariaDB [test]> load data local infile '/tmp/file01.json' into table test01(val);
Query OK, 0 rows affected, 23 warnings (0.001 sec)
Records: 0  Deleted: 0  Skipped: 0  Warnings: 23
 
MariaDB [test]> show warnings;
+---------+------+----------------------------------------------------+
| Level   | Code | Message                                            |
+---------+------+----------------------------------------------------+
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
| Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` |
+---------+------+----------------------------------------------------+
23 rows in set (0.000 sec)

So tried with (make the JSON file readable by everyone):

MariaDB [test]> insert into test01(val) values(load_file('/tmp/file01.json'));
Query OK, 1 row affected (0.003 sec)
 
MariaDB [test]> select * from test01;
+----+------------------------------------------------------------------------+
| id | val                                                                    |
+----+------------------------------------------------------------------------+
|  1 | [
  {
    "id": "6423f77eda3e4733f720bd62",
    "first_name": "Suarez",
    "last_name": "Hurst",
    "company": "Ohmnet",
    "email": "suarez.hurst@ohmnet.cymru"
  },
  {
    "id": "6423f77e05169d9c0a7e0f25",
    "first_name": "Dickson",
    "last_name": "Wolf",
    "company": "Ceprene",
    "email": "dickson.wolf@ceprene.sexy"
  },
  {
    "id": "6423f77e2d0484b76b85bd29",
    "first_name": "Cynthia",
    "last_name": "Rodriquez",
    "company": "Rubadub",
    "email": "cynthia.rodriquez@rubadub.guge"
  }
] |
+----+------------------------------------------------------------------------+
1 row in set (0.000 sec)

JSON is loaded but the result is not what I really wanted as now all queries are relatively complex:

MariaDB [test]> select json_valid(val) from test01;
+-----------------+
| json_valid(val) |
+-----------------+
|               1 |
+-----------------+
1 row in set (0.000 sec)
 
MariaDB [test]> select json_length(val) from test01;
+------------------+
| json_length(val) |
+------------------+
|                3 |
+------------------+
1 row in set (0.000 sec)
 
MariaDB [test]> select json_extract(val,'$[0]') from test01;
+---------------------------------------------------------------------------------------------------------------------------------------------+
| json_extract(val,'$[0]')                                                                                                                    |
+---------------------------------------------------------------------------------------------------------------------------------------------+
| {"id": "6423f77eda3e4733f720bd62", "first_name": "Suarez", "last_name": "Hurst", "company": "Ohmnet", "email": "suarez.hurst@ohmnet.cymru"} |
+---------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.000 sec)
 
MariaDB [test]> select json_value(json_extract(val,'$[0]'),'$.first_name') from test01;
+-----------------------------------------------------+
| json_value(json_extract(val,'$[0]'),'$.first_name') |
+-----------------------------------------------------+
| Suarez                                              |
+-----------------------------------------------------+
1 row in set (0.000 sec)

Also I have not been able to iterate in a procedural loop, below code is NOT working:

MariaDB [test]> delimiter //
MariaDB [test]> for i in 0..2
    -> do
    ->   select json_value(json_extract(val,'$['+i+']'),'$.first_name') from test01;
    -> end for;
    -> //
+---------------------------------------------------------+
| json_value(JSON_extract(val,'$['+i+']'),'$.first_name') |
+---------------------------------------------------------+
| NULL                                                    |
+---------------------------------------------------------+
1 row in set (0.000 sec)
 
+---------------------------------------------------------+
| json_value(JSON_extract(val,'$['+i+']'),'$.first_name') |
+---------------------------------------------------------+
| NULL                                                    |
+---------------------------------------------------------+
1 row in set (0.001 sec)
 
+---------------------------------------------------------+
| json_value(JSON_extract(val,'$['+i+']'),'$.first_name') |
+---------------------------------------------------------+
| NULL                                                    |
+---------------------------------------------------------+
1 row in set (0.001 sec)
 
Query OK, 0 rows affected, 3 warnings (0.001 sec)

Load JSON into MariaDB with Python

I came to conclusion that to have something clean I had to use a more advanced programming language than SQL, so decided to use Python (3.11.2) that I’m currently using a lot with Databricks. I have it on my laptop with Miniconda.

In my Miniconda I have created a dedicated environment where I have installed the official MariaDB Python connector:

$ conda-env list
$ conda create -n mariadb
$ conda activate mariadb
$ conda install pip
$ pip install mariadb
Collecting mariadb
  Downloading mariadb-1.1.6-cp311-cp311-win_amd64.whl (193 kB)
     ------------------------------------ 193.1/193.1 kB 615.8 kB/s eta 0:00:00
Collecting packaging
  Downloading packaging-23.0-py3-none-any.whl (42 kB)
     ---------------------------------------- 42.7/42.7 kB 1.0 MB/s eta 0:00:00
Installing collected packages: packaging, mariadb
Successfully installed mariadb-1.1.6 packaging-23.0

I have written below classic code using execute() function. Do not forget the commit() function at the end as autocommit is off by default and you see nothing if missing (real life experience):

import os, json, mariadb
 
# read JSON file which is in the next parent folder
file = os.path.abspath('./') + "/file01.json"
json_data=open(file).read()
# json_obj = json.dumps(json.loads(json_data))
json_obj = json.loads(json_data)
 
# Connect to MariaDB Platform
try:
  connection = mariadb.connect(
    user="yjaquier",
    password="secure_password",
    host="euls20092.sgp.st.com",
    port=3316,
    database="test"
  )
except mariadb.Error as e:
  print(f"Error connecting to MariaDB Platform: {e}")
  exit(1)
 
# Get Cursor
cursor = connection.cursor()
 
for json_item in json_obj:
  print(json.dumps(json_item))
  sql="INSERT INTO test01 (val) VALUES (?)"
  data=(json.dumps(json_item),)
  try:
    cursor.execute(sql, data)
  except mariadb.Error as e:
    print(f"Error inserting in MariaDB Platform: {e}")
 
# Commit
connection.commit()
# Close cursor and connection
cursor.close()
connection.close()

There is a so called more efficient version with executemany() function:

import os, json, mariadb
 
# read JSON file which is in the next parent folder
file = os.path.abspath('./') + "/file01.json"
json_data=open(file).read()
# json_obj = json.dumps(json.loads(json_data))
json_obj = json.loads(json_data)
 
# Connect to MariaDB Platform
try:
    connection = mariadb.connect(
        user="yjaquier",
        password="secure_password",
        host="euls20092.sgp.st.com",
        port=3316,
        database="test"
    )
except mariadb.Error as e:
    print(f"Error connecting to MariaDB Platform: {e}")
    exit(1)
 
# Get Cursor
cursor = connection.cursor()
 
data=[]
sql="INSERT INTO test01 (val) VALUES (?)"
 
for json_item in json_obj:
  print(json.dumps(json_item))
  data.append((json.dumps(json_item),))
 
try:
  cursor.executemany(sql, data)
except mariadb.Error as e:
  print(f"Error inserting in MariaDB Platform: {e}")
 
# Commit
connection.commit()
# Close cursor and connection
cursor.close()
connection.close()

I have now (as I initially expected) one row per JSON item of the array in my JSON file. And all JSON queries are now much more fluid:

MariaDB [test]> select * from test01;
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | val                                                                                                                                                    |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 | {"id": "6423f77eda3e4733f720bd62", "first_name": "Suarez", "last_name": "Hurst", "company": "Ohmnet", "email": "suarez.hurst@ohmnet.cymru"}            |
|  2 | {"id": "6423f77e05169d9c0a7e0f25", "first_name": "Dickson", "last_name": "Wolf", "company": "Ceprene", "email": "dickson.wolf@ceprene.sexy"}           |
|  3 | {"id": "6423f77e2d0484b76b85bd29", "first_name": "Cynthia", "last_name": "Rodriquez", "company": "Rubadub", "email": "cynthia.rodriquez@rubadub.guge"} |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.000 sec)
 
MariaDB [test]> select json_value(val,'$.first_name') from test01;
+--------------------------------+
| json_value(val,'$.first_name') |
+--------------------------------+
| Suarez                         |
| Dickson                        |
| Cynthia                        |
+--------------------------------+
3 rows in set (0.000 sec)

References

About Post Author

Share the knowledge!
This entry was posted in MariaDB and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>