Table of contents
Preamble
I got contacted on how to load a JSON file into a MariaDB table. I initially thought it would be a 5 minutes job but at the end it was not that straightforward and I have not found many good blog post about it so this small article…
To generate a dummy JSON file to load I have used one of the first search result I found: JSON Generator – Tool for generating random data, Mockaroo is also a good fit. I used below code to generate what I wanted:
JG.repeat(3, 3, { id: JG.objectId(), first_name: JG.firstName(), last_name: JG.lastName(), company: JG.company(), email() { return ( _.snakeCase(this.first_name) + '.' + _.snakeCase(this.last_name) + '@' + this.company + JG.domainZone() ).toLowerCase(); }, }); |
This gave me this dummy JSON file to load:
[ { "id": "6423f77eda3e4733f720bd62", "first_name": "Suarez", "last_name": "Hurst", "company": "Ohmnet", "email": "suarez.hurst@ohmnet.cymru" }, { "id": "6423f77e05169d9c0a7e0f25", "first_name": "Dickson", "last_name": "Wolf", "company": "Ceprene", "email": "dickson.wolf@ceprene.sexy" }, { "id": "6423f77e2d0484b76b85bd29", "first_name": "Cynthia", "last_name": "Rodriquez", "company": "Rubadub", "email": "cynthia.rodriquez@rubadub.guge" } ] |
My (powerful) test server is made of 12 x86 cores and 64GB memory running Red Hat Enterprise Linux Server release 7.9 (Maipo) and MariaDB Community 10.6.12 where I have allocated 1GB to InnoDB.
Preparation
I used my MOCA filesystems layout and this below my.cnf:
[server] # Primary variables basedir = /mariadb/software/mariadb01 datadir = /mariadb/data01/mariadb01 max_allowed_packet = 256M max_connect_errors = 1000000 pid_file = /mariadb/software/mariadb01/conf/mariadb01.pid skip_external_locking skip_name_resolve log_bin = /mariadb/logs/mariadb01/mariadb-bin log_bin_trust_function_creators = on # Logging log_error = /mariadb/dump/mariadb01/mariadb01.err log_queries_not_using_indexes = ON long_query_time = 5 slow_query_log = ON # Disabled for production slow_query_log_file = /mariadb/dump/mariadb01/mariadb01-slow.log tmpdir = /tmp user = mariadb # InnoDB Settings default_storage_engine = InnoDB innodb_buffer_pool_size = 1G # Use up to 70-80% of RAM innodb_file_per_table = ON innodb_flush_log_at_trx_commit = 0 innodb_flush_method = O_DIRECT innodb_log_buffer_size = 16M innodb_log_file_size = 512M innodb_stats_on_metadata = ON innodb_read_io_threads = 64 innodb_write_io_threads = 64 # Query cache query_cache_size = 10M query_cache_type = ON [client-server] port = 3316 socket = /mariadb/software/mariadb01/conf/mariadb01.sock |
I have created below table to load my JSON file:
MariaDB [test]> create table test01 ( -> id int not null auto_increment, -> val json not null, -> primary key (id) -> ); Query OK, 0 rows affected (0.004 sec) |
MariaDB says JSON datatype is an alias for LOGNTEXT and I was thinking to add a check constraint to ensure I’m inserting valid JSON only but in fact this is done by default with JSON datatype:
MariaDB [test]> show create table test01; +--------+------------------------------------------------------------------------------------+ | Table | Create Table | +--------+------------------------------------------------------------------------------------+ | test01 | CREATE TABLE `test01` ( `id` int(11) NOT NULL AUTO_INCREMENT, `val` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL CHECK (json_valid(`val`)), PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci | +--------+------------------------------------------------------------------------------------+ 1 row in set (0.000 sec) |
If you want to reset your table for multiple trials do:
MariaDB [test]> truncate table test01; Query OK, 0 rows affected (0.006 sec) MariaDB [test]> alter table test01 auto_increment=1; Query OK, 0 rows affected (0.005 sec) Records: 0 Duplicates: 0 Warnings: 0 |
Load JSON into MariaDB only with SQL
My first idea was to be SQL only but obviously LOAD DATA INFILE cannot be used as this is more for CSV files and in my case a row is spread over multiple physical lines in my JSON file…
MariaDB [test]> load data local infile '/tmp/file01.json' into table test01(val); Query OK, 0 rows affected, 23 warnings (0.001 sec) Records: 0 Deleted: 0 Skipped: 0 Warnings: 23 MariaDB [test]> show warnings; +---------+------+----------------------------------------------------+ | Level | Code | Message | +---------+------+----------------------------------------------------+ | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | | Warning | 4025 | CONSTRAINT `test01.val` failed for `test`.`test01` | +---------+------+----------------------------------------------------+ 23 rows in set (0.000 sec) |
So tried with (make the JSON file readable by everyone):
MariaDB [test]> insert into test01(val) values(load_file('/tmp/file01.json')); Query OK, 1 row affected (0.003 sec) MariaDB [test]> select * from test01; +----+------------------------------------------------------------------------+ | id | val | +----+------------------------------------------------------------------------+ | 1 | [ { "id": "6423f77eda3e4733f720bd62", "first_name": "Suarez", "last_name": "Hurst", "company": "Ohmnet", "email": "suarez.hurst@ohmnet.cymru" }, { "id": "6423f77e05169d9c0a7e0f25", "first_name": "Dickson", "last_name": "Wolf", "company": "Ceprene", "email": "dickson.wolf@ceprene.sexy" }, { "id": "6423f77e2d0484b76b85bd29", "first_name": "Cynthia", "last_name": "Rodriquez", "company": "Rubadub", "email": "cynthia.rodriquez@rubadub.guge" } ] | +----+------------------------------------------------------------------------+ 1 row in set (0.000 sec) |
JSON is loaded but the result is not what I really wanted as now all queries are relatively complex:
MariaDB [test]> select json_valid(val) from test01; +-----------------+ | json_valid(val) | +-----------------+ | 1 | +-----------------+ 1 row in set (0.000 sec) MariaDB [test]> select json_length(val) from test01; +------------------+ | json_length(val) | +------------------+ | 3 | +------------------+ 1 row in set (0.000 sec) MariaDB [test]> select json_extract(val,'$[0]') from test01; +---------------------------------------------------------------------------------------------------------------------------------------------+ | json_extract(val,'$[0]') | +---------------------------------------------------------------------------------------------------------------------------------------------+ | {"id": "6423f77eda3e4733f720bd62", "first_name": "Suarez", "last_name": "Hurst", "company": "Ohmnet", "email": "suarez.hurst@ohmnet.cymru"} | +---------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.000 sec) MariaDB [test]> select json_value(json_extract(val,'$[0]'),'$.first_name') from test01; +-----------------------------------------------------+ | json_value(json_extract(val,'$[0]'),'$.first_name') | +-----------------------------------------------------+ | Suarez | +-----------------------------------------------------+ 1 row in set (0.000 sec) |
Also I have not been able to iterate in a procedural loop, below code is NOT working:
MariaDB [test]> delimiter // MariaDB [test]> for i in 0..2 -> do -> select json_value(json_extract(val,'$['+i+']'),'$.first_name') from test01; -> end for; -> // +---------------------------------------------------------+ | json_value(JSON_extract(val,'$['+i+']'),'$.first_name') | +---------------------------------------------------------+ | NULL | +---------------------------------------------------------+ 1 row in set (0.000 sec) +---------------------------------------------------------+ | json_value(JSON_extract(val,'$['+i+']'),'$.first_name') | +---------------------------------------------------------+ | NULL | +---------------------------------------------------------+ 1 row in set (0.001 sec) +---------------------------------------------------------+ | json_value(JSON_extract(val,'$['+i+']'),'$.first_name') | +---------------------------------------------------------+ | NULL | +---------------------------------------------------------+ 1 row in set (0.001 sec) Query OK, 0 rows affected, 3 warnings (0.001 sec) |
Load JSON into MariaDB with Python
I came to conclusion that to have something clean I had to use a more advanced programming language than SQL, so decided to use Python (3.11.2) that I’m currently using a lot with Databricks. I have it on my laptop with Miniconda.
In my Miniconda I have created a dedicated environment where I have installed the official MariaDB Python connector:
$ conda-env list $ conda create -n mariadb $ conda activate mariadb $ conda install pip $ pip install mariadb Collecting mariadb Downloading mariadb-1.1.6-cp311-cp311-win_amd64.whl (193 kB) ------------------------------------ 193.1/193.1 kB 615.8 kB/s eta 0:00:00 Collecting packaging Downloading packaging-23.0-py3-none-any.whl (42 kB) ---------------------------------------- 42.7/42.7 kB 1.0 MB/s eta 0:00:00 Installing collected packages: packaging, mariadb Successfully installed mariadb-1.1.6 packaging-23.0 |
I have written below classic code using execute() function. Do not forget the commit() function at the end as autocommit is off by default and you see nothing if missing (real life experience):
import os, json, mariadb # read JSON file which is in the next parent folder file = os.path.abspath('./') + "/file01.json" json_data=open(file).read() # json_obj = json.dumps(json.loads(json_data)) json_obj = json.loads(json_data) # Connect to MariaDB Platform try: connection = mariadb.connect( user="yjaquier", password="secure_password", host="euls20092.sgp.st.com", port=3316, database="test" ) except mariadb.Error as e: print(f"Error connecting to MariaDB Platform: {e}") exit(1) # Get Cursor cursor = connection.cursor() for json_item in json_obj: print(json.dumps(json_item)) sql="INSERT INTO test01 (val) VALUES (?)" data=(json.dumps(json_item),) try: cursor.execute(sql, data) except mariadb.Error as e: print(f"Error inserting in MariaDB Platform: {e}") # Commit connection.commit() # Close cursor and connection cursor.close() connection.close() |
There is a so called more efficient version with executemany() function:
import os, json, mariadb # read JSON file which is in the next parent folder file = os.path.abspath('./') + "/file01.json" json_data=open(file).read() # json_obj = json.dumps(json.loads(json_data)) json_obj = json.loads(json_data) # Connect to MariaDB Platform try: connection = mariadb.connect( user="yjaquier", password="secure_password", host="euls20092.sgp.st.com", port=3316, database="test" ) except mariadb.Error as e: print(f"Error connecting to MariaDB Platform: {e}") exit(1) # Get Cursor cursor = connection.cursor() data=[] sql="INSERT INTO test01 (val) VALUES (?)" for json_item in json_obj: print(json.dumps(json_item)) data.append((json.dumps(json_item),)) try: cursor.executemany(sql, data) except mariadb.Error as e: print(f"Error inserting in MariaDB Platform: {e}") # Commit connection.commit() # Close cursor and connection cursor.close() connection.close() |
I have now (as I initially expected) one row per JSON item of the array in my JSON file. And all JSON queries are now much more fluid:
MariaDB [test]> select * from test01; +----+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | val | +----+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | {"id": "6423f77eda3e4733f720bd62", "first_name": "Suarez", "last_name": "Hurst", "company": "Ohmnet", "email": "suarez.hurst@ohmnet.cymru"} | | 2 | {"id": "6423f77e05169d9c0a7e0f25", "first_name": "Dickson", "last_name": "Wolf", "company": "Ceprene", "email": "dickson.wolf@ceprene.sexy"} | | 3 | {"id": "6423f77e2d0484b76b85bd29", "first_name": "Cynthia", "last_name": "Rodriquez", "company": "Rubadub", "email": "cynthia.rodriquez@rubadub.guge"} | +----+--------------------------------------------------------------------------------------------------------------------------------------------------------+ 3 rows in set (0.000 sec) MariaDB [test]> select json_value(val,'$.first_name') from test01; +--------------------------------+ | json_value(val,'$.first_name') | +--------------------------------+ | Suarez | | Dickson | | Cynthia | +--------------------------------+ 3 rows in set (0.000 sec) |