Hadoop backup: what parts to backup and how to do it ? Posted on September 27, 2019, updated on May 6, 2024 by Yannick JaquierTable of contents Preamble Mandatory parts to backup Configuration files Ambari server meta info NameNode metadata Ambari repository database Backup with Point In Time Recovery (PITR) capability Backup with no PITR capability Hive repository database Backup with Point In Time Recovery (PITR) capability Backup with no PITR capability Not mandatory parts to backup JournalNodes Parts nice to backup HDFS References Continue reading
HDFS capacity planning computation and analysis Posted on August 30, 2019, updated on April 22, 2024 by Yannick JaquierTable of contents Preamble HDFS capacity planning first estimation HDFS snapshot situation After delete of HDFS snapshot References Continue reading
ORC versus Parquet compression and response time Posted on August 2, 2019, updated on February 20, 2020 by Yannick JaquierTable of contents Preamble ORC versus Parquet compression ORC versus Parquet response time References Continue reading
HDFS balancer options to speed up balance operations Posted on July 5, 2019, updated on May 6, 2024 by Yannick JaquierTable of contents Preamble HDFS Balancer References Continue reading
JournalNode Web UI time out critical error on port 8480 Posted on June 7, 2019, updated on April 17, 2024 by Yannick JaquierTable of contents Preamble JournalNode Web UI time out resolution References Continue reading
YARN command line for low level management of applications Posted on May 10, 2019, updated on May 6, 2024 by Yannick JaquierTable of contents Preamble Problematic situation and YARN command line first trial YARN command line to the rescue References Continue reading