Spark dynamic allocation how to configure and use it Posted on October 22, 2020, updated on October 13, 2020 by Yannick JaquierTable of contents Preamble Spark dynamic allocation setup Spark dynamic allocation testing References Continue reading
On the importance to have good Hive statistics on your tables Posted on March 23, 2020, updated on March 20, 2020 by Yannick JaquierTable of contents Preamble The problematic queries Problem has gone with good Hive statistics References Continue reading
Hive fetch task really improving response time by bypassing MapReduce ? Posted on November 24, 2019, updated on February 28, 2020 by Yannick JaquierTable of contents Preamble Identical queries not same response time Partitions statistics and concatenation Fetch task performing worst than MapReduce ? To go further References Continue reading
ORC versus Parquet compression and response time Posted on August 2, 2019, updated on February 20, 2020 by Yannick JaquierTable of contents Preamble ORC versus Parquet compression ORC versus Parquet response time References Continue reading
How to identify table fragmentation and remove it ? Posted on December 18, 2018, updated on October 16, 2019 by Yannick JaquierTable of contents Preamble Legacy situation Newest methods to estimate tables size Table fragmentation identification Move, shrink or export/import ? References Continue reading
How to non intrusively find index rebuild or shrink candidates ? Posted on November 23, 2018, updated on November 28, 2019 by Yannick JaquierTable of contents Preamble Legacy situation Newest methods to estimate indexes size Index rebuild candidates list To go further Rebuild or shrink ? References Continue reading