Hive fetch task really improving response time by bypassing MapReduce ?

Preamble

Our internal customers have started to report erratic response time for similar queries in their Spotfire dashboard. They claimed the queries were almost the same but response time was ten times slower for some of them…

Isn’t it my first performance issue with Hadoop ? Yessss it is !!!!

We are running HDP-2.6 (2.6.4.0-91) and Hive release in this edition is 1.2.1000.

Identical queries not same response time

The first job was to extract the two “similar” queries from the Spotfire dashboard and compare them. I have written similar between quotes because what is similar from user perspective can be really different in real life. To execute them and find myself in the flow of all queries in log files I have used the same trick as for Oracle, means adding a comment with my first name.

Query01:

SELECT --Yannick01
lot_id,
wafer_id,
flow_id,
param_id,
start_t,
finish_t,
param_name,
param_unit,
param_low_limit,
param_high_limit,
nb_dies_tested,
nb_dies_failed
FROM prod_spotfire_refined.tbl_bin_stat_orc
WHERE fab = "C2WF"
AND lot_partition IN ("Q842")
AND lot_id IN ("Q842889")
AND wafer_id IN ('Q842889-01E3','Q842889-02D6','Q842889-03D1','Q842889-04C4','Q842889-05B7','Q842889-06B2','Q842889-07A5','Q842889-08A0','Q842889-09G6',
'Q842889-10A0','Q842889-11G6','Q842889-12G1','Q842889-13F4','Q842889-14E7','Q842889-15E2','Q842889-16D5','Q842889-17D0','Q842889-18C3','Q842889-19B6',
'Q842889-20C3','Q842889-21B6','Q842889-22B1','Q842889-23A4','Q842889-24H2')
AND flow_id IN ("EWS1")
AND start_t IN ('2019.01.04-19:50:55','2019.01.05-02:21:26','2019.01.05-08:33:59','2019.01.05-14:06:24','2019.01.05-19:35:23','2019.01.05-22:25:30',
'2019.01.06-03:49:53','2019.01.06-09:19:52','2019.01.06-14:23:47','2019.01.06-19:27:35','2019.01.07-00:47:59','2019.01.07-06:37:21','2019.01.07-11:15:14',
'2019.01.07-15:56:55','2019.01.07-20:05:50','2019.01.07-22:48:09','2019.01.08-04:37:37','2019.01.08-08:48:26','2019.01.08-13:51:34','2019.01.08-18:31:38',
'2019.01.09-00:01:41','2019.01.09-04:11:44','2019.01.09-09:45:08','2019.01.09-13:47:11')
AND finish_t IN ('2019.01.05-01:52:02','2019.01.05-08:30:52','2019.01.05-14:01:33','2019.01.05-19:32:20','2019.01.05-22:22:15','2019.01.06-03:46:46',
'2019.01.06-09:16:43','2019.01.06-14:20:42','2019.01.06-19:24:22','2019.01.07-00:44:48','2019.01.07-06:34:15','2019.01.07-11:12:02','2019.01.07-15:53:45',
'2019.01.07-20:02:41','2019.01.07-22:26:37','2019.01.08-04:34:30','2019.01.08-08:45:20','2019.01.08-13:48:27','2019.01.08-18:28:33','2019.01.08-23:58:34',
'2019.01.09-04:08:41','2019.01.09-09:41:57','2019.01.09-13:44:03','2019.01.09-18:01:54')
AND hbin_number IN ("9")
AND sbin_number IN ("403");

Query02:

SELECT --Yannick02
lot_id,
wafer_id,
flow_id,
param_id,
start_t,
finish_t,
param_name,
param_unit,
param_low_limit,
param_high_limit,
nb_dies_tested,
nb_dies_failed
FROM prod_spotfire_refined.tbl_bin_stat_orc
WHERE fab = "C2WF"
AND lot_partition IN ("Q840")
AND lot_id IN ("Q840401")
AND wafer_id IN ('Q840401-01E6','Q840401-02E1','Q840401-03D4','Q840401-04C7','Q840401-05C2','Q840401-06B5','Q840401-07B0','Q840401-08A3','Q840401-09H1',
'Q840401-10A3','Q840401-11H1','Q840401-12G4','Q840401-13F7','Q840401-14F2','Q840401-15E5','Q840401-16E0','Q840401-17D3','Q840401-18C6','Q840401-19C1',
'Q840401-20C6','Q840401-21C1','Q840401-22B4','Q840401-23A7','Q840401-24A2','Q840401-25H0')
AND flow_id IN ("EWS1")
AND start_t IN ('2018.12.27-10:42:54','2018.12.27-12:01:57','2018.12.27-13:18:47','2018.12.27-14:36:31','2018.12.27-15:55:57','2018.12.27-17:13:42',
'2018.12.27-18:31:34','2018.12.27-19:49:27','2018.12.27-21:05:30','2018.12.27-22:23:36','2018.12.27-23:40:11','2018.12.28-00:56:40','2018.12.28-02:15:51',
'2018.12.28-03:41:23','2018.12.28-04:58:02','2018.12.28-06:16:11','2018.12.28-07:34:40','2018.12.28-08:55:29','2018.12.28-10:13:25','2018.12.28-11:30:34',
'2018.12.28-12:48:08','2018.12.28-14:06:12','2018.12.28-15:23:00','2018.12.28-16:39:50','2018.12.28-17:56:57')
AND finish_t IN ('2018.12.27-12:00:40','2018.12.27-13:17:30','2018.12.27-14:35:13','2018.12.27-15:54:39','2018.12.27-17:12:26','2018.12.27-18:30:16',
'2018.12.27-19:48:12','2018.12.27-21:04:15','2018.12.27-22:22:19','2018.12.27-23:38:53','2018.12.28-00:55:24','2018.12.28-02:14:34','2018.12.28-03:32:41',
'2018.12.28-04:56:43','2018.12.28-06:14:52','2018.12.28-07:33:21','2018.12.28-08:54:12','2018.12.28-10:12:06','2018.12.28-11:29:17','2018.12.28-12:46:50',
'2018.12.28-14:04:55','2018.12.28-15:21:43','2018.12.28-16:38:30','2018.12.28-17:55:38','2018.12.28-19:13:18')
AND hbin_number IN ("1")
AND sbin_number IN ("1");

At this stage I have to say that queries are pretty similar and our users might be right as the response time should not be that different between the two queries…

Query02 returns 682 rows in around 330 seconds. Query01 returns 38,664 rows in around 25 seconds (16 seconds to really execute the query as you can see below, the rest is network transfert).

So we have a factor of 13 times less efficient to return 57 times less row

Partitions statistics and concatenation

I have started by checking that the two involved partitions are almost the same from statistics point of view and that compaction has been done for both of them.

From global statistics there is a difference but not that much. Number of rows and total partition size are really close. So close that it cannot explain the factor ten in response time:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q842");
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+
|             col_name              |                                                          data_type                                                          |           comment           |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+
| # col_name                        | data_type                                                                                                                   | comment                     |
|                                   | NULL                                                                                                                        | NULL                        |
| lot_id                            | string                                                                                                                      |                             |
| wafer_id                          | string                                                                                                                      |                             |
| flow_id                           | string                                                                                                                      |                             |
| start_t                           | string                                                                                                                      |                             |
| finish_t                          | string                                                                                                                      |                             |
| hbin_number                       | int                                                                                                                         |                             |
| hbin_name                         | string                                                                                                                      |                             |
| sbin_number                       | int                                                                                                                         |                             |
| sbin_name                         | string                                                                                                                      |                             |
| param_id                          | string                                                                                                                      |                             |
| param_name                        | string                                                                                                                      |                             |
| param_unit                        | string                                                                                                                      |                             |
| param_low_limit                   | float                                                                                                                       |                             |
| param_high_limit                  | float                                                                                                                       |                             |
| nb_dies_tested                    | int                                                                                                                         |                             |
| nb_dies_failed                    | int                                                                                                                         |                             |
| nb_dies_good                      | int                                                                                                                         |                             |
| ingestion_date                    | string                                                                                                                      |                             |
|                                   | NULL                                                                                                                        | NULL                        |
| # Partition Information           | NULL                                                                                                                        | NULL                        |
| # col_name                        | data_type                                                                                                                   | comment                     |
|                                   | NULL                                                                                                                        | NULL                        |
| fab                               | string                                                                                                                      |                             |
| lot_partition                     | string                                                                                                                      |                             |
|                                   | NULL                                                                                                                        | NULL                        |
| # Detailed Partition Information  | NULL                                                                                                                        | NULL                        |
| Partition Value:                  | [C2WF, Q842]                                                                                                                | NULL                        |
| Database:                         | prod_spotfire_refined                                                                                                       | NULL                        |
| Table:                            | tbl_bin_stat_orc                                                                                                            | NULL                        |
| CreateTime:                       | Mon Feb 11 13:03:15 CET 2019                                                                                                | NULL                        |
| LastAccessTime:                   | UNKNOWN                                                                                                                     | NULL                        |
| Protect Mode:                     | None                                                                                                                        | NULL                        |
| Location:                         | hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842  | NULL                        |
| Partition Parameters:             | NULL                                                                                                                        | NULL                        |
|                                   | COLUMN_STATS_ACCURATE                                                                                                       | {\"BASIC_STATS\":\"true\"}  |
|                                   | numFiles                                                                                                                    | 14                          |
|                                   | numRows                                                                                                                     | 143216514                   |
|                                   | rawDataSize                                                                                                                 | 151433505731                |
|                                   | totalSize                                                                                                                   | 1353827219                  |
|                                   | transient_lastDdlTime                                                                                                       | 1554163118                  |
|                                   | NULL                                                                                                                        | NULL                        |
| # Storage Information             | NULL                                                                                                                        | NULL                        |
| SerDe Library:                    | org.apache.hadoop.hive.ql.io.orc.OrcSerde                                                                                   | NULL                        |
| InputFormat:                      | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat                                                                             | NULL                        |
| OutputFormat:                     | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat                                                                            | NULL                        |
| Compressed:                       | No                                                                                                                          | NULL                        |
| Num Buckets:                      | -1                                                                                                                          | NULL                        |
| Bucket Columns:                   | []                                                                                                                          | NULL                        |
| Sort Columns:                     | []                                                                                                                          | NULL                        |
| Storage Desc Params:              | NULL                                                                                                                        | NULL                        |
|                                   | serialization.format                                                                                                        | 1                           |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q840");
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+
|             col_name              |                                                          data_type                                                          |           comment           |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+
| # col_name                        | data_type                                                                                                                   | comment                     |
|                                   | NULL                                                                                                                        | NULL                        |
| lot_id                            | string                                                                                                                      |                             |
| wafer_id                          | string                                                                                                                      |                             |
| flow_id                           | string                                                                                                                      |                             |
| start_t                           | string                                                                                                                      |                             |
| finish_t                          | string                                                                                                                      |                             |
| hbin_number                       | int                                                                                                                         |                             |
| hbin_name                         | string                                                                                                                      |                             |
| sbin_number                       | int                                                                                                                         |                             |
| sbin_name                         | string                                                                                                                      |                             |
| param_id                          | string                                                                                                                      |                             |
| param_name                        | string                                                                                                                      |                             |
| param_unit                        | string                                                                                                                      |                             |
| param_low_limit                   | float                                                                                                                       |                             |
| param_high_limit                  | float                                                                                                                       |                             |
| nb_dies_tested                    | int                                                                                                                         |                             |
| nb_dies_failed                    | int                                                                                                                         |                             |
| nb_dies_good                      | int                                                                                                                         |                             |
| ingestion_date                    | string                                                                                                                      |                             |
|                                   | NULL                                                                                                                        | NULL                        |
| # Partition Information           | NULL                                                                                                                        | NULL                        |
| # col_name                        | data_type                                                                                                                   | comment                     |
|                                   | NULL                                                                                                                        | NULL                        |
| fab                               | string                                                                                                                      |                             |
| lot_partition                     | string                                                                                                                      |                             |
|                                   | NULL                                                                                                                        | NULL                        |
| # Detailed Partition Information  | NULL                                                                                                                        | NULL                        |
| Partition Value:                  | [C2WF, Q840]                                                                                                                | NULL                        |
| Database:                         | prod_spotfire_refined                                                                                                       | NULL                        |
| Table:                            | tbl_bin_stat_orc                                                                                                            | NULL                        |
| CreateTime:                       | Mon Feb 11 13:02:12 CET 2019                                                                                                | NULL                        |
| LastAccessTime:                   | UNKNOWN                                                                                                                     | NULL                        |
| Protect Mode:                     | None                                                                                                                        | NULL                        |
| Location:                         | hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840  | NULL                        |
| Partition Parameters:             | NULL                                                                                                                        | NULL                        |
|                                   | COLUMN_STATS_ACCURATE                                                                                                       | {\"BASIC_STATS\":\"true\"}  |
|                                   | numFiles                                                                                                                    | 11                          |
|                                   | numRows                                                                                                                     | 109795564                   |
|                                   | rawDataSize                                                                                                                 | 116612625917                |
|                                   | totalSize                                                                                                                   | 989787753                   |
|                                   | transient_lastDdlTime                                                                                                       | 1554163118                  |
|                                   | NULL                                                                                                                        | NULL                        |
| # Storage Information             | NULL                                                                                                                        | NULL                        |
| SerDe Library:                    | org.apache.hadoop.hive.ql.io.orc.OrcSerde                                                                                   | NULL                        |
| InputFormat:                      | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat                                                                             | NULL                        |
| OutputFormat:                     | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat                                                                            | NULL                        |
| Compressed:                       | No                                                                                                                          | NULL                        |
| Num Buckets:                      | -1                                                                                                                          | NULL                        |
| Bucket Columns:                   | []                                                                                                                          | NULL                        |
| Sort Columns:                     | []                                                                                                                          | NULL                        |
| Storage Desc Params:              | NULL                                                                                                                        | NULL                        |
|                                   | serialization.format                                                                                                        | 1                           |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+

Even if a bit boring and long I have also checked the columns statistics, again not much differences:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc param_id partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| param_id                | string                |                       |                       | 0                     | 92393                 | 5.7145                | 10                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.445 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc param_id partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| param_id                | string                |                       |                       | 0                     | 169450                | 5.6886                | 10                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.457 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc lot_id partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| lot_id                  | string                |                       |                       | 0                     | 290                   | 7.1705                | 10                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.407 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc lot_id partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| lot_id                  | string                |                       |                       | 0                     | 172                   | 7.071                 | 10                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.416 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc wafer_id partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| wafer_id                | string                |                       |                       | 0                     | 3744                  | 11.9485               | 12                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.444 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc wafer_id partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| wafer_id                | string                |                       |                       | 0                     | 6867                  | 11.8972               | 12                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.398 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc flow_id partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| flow_id                 | string                |                       |                       | 0                     | 18                    | 4.1747                | 30                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.423 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc flow_id partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| flow_id                 | string                |                       |                       | 0                     | 37                    | 4.3368                | 31                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.448 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc start_t partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| start_t                 | string                |                       |                       | 0                     | 17811                 | 19.0                  | 19                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.425 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc start_t partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| start_t                 | string                |                       |                       | 0                     | 11549                 | 19.0                  | 19                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.408 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc finish_t partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| finish_t                | string                |                       |                       | 0                     | 11059                 | 19.0                  | 19                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.398 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc finish_t partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| finish_t                | string                |                       |                       | 0                     | 12060                 | 19.0                  | 19                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.382 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc hbin_number partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| hbin_number             | int                   | 0                     | 65535                 | 0                     | 61                    |                       |                       |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.356 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc hbin_number partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| hbin_number             | int                   | 0                     | 65535                 | 0                     | 58                    |                       |                       |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.364 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc sbin_number partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| sbin_number             | int                   | 1                     | 65535                 | 0                     | 3288                  |                       |                       |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.417 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc sbin_number partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| sbin_number             | int                   | 0                     | 65535                 | 0                     | 3148                  |                       |                       |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.344 seconds)

The global table and statistics information can be get in a more condensed way using:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe extended prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q840");
 
| Detailed Partition Information  | Partition(values:[C2WF, Q840], dbName:prod_spotfire_refined, tableName:tbl_bin_stat_orc, createTime:1549886532, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:lot_id, type:string, comment:null), FieldSchema(name:wafer_id, type:string, comment:null), FieldSchema(name:flow_id, type:string, comment:null), FieldSchema(name:start_t, type:string, comment:null), FieldSchema(name:finish_t, type:string, comment:null), FieldSchema(name:hbin_number, type:int, comment:null), FieldSchema(name:hbin_name, type:string, comment:null), FieldSchema(name:sbin_number, type:int, comment:null), FieldSchema(name:sbin_name, type:string, comment:null), FieldSchema(name:param_id, type:string, comment:null), FieldSchema(name:param_name, type:string, comment:null), FieldSchema(name:param_unit, type:string, comment:null), FieldSchema(name:param_low_limit, type:float, comment:null), FieldSchema(name:param_high_limit, type:float, comment:null), FieldSchema(name:nb_dies_tested, type:int, comment:null), FieldSchema(name:nb_dies_failed, type:int, comment:null), FieldSchema(name:nb_dies_good, type:int, comment:null), FieldSchema(name:ingestion_date, type:string, comment:null), FieldSchema(name:fab, type:string, comment:null), FieldSchema(name:lot_partition, type:string, comment:null)], location:hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:{totalSize=989787753,  numRows=109795564, rawDataSize=116612625917, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=11, transient_lastDdlTime=1554163118})  |                       |
 
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe extended prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q842");
 
| Detailed Partition Information  | Partition(values:[C2WF, Q842], dbName:prod_spotfire_refined, tableName:tbl_bin_stat_orc, createTime:1549886595, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:lot_id, type:string, comment:null), FieldSchema(name:wafer_id, type:string, comment:null), FieldSchema(name:flow_id, type:string, comment:null), FieldSchema(name:start_t, type:string, comment:null), FieldSchema(name:finish_t, type:string, comment:null), FieldSchema(name:hbin_number, type:int, comment:null), FieldSchema(name:hbin_name, type:string, comment:null), FieldSchema(name:sbin_number, type:int, comment:null), FieldSchema(name:sbin_name, type:string, comment:null), FieldSchema(name:param_id, type:string, comment:null), FieldSchema(name:param_name, type:string, comment:null), FieldSchema(name:param_unit, type:string, comment:null), FieldSchema(name:param_low_limit, type:float, comment:null), FieldSchema(name:param_high_limit, type:float, comment:null), FieldSchema(name:nb_dies_tested, type:int, comment:null), FieldSchema(name:nb_dies_failed, type:int, comment:null), FieldSchema(name:nb_dies_good, type:int, comment:null), FieldSchema(name:ingestion_date, type:string, comment:null), FieldSchema(name:fab, type:string, comment:null), FieldSchema(name:lot_partition, type:string, comment:null)], location:hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:{totalSize=1353827219, numRows=143216514, rawDataSize=151433505731, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=14, transient_lastDdlTime=1554163118})  |

Same as you would do on a traditional RDBMS I have extracted the explain plans of two queries:

explain --extended
select --Yannick01
lot_id,
wafer_id,
flow_id,
param_id,
start_t,
finish_t,
param_name,
param_unit,
param_low_limit,
param_high_limit,
nb_dies_tested,
nb_dies_failed
from prod_spotfire_refined.tbl_bin_stat_orc
where fab = "C2WF"
and lot_partition in ("Q842")
and lot_id in ("Q842889")
and wafer_id in ('Q842889-01E3','Q842889-02D6','Q842889-03D1','Q842889-04C4','Q842889-05B7','Q842889-06B2','Q842889-07A5','Q842889-08A0','Q842889-09G6','Q842889-10A0','Q842889-11G6','Q842889-12G1','Q842889-13F4','Q842889-14E7','Q842889-15E2','Q842889-16D5','Q842889-17D0','Q842889-18C3','Q842889-19B6','Q842889-20C3','Q842889-21B6','Q842889-22B1','Q842889-23A4','Q842889-24H2')
and flow_id in ("EWS1")
and start_t in ('2019.01.04-19:50:55','2019.01.05-02:21:26','2019.01.05-08:33:59','2019.01.05-14:06:24','2019.01.05-19:35:23','2019.01.05-22:25:30','2019.01.06-03:49:53','2019.01.06-09:19:52','2019.01.06-14:23:47','2019.01.06-19:27:35','2019.01.07-00:47:59','2019.01.07-06:37:21','2019.01.07-11:15:14','2019.01.07-15:56:55','2019.01.07-20:05:50','2019.01.07-22:48:09','2019.01.08-04:37:37','2019.01.08-08:48:26','2019.01.08-13:51:34','2019.01.08-18:31:38','2019.01.09-00:01:41','2019.01.09-04:11:44','2019.01.09-09:45:08','2019.01.09-13:47:11')
and finish_t in ('2019.01.05-01:52:02','2019.01.05-08:30:52','2019.01.05-14:01:33','2019.01.05-19:32:20','2019.01.05-22:22:15','2019.01.06-03:46:46','2019.01.06-09:16:43','2019.01.06-14:20:42','2019.01.06-19:24:22','2019.01.07-00:44:48','2019.01.07-06:34:15','2019.01.07-11:12:02','2019.01.07-15:53:45','2019.01.07-20:02:41','2019.01.07-22:26:37','2019.01.08-04:34:30','2019.01.08-08:45:20','2019.01.08-13:48:27','2019.01.08-18:28:33','2019.01.08-23:58:34','2019.01.09-04:08:41','2019.01.09-09:41:57','2019.01.09-13:44:03','2019.01.09-18:01:54')
and hbin_number in ("9")
and sbin_number in ("403");
 
Explain                                                                                                                                                                                                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Plan not optimized by CBO.                                                                                                                                                                                                                                      
 
Stage-0                                                                                                                                                                                                                                                         
   Fetch Operator                                                                                                                                                                                                                                               
      limit:-1                                                                                                                                                                                                                                                  
      Stage-1                                                                                                                                                                                                                                                   
         Map 1                                                                                                                                                                                                                                                  
         File Output Operator [FS_2725118]                                                                                                                                                                                                                      
            compressed:false                                                                                                                                                                                                                                    
            Statistics:Num rows: 1 Data size: 780 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                  
            table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}                                      
            Select Operator [SEL_2725117]                                                                                                                                                                                                                       
               outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]                                                                                                                            
               Statistics:Num rows: 1 Data size: 780 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                               
               Filter Operator [FIL_2725119]                                                                                                                                                                                                                    
                  predicate:((lot_partition) IN ('Q842') and (lot_id) IN ('Q842889') and (wafer_id) IN ('Q842889-01E3', 'Q842889-02D6', 'Q842889-03D1', 'Q842889-04C4', 'Q842889-05B7', 'Q842889-06B2', 'Q842889-07A5', 'Q842889-08A0', 'Q842889-09G6', 'Q842889-10A0', 'Q842889-11G6', 'Q842889-12G1', 'Q842889-13F4', 'Q842889-14E7', 'Q842889-15E2', 'Q842889-16D5', 'Q842889-17D0', 'Q842889-18C3', 'Q842889-19B6', 'Q842889-20C3', 'Q842889-21B6', 'Q842889-22B1', 'Q842889-23A4', 'Q842889-24H2') and (flow_id) IN ('EWS1') and (start_t) IN ('2019.01.04-19:50:55', '2019.01.05-02:21:26', '2019.01.05-08:33:59', '2019.01.05-14:06:24', '2019.01.05-19:35:23', '2019.01.05-22:25:30', '2019.01.06-03:49:53', '2019.01.06-09:19:52', '2019.01.06-14:23:47', '2019.01.06-19:27:35', '2019.01.07-00:47:59', '2019.01.07-06:37:21', '2019.01.07-11:15:14', '2019.01.07-15:56:55', '2019.01.07-20:05:50', '2019.01.07-22:48:09', '2019.01.08-04:37:37', '2019.01.08-08:48:26', '2019.01.08-13:51:34', '2019.01.08-18:31:38', '2019.01.09-00:01:41', '2019.01.09-04:11:44', '2019.01.09-09:45:08', '2019.01.09-13:47:11') and (finish_t) IN ('2019.01.05-01:52:02', '2019.01.05-08:30:52', '2019.01.05-14:01:33', '2019.01.05-19:32:20', '2019.01.05-22:22:15', '2019.01.06-03:46:46', '2019.01.06-09:16:43', '2019.01.06-14:20:42', '2019.01.06-19:24:22', '2019.01.07-00:44:48', '2019.01.07-06:34:15', '2019.01.07-11:12:02', '2019.01.07-15:53:45', '2019.01.07-20:02:41', '2019.01.07-22:26:37', '2019.01.08-04:34:30', '2019.01.08-08:45:20', '2019.01.08-13:48:27', '2019.01.08-18:28:33', '2019.01.08-23:58:34', '2019.01.09-04:08:41', '2019.01.09-09:41:57', '2019.01.09-13:44:03', '2019.01.09-18:01:54') and (hbin_number) IN ('9') and (sbin_number) IN ('403')) (type: boolean) 
                  Statistics:Num rows: 1 Data size: 972 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                            
                  TableScan [TS_2725115]                                                                                                                                                                                                                        
                     alias:tbl_bin_stat_orc                                                                                                                                                                                                                     
                     Statistics:Num rows: 143216514 Data size: 151433505731 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                        
 
 
21 rows selected.

Remark:
I also tried the EXTENDED mode of EXPAIN PLAN command (only available option of my Hive release, but plenty other are available in latest Hive releases) that, obviously, provide a more verbose output. Maybe too much verbose but from it I have seen an interesting additional information that tell you that your query is effectively using partition pruning and access only to the expected partitions:

partition values:                                                                                                                                                                                                                                   
  fab C2WF                                                                                                                                                                                                                                          
  lot_partition Q840
explain --extended
select --Yannick02
lot_id,
wafer_id,
flow_id,
param_id,
start_t,
finish_t,
param_name,
param_unit,
param_low_limit,
param_high_limit,
nb_dies_tested,
nb_dies_failed
from prod_spotfire_refined.tbl_bin_stat_orc
where fab = "C2WF"
and lot_partition in ("Q840")
and lot_id in ("Q840401")
and wafer_id in ('Q840401-01E6','Q840401-02E1','Q840401-03D4','Q840401-04C7','Q840401-05C2','Q840401-06B5','Q840401-07B0','Q840401-08A3','Q840401-09H1','Q840401-10A3','Q840401-11H1','Q840401-12G4','Q840401-13F7','Q840401-14F2','Q840401-15E5','Q840401-16E0','Q840401-17D3','Q840401-18C6','Q840401-19C1','Q840401-20C6','Q840401-21C1','Q840401-22B4','Q840401-23A7','Q840401-24A2','Q840401-25H0')
and flow_id in ("EWS1")
and start_t in ('2018.12.27-10:42:54','2018.12.27-12:01:57','2018.12.27-13:18:47','2018.12.27-14:36:31','2018.12.27-15:55:57','2018.12.27-17:13:42','2018.12.27-18:31:34','2018.12.27-19:49:27','2018.12.27-21:05:30','2018.12.27-22:23:36','2018.12.27-23:40:11','2018.12.28-00:56:40','2018.12.28-02:15:51','2018.12.28-03:41:23','2018.12.28-04:58:02','2018.12.28-06:16:11','2018.12.28-07:34:40','2018.12.28-08:55:29','2018.12.28-10:13:25','2018.12.28-11:30:34','2018.12.28-12:48:08','2018.12.28-14:06:12','2018.12.28-15:23:00','2018.12.28-16:39:50','2018.12.28-17:56:57')
and finish_t in ('2018.12.27-12:00:40','2018.12.27-13:17:30','2018.12.27-14:35:13','2018.12.27-15:54:39','2018.12.27-17:12:26','2018.12.27-18:30:16','2018.12.27-19:48:12','2018.12.27-21:04:15','2018.12.27-22:22:19','2018.12.27-23:38:53','2018.12.28-00:55:24','2018.12.28-02:14:34','2018.12.28-03:32:41','2018.12.28-04:56:43','2018.12.28-06:14:52','2018.12.28-07:33:21','2018.12.28-08:54:12','2018.12.28-10:12:06','2018.12.28-11:29:17','2018.12.28-12:46:50','2018.12.28-14:04:55','2018.12.28-15:21:43','2018.12.28-16:38:30','2018.12.28-17:55:38','2018.12.28-19:13:18')
and hbin_number in ("1")
and sbin_number in ("1");
 
 
Explain                                                                                                                                                                                                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Plan not optimized by CBO.                                                                                                                                                                                                                                      
 
Stage-0                                                                                                                                                                                                                                                         
   Fetch Operator                                                                                                                                                                                                                                               
      limit:-1                                                                                                                                                                                                                                                  
      Select Operator [SEL_2725124]                                                                                                                                                                                                                             
         outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]                                                                                                                                  
         Filter Operator [FIL_2725126]                                                                                                                                                                                                                          
            predicate:((lot_partition) IN ('Q840') and (lot_id) IN ('Q840401') and (wafer_id) IN ('Q840401-01E6', 'Q840401-02E1', 'Q840401-03D4', 'Q840401-04C7', 'Q840401-05C2', 'Q840401-06B5', 'Q840401-07B0', 'Q840401-08A3', 'Q840401-09H1', 'Q840401-10A3', 'Q840401-11H1', 'Q840401-12G4', 'Q840401-13F7', 'Q840401-14F2', 'Q840401-15E5', 'Q840401-16E0', 'Q840401-17D3', 'Q840401-18C6', 'Q840401-19C1', 'Q840401-20C6', 'Q840401-21C1', 'Q840401-22B4', 'Q840401-23A7', 'Q840401-24A2', 'Q840401-25H0') and (flow_id) IN ('EWS1') and (start_t) IN ('2018.12.27-10:42:54', '2018.12.27-12:01:57', '2018.12.27-13:18:47', '2018.12.27-14:36:31', '2018.12.27-15:55:57', '2018.12.27-17:13:42', '2018.12.27-18:31:34', '2018.12.27-19:49:27', '2018.12.27-21:05:30', '2018.12.27-22:23:36', '2018.12.27-23:40:11', '2018.12.28-00:56:40', '2018.12.28-02:15:51', '2018.12.28-03:41:23', '2018.12.28-04:58:02', '2018.12.28-06:16:11', '2018.12.28-07:34:40', '2018.12.28-08:55:29', '2018.12.28-10:13:25', '2018.12.28-11:30:34', '2018.12.28-12:48:08', '2018.12.28-14:06:12', '2018.12.28-15:23:00', '2018.12.28-16:39:50', '2018.12.28-17:56:57') and (finish_t) IN ('2018.12.27-12:00:40', '2018.12.27-13:17:30', '2018.12.27-14:35:13', '2018.12.27-15:54:39', '2018.12.27-17:12:26', '2018.12.27-18:30:16', '2018.12.27-19:48:12', '2018.12.27-21:04:15', '2018.12.27-22:22:19', '2018.12.27-23:38:53', '2018.12.28-00:55:24', '2018.12.28-02:14:34', '2018.12.28-03:32:41', '2018.12.28-04:56:43', '2018.12.28-06:14:52', '2018.12.28-07:33:21', '2018.12.28-08:54:12', '2018.12.28-10:12:06', '2018.12.28-11:29:17', '2018.12.28-12:46:50', '2018.12.28-14:04:55', '2018.12.28-15:21:43', '2018.12.28-16:38:30', '2018.12.28-17:55:38', '2018.12.28-19:13:18') and (hbin_number) IN ('1') and (sbin_number) IN ('1')) (type: boolean) 
            TableScan [TS_2725122]                                                                                                                                                                                                                              
               alias:tbl_bin_stat_orc                                                                                                                                                                                                                           
 
 
12 rows selected.

We can anyway notice that the explain plan of Query02 is not displaying any statistics even if they are there in the table. I would have expected in clear text something like “fetch task”, let see why…

When executing the queries, the first observation I have made is that query02 does not do a MapReduce job (using TZ engine as configured by default on our cluster) but a direct HDFS access. You can see that the query is NOT doing a MapReduce in Beeline when just after having submitted your query you don’t see the usual graphical display of number of Map and Reduce jobs but directly the query result…

Query01 performs a standard MapReduce job:

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      9          9        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 15.92 s
--------------------------------------------------------------------------------

We have seen above that there is a factor of 13 times less efficient to return 57 times less row. so looks like the direct HDFS access called a fetch task is not that optimal…

Remark:
The “Plan not optimized by CBO” message might look scary so decided to dig a bit on it. First you have to be on your Hive server and check /var/log/hive/hiveserver2.log file (Hadoop parameter is hive_log_dir). Now you understand why I have added stupid comment in my query (–Yannick01 and –Yannick02), as this was a trick I was using with Oracle to find myself in V$SQL. Here opening this huge file you can search for your name and below the text of your query you should be able to see something like:

2019-04-04 12:06:12,633 INFO  [HiveServer2-Handler-Pool: Thread-15548880]: parse.BaseSemanticAnalyzer (CalcitePlanner.java:canCBOHandleAst(405)) - Not invoking CBO because the statement has too few joins

I have also checked the partitions sizes as well as number of ORC files in each partition (compaction):

hdfs@client_node:~$ hdfs dfs -ls -r -t /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842
Found 14 items
-rwxrwxrwx   3 mfgdl_ingestion hadoop   10185787 2019-03-30 17:16 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000006_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    4240294 2019-03-30 17:16 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000007_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop   10696310 2019-03-30 17:16 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000005_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  268531575 2019-03-30 17:16 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000001_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  263145620 2019-03-30 17:16 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000004_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  257281211 2019-03-30 17:28 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000003_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  275941797 2019-03-30 17:34 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000000_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  252093563 2019-03-30 17:40 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000002_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    2127431 2019-03-30 22:38 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000018_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    2144463 2019-03-31 10:21 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000018_0_copy_1
-rwxrwxrwx   3 mfgdl_ingestion hadoop    2833270 2019-03-31 23:38 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000015_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    1563968 2019-04-01 10:50 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000021_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    1418099 2019-04-01 17:31 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000021_0_copy_1
-rwxrwxrwx   3 mfgdl_ingestion hadoop    1623831 2019-04-02 01:58 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000017_0
 
hdfs@client_node:~$ hdfs dfs -du -h -s /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842
1.3 G  /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842
 
hdfs@client_node:~$ hdfs dfs -ls -r -t /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840
Found 12 items
-rwxrwxrwx   3 mfgdl_ingestion hadoop  298096825 2019-03-30 17:01 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000000_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    7651387 2019-03-30 17:01 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000004_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  262040206 2019-03-30 17:02 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000001_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  177619545 2019-03-30 17:02 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000003_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  241590172 2019-03-30 17:13 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000002_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop     588415 2019-03-30 22:38 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000035_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop     466219 2019-03-31 10:21 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000037_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop     496742 2019-03-31 23:38 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000038_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop     463094 2019-04-01 10:50 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000037_0_copy_1
-rwxrwxrwx   3 mfgdl_ingestion hadoop     362538 2019-04-01 17:31 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000041_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop     412610 2019-04-02 01:58 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000038_0_copy_1
drwxrwxrwx   - mfgaewsp        hadoop          0 2019-04-03 16:12 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/.hive-staging_hive_2019-04-03_16-12-30_095_3617150963696131727-133149
 
hdfs@client_node:~$ hdfs dfs -du -h -s /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840
943.9 M  /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840

Again no particular big differences in size and number of files for two partitions…

To speed up long query I have tried to compact the partition using:

ALTER TABLE prod_spotfire_refined.tbl_bin_stat_orc PARTITION(fab = "C2WF", lot_partition="Q840") CONCATENATE;

Reaching this status:

hdfs@client_node:~$ hdfs dfs -ls -r -t /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840
Found 5 items
-rwxrwxrwx   3 mfgaewsp hadoop    8702924 2019-04-03 16:12 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000004_0
-rwxrwxrwx   3 mfgaewsp hadoop  266770170 2019-04-03 16:12 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000001_0
-rwxrwxrwx   3 mfgaewsp hadoop  178527568 2019-04-03 16:12 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000003_0
-rwxrwxrwx   3 mfgaewsp hadoop  242418393 2019-04-03 16:13 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000002_0
-rwxrwxrwx   3 mfgaewsp hadoop  291710785 2019-04-03 16:13 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000000_0

Then we discovered that concatenate command is destroying the existing statistics so we had to recompute them again !

ANALYZE TABLE prod_spotfire_refined.tbl_bin_stat_orc PARTITION(fab = "C2WF", lot_partition="Q840") COMPUTE STATISTICS FOR COLUMNS;

Remark:
We have been able to use the FOR COLUMNS without specifying the columns list because our table does not contains any complex type like Array.

Statistics went back to initial situation:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q840");
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
|             col_name              |                                                          data_type                                                          |                                                                                                                                                                                                                                    comment                                                                                                                                                                                                                                     |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| # col_name                        | data_type                                                                                                                   | comment                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                   | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| lot_id                            | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| wafer_id                          | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| flow_id                           | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| start_t                           | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| finish_t                          | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| hbin_number                       | int                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| hbin_name                         | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| sbin_number                       | int                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| sbin_name                         | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| param_id                          | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| param_name                        | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| param_unit                        | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| param_low_limit                   | float                                                                                                                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| param_high_limit                  | float                                                                                                                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| nb_dies_tested                    | int                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| nb_dies_failed                    | int                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| nb_dies_good                      | int                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| ingestion_date                    | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                   | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| # Partition Information           | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| # col_name                        | data_type                                                                                                                   | comment                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                   | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| fab                               | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| lot_partition                     | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                   | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| # Detailed Partition Information  | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Partition Value:                  | [C2WF, Q840]                                                                                                                | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Database:                         | prod_spotfire_refined                                                                                                       | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Table:                            | tbl_bin_stat_orc                                                                                                            | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| CreateTime:                       | Mon Feb 11 13:02:12 CET 2019                                                                                                | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| LastAccessTime:                   | UNKNOWN                                                                                                                     | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Protect Mode:                     | None                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Location:                         | hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840  | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Partition Parameters:             | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|                                   | COLUMN_STATS_ACCURATE                                                                                                       | {\"COLUMN_STATS\":{\"lot_id\":\"true\",\"wafer_id\":\"true\",\"flow_id\":\"true\",\"start_t\":\"true\",\"finish_t\":\"true\",\"hbin_number\":\"true\",\"hbin_name\":\"true\",\"sbin_number\":\"true\",\"sbin_name\":\"true\",\"param_id\":\"true\",\"param_name\":\"true\",\"param_unit\":\"true\",\"param_low_limit\":\"true\",\"param_high_limit\":\"true\",\"nb_dies_tested\":\"true\",\"nb_dies_failed\":\"true\",\"nb_dies_good\":\"true\",\"ingestion_date\":\"true\"}}  |
|                                   | numFiles                                                                                                                    | 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|                                   | numRows                                                                                                                     | 109795564                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                                   | rawDataSize                                                                                                                 | 116612625917                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|                                   | totalSize                                                                                                                   | 988129840                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                                   | transient_lastDdlTime                                                                                                       | 1554300803                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|                                   | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| # Storage Information             | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| SerDe Library:                    | org.apache.hadoop.hive.ql.io.orc.OrcSerde                                                                                   | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| InputFormat:                      | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat                                                                             | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| OutputFormat:                     | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat                                                                            | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Compressed:                       | No                                                                                                                          | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Num Buckets:                      | -1                                                                                                                          | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Bucket Columns:                   | []                                                                                                                          | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Sort Columns:                     | []                                                                                                                          | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Storage Desc Params:              | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|                                   | serialization.format                                                                                                        | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
describe extended prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q840");
 
| Detailed Partition Information  | Partition(values:[C2WF, Q840], dbName:prod_spotfire_refined, tableName:tbl_bin_stat_orc, createTime:1549886532, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:lot_id, type:string, comment:null), FieldSchema(name:wafer_id, type:string, comment:null), FieldSchema(name:flow_id, type:string, comment:null), FieldSchema(name:start_t, type:string, comment:null), FieldSchema(name:finish_t, type:string, comment:null), FieldSchema(name:hbin_number, type:int, comment:null), FieldSchema(name:hbin_name, type:string, comment:null), FieldSchema(name:sbin_number, type:int, comment:null), FieldSchema(name:sbin_name, type:string, comment:null), FieldSchema(name:param_id, type:string, comment:null), FieldSchema(name:param_name, type:string, comment:null), FieldSchema(name:param_unit, type:string, comment:null), FieldSchema(name:param_low_limit, type:float, comment:null), FieldSchema(name:param_high_limit, type:float, comment:null), FieldSchema(name:nb_dies_tested, type:int, comment:null), FieldSchema(name:nb_dies_failed, type:int, comment:null), FieldSchema(name:nb_dies_good, type:int, comment:null), FieldSchema(name:ingestion_date, type:string, comment:null), FieldSchema(name:fab, type:string, comment:null), FieldSchema(name:lot_partition, type:string, comment:null)], location:hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:{totalSize=988129840, numRows=109795564, rawDataSize=116612625917, COLUMN_STATS_ACCURATE={"COLUMN_STATS":{"lot_id":"true","wafer_id":"true","flow_id":"true","start_t":"true","finish_t":"true","hbin_number":"true","hbin_name":"true","sbin_number":"true","sbin_name":"true","param_id":"true","param_name":"true","param_unit":"true","param_low_limit":"true","param_high_limit":"true","nb_dies_tested":"true","nb_dies_failed":"true","nb_dies_good":"true","ingestion_date":"true"}}, numFiles=5, transient_lastDdlTime=1554300803})  |

Remark:
Note that the ALL COLUMNS change the COLUMN_STATS_ACCURATE value that has moved from {\”BASIC_STATS\”:\”true\”} to {\”COLUMN_STATS\”: xxx}.

But it did not change the poor response type of Query02….

Fetch task performing worst than MapReduce ?

From the official documentation and many web sites around they say that simple queries can be executed as fetch task instead of traditional Map Reduce to minimize latency. Fetch tasks are direct HDFS access using “hdfs dfs –get” or “hdfs dfs –cat” commands son on the paper the overhead of creation MAp and Reduce jobs is gone. On simple queries means single sourced query not having any subquery and not having any aggregations, distincts, lateral views and joins.

You can change parameters only for your session using:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion;
+----------------------------------+--+
|               set                |
+----------------------------------+--+
| hive.fetch.task.conversion=more  |
+----------------------------------+--+
1 row selected (0.007 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion.threshold;
+--------------------------------------------------+--+
|                       set                        |
+--------------------------------------------------+--+
| hive.fetch.task.conversion.threshold=1073741824  |
+--------------------------------------------------+--+
1 row selected (0.007 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion.threshold=524288000;
No rows affected (0.003 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion.threshold;
+-------------------------------------------------+--+
|                       set                       |
+-------------------------------------------------+--+
| hive.fetch.task.conversion.threshold=524288000  |
+-------------------------------------------------+--+
1 row selected (0.006 seconds)
 
 
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion=none;
No rows affected (0.004 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion;
+----------------------------------+--+
|               set                |
+----------------------------------+--+
| hive.fetch.task.conversion=none  |
+----------------------------------+--+

After changing hive.fetch.task.conversion to none, means disabling Hive fetch task, the query move to a traditional MapReduce job:

INFO  : Status: Running (Executing on YARN cluster with App id application_1551777498072_117912)
 
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED     14         14        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 20.19 s
--------------------------------------------------------------------------------

And more importantly query execution time moved from around from 330 seconds to 28 seconds, 20 seconds for execution, rest is network transfer

Fetch task are supposed to be much faster than MapReduce job but we clearly show that it’s not the case at all on our cluster… Bug ?

To go further

To debug Hive fetch, with the help of a consultant working for us, we have exported an online variable:

hdfs@client_node:~$ export HADOOP_ROOT_LOGGER=debug,console

Then we simulated a Hive fetch on a datafile of one partition with a size of less than 1GB with:

hdfs@client_node:~$ hdfs dfs -cat /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q846/000000_0 > /tmp/test_cq
19/04/08 12:17:38 DEBUG util.Shell: setsid exited with exit code 0
19/04/08 12:17:38 DEBUG conf.Configuration: parsing URL jar:file:/usr/hdp/2.6.4.0-91/hadoop/hadoop-common-2.7.3.2.6.4.0-91.jar!/core-default.xml
19/04/08 12:17:38 DEBUG conf.Configuration: parsing input stream sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream@7a36aefa
19/04/08 12:17:38 DEBUG conf.Configuration: parsing URL file:/etc/hadoop/2.6.4.0-91/0/core-site.xml
19/04/08 12:17:38 DEBUG conf.Configuration: parsing input stream java.io.BufferedInputStream@58c1c010
19/04/08 12:17:38 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
19/04/08 12:17:38 DEBUG util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
19/04/08 12:17:38 DEBUG security.Groups:  Creating new Groups object
19/04/08 12:17:38 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
19/04/08 12:17:38 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
19/04/08 12:17:38 DEBUG security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
19/04/08 12:17:38 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
19/04/08 12:17:38 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
19/04/08 12:17:38 DEBUG security.UserGroupInformation: hadoop login
19/04/08 12:17:38 DEBUG security.UserGroupInformation: hadoop login commit
19/04/08 12:17:38 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: hdfs
19/04/08 12:17:38 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: hdfs" with name hdfs
19/04/08 12:17:38 DEBUG security.UserGroupInformation: User entry: "hdfs"
19/04/08 12:17:38 DEBUG security.UserGroupInformation: Assuming keytab is managed externally since logged in from subject.
19/04/08 12:17:38 DEBUG security.UserGroupInformation: UGI loginUser:hdfs (auth:SIMPLE)
19/04/08 12:17:38 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
19/04/08 12:17:38 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
19/04/08 12:17:38 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
19/04/08 12:17:38 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
19/04/08 12:17:39 DEBUG hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://ManufacturingDataLakeHdfs
19/04/08 12:17:39 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
19/04/08 12:17:39 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
19/04/08 12:17:39 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
19/04/08 12:17:39 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
19/04/08 12:17:39 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
19/04/08 12:17:39 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$S  erver$ProtoBufRpcInvoker@238d68ff
19/04/08 12:17:39 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@70e38ce1
19/04/08 12:17:39 DEBUG unix.DomainSocketWatcher: org.apache.hadoop.net.unix.DomainSocketWatcher$2@3dfc1841: starting with interruptCheckPeriodMs = 60000
19/04/08 12:17:39 DEBUG shortcircuit.DomainSocketFactory: The short-circuit local reads feature is enabled.
19/04/08 12:17:39 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
19/04/08 12:17:39 DEBUG ipc.Client: The ping interval is 60000 ms.
19/04/08 12:17:39 DEBUG ipc.Client: Connecting to master_namenode.domain.com/10.75.144.1:8020
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs: starting, having connections 1
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs sending #0
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs got value #0
19/04/08 12:17:39 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 53ms
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs sending #1
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs got value #1
19/04/08 12:17:39 DEBUG ipc.ProtobufRpcEngine: Call: getBlockLocations took 1ms
19/04/08 12:17:39 DEBUG azure.NativeAzureFileSystem: finalize() called.
19/04/08 12:17:39 DEBUG azure.NativeAzureFileSystem: finalize() called.
19/04/08 12:17:39 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{
  fileLength=281939725
  underConstruction=false
  blocks=[LocatedBlock{BP-1711156358-10.75.144.1-1519036486930:blk_1212723761_139044745; getBlockSize()=268435456; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.75.144.15:50010,DS-7bf7ad15-da5c  -454b-adb5-67f1ea89e0e6,DISK], DatanodeInfoWithStorage[10.75.144.14:50010,DS-ee0aca78-2c5c-48bf-aeb8-3e4ed5823be1,DISK], DatanodeInfoWithStorage[10.75.144.11:50010,DS-92c5ac9c-6a51-464a-9925-8f0cc06d3f3f,D  ISK]]}, LocatedBlock{BP-1711156358-10.75.144.1-1519036486930:blk_1212733643_139054627; getBlockSize()=13504269; corrupt=false; offset=268435456; locs=[DatanodeInfoWithStorage[10.75.144.15:50010,DS-291244e5  -8577-4095-b200-3233661417db,DISK], DatanodeInfoWithStorage[10.75.144.11:50010,DS-bb818fac-2bb1-4ccf-9a17-a9afbecf5f6f,DISK], DatanodeInfoWithStorage[10.75.144.14:50010,DS-96c13ead-02bb-4191-b6c9-02c2f632e  bf0,DISK]]}]
  lastLocatedBlock=LocatedBlock{BP-1711156358-10.75.144.1-1519036486930:blk_1212733643_139054627; getBlockSize()=13504269; corrupt=false; offset=268435456; locs=[DatanodeInfoWithStorage[10.75.144.14:50010,  DS-96c13ead-02bb-4191-b6c9-02c2f632ebf0,DISK], DatanodeInfoWithStorage[10.75.144.15:50010,DS-291244e5-8577-4095-b200-3233661417db,DISK], DatanodeInfoWithStorage[10.75.144.11:50010,DS-bb818fac-2bb1-4ccf-9a1  7-a9afbecf5f6f,DISK]]}
  isLastBlockComplete=true}
19/04/08 12:17:39 DEBUG hdfs.DFSClient: Connecting to datanode 10.75.144.15:50010
19/04/08 12:17:39 DEBUG util.PerformanceAdvisory: BlockReaderFactory(fileName=/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q846/000000_0, block=BP-1711156358-10.75.  144.1-1519036486930:blk_1212723761_139044745): PathInfo{path=, state=UNUSABLE} is not usable for short circuit; giving up on BlockReaderLocal.
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs sending #2
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs got value #2
19/04/08 12:17:39 DEBUG ipc.ProtobufRpcEngine: Call: getServerDefaults took 1ms
19/04/08 12:17:39 DEBUG sasl.SaslDataTransferClient: SASL client skipping handshake in unsecured configuration for addr = /10.75.144.15, datanodeId = DatanodeInfoWithStorage[10.75.144.15:50010,DS-7bf7ad15-  da5c-454b-adb5-67f1ea89e0e6,DISK]
19/04/08 12:17:41 DEBUG hdfs.DFSClient: Connecting to datanode 10.75.144.15:50010
19/04/08 12:17:41 DEBUG util.PerformanceAdvisory: BlockReaderFactory(fileName=/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q846/000000_0, block=BP-1711156358-10.75.  144.1-1519036486930:blk_1212733643_139054627): PathInfo{path=, state=UNUSABLE} is not usable for short circuit; giving up on BlockReaderLocal.
19/04/08 12:17:41 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@70e38ce1
19/04/08 12:17:41 DEBUG ipc.Client: removing client from cache: org.apache.hadoop.ipc.Client@70e38ce1
19/04/08 12:17:41 DEBUG ipc.Client: stopping actual client because no more references remain: org.apache.hadoop.ipc.Client@70e38ce1
19/04/08 12:17:41 DEBUG ipc.Client: Stopping client
19/04/08 12:17:41 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs: closed
19/04/08 12:17:41 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs: stopped, remaining connections 0
19/04/08 12:17:41 DEBUG util.ShutdownHookManager: ShutdownHookManger complete shutdown.

And we have seen that dfs.client.read.shortcircuit was correctly used…

Questioning few Cloudera engineers they suggested to increase HiveServer2 memory as in case of a Hive fetch the ORC decompression and filtering is done directly by Hive server. Using Grafana from Ambari we saw some pikes in HiveServer2 memory and so decided to increase its memory from 12GB to 32GB. This was also possible as our server has plenty of memory. When I talk of ORC decompression it is because we have chosen this parameters:

hive_fetch01
hive_fetch01

We retried the query with absolutely no improvement… At that time we also investigated in HiveServer2 log file and found:

2019-04-09 15:00:34,183 INFO  [HiveServer2-Handler-Pool: Thread-112]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(478)) - Reading ORC rows from hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q846/000002_0 with {include: [true, true, true, true, true, true, true, false, true, false, true, true, true, true, true, true, true, false, false], offset: 0, length: 257464687, sarg: leaf-0 = (IN lot_partition Q846), leaf-1 = (IN lot_id Q840401), leaf-2 = (IN wafer_id Q840401-01E6 Q840401-02E1 Q840401-03D4 Q840401-04C7 Q840401-05C2 Q840401-06B5 Q840401-07B0 Q840401-08A3 Q840401-09H1 Q840401-10A3 Q840401-11H1 Q840401-12G4 Q840401-13F7 Q840401-14F2 Q840401-15E5 Q840401-16E0 Q840401-17D3 Q840401-18C6 Q840401-19C1 Q840401-20C6 Q840401-21C1 Q840401-22B4 Q840401-23A7 Q840401-24A2 Q840401-25H0), leaf-3 = (IN flow_id EWS1), leaf-4 = (IN start_t 2018.12.27-10:42:54 2018.12.27-12:01:57 2018.12.27-13:18:47 2018.12.27-14:36:31 2018.12.27-15:55:57 2018.12.27-17:13:42 2018.12.27-18:31:34 2018.12.27-19:49:27 2018.12.27-21:05:30 2018.12.27-22:23:36 2018.12.27-23:40:11 2018.12.28-00:56:40 2018.12.28-02:15:51 2018.12.28-03:41:23 2018.12.28-04:58:02 2018.12.28-06:16:11 2018.12.28-07:34:40 2018.12.28-08:55:29 2018.12.28-10:13:25 2018.12.28-11:30:34 2018.12.28-12:48:08 2018.12.28-14:06:12 2018.12.28-15:23:00 2018.12.28-16:39:50 2018.12.28-17:56:57), leaf-5 = (IN finish_t 2018.12.27-12:00:40 2018.12.27-13:17:30 2018.12.27-14:35:13 2018.12.27-15:54:39 2018.12.27-17:12:26 2018.12.27-18:30:16 2018.12.27-19:48:12 2018.12.27-21:04:15 2018.12.27-22:22:19 2018.12.27-23:38:53 2018.12.28-00:55:24 2018.12.28-02:14:34 2018.12.28-03:32:41 2018.12.28-04:56:43 2018.12.28-06:14:52 2018.12.28-07:33:21 2018.12.28-08:54:12 2018.12.28-10:12:06 2018.12.28-11:29:17 2018.12.28-12:46:50 2018.12.28-14:04:55 2018.12.28-15:21:43 2018.12.28-16:38:30 2018.12.28-17:55:38 2018.12.28-19:13:18), leaf-6 = (IN hbin_number 1), leaf-7 = (IN sbin_number 1), expr = (and leaf-0 leaf-1 leaf-2 leaf-3 leaf-4 leaf-5 leaf-6 leaf-7), columns: ['null', 'lot_id', 'wafer_id', 'flow_id', 'start_t', 'finish_t', 'hbin_number', 'null', 'sbin_number', 'null', 'param_id', 'param_name', 'param_unit', 'param_low_limit', 'param_high_limit', 'nb_dies_tested', 'nb_dies_failed', 'null', 'null']}
2019-04-09 15:02:24,610 INFO  [HiveServer2-Handler-Pool: Thread-112]: orc.OrcUtils (OrcUtils.java:getDesiredRowTypeDescr(810)) - Using schema evolution configuration variables schema.evolution.columns [lot_id, wafer_id, flow_id, start_t, finish_t, hbin_number, hbin_name, sbin_number, sbin_name, param_id, param_name, param_unit, param_low_limit, param_high_limit, nb_dies_tested, nb_dies_failed, nb_dies_good, ingestion_date] / schema.evolution.columns.types [string, string, string, string, string, int, string, int, string, string, string, string, float, float, int, int, int, string] (isAcid false)

If you see below this file is only around 230MB in size. So it has taken almost 2 minutes (!!!) to read, decompress and filter this small file. So clearly there is something wrong with Hive fetch task and we have decided to deactivate it by setting cluster wide below parameter:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion;
+----------------------------------+--+
|               set                |
+----------------------------------+--+
| hive.fetch.task.conversion=none  |
+----------------------------------+--+
1 row selected (0.09 seconds)

At the end both queries are now running in less than 25-30 seconds and we will monitor for any other negative side effects…

References

Yannick Jaquier on LinkedinYannick Jaquier on RssYannick Jaquier on Twitter
Yannick Jaquier
Find more about me on social media.
This entry was posted in Hadoop and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>