IT World https://blog.yannickjaquier.com RDBMS, Unix and many more... Mon, 20 Jan 2020 10:30:49 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.2 Hive concatenate command issues and workaround https://blog.yannickjaquier.com/hadoop/hive-concatenate-command-issues-and-workaround.html https://blog.yannickjaquier.com/hadoop/hive-concatenate-command-issues-and-workaround.html#respond Fri, 24 Jan 2020 08:30:12 +0000 https://blog.yannickjaquier.com/?p=4834 Preamble To maintain good performance we have develop a script (will be shared in another blog post) to concatenate the partitions of our Hive tables every week. It helps in reducing the number of ORC files per partitions and, as such, helps in reducing the number of MAP and Reduce jobs mandatory to access the […]

The post Hive concatenate command issues and workaround appeared first on IT World.

]]>

Table of contents

Preamble

To maintain good performance we have develop a script (will be shared in another blog post) to concatenate the partitions of our Hive tables every week. It helps in reducing the number of ORC files per partitions and, as such, helps in reducing the number of MAP and Reduce jobs mandatory to access the partitions in Hive queries.

It all went good till one week we started to have error message for partitions of one of our table. The funny situation was that for some partitions it still works well while for other we get an error message.

This table has been created in Hive but we fill the partition in a PySpark script. So partition got created in Spark because obviously this is when you insert rows that the partition directories in HDFS is created…

Our cluster is running HDP-2.6.4.0.

The concerned stack version are:

  • Hive 1.2.1000
  • Spark 2.2.0.2.6.4.0-91

Concatenate command failing for access right

The complete error message is the following:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> alter table prod_ews_refined.tbl_wafer_param_stats partition (fab="CTM8",lot_partition="58053") concatenate;
INFO  : Session is already open
INFO  : Dag name: hive_20190830162358_1591b2d1-7489-4f60-8b9f-725fb50a4648
INFO  : Tez session was closed. Reopening...
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
File Merge ....      RUNNING      5          4        0        1       4       0
--------------------------------------------------------------------------------
VERTICES: 00/01  [====================>>------] 80%   ELAPSED TIME: 5.52 s
--------------------------------------------------------------------------------
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=File Merge, vertexId=vertex_1565718945091_75870_2_00, diagnostics=[Task failed, taskId=task_1565718945091_75870_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:184)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
        ... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close AbstractFileMergeOperator
        at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:272)
        at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:250)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:620)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:176)
        ... 15 more
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=mfgdl_ingestion, access=EXECUTE, inode="/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/.hive-staging_hive_2019-08-30_16-23-58_034_3892822656895858873-3012/_tmp.-ext-10000/000000_0_copy_1/000000_0_copy_7":mfgdl_ingestion:hadoop:-rw-r--r--
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:292)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:238)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)
        at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4142)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1137)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:866)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2167)
        at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1442)
        at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1438)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1454)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
        at org.apache.hadoop.hive.ql.exec.Utilities.moveFile(Utilities.java:1807)
        at org.apache.hadoop.hive.ql.exec.Utilities.renameOrMoveFiles(Utilities.java:1843)
        at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:258)
        ... 18 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=mfgdl_ingestion, access=EXECUTE, inode="/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/.hive-staging_hive_2019-08-30_16-23-58_034_3892822656895858873-3012/_tmp.-ext-10000/000000_0_copy_1/000000_0_copy_7":mfgdl_ingestion:hadoop:-rw-r--r--
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:292)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:238)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)
        at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4142)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1137)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:866)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
        at org.apache.hadoop.ipc.Client.call(Client.java:1498)
        at org.apache.hadoop.ipc.Client.call(Client.java:1398)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:823)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
        at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2165)
        ... 26 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140)
        at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedConstructorAccessor18.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)
        ... 18 more
Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_5
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274)
        at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:266)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:319)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.(OrcFileStripeMergeRecordReader.java:47)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37)
        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
        ... 22 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_5
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
        at org.apache.hadoop.ipc.Client.call(Client.java:1498)
        at org.apache.hadoop.ipc.Client.call(Client.java:1398)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
        at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238)
        ... 39 more
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140)
        at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)
        ... 18 more
Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_2
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274)
        at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:266)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:319)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.(OrcFileStripeMergeRecordReader.java:47)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37)
        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
        ... 23 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_2
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
        at org.apache.hadoop.ipc.Client.call(Client.java:1498)
        at org.apache.hadoop.ipc.Client.call(Client.java:1398)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
        at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238)
        ... 40 more
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140)
        at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)
        ... 18 more
Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_2
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274)
        at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:266)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:319)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.(OrcFileStripeMergeRecordReader.java:47)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37)
        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
        ... 23 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_2
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
        at org.apache.hadoop.ipc.Client.call(Client.java:1498)
        at org.apache.hadoop.ipc.Client.call(Client.java:1398)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
        at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238)
        ... 40 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1565718945091_75870_2_00 [File Merge] killed/failed due to:OWN_TASK_FAILURE]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=2)

While, as written, for some other partitions all is still working fine:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> alter table prod_ews_refined.tbl_wafer_param_stats partition (fab="CTM8",lot_partition="59591") concatenate;
INFO  : Session is already open
INFO  : Dag name: hive_20190830145138_334957cb-f329-43af-953b-d03e213c9b03
INFO  : Tez session was closed. Reopening...
INFO  : Session re-established.
INFO  : Status: Running (Executing on YARN cluster with App id application_1565718945091_74778)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
File Merge .....   SUCCEEDED      4          4        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 31.15 s
--------------------------------------------------------------------------------
INFO  : Loading data to table prod_ews_refined.tbl_wafer_param_stats partition (fab=CTM8, lot_partition=59591) from hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_ews_refined.db/tbl_wafe          r_param_stats/fab=CTM8/lot_partition=59591/.hive-staging_hive_2019-08-30_14-51-38_835_3100511055765258809-3012/-ext-10000
INFO  : Partition prod_ews_refined.tbl_wafer_param_stats{fab=CTM8, lot_partition=59591} stats: [numFiles=4, totalSize=107827]
No rows affected (56.472 seconds)

If we extract the root cause from the long above error message it is:

Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=mfgdl_ingestion, access=EXECUTE, inode="/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/.hive-staging_hive_2019-08-30_16-23-58_034_3892822656895858873-3012/_tmp.-ext-10000/000000_0_copy_1/000000_0_copy_7":mfgdl_ingestion:hadoop:-rw-r--r--

Clearly there is a problem of access right for the concatenate command. The problem is that the command is launched with the exact user as the one that has created and which fill the partition and more importantly the error happens randomly on only few partitions… No doubt we are hitting a bug, cannot say if it is Spark or Hive related…

HDFS default access right with Hive and Spark

I first noticed that for failing partition the rights on the ORC files are not all the same (755, rwxr-xr-x and 644, rw-r–r–):

hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053
Found 35 items
.
.
-rwxr-xr-x   3 mfgdl_ingestion hadoop       1959 2019-08-25 18:12 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000001_0_copy_8
-rwxr-xr-x   3 mfgdl_ingestion hadoop       2563 2019-08-25 18:12 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000002_0_copy_1
-rwxr-xr-x   3 mfgdl_ingestion hadoop       1967 2019-08-25 18:11 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000002_0_copy_2
-rwxr-xr-x   3 mfgdl_ingestion hadoop       2190 2019-08-25 18:08 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000002_0_copy_3
-rwxr-xr-x   3 mfgdl_ingestion hadoop       1985 2019-08-25 18:11 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000002_0_copy_4
-rwxr-xr-x   3 mfgdl_ingestion hadoop       3508 2019-08-26 20:19 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000004_0
-rw-r--r--   3 mfgdl_ingestion hadoop       2009 2019-08-30 06:18 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/part-00004-9be63b72-9ccc-48f9-a422-b9a6420a3f6f.c000.zlib.orc
-rw-r--r--   3 mfgdl_ingestion hadoop       1959 2019-08-27 21:16 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/part-00007-dd3cce93-8301-42b4-add0-570ee27a5d66.c000.zlib.orc
-rw-r--r--   3 mfgdl_ingestion hadoop       2133 2019-08-27 21:16 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/part-00030-dd3cce93-8301-42b4-add0-570ee27a5d66.c000.zlib.orc
-rw-r--r--   3 mfgdl_ingestion hadoop       2137 2019-08-30 06:18 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/part-00047-9be63b72-9ccc-48f9-a422-b9a6420a3f6f.c000.zlib.orc
.
.
.

While for working partition it is all the same (755, rwxr-xr-x):

hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=59591
Found 4 items
-rwxr-xr-x   3 mfgdl_ingestion hadoop      41397 2019-08-30 14:52 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=59591/000000_0
-rwxr-xr-x   3 mfgdl_ingestion hadoop      39713 2019-08-30 14:52 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=59591/000001_0
-rwxr-xr-x   3 mfgdl_ingestion hadoop      21324 2019-08-30 14:52 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=59591/000002_0
-rwxr-xr-x   3 mfgdl_ingestion hadoop       5393 2019-08-30 14:52 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=59591/000003_0

I have also tried to find if all the files are really part of the partitions (no ghost file) and apparently yes. We can only check that number of file (totalNumberFiles) is consistent from Hive but not have the actual list of HDFS files that make the partition:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> use prod_ews_refined;
No rows affected (0.438 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> show table extended like tbl_wafer_param_stats partition (fab="CTM8",lot_partition="58053");
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
|                                                                                                                                                                                                                           tab_name                                                                                                                                                                                                                            |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| tableName:tbl_wafer_param_stats                                                                                                                                                                                                                                                                                                                                                                                                                               |
| owner:mfgdl_ingestion                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| location:hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053                                                                                                                                                                                                                                                                                                                          |
| inputformat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat                                                                                                                                                                                                                                                                                                                                                                                                   |
| outputformat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat                                                                                                                                                                                                                                                                                                                                                                                                 |
| columns:struct columns { string start_t, string finish_t, string lot_id, string wafer_id, string flow_id, i32 param_id, string param_name, float param_low_limit, float param_high_limit, string param_unit, string ingestion_date, float param_p01, float param_q1, float param_median, float param_q3, float param_p99, float param_min_value, double param_avg_value, float param_max_value, double param_stddev, i64 nb_dies_tested, i64 nb_dies_failed}  |
| partitioned:true                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| partitionColumns:struct partition_columns { string fab, string lot_partition}                                                                                                                                                                                                                                                                                                                                                                                 |
| totalNumberFiles:34                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| totalFileSize:88754                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| maxFileSize:16710                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| minFileSize:1954                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| lastAccessTime:1567138817781                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| lastUpdateTime:1567179342705                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
15 rows selected (0.444 seconds)

Remark
I have tried the: IN | FROM database_name as described in Official Hive documentation

SHOW TABLE EXTENDED [IN|FROM database_name] LIKE 'identifier_with_wildcards' [PARTITION(partition_spec)];

But I have not been able to make it working so decided to finally use the USE database_name statement…

As written this table is filled using PySpark with a code like:

dataframe.write.mode('append').format('orc').option("compression","zlib").partitionBy('fab','lot_partition').saveAsTable("prod_ews_refined.tbl_wafer_param_stats")

I have checked the default HDFS mask:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set fs.permissions.umask-mode;
+--------------------------------+--+
|              set               |
+--------------------------------+--+
| fs.permissions.umask-mode=022  |
+--------------------------------+--+
1 row selected (0.061 seconds)

It means that file will be created with 644, rw-r–r– (666 – 022) and directory will be created with 755, rwx-r-xr-x (777 – 022) by default.

But digging a bit inside the tree of my database:

hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/
Found 7 items
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-09-02 10:12 /apps/hive/warehouse/prod_ews_refined.db/tbl_bin_param_stat
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-08-31 01:38 /apps/hive/warehouse/prod_ews_refined.db/tbl_die_bin
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-09-02 10:36 /apps/hive/warehouse/prod_ews_refined.db/tbl_die_param_flow
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-09-02 10:34 /apps/hive/warehouse/prod_ews_refined.db/tbl_die_sp
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-08-30 19:26 /apps/hive/warehouse/prod_ews_refined.db/tbl_stdf
drwxr-xr-x   - mfgdl_ingestion hadoop          0 2019-09-02 10:39 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-08-30 12:02 /apps/hive/warehouse/prod_ews_refined.db/tbl_wsr_map
hdfs@clientnode:~$ hdfs dfs -ls -d /apps/hive/warehouse/prod_ews_refined.db
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-07-16 23:17 /apps/hive/warehouse/prod_ews_refined.db
hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/
Found 1 items
drwxrwxrwx   - hive hadoop          0 2019-09-02 10:09 /apps/hive/warehouse

Below parameter explain why for sub-directories we always have 777, I have seen that this parameter more apply to files and not to sub-directories (!!):

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.warehouse.subdir.inherit.perms;
+-------------------------------------------+--+
|                    set                    |
+-------------------------------------------+--+
| hive.warehouse.subdir.inherit.perms=true  |
+-------------------------------------------+--+
1 row selected (0.103 seconds)

So this explain why HDFS files (ORC for me) of tables created and filled by Hive have 777 for directories and files. For partitions created by Spark the default HDFS mask (fs.permissions.umask-mode) and the 755 (rwxr–r–) for directories and 644 (rw-r–r–) for files is expected behavior.

So to solve the execute permission issue of the partition I have issued:

hdfs@clientnode:~$ hdfs dfs -chmod 755 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/part*

And the concatenate went well this time:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> alter table prod_ews_refined.tbl_wafer_param_stats partition (fab="CTM8",lot_partition="58053") concatenate;
INFO  : Session is already open
INFO  : Dag name: hive_20190902152242_ac7ab990-fdfd-4094-89b4-4926c49364ee
INFO  : Tez session was closed. Reopening...
INFO  : Session re-established.
INFO  : Status: Running (Executing on YARN cluster with App id application_1565718945091_85673)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
File Merge .....   SUCCEEDED      5          5        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 12.87 s
--------------------------------------------------------------------------------
INFO  : Loading data to table prod_ews_refined.tbl_wafer_param_stats partition (fab=CTM8, lot_partition=58053) from hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/.hive-staging_hive_2019-09-02_15-22-42_356_8162954430835088222-15910/-ext-10000
INFO  : Partition prod_ews_refined.tbl_wafer_param_stats{fab=CTM8, lot_partition=58053} stats: [numFiles=8, numRows=57, totalSize=65242, rawDataSize=42834]
No rows affected (29.993 seconds)

With number of files reduced drastically:

hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053
Found 9 items
drwxr-xr-x   - mfgdl_ingestion hadoop          0 2019-08-29 19:20 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/.hive-staging_hive_2019-08-29_18-10-26_174_7966264139569655341-114
-rwxr-xr-x   3 mfgdl_ingestion hadoop      20051 2019-09-02 15:23 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0
-rwxr-xr-x   3 mfgdl_ingestion hadoop      19020 2019-09-02 15:23 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000001_0
-rwxr-xr-x   3 mfgdl_ingestion hadoop       2009 2019-08-30 06:18 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000001_0_copy_1
-rwxr-xr-x   3 mfgdl_ingestion hadoop       1959 2019-08-27 21:16 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000001_0_copy_2
-rwxr-xr-x   3 mfgdl_ingestion hadoop       2163 2019-08-27 03:42 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000001_0_copy_3
-rwxr-xr-x   3 mfgdl_ingestion hadoop      14413 2019-09-02 15:23 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000002_0
-rwxr-xr-x   3 mfgdl_ingestion hadoop       3508 2019-09-02 15:23 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000003_0
-rwxr-xr-x   3 mfgdl_ingestion hadoop       2119 2019-09-02 15:23 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000004_0

To go further

To try to clarify this story of default access rights with hive.warehouse.subdir.inherit.perms and fs.permissions.umask-mode I have created a default table having those parameters set respectively to true and 022:

drop table yannick01 purge;

create table default.yannick01
(
value string
)
partitioned by(fab string, int_partition int)
stored as orc;

I insert a dummy record with:

insert into default.yannick01 partition (fab="CTM8", int_partition=1) values ("One");

As expected I have 777 for all files because my /apps/hive/warehouse directory is 777:

hdfs@clientnode:~$ hdfs dfs -ls -d /apps/hive/warehouse/yannick01
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-09-03 11:21 /apps/hive/warehouse/yannick01

hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/yannick01
Found 1 items
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-09-02 18:03 /apps/hive/warehouse/yannick01/fab=CTM8

hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/yannick01/fab=CTM8
Found 1 items
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-09-02 18:03 /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1

hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1
Found 1 items
-rwxrwxrwx   3 mfgdl_ingestion hadoop        214 2019-09-02 18:04 /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1/000000_0

Now if I re-execute the creation script and set this:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.warehouse.subdir.inherit.perms=false;
No rows affected (0.003 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.warehouse.subdir.inherit.perms;
+--------------------------------------------+--+
|                    set                     |
+--------------------------------------------+--+
| hive.warehouse.subdir.inherit.perms=false  |
+--------------------------------------------+--+
1 row selected (0.005 seconds)

I get:

hdfs@clientnode:~$ hdfs dfs -ls -d /apps/hive/warehouse/yannick01
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-09-03 11:21 /apps/hive/warehouse/yannick01

hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/yannick01/fab=CTM8
Found 1 items
drwxrwxrwx   - mfgdl_ingestion hadoop          0 2019-09-03 11:22 /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1

hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1
Found 1 items
-rw-r--r--   3 mfgdl_ingestion hadoop        223 2019-09-03 11:22 /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1/000000_0

And the concatenate command is working fine because the directory containing the ORC HDFS files has 777. So it is not required to set execute permission on ORC files, having 777 (write for others, hdfs dfs -chmod g+w,o+w or hdfs dfs -chmod 777) on directory where files are stored is also working. But clearly this is much less secure…

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> alter table default.yannick01 partition (fab="CTM8", int_partition=1) concatenate;
INFO  : Session is already open
INFO  : Dag name: hive_20190904163205_31445337-20a1-42d3-80ac-e08028d6a2a1
INFO  : Status: Running (Executing on YARN cluster with App id application_1565718945091_100063)

INFO  : Loading data to table default.yannick01 partition (fab=CTM8, int_partition=1) from hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1/.hive-staging_hive_2019-09-04_16-32-05_102_6350061968665706892-148/-ext-10000
INFO  : Partition default.yannick01{fab=CTM8, int_partition=1} stats: [numFiles=2, numRows=1, totalSize=158170, rawDataSize=87]
No rows affected (3.291 seconds)

Worked well till…

It all worked as expected till we hit a partition with lots of files and even setting the correct rights it failed for:

ERROR : Status: Failed
ERROR : Vertex failed, vertexName=File Merge, vertexId=vertex_1565718945091_143224_2_00, diagnostics=[Task failed, taskId=task_1565718945091_143224_2_00_000001, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:184)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
        ... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close AbstractFileMergeOperator
        at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:272)
        at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:250)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:620)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:176)
        ... 15 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): Current inode is not a directory: 000001_0_copy_21(INodeFile@1491d045), parentDir=_tmp.-ext-10000/
        at org.apache.hadoop.hdfs.server.namenode.INode.asDirectory(INode.java:331)
        at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.verifyFsLimitsForRename(FSDirRenameOp.java:117)
        at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:189)
        at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:492)
        at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:73)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3938)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:993)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:587)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
        at org.apache.hadoop.ipc.Client.call(Client.java:1498)
        at org.apache.hadoop.ipc.Client.call(Client.java:1398)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy11.rename(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:529)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
        at com.sun.proxy.$Proxy12.rename(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:2006)
        at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:732)
        at org.apache.hadoop.hive.ql.exec.Utilities.moveFile(Utilities.java:1815)
        at org.apache.hadoop.hive.ql.exec.Utilities.renameOrMoveFiles(Utilities.java:1843)
        at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:258)
        ... 18 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140)
        at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedConstructorAccessor18.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)
        ... 18 more
Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_993
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274)
        at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:266)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:319)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.(OrcFileStripeMergeRecordReader.java:47)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37)
        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
        ... 22 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_993
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
        at org.apache.hadoop.ipc.Client.call(Client.java:1498)
        at org.apache.hadoop.ipc.Client.call(Client.java:1398)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
        at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238)
        ... 39 more
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140)
        at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)
        ... 18 more
Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_969
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274)
        at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:266)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:319)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.(OrcFileStripeMergeRecordReader.java:47)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37)
        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
        ... 23 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_969
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
        at org.apache.hadoop.ipc.Client.call(Client.java:1498)
        at org.apache.hadoop.ipc.Client.call(Client.java:1398)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
        at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238)
        ... 40 more
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140)
        at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
        at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)
        ... 18 more
Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_969
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274)
        at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:266)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332)
        at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355)
        at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:319)
        at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.(OrcFileStripeMergeRecordReader.java:47)
        at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37)
        at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
        ... 23 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_969
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
        at org.apache.hadoop.ipc.Client.call(Client.java:1498)
        at org.apache.hadoop.ipc.Client.call(Client.java:1398)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
        at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238)
        ... 40 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:1, Vertex vertex_1565718945091_143224_2_00 [File Merge] killed/failed due to:OWN_TASK_FAILURE]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=2)

I initially thought it could be a HDFS FileSytem issue so:

hdfs@clientnode:~$ hdfs fsck /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/
Connecting to namenode via http://namenode01.domain.com:50070/fsck?ugi=hdfs&path=%2Fapps%2Fhive%2Fwarehouse%2Fprod_ews_refined.db%2Ftbl_wafer_param_stats_old%2Ffab%3DC2WF%2Flot_partition%3DQ9053
FSCK started by hdfs (auth:SIMPLE) from /10.75.144.5 for path /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053 at Mon Sep 16 17:01:29 CEST 2019

Status: HEALTHY
 Total size:    159181622 B
 Total dirs:    1
 Total files:   21778
 Total symlinks:                0
 Total blocks (validated):      21778 (avg. block size 7309 B)
 Minimally replicated blocks:   21778 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          8
 Number of racks:               2
FSCK ended at Mon Sep 16 17:01:29 CEST 2019 in 377 milliseconds


The filesystem under path '/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053' is HEALTHY

To read all the logs in a convenient manner I strongly encourage the usage of Tez View. To do so you can read the application id when executing the command in beeline:

.
.
INFO  : Status: Running (Executing on YARN cluster with App id application_1565718945091_161035)
.
.

And access the resource using this url http://resourcemanager01.domain.com:8088/cluster/app/application_1565718945091_142464 then click on ApplicationMaster link in displayed page. Finally accessing Dag tab and opening details you should see something like:

concatenate01
concatenate01

You can also generate a text file using:

[yarn@clientnode ~]$ yarn logs -applicationId application_1565718945091_141922 > /tmp/yan.txt

From this huge amount of logs I have obviously see plenty of strange error messages but none of them led to a clear conclusion:

  • |OrcFileMergeOperator|: Incompatible ORC file merge! Writer version mismatch for
  • |tez.RecordProcessor|: Hit error while closing operators – failing tree
  • |tez.TezProcessor|: java.lang.RuntimeException: Hive Runtime Error while closing operators
  • org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): Current inode is not a directory:

But the number of files in the partition is decreasing:

hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=C2WF/lot_partition=Q9053 | wc -l
21693

hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=C2WF/lot_partition=Q9053 | wc -l
21642

So clearly another bug

Last but not least one of my teammate noticed a strange behavior while doing simple count on partition. He generated a pure Hive table by inserting the rows of the one created by Spark:

0: jdbc:hive2://zookeeer01.domain.com:2181,zoo> select count(*) from prod_ews_refined.tbl_hive_generated where fab="R8WF" and lot_partition="G3473";
+------+--+
| _c0  |
+------+--+
| 137  |
+------+--+
1 row selected (0.045 seconds)
0: jdbc:hive2://zookeeer01.domain.com:2181,zoo> select count(*) from prod_ews_refined.tbl_spark_generated where fab="R8WF" and lot_partition="G3473";
+------+--+
| _c0  |
+------+--+
| 130  |
+------+--+
1 row selected (0.058 seconds)

But by exporting the rows of the Spark generated table in a csv file (–outformat=csv2) the output is the correct one:

Connected to: Apache Hive (version 1.2.1000.2.6.4.0-91)
Driver: Hive JDBC (version 1.2.1000.2.6.4.0-91)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: select * from prod_ew...ot_partition="G3473"(Stage-1)
INFO  : Status: Running (Executing on YARN cluster with App id application_1565718945091_133078)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      2          2        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 2.11 s     
--------------------------------------------------------------------------------
137 rows selected (7.998 seconds)

So one bug more which has lead to the project of upgrading our cluster to latest HDP version to prepare the migration to Cloudera as Hortonworks is dead…

In meanwhile the non-satisfactory solution we have implemented is to fill a pure Hive table with INSERT AS SELECT from the Spark generated table…

References

The post Hive concatenate command issues and workaround appeared first on IT World.

]]>
https://blog.yannickjaquier.com/hadoop/hive-concatenate-command-issues-and-workaround.html/feed 0
How to handle HDFS blocks with corrupted replicas or under replicated https://blog.yannickjaquier.com/hadoop/how-to-handle-hdfs-blocks-with-corrupted-replicas-or-under-replicated.html https://blog.yannickjaquier.com/hadoop/how-to-handle-hdfs-blocks-with-corrupted-replicas-or-under-replicated.html#respond Wed, 25 Dec 2019 14:00:23 +0000 https://blog.yannickjaquier.com/?p=4825 Preamble In Ambari, and in HDFS more precisely, there are two widgets that will jump to your eyes if they are not equal to zero. They are Blocks With Corrupted Replicas and Under Replicated Blocks. In the graphical interface this is something like: Managing under replicated blocks You get a complete list of impacted file […]

The post How to handle HDFS blocks with corrupted replicas or under replicated appeared first on IT World.

]]>

Table of contents

Preamble

In Ambari, and in HDFS more precisely, there are two widgets that will jump to your eyes if they are not equal to zero. They are Blocks With Corrupted Replicas and Under Replicated Blocks. In the graphical interface this is something like:

hdfs_blocks01
hdfs_blocks01

Managing under replicated blocks

You get a complete list of impacted file with this command. The grep command is simply to remove all the lines with multiple point symbol:

hdfs@client_node:~$ hdfs fsck / | egrep -v '^\.+'
Connecting to namenode via http://namenode01.domain.com:50070/fsck?ugi=hdfs&path=%2F
FSCK started by hdfs (auth:SIMPLE) from /10.75.144.5 for path / at Tue Jul 23 14:58:19 CEST 2019
/tmp/hive/training/dda2d815-27d3-43e0-9d3b-4aea3983c1a9/hive_2018-09-18_12-13-21_550_5155956994526391933-12/_tez_scratch_dir/split_Map_1/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1087250107_13523212. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/ambari-qa/.staging/job_1525749609269_2679/libjars/hive-shims-0.23-1.2.1000.2.6.4.0-91.jar:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1074437588_697325. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/ambari-qa/.staging/job_1525749609269_2679/libjars/hive-shims-common-1.2.1000.2.6.4.0-91.jar:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1074437448_697185. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/hdfs/.staging/job_1541585350344_0008/job.jar:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1092991377_19266614. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/hdfs/.staging/job_1541585350344_0008/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1092991378_19266615. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/hdfs/.staging/job_1541585350344_0041/job.jar:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1093022580_19297823. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/hdfs/.staging/job_1541585350344_0041/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1093022581_19297824. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1519657336782_0105/job.jar:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1073754565_13755. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1519657336782_0105/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1073754566_13756. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1519657336782_0105/libjars/hive-hcatalog-core.jar:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1073754564_13754. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536057043538_0001/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085621525_11894367. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536057043538_0002/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085621527_11894369. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536057043538_0004/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085621593_11894435. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536057043538_0023/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085622064_11894906. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536057043538_0025/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085622086_11894928. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536057043538_0027/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085622115_11894957. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536057043538_0028/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085622133_11894975. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536642465198_0002/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397707_12670663. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536642465198_0003/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397706_12670662. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536642465198_0004/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397708_12670664. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536642465198_0005/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397718_12670674. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536642465198_0006/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397720_12670676. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536642465198_0007/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397721_12670677. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
/user/training/.staging/job_1536642465198_2307/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086509846_12782817. Target Replicas is 10 but found 8 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
 Total size:    25053842686185 B (Total open files size: 33152660866 B)
 Total dirs:    1500114
 Total files:   12517972
 Total symlinks:                0 (Files currently being written: 268)
 Total blocks (validated):      12534979 (avg. block size 1998714 B) (Total open file blocks (not validated): 325)
 Minimally replicated blocks:   12534979 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       24 (1.9146422E-4 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.9715698
 Corrupt blocks:                0
 Missing replicas:              48 (1.2886386E-4 %)
 Number of data-nodes:          8
 Number of racks:               2
FSCK ended at Tue Jul 23 15:01:08 CEST 2019 in 169140 milliseconds


The filesystem under path '/' is HEALTHY

All those files have the strange “Target Replicas is 10 but found 8 live replica(s)” message while our default replication factor is 3, so I have decided to set it back to 3 with:

 
hdfs@client_node:~$ hdfs dfs -setrep 3 /user/training/.staging/job_1536642465198_2307/job.split
Replication 3 set: /user/training/.staging/job_1536642465198_2307/job.split

And it solved the issue (24 to 23 under replicated blocks):

 Total size:    25009165459559 B (Total open files size: 1006863 B)
 Total dirs:    1500150
 Total files:   12528930
 Total symlinks:                0 (Files currently being written: 47)
 Total blocks (validated):      12545325 (avg. block size 1993504 B) (Total open file blocks (not validated): 43)
 Minimally replicated blocks:   12545325 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       23 (1.8333523E-4 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.9715445
 Corrupt blocks:                0
 Missing replicas:              46 (1.2339374E-4 %)
 Number of data-nodes:          8
 Number of racks:               2
FSCK ended at Tue Jul 23 15:49:27 CEST 2019 in 155922 milliseconds

That we also see graphically under Ambari:

hdfs_blocks02
hdfs_blocks02

There is also an option to delete the files if it has no interested or if, like for me, this is a temporary old file like this one of my list:

hdfs@client_node:~$ hdfs dfs -ls -d /tmp/hive/training/dda2d815*
drwx------   - training hdfs          0 2018-09-18 12:53 /tmp/hive/training/dda2d815-27d3-43e0-9d3b-4aea3983c1a9

Then deleting it and skipping the trash (no recover possible, if you are unsure keep the trash activated, but you will have to wait that HDFS purge the trash after fs.trash.interval minutes):

hdfs@client_node:~$ hdfs dfs -rm -r -f -skipTrash /tmp/hive/training/dda2d815-27d3-43e0-9d3b-4aea3983c1a9
Deleted /tmp/hive/training/dda2d815-27d3-43e0-9d3b-4aea3983c1a9

Obviously same exact result:

 Total size:    25019302903260 B (Total open files size: 395112 B)
 Total dirs:    1439889
 Total files:   12474754
 Total symlinks:                0 (Files currently being written: 69)
 Total blocks (validated):      12491149 (avg. block size 2002962 B) (Total open file blocks (not validated): 57)
 Minimally replicated blocks:   12491149 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       22 (1.7612471E-4 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.971359
 Corrupt blocks:                0
 Missing replicas:              44 (1.185481E-4 %)
 Number of data-nodes:          8
 Number of racks:               2
FSCK ended at Tue Jul 23 16:18:51 CEST 2019 in 141604 milliseconds

Managing blocks with corrupted replicas

Simulating corrupted blocks is not piece of cake and even if my Ambari display is showing corrupted blocks, in reality I have none:

hdfs@client_node:~$ hdfs fsck -list-corruptfileblocks
Connecting to namenode via http://namenode01.domain.com:50070/fsck?ugi=hdfs&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files

On a TI environment where plenty of people have played with I finally got the (hopefully) rare message of corrupted blocks. In real life it should not happen as by default your HDFS replication factor is 3 (It even forbid me to start HBase):

[hdfs@client_node ~]$ hdfs dfsadmin -report
Configured Capacity: 6139207680 (5.72 GB)
Present Capacity: 5701219000 (5.31 GB)
DFS Remaining: 2659930112 (2.48 GB)
DFS Used: 3041288888 (2.83 GB)
DFS Used%: 53.34%
Replicated Blocks:
        Under replicated blocks: 0
        Blocks with corrupt replicas: 0
        Missing blocks: 10
        Missing blocks (with replication factor 1): 0
        Pending deletion blocks: 0
Erasure Coded Block Groups:
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Pending deletion blocks: 0
.
.

More precisely:

[hdfs@client_node ~]$ hdfs fsck -list-corruptfileblocks
Connecting to namenode via http://namenode01.domain.com:50070/fsck?ugi=hdfs&listcorruptfileblocks=1&path=%2F
The list of corrupt files under path '/' are:
blk_1073744038  /hdp/apps/3.0.0.0-1634/mapreduce/mapreduce.tar.gz
blk_1073744039  /hdp/apps/3.0.0.0-1634/mapreduce/mapreduce.tar.gz
blk_1073744040  /hdp/apps/3.0.0.0-1634/mapreduce/mapreduce.tar.gz
blk_1073744041  /hdp/apps/3.0.0.0-1634/yarn/service-dep.tar.gz
blk_1073744042  /apps/hbase/data/hbase.version
blk_1073744043  /apps/hbase/data/hbase.id
blk_1073744044  /apps/hbase/data/data/hbase/meta/1588230740/.regioninfo
blk_1073744045  /apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001
blk_1073744046  /apps/hbase/data/MasterProcWALs/pv2-00000000000000000001.log
blk_1073744047  /apps/hbase/data/WALs/datanode01.domain.com,16020,1576839178870/datanode01.domain.com%2C16020%2C1576839178870.1576839191114
The filesystem under path '/' has 10 CORRUPT files

Here is why I was not able to start component related to HBAse and even HBase daemon was failing:

[hdfs@client_node ~]$ hdfs dfs -cat /apps/hbase/data/hbase.version
19/12/20 15:41:12 WARN hdfs.DFSClient: No live nodes contain block BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 after checking nodes = [], ignoredNodes = null
19/12/20 15:41:12 INFO hdfs.DFSClient: No node available for BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 file=/apps/hbase/data/hbase.version
19/12/20 15:41:12 INFO hdfs.DFSClient: Could not obtain BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 from any node:  No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry...
19/12/20 15:41:12 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 2312.7779354694017 msec.
19/12/20 15:41:14 WARN hdfs.DFSClient: No live nodes contain block BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 after checking nodes = [], ignoredNodes = null
19/12/20 15:41:14 INFO hdfs.DFSClient: No node available for BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 file=/apps/hbase/data/hbase.version
19/12/20 15:41:14 INFO hdfs.DFSClient: Could not obtain BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 from any node:  No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry...
19/12/20 15:41:14 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 IOException, will wait for 8143.439259842901 msec.
19/12/20 15:41:22 WARN hdfs.DFSClient: No live nodes contain block BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 after checking nodes = [], ignoredNodes = null
19/12/20 15:41:22 INFO hdfs.DFSClient: No node available for BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 file=/apps/hbase/data/hbase.version
19/12/20 15:41:22 INFO hdfs.DFSClient: Could not obtain BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 from any node:  No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry...
19/12/20 15:41:22 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 11760.036759939097 msec.
19/12/20 15:41:34 WARN hdfs.DFSClient: No live nodes contain block BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 after checking nodes = [], ignoredNodes = null
19/12/20 15:41:34 WARN hdfs.DFSClient: Could not obtain block: BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 file=/apps/hbase/data/hbase.version No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
19/12/20 15:41:34 WARN hdfs.DFSClient: No live nodes contain block BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 after checking nodes = [], ignoredNodes = null
19/12/20 15:41:34 WARN hdfs.DFSClient: Could not obtain block: BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 file=/apps/hbase/data/hbase.version No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
19/12/20 15:41:34 WARN hdfs.DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 file=/apps/hbase/data/hbase.version
        at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:870)
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:853)
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:832)
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:564)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:754)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:820)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
        at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269)
        at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:120)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:176)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
cat: Could not obtain block: BP-369465004-10.75.46.68-1539340329592:blk_1073744042_3218 file=/apps/hbase/data/hbase.version

Here no particular miracle on how to solve it:

[hdfs@client_node ~]$ hdfs fsck / -delete
Connecting to namenode via http://namenode01.domain.com:50070/fsck?ugi=hdfs&delete=1&path=%2F
FSCK started by hdfs (auth:SIMPLE) from /10.75.46.69 for path / at Fri Dec 20 15:41:53 CET 2019

/apps/hbase/data/MasterProcWALs/pv2-00000000000000000001.log: MISSING 1 blocks of total size 34358 B.
/apps/hbase/data/WALs/datanode01.domain.com,16020,1576839178870/datanode01.domain.com%2C16020%2C1576839178870.1576839191114: MISSING 1 blocks of total size 98 B.
/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001: MISSING 1 blocks of total size 996 B.
/apps/hbase/data/data/hbase/meta/1588230740/.regioninfo: MISSING 1 blocks of total size 32 B.
/apps/hbase/data/hbase.id: MISSING 1 blocks of total size 42 B.
/apps/hbase/data/hbase.version: MISSING 1 blocks of total size 7 B.
/hdp/apps/3.0.0.0-1634/mapreduce/mapreduce.tar.gz: MISSING 3 blocks of total size 306766434 B.
/hdp/apps/3.0.0.0-1634/yarn/service-dep.tar.gz: MISSING 1 blocks of total size 92348160 B.
Status: CORRUPT
 Number of data-nodes:  3
 Number of racks:               1
 Total dirs:                    780
 Total symlinks:                0

Replicated Blocks:
 Total size:    1404802556 B
 Total files:   190 (Files currently being written: 1)
 Total blocks (validated):      47 (avg. block size 29889416 B)
  ********************************
  UNDER MIN REPL'D BLOCKS:      10 (21.276596 %)
  MINIMAL BLOCK REPLICATION:    1
  CORRUPT FILES:        8
  MISSING BLOCKS:       10
  MISSING SIZE:         399150127 B
  ********************************
 Minimally replicated blocks:   37 (78.723404 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.3617022
 Missing blocks:                10
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)

Erasure Coded Block Groups:
 Total size:    0 B
 Total files:   0
 Total block groups (validated):        0
 Minimally erasure-coded block groups:  0
 Over-erasure-coded block groups:       0
 Under-erasure-coded block groups:      0
 Unsatisfactory placement block groups: 0
 Average block group size:      0.0
 Missing block groups:          0
 Corrupt block groups:          0
 Missing internal blocks:       0
FSCK ended at Fri Dec 20 15:41:54 CET 2019 in 306 milliseconds


The filesystem under path '/' is CORRUPT

Finally solved (with data loss of course):

[hdfs@client_node ~]$ hdfs fsck -list-corruptfileblocks
Connecting to namenode via http://namenode01.domain.com:50070/fsck?ugi=hdfs&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files

And:

[hdfs@client_node ~]$ hdfs dfsadmin -report
Configured Capacity: 6139207680 (5.72 GB)
Present Capacity: 5701216450 (5.31 GB)
DFS Remaining: 2659930112 (2.48 GB)
DFS Used: 3041286338 (2.83 GB)
DFS Used%: 53.34%
Replicated Blocks:
        Under replicated blocks: 0
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Pending deletion blocks: 0
Erasure Coded Block Groups:
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Pending deletion blocks: 0

References

The post How to handle HDFS blocks with corrupted replicas or under replicated appeared first on IT World.

]]>
https://blog.yannickjaquier.com/hadoop/how-to-handle-hdfs-blocks-with-corrupted-replicas-or-under-replicated.html/feed 0
Hive fetch task really improving response time by bypassing MapReduce ? https://blog.yannickjaquier.com/hadoop/hive-fetch-task-really-improving-response-time-by-bypassing-mapreduce.html https://blog.yannickjaquier.com/hadoop/hive-fetch-task-really-improving-response-time-by-bypassing-mapreduce.html#respond Sun, 24 Nov 2019 09:42:15 +0000 https://blog.yannickjaquier.com/?p=4814 Preamble Our internal customers have started to report erratic response time for similar queries in their Spotfire dashboard. They claimed the queries were almost the same but response time was ten times slower for some of them… Isn’t it my first performance issue with Hadoop ? Yessss it is !!!! We are running HDP-2.6 (2.6.4.0-91) […]

The post Hive fetch task really improving response time by bypassing MapReduce ? appeared first on IT World.

]]>

Table of contents

Preamble

Our internal customers have started to report erratic response time for similar queries in their Spotfire dashboard. They claimed the queries were almost the same but response time was ten times slower for some of them…

Isn’t it my first performance issue with Hadoop ? Yessss it is !!!!

We are running HDP-2.6 (2.6.4.0-91) and Hive release in this edition is 1.2.1000.

Identical queries not same response time

The first job was to extract the two “similar” queries from the Spotfire dashboard and compare them. I have written similar between quotes because what is similar from user perspective can be really different in real life. To execute them and find myself in the flow of all queries in log files I have used the same trick as for Oracle, means adding a comment with my first name.

Query01:

select --Yannick01
lot_id,
wafer_id,
flow_id,
param_id,
start_t,
finish_t,
param_name,
param_unit,
param_low_limit,
param_high_limit,
nb_dies_tested,
nb_dies_failed
from prod_spotfire_refined.tbl_bin_stat_orc
where fab = "C2WF"
and lot_partition in ("Q842")
and lot_id in ("Q842889")
and wafer_id in ('Q842889-01E3','Q842889-02D6','Q842889-03D1','Q842889-04C4','Q842889-05B7','Q842889-06B2','Q842889-07A5','Q842889-08A0','Q842889-09G6',
'Q842889-10A0','Q842889-11G6','Q842889-12G1','Q842889-13F4','Q842889-14E7','Q842889-15E2','Q842889-16D5','Q842889-17D0','Q842889-18C3','Q842889-19B6',
'Q842889-20C3','Q842889-21B6','Q842889-22B1','Q842889-23A4','Q842889-24H2')
and flow_id in ("EWS1")
and start_t in ('2019.01.04-19:50:55','2019.01.05-02:21:26','2019.01.05-08:33:59','2019.01.05-14:06:24','2019.01.05-19:35:23','2019.01.05-22:25:30',
'2019.01.06-03:49:53','2019.01.06-09:19:52','2019.01.06-14:23:47','2019.01.06-19:27:35','2019.01.07-00:47:59','2019.01.07-06:37:21','2019.01.07-11:15:14',
'2019.01.07-15:56:55','2019.01.07-20:05:50','2019.01.07-22:48:09','2019.01.08-04:37:37','2019.01.08-08:48:26','2019.01.08-13:51:34','2019.01.08-18:31:38',
'2019.01.09-00:01:41','2019.01.09-04:11:44','2019.01.09-09:45:08','2019.01.09-13:47:11')
and finish_t in ('2019.01.05-01:52:02','2019.01.05-08:30:52','2019.01.05-14:01:33','2019.01.05-19:32:20','2019.01.05-22:22:15','2019.01.06-03:46:46',
'2019.01.06-09:16:43','2019.01.06-14:20:42','2019.01.06-19:24:22','2019.01.07-00:44:48','2019.01.07-06:34:15','2019.01.07-11:12:02','2019.01.07-15:53:45',
'2019.01.07-20:02:41','2019.01.07-22:26:37','2019.01.08-04:34:30','2019.01.08-08:45:20','2019.01.08-13:48:27','2019.01.08-18:28:33','2019.01.08-23:58:34',
'2019.01.09-04:08:41','2019.01.09-09:41:57','2019.01.09-13:44:03','2019.01.09-18:01:54')
and hbin_number in ("9")
and sbin_number in ("403");

Query02:

select --Yannick02
lot_id,
wafer_id,
flow_id,
param_id,
start_t,
finish_t,
param_name,
param_unit,
param_low_limit,
param_high_limit,
nb_dies_tested,
nb_dies_failed
from prod_spotfire_refined.tbl_bin_stat_orc
where fab = "C2WF"
and lot_partition in ("Q840")
and lot_id in ("Q840401")
and wafer_id in ('Q840401-01E6','Q840401-02E1','Q840401-03D4','Q840401-04C7','Q840401-05C2','Q840401-06B5','Q840401-07B0','Q840401-08A3','Q840401-09H1',
'Q840401-10A3','Q840401-11H1','Q840401-12G4','Q840401-13F7','Q840401-14F2','Q840401-15E5','Q840401-16E0','Q840401-17D3','Q840401-18C6','Q840401-19C1',
'Q840401-20C6','Q840401-21C1','Q840401-22B4','Q840401-23A7','Q840401-24A2','Q840401-25H0')
and flow_id in ("EWS1")
and start_t in ('2018.12.27-10:42:54','2018.12.27-12:01:57','2018.12.27-13:18:47','2018.12.27-14:36:31','2018.12.27-15:55:57','2018.12.27-17:13:42',
'2018.12.27-18:31:34','2018.12.27-19:49:27','2018.12.27-21:05:30','2018.12.27-22:23:36','2018.12.27-23:40:11','2018.12.28-00:56:40','2018.12.28-02:15:51',
'2018.12.28-03:41:23','2018.12.28-04:58:02','2018.12.28-06:16:11','2018.12.28-07:34:40','2018.12.28-08:55:29','2018.12.28-10:13:25','2018.12.28-11:30:34',
'2018.12.28-12:48:08','2018.12.28-14:06:12','2018.12.28-15:23:00','2018.12.28-16:39:50','2018.12.28-17:56:57')
and finish_t in ('2018.12.27-12:00:40','2018.12.27-13:17:30','2018.12.27-14:35:13','2018.12.27-15:54:39','2018.12.27-17:12:26','2018.12.27-18:30:16',
'2018.12.27-19:48:12','2018.12.27-21:04:15','2018.12.27-22:22:19','2018.12.27-23:38:53','2018.12.28-00:55:24','2018.12.28-02:14:34','2018.12.28-03:32:41',
'2018.12.28-04:56:43','2018.12.28-06:14:52','2018.12.28-07:33:21','2018.12.28-08:54:12','2018.12.28-10:12:06','2018.12.28-11:29:17','2018.12.28-12:46:50',
'2018.12.28-14:04:55','2018.12.28-15:21:43','2018.12.28-16:38:30','2018.12.28-17:55:38','2018.12.28-19:13:18')
and hbin_number in ("1")
and sbin_number in ("1");

At this stage I have to say that queries are pretty similar and our users might be right as the response time should not be that different between the two queries…

Query02 returns 682 rows in around 330 seconds. Query01 returns 38,664 rows in around 25 seconds (16 seconds to really execute the query as you can see below, the rest is network transfert).

So we have a factor of 13 times less efficient to return 57 times less row

Partitions statistics and concatenation

I have started by checking that the two involved partitions are almost the same from statistics point of view and that compaction has been done for both of them.

From global statistics there is a difference but not that much. Number of rows and total partition size are really close. So close that it cannot explain the factor ten in response time:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q842");
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+
|             col_name              |                                                          data_type                                                          |           comment           |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+
| # col_name                        | data_type                                                                                                                   | comment                     |
|                                   | NULL                                                                                                                        | NULL                        |
| lot_id                            | string                                                                                                                      |                             |
| wafer_id                          | string                                                                                                                      |                             |
| flow_id                           | string                                                                                                                      |                             |
| start_t                           | string                                                                                                                      |                             |
| finish_t                          | string                                                                                                                      |                             |
| hbin_number                       | int                                                                                                                         |                             |
| hbin_name                         | string                                                                                                                      |                             |
| sbin_number                       | int                                                                                                                         |                             |
| sbin_name                         | string                                                                                                                      |                             |
| param_id                          | string                                                                                                                      |                             |
| param_name                        | string                                                                                                                      |                             |
| param_unit                        | string                                                                                                                      |                             |
| param_low_limit                   | float                                                                                                                       |                             |
| param_high_limit                  | float                                                                                                                       |                             |
| nb_dies_tested                    | int                                                                                                                         |                             |
| nb_dies_failed                    | int                                                                                                                         |                             |
| nb_dies_good                      | int                                                                                                                         |                             |
| ingestion_date                    | string                                                                                                                      |                             |
|                                   | NULL                                                                                                                        | NULL                        |
| # Partition Information           | NULL                                                                                                                        | NULL                        |
| # col_name                        | data_type                                                                                                                   | comment                     |
|                                   | NULL                                                                                                                        | NULL                        |
| fab                               | string                                                                                                                      |                             |
| lot_partition                     | string                                                                                                                      |                             |
|                                   | NULL                                                                                                                        | NULL                        |
| # Detailed Partition Information  | NULL                                                                                                                        | NULL                        |
| Partition Value:                  | [C2WF, Q842]                                                                                                                | NULL                        |
| Database:                         | prod_spotfire_refined                                                                                                       | NULL                        |
| Table:                            | tbl_bin_stat_orc                                                                                                            | NULL                        |
| CreateTime:                       | Mon Feb 11 13:03:15 CET 2019                                                                                                | NULL                        |
| LastAccessTime:                   | UNKNOWN                                                                                                                     | NULL                        |
| Protect Mode:                     | None                                                                                                                        | NULL                        |
| Location:                         | hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842  | NULL                        |
| Partition Parameters:             | NULL                                                                                                                        | NULL                        |
|                                   | COLUMN_STATS_ACCURATE                                                                                                       | {\"BASIC_STATS\":\"true\"}  |
|                                   | numFiles                                                                                                                    | 14                          |
|                                   | numRows                                                                                                                     | 143216514                   |
|                                   | rawDataSize                                                                                                                 | 151433505731                |
|                                   | totalSize                                                                                                                   | 1353827219                  |
|                                   | transient_lastDdlTime                                                                                                       | 1554163118                  |
|                                   | NULL                                                                                                                        | NULL                        |
| # Storage Information             | NULL                                                                                                                        | NULL                        |
| SerDe Library:                    | org.apache.hadoop.hive.ql.io.orc.OrcSerde                                                                                   | NULL                        |
| InputFormat:                      | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat                                                                             | NULL                        |
| OutputFormat:                     | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat                                                                            | NULL                        |
| Compressed:                       | No                                                                                                                          | NULL                        |
| Num Buckets:                      | -1                                                                                                                          | NULL                        |
| Bucket Columns:                   | []                                                                                                                          | NULL                        |
| Sort Columns:                     | []                                                                                                                          | NULL                        |
| Storage Desc Params:              | NULL                                                                                                                        | NULL                        |
|                                   | serialization.format                                                                                                        | 1                           |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q840");
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+
|             col_name              |                                                          data_type                                                          |           comment           |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+
| # col_name                        | data_type                                                                                                                   | comment                     |
|                                   | NULL                                                                                                                        | NULL                        |
| lot_id                            | string                                                                                                                      |                             |
| wafer_id                          | string                                                                                                                      |                             |
| flow_id                           | string                                                                                                                      |                             |
| start_t                           | string                                                                                                                      |                             |
| finish_t                          | string                                                                                                                      |                             |
| hbin_number                       | int                                                                                                                         |                             |
| hbin_name                         | string                                                                                                                      |                             |
| sbin_number                       | int                                                                                                                         |                             |
| sbin_name                         | string                                                                                                                      |                             |
| param_id                          | string                                                                                                                      |                             |
| param_name                        | string                                                                                                                      |                             |
| param_unit                        | string                                                                                                                      |                             |
| param_low_limit                   | float                                                                                                                       |                             |
| param_high_limit                  | float                                                                                                                       |                             |
| nb_dies_tested                    | int                                                                                                                         |                             |
| nb_dies_failed                    | int                                                                                                                         |                             |
| nb_dies_good                      | int                                                                                                                         |                             |
| ingestion_date                    | string                                                                                                                      |                             |
|                                   | NULL                                                                                                                        | NULL                        |
| # Partition Information           | NULL                                                                                                                        | NULL                        |
| # col_name                        | data_type                                                                                                                   | comment                     |
|                                   | NULL                                                                                                                        | NULL                        |
| fab                               | string                                                                                                                      |                             |
| lot_partition                     | string                                                                                                                      |                             |
|                                   | NULL                                                                                                                        | NULL                        |
| # Detailed Partition Information  | NULL                                                                                                                        | NULL                        |
| Partition Value:                  | [C2WF, Q840]                                                                                                                | NULL                        |
| Database:                         | prod_spotfire_refined                                                                                                       | NULL                        |
| Table:                            | tbl_bin_stat_orc                                                                                                            | NULL                        |
| CreateTime:                       | Mon Feb 11 13:02:12 CET 2019                                                                                                | NULL                        |
| LastAccessTime:                   | UNKNOWN                                                                                                                     | NULL                        |
| Protect Mode:                     | None                                                                                                                        | NULL                        |
| Location:                         | hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840  | NULL                        |
| Partition Parameters:             | NULL                                                                                                                        | NULL                        |
|                                   | COLUMN_STATS_ACCURATE                                                                                                       | {\"BASIC_STATS\":\"true\"}  |
|                                   | numFiles                                                                                                                    | 11                          |
|                                   | numRows                                                                                                                     | 109795564                   |
|                                   | rawDataSize                                                                                                                 | 116612625917                |
|                                   | totalSize                                                                                                                   | 989787753                   |
|                                   | transient_lastDdlTime                                                                                                       | 1554163118                  |
|                                   | NULL                                                                                                                        | NULL                        |
| # Storage Information             | NULL                                                                                                                        | NULL                        |
| SerDe Library:                    | org.apache.hadoop.hive.ql.io.orc.OrcSerde                                                                                   | NULL                        |
| InputFormat:                      | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat                                                                             | NULL                        |
| OutputFormat:                     | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat                                                                            | NULL                        |
| Compressed:                       | No                                                                                                                          | NULL                        |
| Num Buckets:                      | -1                                                                                                                          | NULL                        |
| Bucket Columns:                   | []                                                                                                                          | NULL                        |
| Sort Columns:                     | []                                                                                                                          | NULL                        |
| Storage Desc Params:              | NULL                                                                                                                        | NULL                        |
|                                   | serialization.format                                                                                                        | 1                           |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------+--+

Even if a bit boring and long I have also checked the columns statistics, again not much differences:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc param_id partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| param_id                | string                |                       |                       | 0                     | 92393                 | 5.7145                | 10                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.445 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc param_id partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| param_id                | string                |                       |                       | 0                     | 169450                | 5.6886                | 10                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.457 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc lot_id partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| lot_id                  | string                |                       |                       | 0                     | 290                   | 7.1705                | 10                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.407 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc lot_id partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| lot_id                  | string                |                       |                       | 0                     | 172                   | 7.071                 | 10                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.416 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc wafer_id partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| wafer_id                | string                |                       |                       | 0                     | 3744                  | 11.9485               | 12                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.444 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc wafer_id partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| wafer_id                | string                |                       |                       | 0                     | 6867                  | 11.8972               | 12                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.398 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc flow_id partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| flow_id                 | string                |                       |                       | 0                     | 18                    | 4.1747                | 30                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.423 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc flow_id partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| flow_id                 | string                |                       |                       | 0                     | 37                    | 4.3368                | 31                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.448 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc start_t partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| start_t                 | string                |                       |                       | 0                     | 17811                 | 19.0                  | 19                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.425 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc start_t partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| start_t                 | string                |                       |                       | 0                     | 11549                 | 19.0                  | 19                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.408 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc finish_t partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| finish_t                | string                |                       |                       | 0                     | 11059                 | 19.0                  | 19                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.398 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc finish_t partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| finish_t                | string                |                       |                       | 0                     | 12060                 | 19.0                  | 19                    |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.382 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc hbin_number partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| hbin_number             | int                   | 0                     | 65535                 | 0                     | 61                    |                       |                       |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.356 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc hbin_number partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| hbin_number             | int                   | 0                     | 65535                 | 0                     | 58                    |                       |                       |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.364 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc sbin_number partition(fab = "C2WF", lot_partition="Q840");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| sbin_number             | int                   | 1                     | 65535                 | 0                     | 3288                  |                       |                       |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.417 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc sbin_number partition(fab = "C2WF", lot_partition="Q842");
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
|        col_name         |       data_type       |          min          |          max          |       num_nulls       |    distinct_count     |      avg_col_len      |      max_col_len      |       num_trues       |      num_falses       |        comment        |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
| # col_name              | data_type             | min                   | max                   | num_nulls             | distinct_count        | avg_col_len           | max_col_len           | num_trues             | num_falses            | comment               |
|                         | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  | NULL                  |
| sbin_number             | int                   | 0                     | 65535                 | 0                     | 3148                  |                       |                       |                       |                       | from deserializer     |
+-------------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--+
3 rows selected (0.344 seconds)

The global table and statistics information can be get in a more condensed way using:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe extended prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q840");

| Detailed Partition Information  | Partition(values:[C2WF, Q840], dbName:prod_spotfire_refined, tableName:tbl_bin_stat_orc, createTime:1549886532, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:lot_id, type:string, comment:null), FieldSchema(name:wafer_id, type:string, comment:null), FieldSchema(name:flow_id, type:string, comment:null), FieldSchema(name:start_t, type:string, comment:null), FieldSchema(name:finish_t, type:string, comment:null), FieldSchema(name:hbin_number, type:int, comment:null), FieldSchema(name:hbin_name, type:string, comment:null), FieldSchema(name:sbin_number, type:int, comment:null), FieldSchema(name:sbin_name, type:string, comment:null), FieldSchema(name:param_id, type:string, comment:null), FieldSchema(name:param_name, type:string, comment:null), FieldSchema(name:param_unit, type:string, comment:null), FieldSchema(name:param_low_limit, type:float, comment:null), FieldSchema(name:param_high_limit, type:float, comment:null), FieldSchema(name:nb_dies_tested, type:int, comment:null), FieldSchema(name:nb_dies_failed, type:int, comment:null), FieldSchema(name:nb_dies_good, type:int, comment:null), FieldSchema(name:ingestion_date, type:string, comment:null), FieldSchema(name:fab, type:string, comment:null), FieldSchema(name:lot_partition, type:string, comment:null)], location:hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:{totalSize=989787753,  numRows=109795564, rawDataSize=116612625917, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=11, transient_lastDdlTime=1554163118})  |                       |

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe extended prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q842");

| Detailed Partition Information  | Partition(values:[C2WF, Q842], dbName:prod_spotfire_refined, tableName:tbl_bin_stat_orc, createTime:1549886595, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:lot_id, type:string, comment:null), FieldSchema(name:wafer_id, type:string, comment:null), FieldSchema(name:flow_id, type:string, comment:null), FieldSchema(name:start_t, type:string, comment:null), FieldSchema(name:finish_t, type:string, comment:null), FieldSchema(name:hbin_number, type:int, comment:null), FieldSchema(name:hbin_name, type:string, comment:null), FieldSchema(name:sbin_number, type:int, comment:null), FieldSchema(name:sbin_name, type:string, comment:null), FieldSchema(name:param_id, type:string, comment:null), FieldSchema(name:param_name, type:string, comment:null), FieldSchema(name:param_unit, type:string, comment:null), FieldSchema(name:param_low_limit, type:float, comment:null), FieldSchema(name:param_high_limit, type:float, comment:null), FieldSchema(name:nb_dies_tested, type:int, comment:null), FieldSchema(name:nb_dies_failed, type:int, comment:null), FieldSchema(name:nb_dies_good, type:int, comment:null), FieldSchema(name:ingestion_date, type:string, comment:null), FieldSchema(name:fab, type:string, comment:null), FieldSchema(name:lot_partition, type:string, comment:null)], location:hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:{totalSize=1353827219, numRows=143216514, rawDataSize=151433505731, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=14, transient_lastDdlTime=1554163118})  |             

Same as you would do on a traditional RDBMS I have extracted the explain plans of two queries:

explain --extended
select --Yannick01
lot_id,
wafer_id,
flow_id,
param_id,
start_t,
finish_t,
param_name,
param_unit,
param_low_limit,
param_high_limit,
nb_dies_tested,
nb_dies_failed
from prod_spotfire_refined.tbl_bin_stat_orc
where fab = "C2WF"
and lot_partition in ("Q842")
and lot_id in ("Q842889")
and wafer_id in ('Q842889-01E3','Q842889-02D6','Q842889-03D1','Q842889-04C4','Q842889-05B7','Q842889-06B2','Q842889-07A5','Q842889-08A0','Q842889-09G6','Q842889-10A0','Q842889-11G6','Q842889-12G1','Q842889-13F4','Q842889-14E7','Q842889-15E2','Q842889-16D5','Q842889-17D0','Q842889-18C3','Q842889-19B6','Q842889-20C3','Q842889-21B6','Q842889-22B1','Q842889-23A4','Q842889-24H2')
and flow_id in ("EWS1")
and start_t in ('2019.01.04-19:50:55','2019.01.05-02:21:26','2019.01.05-08:33:59','2019.01.05-14:06:24','2019.01.05-19:35:23','2019.01.05-22:25:30','2019.01.06-03:49:53','2019.01.06-09:19:52','2019.01.06-14:23:47','2019.01.06-19:27:35','2019.01.07-00:47:59','2019.01.07-06:37:21','2019.01.07-11:15:14','2019.01.07-15:56:55','2019.01.07-20:05:50','2019.01.07-22:48:09','2019.01.08-04:37:37','2019.01.08-08:48:26','2019.01.08-13:51:34','2019.01.08-18:31:38','2019.01.09-00:01:41','2019.01.09-04:11:44','2019.01.09-09:45:08','2019.01.09-13:47:11')
and finish_t in ('2019.01.05-01:52:02','2019.01.05-08:30:52','2019.01.05-14:01:33','2019.01.05-19:32:20','2019.01.05-22:22:15','2019.01.06-03:46:46','2019.01.06-09:16:43','2019.01.06-14:20:42','2019.01.06-19:24:22','2019.01.07-00:44:48','2019.01.07-06:34:15','2019.01.07-11:12:02','2019.01.07-15:53:45','2019.01.07-20:02:41','2019.01.07-22:26:37','2019.01.08-04:34:30','2019.01.08-08:45:20','2019.01.08-13:48:27','2019.01.08-18:28:33','2019.01.08-23:58:34','2019.01.09-04:08:41','2019.01.09-09:41:57','2019.01.09-13:44:03','2019.01.09-18:01:54')
and hbin_number in ("9")
and sbin_number in ("403");

Explain                                                                                                                                                                                                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Plan not optimized by CBO.                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                
Stage-0                                                                                                                                                                                                                                                         
   Fetch Operator                                                                                                                                                                                                                                               
      limit:-1                                                                                                                                                                                                                                                  
      Stage-1                                                                                                                                                                                                                                                   
         Map 1                                                                                                                                                                                                                                                  
         File Output Operator [FS_2725118]                                                                                                                                                                                                                      
            compressed:false                                                                                                                                                                                                                                    
            Statistics:Num rows: 1 Data size: 780 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                                  
            table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}                                      
            Select Operator [SEL_2725117]                                                                                                                                                                                                                       
               outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]                                                                                                                            
               Statistics:Num rows: 1 Data size: 780 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                               
               Filter Operator [FIL_2725119]                                                                                                                                                                                                                    
                  predicate:((lot_partition) IN ('Q842') and (lot_id) IN ('Q842889') and (wafer_id) IN ('Q842889-01E3', 'Q842889-02D6', 'Q842889-03D1', 'Q842889-04C4', 'Q842889-05B7', 'Q842889-06B2', 'Q842889-07A5', 'Q842889-08A0', 'Q842889-09G6', 'Q842889-10A0', 'Q842889-11G6', 'Q842889-12G1', 'Q842889-13F4', 'Q842889-14E7', 'Q842889-15E2', 'Q842889-16D5', 'Q842889-17D0', 'Q842889-18C3', 'Q842889-19B6', 'Q842889-20C3', 'Q842889-21B6', 'Q842889-22B1', 'Q842889-23A4', 'Q842889-24H2') and (flow_id) IN ('EWS1') and (start_t) IN ('2019.01.04-19:50:55', '2019.01.05-02:21:26', '2019.01.05-08:33:59', '2019.01.05-14:06:24', '2019.01.05-19:35:23', '2019.01.05-22:25:30', '2019.01.06-03:49:53', '2019.01.06-09:19:52', '2019.01.06-14:23:47', '2019.01.06-19:27:35', '2019.01.07-00:47:59', '2019.01.07-06:37:21', '2019.01.07-11:15:14', '2019.01.07-15:56:55', '2019.01.07-20:05:50', '2019.01.07-22:48:09', '2019.01.08-04:37:37', '2019.01.08-08:48:26', '2019.01.08-13:51:34', '2019.01.08-18:31:38', '2019.01.09-00:01:41', '2019.01.09-04:11:44', '2019.01.09-09:45:08', '2019.01.09-13:47:11') and (finish_t) IN ('2019.01.05-01:52:02', '2019.01.05-08:30:52', '2019.01.05-14:01:33', '2019.01.05-19:32:20', '2019.01.05-22:22:15', '2019.01.06-03:46:46', '2019.01.06-09:16:43', '2019.01.06-14:20:42', '2019.01.06-19:24:22', '2019.01.07-00:44:48', '2019.01.07-06:34:15', '2019.01.07-11:12:02', '2019.01.07-15:53:45', '2019.01.07-20:02:41', '2019.01.07-22:26:37', '2019.01.08-04:34:30', '2019.01.08-08:45:20', '2019.01.08-13:48:27', '2019.01.08-18:28:33', '2019.01.08-23:58:34', '2019.01.09-04:08:41', '2019.01.09-09:41:57', '2019.01.09-13:44:03', '2019.01.09-18:01:54') and (hbin_number) IN ('9') and (sbin_number) IN ('403')) (type: boolean) 
                  Statistics:Num rows: 1 Data size: 972 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                                            
                  TableScan [TS_2725115]                                                                                                                                                                                                                        
                     alias:tbl_bin_stat_orc                                                                                                                                                                                                                     
                     Statistics:Num rows: 143216514 Data size: 151433505731 Basic stats: COMPLETE Column stats: COMPLETE                                                                                                                                        
                                                                                                                                                                                                                                                                

21 rows selected. 

Remark:
I also tried the EXTENDED mode of EXPAIN PLAN command (only available option of my Hive release, but plenty other are available in latest Hive releases) that, obviously, provide a more verbose output. Maybe too much verbose but from it I have seen an interesting additional information that tell you that your query is effectively using partition pruning and access only to the expected partitions:

partition values:                                                                                                                                                                                                                                   
  fab C2WF                                                                                                                                                                                                                                          
  lot_partition Q840     
explain --extended
select --Yannick02
lot_id,
wafer_id,
flow_id,
param_id,
start_t,
finish_t,
param_name,
param_unit,
param_low_limit,
param_high_limit,
nb_dies_tested,
nb_dies_failed
from prod_spotfire_refined.tbl_bin_stat_orc
where fab = "C2WF"
and lot_partition in ("Q840")
and lot_id in ("Q840401")
and wafer_id in ('Q840401-01E6','Q840401-02E1','Q840401-03D4','Q840401-04C7','Q840401-05C2','Q840401-06B5','Q840401-07B0','Q840401-08A3','Q840401-09H1','Q840401-10A3','Q840401-11H1','Q840401-12G4','Q840401-13F7','Q840401-14F2','Q840401-15E5','Q840401-16E0','Q840401-17D3','Q840401-18C6','Q840401-19C1','Q840401-20C6','Q840401-21C1','Q840401-22B4','Q840401-23A7','Q840401-24A2','Q840401-25H0')
and flow_id in ("EWS1")
and start_t in ('2018.12.27-10:42:54','2018.12.27-12:01:57','2018.12.27-13:18:47','2018.12.27-14:36:31','2018.12.27-15:55:57','2018.12.27-17:13:42','2018.12.27-18:31:34','2018.12.27-19:49:27','2018.12.27-21:05:30','2018.12.27-22:23:36','2018.12.27-23:40:11','2018.12.28-00:56:40','2018.12.28-02:15:51','2018.12.28-03:41:23','2018.12.28-04:58:02','2018.12.28-06:16:11','2018.12.28-07:34:40','2018.12.28-08:55:29','2018.12.28-10:13:25','2018.12.28-11:30:34','2018.12.28-12:48:08','2018.12.28-14:06:12','2018.12.28-15:23:00','2018.12.28-16:39:50','2018.12.28-17:56:57')
and finish_t in ('2018.12.27-12:00:40','2018.12.27-13:17:30','2018.12.27-14:35:13','2018.12.27-15:54:39','2018.12.27-17:12:26','2018.12.27-18:30:16','2018.12.27-19:48:12','2018.12.27-21:04:15','2018.12.27-22:22:19','2018.12.27-23:38:53','2018.12.28-00:55:24','2018.12.28-02:14:34','2018.12.28-03:32:41','2018.12.28-04:56:43','2018.12.28-06:14:52','2018.12.28-07:33:21','2018.12.28-08:54:12','2018.12.28-10:12:06','2018.12.28-11:29:17','2018.12.28-12:46:50','2018.12.28-14:04:55','2018.12.28-15:21:43','2018.12.28-16:38:30','2018.12.28-17:55:38','2018.12.28-19:13:18')
and hbin_number in ("1")
and sbin_number in ("1");


Explain                                                                                                                                                                                                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Plan not optimized by CBO.                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                
Stage-0                                                                                                                                                                                                                                                         
   Fetch Operator                                                                                                                                                                                                                                               
      limit:-1                                                                                                                                                                                                                                                  
      Select Operator [SEL_2725124]                                                                                                                                                                                                                             
         outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]                                                                                                                                  
         Filter Operator [FIL_2725126]                                                                                                                                                                                                                          
            predicate:((lot_partition) IN ('Q840') and (lot_id) IN ('Q840401') and (wafer_id) IN ('Q840401-01E6', 'Q840401-02E1', 'Q840401-03D4', 'Q840401-04C7', 'Q840401-05C2', 'Q840401-06B5', 'Q840401-07B0', 'Q840401-08A3', 'Q840401-09H1', 'Q840401-10A3', 'Q840401-11H1', 'Q840401-12G4', 'Q840401-13F7', 'Q840401-14F2', 'Q840401-15E5', 'Q840401-16E0', 'Q840401-17D3', 'Q840401-18C6', 'Q840401-19C1', 'Q840401-20C6', 'Q840401-21C1', 'Q840401-22B4', 'Q840401-23A7', 'Q840401-24A2', 'Q840401-25H0') and (flow_id) IN ('EWS1') and (start_t) IN ('2018.12.27-10:42:54', '2018.12.27-12:01:57', '2018.12.27-13:18:47', '2018.12.27-14:36:31', '2018.12.27-15:55:57', '2018.12.27-17:13:42', '2018.12.27-18:31:34', '2018.12.27-19:49:27', '2018.12.27-21:05:30', '2018.12.27-22:23:36', '2018.12.27-23:40:11', '2018.12.28-00:56:40', '2018.12.28-02:15:51', '2018.12.28-03:41:23', '2018.12.28-04:58:02', '2018.12.28-06:16:11', '2018.12.28-07:34:40', '2018.12.28-08:55:29', '2018.12.28-10:13:25', '2018.12.28-11:30:34', '2018.12.28-12:48:08', '2018.12.28-14:06:12', '2018.12.28-15:23:00', '2018.12.28-16:39:50', '2018.12.28-17:56:57') and (finish_t) IN ('2018.12.27-12:00:40', '2018.12.27-13:17:30', '2018.12.27-14:35:13', '2018.12.27-15:54:39', '2018.12.27-17:12:26', '2018.12.27-18:30:16', '2018.12.27-19:48:12', '2018.12.27-21:04:15', '2018.12.27-22:22:19', '2018.12.27-23:38:53', '2018.12.28-00:55:24', '2018.12.28-02:14:34', '2018.12.28-03:32:41', '2018.12.28-04:56:43', '2018.12.28-06:14:52', '2018.12.28-07:33:21', '2018.12.28-08:54:12', '2018.12.28-10:12:06', '2018.12.28-11:29:17', '2018.12.28-12:46:50', '2018.12.28-14:04:55', '2018.12.28-15:21:43', '2018.12.28-16:38:30', '2018.12.28-17:55:38', '2018.12.28-19:13:18') and (hbin_number) IN ('1') and (sbin_number) IN ('1')) (type: boolean) 
            TableScan [TS_2725122]                                                                                                                                                                                                                              
               alias:tbl_bin_stat_orc                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                

12 rows selected.

We can anyway notice that the explain plan of Query02 is not displaying any statistics even if they are there in the table. I would have expected in clear text something like “fetch task”, let see why…

When executing the queries, the first observation I have made is that query02 does not do a MapReduce job (using TZ engine as configured by default on our cluster) but a direct HDFS access. You can see that the query is NOT doing a MapReduce in Beeline when just after having submitted your query you don’t see the usual graphical display of number of Map and Reduce jobs but directly the query result…

Query01 performs a standard MapReduce job:

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      9          9        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 15.92 s
--------------------------------------------------------------------------------

We have seen above that there is a factor of 13 times less efficient to return 57 times less row. so looks like the direct HDFS access called a fetch task is not that optimal…

Remark:
The “Plan not optimized by CBO” message might look scary so decided to dig a bit on it. First you have to be on your Hive server and check /var/log/hive/hiveserver2.log file (Hadoop parameter is hive_log_dir). Now you understand why I have added stupid comment in my query (–Yannick01 and –Yannick02), as this was a trick I was using with Oracle to find myself in V$SQL. Here opening this huge file you can search for your name and below the text of your query you should be able to see something like:

2019-04-04 12:06:12,633 INFO  [HiveServer2-Handler-Pool: Thread-15548880]: parse.BaseSemanticAnalyzer (CalcitePlanner.java:canCBOHandleAst(405)) - Not invoking CBO because the statement has too few joins

I have also checked the partitions sizes as well as number of ORC files in each partition (compaction):

hdfs@client_node:~$ hdfs dfs -ls -r -t /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842
Found 14 items
-rwxrwxrwx   3 mfgdl_ingestion hadoop   10185787 2019-03-30 17:16 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000006_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    4240294 2019-03-30 17:16 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000007_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop   10696310 2019-03-30 17:16 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000005_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  268531575 2019-03-30 17:16 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000001_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  263145620 2019-03-30 17:16 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000004_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  257281211 2019-03-30 17:28 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000003_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  275941797 2019-03-30 17:34 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000000_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  252093563 2019-03-30 17:40 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000002_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    2127431 2019-03-30 22:38 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000018_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    2144463 2019-03-31 10:21 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000018_0_copy_1
-rwxrwxrwx   3 mfgdl_ingestion hadoop    2833270 2019-03-31 23:38 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000015_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    1563968 2019-04-01 10:50 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000021_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    1418099 2019-04-01 17:31 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000021_0_copy_1
-rwxrwxrwx   3 mfgdl_ingestion hadoop    1623831 2019-04-02 01:58 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842/000017_0

hdfs@client_node:~$ hdfs dfs -du -h -s /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842
1.3 G  /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q842

hdfs@client_node:~$ hdfs dfs -ls -r -t /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840
Found 12 items
-rwxrwxrwx   3 mfgdl_ingestion hadoop  298096825 2019-03-30 17:01 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000000_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop    7651387 2019-03-30 17:01 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000004_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  262040206 2019-03-30 17:02 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000001_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  177619545 2019-03-30 17:02 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000003_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop  241590172 2019-03-30 17:13 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000002_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop     588415 2019-03-30 22:38 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000035_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop     466219 2019-03-31 10:21 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000037_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop     496742 2019-03-31 23:38 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000038_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop     463094 2019-04-01 10:50 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000037_0_copy_1
-rwxrwxrwx   3 mfgdl_ingestion hadoop     362538 2019-04-01 17:31 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000041_0
-rwxrwxrwx   3 mfgdl_ingestion hadoop     412610 2019-04-02 01:58 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000038_0_copy_1
drwxrwxrwx   - mfgaewsp        hadoop          0 2019-04-03 16:12 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/.hive-staging_hive_2019-04-03_16-12-30_095_3617150963696131727-133149

hdfs@client_node:~$ hdfs dfs -du -h -s /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840
943.9 M  /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840

Again no particular big differences in size and number of files for two partitions…

To speed up long query I have tried to compact the partition using:

ALTER TABLE prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q840") CONCATENATE;

Reaching this status:

hdfs@client_node:~$ hdfs dfs -ls -r -t /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840
Found 5 items
-rwxrwxrwx   3 mfgaewsp hadoop    8702924 2019-04-03 16:12 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000004_0
-rwxrwxrwx   3 mfgaewsp hadoop  266770170 2019-04-03 16:12 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000001_0
-rwxrwxrwx   3 mfgaewsp hadoop  178527568 2019-04-03 16:12 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000003_0
-rwxrwxrwx   3 mfgaewsp hadoop  242418393 2019-04-03 16:13 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000002_0
-rwxrwxrwx   3 mfgaewsp hadoop  291710785 2019-04-03 16:13 /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840/000000_0

Then we discovered that concatenate command is destroying the existing statistics so we had to recompute them again !

ANALYZE TABLE prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q840") COMPUTE STATISTICS FOR COLUMNS;

Remark:
We have been able to use the FOR COLUMNS without specifying the columns list because our table does not contains any complex type like Array.

Statistics went back to initial situation:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> describe formatted prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q840");
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
|             col_name              |                                                          data_type                                                          |                                                                                                                                                                                                                                    comment                                                                                                                                                                                                                                     |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| # col_name                        | data_type                                                                                                                   | comment                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                   | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| lot_id                            | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| wafer_id                          | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| flow_id                           | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| start_t                           | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| finish_t                          | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| hbin_number                       | int                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| hbin_name                         | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| sbin_number                       | int                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| sbin_name                         | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| param_id                          | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| param_name                        | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| param_unit                        | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| param_low_limit                   | float                                                                                                                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| param_high_limit                  | float                                                                                                                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| nb_dies_tested                    | int                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| nb_dies_failed                    | int                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| nb_dies_good                      | int                                                                                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| ingestion_date                    | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                   | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| # Partition Information           | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| # col_name                        | data_type                                                                                                                   | comment                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                   | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| fab                               | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| lot_partition                     | string                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                   | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| # Detailed Partition Information  | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Partition Value:                  | [C2WF, Q840]                                                                                                                | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Database:                         | prod_spotfire_refined                                                                                                       | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Table:                            | tbl_bin_stat_orc                                                                                                            | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| CreateTime:                       | Mon Feb 11 13:02:12 CET 2019                                                                                                | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| LastAccessTime:                   | UNKNOWN                                                                                                                     | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Protect Mode:                     | None                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Location:                         | hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840  | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Partition Parameters:             | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|                                   | COLUMN_STATS_ACCURATE                                                                                                       | {\"COLUMN_STATS\":{\"lot_id\":\"true\",\"wafer_id\":\"true\",\"flow_id\":\"true\",\"start_t\":\"true\",\"finish_t\":\"true\",\"hbin_number\":\"true\",\"hbin_name\":\"true\",\"sbin_number\":\"true\",\"sbin_name\":\"true\",\"param_id\":\"true\",\"param_name\":\"true\",\"param_unit\":\"true\",\"param_low_limit\":\"true\",\"param_high_limit\":\"true\",\"nb_dies_tested\":\"true\",\"nb_dies_failed\":\"true\",\"nb_dies_good\":\"true\",\"ingestion_date\":\"true\"}}  |
|                                   | numFiles                                                                                                                    | 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|                                   | numRows                                                                                                                     | 109795564                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                                   | rawDataSize                                                                                                                 | 116612625917                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|                                   | totalSize                                                                                                                   | 988129840                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|                                   | transient_lastDdlTime                                                                                                       | 1554300803                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|                                   | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| # Storage Information             | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| SerDe Library:                    | org.apache.hadoop.hive.ql.io.orc.OrcSerde                                                                                   | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| InputFormat:                      | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat                                                                             | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| OutputFormat:                     | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat                                                                            | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Compressed:                       | No                                                                                                                          | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Num Buckets:                      | -1                                                                                                                          | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Bucket Columns:                   | []                                                                                                                          | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Sort Columns:                     | []                                                                                                                          | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Storage Desc Params:              | NULL                                                                                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|                                   | serialization.format                                                                                                        | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
describe extended prod_spotfire_refined.tbl_bin_stat_orc partition(fab = "C2WF", lot_partition="Q840");

| Detailed Partition Information  | Partition(values:[C2WF, Q840], dbName:prod_spotfire_refined, tableName:tbl_bin_stat_orc, createTime:1549886532, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:lot_id, type:string, comment:null), FieldSchema(name:wafer_id, type:string, comment:null), FieldSchema(name:flow_id, type:string, comment:null), FieldSchema(name:start_t, type:string, comment:null), FieldSchema(name:finish_t, type:string, comment:null), FieldSchema(name:hbin_number, type:int, comment:null), FieldSchema(name:hbin_name, type:string, comment:null), FieldSchema(name:sbin_number, type:int, comment:null), FieldSchema(name:sbin_name, type:string, comment:null), FieldSchema(name:param_id, type:string, comment:null), FieldSchema(name:param_name, type:string, comment:null), FieldSchema(name:param_unit, type:string, comment:null), FieldSchema(name:param_low_limit, type:float, comment:null), FieldSchema(name:param_high_limit, type:float, comment:null), FieldSchema(name:nb_dies_tested, type:int, comment:null), FieldSchema(name:nb_dies_failed, type:int, comment:null), FieldSchema(name:nb_dies_good, type:int, comment:null), FieldSchema(name:ingestion_date, type:string, comment:null), FieldSchema(name:fab, type:string, comment:null), FieldSchema(name:lot_partition, type:string, comment:null)], location:hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q840, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:{totalSize=988129840, numRows=109795564, rawDataSize=116612625917, COLUMN_STATS_ACCURATE={"COLUMN_STATS":{"lot_id":"true","wafer_id":"true","flow_id":"true","start_t":"true","finish_t":"true","hbin_number":"true","hbin_name":"true","sbin_number":"true","sbin_name":"true","param_id":"true","param_name":"true","param_unit":"true","param_low_limit":"true","param_high_limit":"true","nb_dies_tested":"true","nb_dies_failed":"true","nb_dies_good":"true","ingestion_date":"true"}}, numFiles=5, transient_lastDdlTime=1554300803})  |       

Remark:
Note that the ALL COLUMNS change the COLUMN_STATS_ACCURATE value that has moved from {\”BASIC_STATS\”:\”true\”} to {\”COLUMN_STATS\”: xxx}.

But it did not change the poor response type of Query02….

Fetch task performing worst than MapReduce ?

From the official documentation and many web sites around they say that simple queries can be executed as fetch task instead of traditional Map Reduce to minimize latency. Fetch tasks are direct HDFS access using “hdfs dfs –get” or “hdfs dfs –cat” commands son on the paper the overhead of creation MAp and Reduce jobs is gone. On simple queries means single sourced query not having any subquery and not having any aggregations, distincts, lateral views and joins.

You can change parameters only for your session using:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion;
+----------------------------------+--+
|               set                |
+----------------------------------+--+
| hive.fetch.task.conversion=more  |
+----------------------------------+--+
1 row selected (0.007 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion.threshold;
+--------------------------------------------------+--+
|                       set                        |
+--------------------------------------------------+--+
| hive.fetch.task.conversion.threshold=1073741824  |
+--------------------------------------------------+--+
1 row selected (0.007 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion.threshold=524288000;
No rows affected (0.003 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion.threshold;
+-------------------------------------------------+--+
|                       set                       |
+-------------------------------------------------+--+
| hive.fetch.task.conversion.threshold=524288000  |
+-------------------------------------------------+--+
1 row selected (0.006 seconds)


0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion=none;
No rows affected (0.004 seconds)
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion;
+----------------------------------+--+
|               set                |
+----------------------------------+--+
| hive.fetch.task.conversion=none  |
+----------------------------------+--+

After changing hive.fetch.task.conversion to none, means disabling Hive fetch task, the query move to a traditional MapReduce job:

INFO  : Status: Running (Executing on YARN cluster with App id application_1551777498072_117912)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED     14         14        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 20.19 s
--------------------------------------------------------------------------------

And more importantly query execution time moved from around from 330 seconds to 28 seconds, 20 seconds for execution, rest is network transfer

Fetch task are supposed to be much faster than MapReduce job but we clearly show that it’s not the case at all on our cluster… Bug ?

To go further

To debug Hive fetch, with the help of a consultant working for us, we have exported an online variable:

hdfs@client_node:~$ export HADOOP_ROOT_LOGGER=debug,console

Then we simulated a Hive fetch on a datafile of one partition with a size of less than 1GB with:

hdfs@client_node:~$ hdfs dfs -cat /apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q846/000000_0 > /tmp/test_cq
19/04/08 12:17:38 DEBUG util.Shell: setsid exited with exit code 0
19/04/08 12:17:38 DEBUG conf.Configuration: parsing URL jar:file:/usr/hdp/2.6.4.0-91/hadoop/hadoop-common-2.7.3.2.6.4.0-91.jar!/core-default.xml
19/04/08 12:17:38 DEBUG conf.Configuration: parsing input stream sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream@7a36aefa
19/04/08 12:17:38 DEBUG conf.Configuration: parsing URL file:/etc/hadoop/2.6.4.0-91/0/core-site.xml
19/04/08 12:17:38 DEBUG conf.Configuration: parsing input stream java.io.BufferedInputStream@58c1c010
19/04/08 12:17:38 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
19/04/08 12:17:38 DEBUG util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
19/04/08 12:17:38 DEBUG security.Groups:  Creating new Groups object
19/04/08 12:17:38 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
19/04/08 12:17:38 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
19/04/08 12:17:38 DEBUG security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
19/04/08 12:17:38 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
19/04/08 12:17:38 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
19/04/08 12:17:38 DEBUG security.UserGroupInformation: hadoop login
19/04/08 12:17:38 DEBUG security.UserGroupInformation: hadoop login commit
19/04/08 12:17:38 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: hdfs
19/04/08 12:17:38 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: hdfs" with name hdfs
19/04/08 12:17:38 DEBUG security.UserGroupInformation: User entry: "hdfs"
19/04/08 12:17:38 DEBUG security.UserGroupInformation: Assuming keytab is managed externally since logged in from subject.
19/04/08 12:17:38 DEBUG security.UserGroupInformation: UGI loginUser:hdfs (auth:SIMPLE)
19/04/08 12:17:38 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
19/04/08 12:17:38 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
19/04/08 12:17:38 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
19/04/08 12:17:38 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
19/04/08 12:17:39 DEBUG hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://ManufacturingDataLakeHdfs
19/04/08 12:17:39 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
19/04/08 12:17:39 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
19/04/08 12:17:39 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
19/04/08 12:17:39 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
19/04/08 12:17:39 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
19/04/08 12:17:39 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$S  erver$ProtoBufRpcInvoker@238d68ff
19/04/08 12:17:39 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@70e38ce1
19/04/08 12:17:39 DEBUG unix.DomainSocketWatcher: org.apache.hadoop.net.unix.DomainSocketWatcher$2@3dfc1841: starting with interruptCheckPeriodMs = 60000
19/04/08 12:17:39 DEBUG shortcircuit.DomainSocketFactory: The short-circuit local reads feature is enabled.
19/04/08 12:17:39 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
19/04/08 12:17:39 DEBUG ipc.Client: The ping interval is 60000 ms.
19/04/08 12:17:39 DEBUG ipc.Client: Connecting to master_namenode.domain.com/10.75.144.1:8020
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs: starting, having connections 1
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs sending #0
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs got value #0
19/04/08 12:17:39 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 53ms
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs sending #1
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs got value #1
19/04/08 12:17:39 DEBUG ipc.ProtobufRpcEngine: Call: getBlockLocations took 1ms
19/04/08 12:17:39 DEBUG azure.NativeAzureFileSystem: finalize() called.
19/04/08 12:17:39 DEBUG azure.NativeAzureFileSystem: finalize() called.
19/04/08 12:17:39 DEBUG hdfs.DFSClient: newInfo = LocatedBlocks{
  fileLength=281939725
  underConstruction=false
  blocks=[LocatedBlock{BP-1711156358-10.75.144.1-1519036486930:blk_1212723761_139044745; getBlockSize()=268435456; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.75.144.15:50010,DS-7bf7ad15-da5c  -454b-adb5-67f1ea89e0e6,DISK], DatanodeInfoWithStorage[10.75.144.14:50010,DS-ee0aca78-2c5c-48bf-aeb8-3e4ed5823be1,DISK], DatanodeInfoWithStorage[10.75.144.11:50010,DS-92c5ac9c-6a51-464a-9925-8f0cc06d3f3f,D  ISK]]}, LocatedBlock{BP-1711156358-10.75.144.1-1519036486930:blk_1212733643_139054627; getBlockSize()=13504269; corrupt=false; offset=268435456; locs=[DatanodeInfoWithStorage[10.75.144.15:50010,DS-291244e5  -8577-4095-b200-3233661417db,DISK], DatanodeInfoWithStorage[10.75.144.11:50010,DS-bb818fac-2bb1-4ccf-9a17-a9afbecf5f6f,DISK], DatanodeInfoWithStorage[10.75.144.14:50010,DS-96c13ead-02bb-4191-b6c9-02c2f632e  bf0,DISK]]}]
  lastLocatedBlock=LocatedBlock{BP-1711156358-10.75.144.1-1519036486930:blk_1212733643_139054627; getBlockSize()=13504269; corrupt=false; offset=268435456; locs=[DatanodeInfoWithStorage[10.75.144.14:50010,  DS-96c13ead-02bb-4191-b6c9-02c2f632ebf0,DISK], DatanodeInfoWithStorage[10.75.144.15:50010,DS-291244e5-8577-4095-b200-3233661417db,DISK], DatanodeInfoWithStorage[10.75.144.11:50010,DS-bb818fac-2bb1-4ccf-9a1  7-a9afbecf5f6f,DISK]]}
  isLastBlockComplete=true}
19/04/08 12:17:39 DEBUG hdfs.DFSClient: Connecting to datanode 10.75.144.15:50010
19/04/08 12:17:39 DEBUG util.PerformanceAdvisory: BlockReaderFactory(fileName=/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q846/000000_0, block=BP-1711156358-10.75.  144.1-1519036486930:blk_1212723761_139044745): PathInfo{path=, state=UNUSABLE} is not usable for short circuit; giving up on BlockReaderLocal.
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs sending #2
19/04/08 12:17:39 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs got value #2
19/04/08 12:17:39 DEBUG ipc.ProtobufRpcEngine: Call: getServerDefaults took 1ms
19/04/08 12:17:39 DEBUG sasl.SaslDataTransferClient: SASL client skipping handshake in unsecured configuration for addr = /10.75.144.15, datanodeId = DatanodeInfoWithStorage[10.75.144.15:50010,DS-7bf7ad15-  da5c-454b-adb5-67f1ea89e0e6,DISK]
19/04/08 12:17:41 DEBUG hdfs.DFSClient: Connecting to datanode 10.75.144.15:50010
19/04/08 12:17:41 DEBUG util.PerformanceAdvisory: BlockReaderFactory(fileName=/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q846/000000_0, block=BP-1711156358-10.75.  144.1-1519036486930:blk_1212733643_139054627): PathInfo{path=, state=UNUSABLE} is not usable for short circuit; giving up on BlockReaderLocal.
19/04/08 12:17:41 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@70e38ce1
19/04/08 12:17:41 DEBUG ipc.Client: removing client from cache: org.apache.hadoop.ipc.Client@70e38ce1
19/04/08 12:17:41 DEBUG ipc.Client: stopping actual client because no more references remain: org.apache.hadoop.ipc.Client@70e38ce1
19/04/08 12:17:41 DEBUG ipc.Client: Stopping client
19/04/08 12:17:41 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs: closed
19/04/08 12:17:41 DEBUG ipc.Client: IPC Client (1605851606) connection to master_namenode.domain.com/10.75.144.1:8020 from hdfs: stopped, remaining connections 0
19/04/08 12:17:41 DEBUG util.ShutdownHookManager: ShutdownHookManger complete shutdown.

And we have seen that dfs.client.read.shortcircuit was correctly used…

Questioning few Cloudera engineers they suggested to increase HiveServer2 memory as in case of a Hive fetch the ORC decompression and filtering is done directly by Hive server. Using Grafana from Ambari we saw some pikes in HiveServer2 memory and so decided to increase its memory from 12GB to 32GB. This was also possible as our server has plenty of memory. When I talk of ORC decompression it is because we have chosen this parameters:

hive_fetch01
hive_fetch01

We retried the query with absolutely no improvement… At that time we also investigated in HiveServer2 log file and found:

2019-04-09 15:00:34,183 INFO  [HiveServer2-Handler-Pool: Thread-112]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(478)) - Reading ORC rows from hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_spotfire_refined.db/tbl_bin_stat_orc/fab=C2WF/lot_partition=Q846/000002_0 with {include: [true, true, true, true, true, true, true, false, true, false, true, true, true, true, true, true, true, false, false], offset: 0, length: 257464687, sarg: leaf-0 = (IN lot_partition Q846), leaf-1 = (IN lot_id Q840401), leaf-2 = (IN wafer_id Q840401-01E6 Q840401-02E1 Q840401-03D4 Q840401-04C7 Q840401-05C2 Q840401-06B5 Q840401-07B0 Q840401-08A3 Q840401-09H1 Q840401-10A3 Q840401-11H1 Q840401-12G4 Q840401-13F7 Q840401-14F2 Q840401-15E5 Q840401-16E0 Q840401-17D3 Q840401-18C6 Q840401-19C1 Q840401-20C6 Q840401-21C1 Q840401-22B4 Q840401-23A7 Q840401-24A2 Q840401-25H0), leaf-3 = (IN flow_id EWS1), leaf-4 = (IN start_t 2018.12.27-10:42:54 2018.12.27-12:01:57 2018.12.27-13:18:47 2018.12.27-14:36:31 2018.12.27-15:55:57 2018.12.27-17:13:42 2018.12.27-18:31:34 2018.12.27-19:49:27 2018.12.27-21:05:30 2018.12.27-22:23:36 2018.12.27-23:40:11 2018.12.28-00:56:40 2018.12.28-02:15:51 2018.12.28-03:41:23 2018.12.28-04:58:02 2018.12.28-06:16:11 2018.12.28-07:34:40 2018.12.28-08:55:29 2018.12.28-10:13:25 2018.12.28-11:30:34 2018.12.28-12:48:08 2018.12.28-14:06:12 2018.12.28-15:23:00 2018.12.28-16:39:50 2018.12.28-17:56:57), leaf-5 = (IN finish_t 2018.12.27-12:00:40 2018.12.27-13:17:30 2018.12.27-14:35:13 2018.12.27-15:54:39 2018.12.27-17:12:26 2018.12.27-18:30:16 2018.12.27-19:48:12 2018.12.27-21:04:15 2018.12.27-22:22:19 2018.12.27-23:38:53 2018.12.28-00:55:24 2018.12.28-02:14:34 2018.12.28-03:32:41 2018.12.28-04:56:43 2018.12.28-06:14:52 2018.12.28-07:33:21 2018.12.28-08:54:12 2018.12.28-10:12:06 2018.12.28-11:29:17 2018.12.28-12:46:50 2018.12.28-14:04:55 2018.12.28-15:21:43 2018.12.28-16:38:30 2018.12.28-17:55:38 2018.12.28-19:13:18), leaf-6 = (IN hbin_number 1), leaf-7 = (IN sbin_number 1), expr = (and leaf-0 leaf-1 leaf-2 leaf-3 leaf-4 leaf-5 leaf-6 leaf-7), columns: ['null', 'lot_id', 'wafer_id', 'flow_id', 'start_t', 'finish_t', 'hbin_number', 'null', 'sbin_number', 'null', 'param_id', 'param_name', 'param_unit', 'param_low_limit', 'param_high_limit', 'nb_dies_tested', 'nb_dies_failed', 'null', 'null']}
2019-04-09 15:02:24,610 INFO  [HiveServer2-Handler-Pool: Thread-112]: orc.OrcUtils (OrcUtils.java:getDesiredRowTypeDescr(810)) - Using schema evolution configuration variables schema.evolution.columns [lot_id, wafer_id, flow_id, start_t, finish_t, hbin_number, hbin_name, sbin_number, sbin_name, param_id, param_name, param_unit, param_low_limit, param_high_limit, nb_dies_tested, nb_dies_failed, nb_dies_good, ingestion_date] / schema.evolution.columns.types [string, string, string, string, string, int, string, int, string, string, string, string, float, float, int, int, int, string] (isAcid false)

If you see below this file is only around 230MB in size. So it has taken almost 2 minutes (!!!) to read, decompress and filter this small file. So clearly there is something wrong with Hive fetch task and we have decided to deactivate it by setting cluster wide below parameter:

0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.fetch.task.conversion;
+----------------------------------+--+
|               set                |
+----------------------------------+--+
| hive.fetch.task.conversion=none  |
+----------------------------------+--+
1 row selected (0.09 seconds)

At the end both queries are now running in less than 25-30 seconds and we will monitor for any other negative side effects…

References

The post Hive fetch task really improving response time by bypassing MapReduce ? appeared first on IT World.

]]>
https://blog.yannickjaquier.com/hadoop/hive-fetch-task-really-improving-response-time-by-bypassing-mapreduce.html/feed 0
Fetch Zookeeper information from Python with Kazoo to connect Hive https://blog.yannickjaquier.com/hadoop/fetch-zookeeper-information-from-python-with-kazoo-to-connect-hive.html https://blog.yannickjaquier.com/hadoop/fetch-zookeeper-information-from-python-with-kazoo-to-connect-hive.html#respond Fri, 25 Oct 2019 07:53:32 +0000 https://blog.yannickjaquier.com/?p=4802 Preamble Straight from the beginning when I have landed to our Hadoop project (HortonWorks) I have seen all our Python scripts directly connecting to our Hive server, with PyHive, bypassing Zookeeper coordination. I also noticed that every Beeline client connection where well using (obviously) the HiveServer2 JDBC URL. I have left this point open for […]

The post Fetch Zookeeper information from Python with Kazoo to connect Hive appeared first on IT World.

]]>

Table of contents

Preamble

Straight from the beginning when I have landed to our Hadoop project (HortonWorks) I have seen all our Python scripts directly connecting to our Hive server, with PyHive, bypassing Zookeeper coordination. I also noticed that every Beeline client connection where well using (obviously) the HiveServer2 JDBC URL. I have left this point open for later until we decided to improve our High Availability (HA) by making few components running on multiple servers. And when it came to our HiveServer2 that is now running on two edge nodes of our Hadoop cluster I have decided to dig out this thread…

The good way of connecting to HiveServer2 is to first get current status and configuration from Zookeeper and then use this information in PyHive (for example) to make a Hive connection. Zookeeper is acting here as a configuration keeper as well as an availability watcher, means Zookeeper will not return a dead HiveServer2 information.

Digging a bit on Internet I came quickly to the obvious conclusion that Kazoo Python package was a must try !

This blog post has been written using kazoo 2.5.0, Python 3.7.3. My Hadoop cluster is HortonWorks Data Platform (HDP) 2.6.4. All developed scripts are running on a Fedora 30 virtual machine.

Kazoo development environment installation

Anaconda is the preferred Python 3.7 environment management so started by downloading and installing it for Python 3.7 on my Fedora virtual machine. The release I have installed is:

[root@fedora1 ~]# anaconda --version
anaconda 30.25.6

It also gives you access to conda that is command line environment management:

(base) [root@fedora1 ~]# conda --version
conda 4.6.14
(base) [root@fedora1 ~]# conda info

     active environment : base
    active env location : /opt/anaconda3
            shell level : 1
       user config file : /root/.condarc
 populated config files : /root/.condarc
          conda version : 4.6.14
    conda-build version : 3.17.8
         python version : 3.7.3.final.0
       base environment : /opt/anaconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/linux-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /opt/anaconda3/pkgs
                          /root/.conda/pkgs
       envs directories : /opt/anaconda3/envs
                          /root/.conda/envs
               platform : linux-64
             user-agent : conda/4.6.14 requests/2.21.0 CPython/3.7.3 Linux/5.0.11-300.fc30.x86_64 fedora/30 glibc/2.29
                UID:GID : 0:0
             netrc file : None
           offline mode : False

As I am behind a corporate proxy I had to customize a little bit my .condarc profile:

(base) [root@fedora1 ~]# cat  .condarc
ssl_verify: False
proxy_servers:
    http: http://proxy_user:proxy_password@proxy_server:proxy_port
    https: https://proxy_user:proxy_password@proxy_server:proxy_port

Remark
I have also been obliged to set ssl_verify to false to avoid any certificate issues that are not in my proxy server…

I create a kazoo conda environment with:

(base) [root@fedora1 ~]# conda create -n kazoo
Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3/envs/kazoo



Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate kazoo
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Activate it with:

[root@fedora1 ~]# conda activate kazoo
(kazoo) [root@fedora1 ~]#

As you have seen I am working with root which is a very bad idea so better to use a normal account (and keep root for more important tasks see below), to do so initialize Conda with (my shell is obviously bash):

[yjaquier@fedora1 ~]$ /opt/anaconda3/bin/conda init bash
no change     /opt/anaconda3/condabin/conda
no change     /opt/anaconda3/bin/conda
no change     /opt/anaconda3/bin/conda-env
no change     /opt/anaconda3/bin/activate
no change     /opt/anaconda3/bin/deactivate
no change     /opt/anaconda3/etc/profile.d/conda.sh
no change     /opt/anaconda3/etc/fish/conf.d/conda.fish
no change     /opt/anaconda3/shell/condabin/Conda.psm1
no change     /opt/anaconda3/shell/condabin/conda-hook.ps1
no change     /opt/anaconda3/lib/python3.7/site-packages/xonsh/conda.xsh
no change     /opt/anaconda3/etc/profile.d/conda.csh
modified      /home/yjaquier/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

Logoff and logon again and use the newly created environment with:

(base) [yjaquier@fedora1 ~]$ conda activate kazoo
(kazoo) [yjaquier@fedora1 ~]$

You also need to configure your conda environment (.condarc) same as above...

For package management and search your reference will be https://anaconda.org. Here is an example of a search for Python (direct link is https://anaconda.org/search?q=python):

kazoo01
kazoo01

If you enter in the Python package most downloaded (good practice in my opinion) you will find the command to install it:

conda install -c conda-forge python

But then you cannot modify the environment with your own account and packages add must be done by root, which is, in my opinion, a very good practice:

EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
  environment location: /opt/anaconda3/envs/kazoo
  uid: 1000
  gid: 100

So executing with root account (in kazoo conda environment):

(kazoo) [root@fedora1 ~]# conda install -c conda-forge python
Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3/envs/kazoo

  added / updated specs:
    - python


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    bzip2-1.0.6                |    h14c3975_1002         415 KB  conda-forge
    certifi-2019.3.9           |           py37_0         149 KB  conda-forge
    pip-19.1                   |           py37_0         1.8 MB  conda-forge
    python-3.7.3               |       h5b0a415_0        35.7 MB  conda-forge
    setuptools-41.0.1          |           py37_0         616 KB  conda-forge
    wheel-0.33.1               |           py37_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        38.7 MB

The following NEW packages will be INSTALLED:

  bzip2              conda-forge/linux-64::bzip2-1.0.6-h14c3975_1002
  ca-certificates    conda-forge/linux-64::ca-certificates-2019.3.9-hecc5488_0
  certifi            conda-forge/linux-64::certifi-2019.3.9-py37_0
  libffi             conda-forge/linux-64::libffi-3.2.1-he1b5a44_1006
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-8.2.0-hdf63c60_1
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-8.2.0-hdf63c60_1
  ncurses            conda-forge/linux-64::ncurses-6.1-hf484d3e_1002
  openssl            conda-forge/linux-64::openssl-1.1.1b-h14c3975_1
  pip                conda-forge/linux-64::pip-19.1-py37_0
  python             conda-forge/linux-64::python-3.7.3-h5b0a415_0
  readline           conda-forge/linux-64::readline-7.0-hf8c457e_1001
  setuptools         conda-forge/linux-64::setuptools-41.0.1-py37_0
  sqlite             conda-forge/linux-64::sqlite-3.26.0-h67949de_1001
  tk                 conda-forge/linux-64::tk-8.6.9-h84994c4_1001
  wheel              conda-forge/linux-64::wheel-0.33.1-py37_0
  xz                 conda-forge/linux-64::xz-5.2.4-h14c3975_1001
  zlib               conda-forge/linux-64::zlib-1.2.11-h14c3975_1004


Proceed ([y]/n)? y


Downloading and Extracting Packages
python-3.7.3         | 35.7 MB   | #################################################################################################################################################################### | 100%
certifi-2019.3.9     | 149 KB    | #################################################################################################################################################################### | 100%
wheel-0.33.1         | 34 KB     | #################################################################################################################################################################### | 100%
setuptools-41.0.1    | 616 KB    | #################################################################################################################################################################### | 100%
pip-19.1             | 1.8 MB    | #################################################################################################################################################################### | 100%
bzip2-1.0.6          | 415 KB    | #################################################################################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

I have also installed Kazoo using:

(kazoo) [root@fedora1 ~]# conda install -c conda-forge kazoo

And I have also installed PyHive to connect to Hive, Pandas to manipulate data structures (finally I have not used it at the end):

conda install -c anaconda pyhive
conda install -c conda-forge pandas

Python source code

The small source code (kazoo_testing.py file name in below) I have written is mainly coming (for the Kazoo part) from the official documentation so no restriction to visit it:

from kazoo.client import KazooClient,KazooState

def my_listener(state):
  if state == KazooState.LOST:
    # Register somewhere that the session was lost
    print('Connection lost !!')
  elif state == KazooState.SUSPENDED:
    # Handle being disconnected from Zookeeper
    print('Connection suspended !!')
  else:
    # Handle being connected/reconnected to Zookeeper
    print('Connected !!')

zk = KazooClient(hosts='zookeeper_server01.domain.com:2181,zookeeper_server02.domain.com:2181,zookeeper_server03.domain.com:2181')

zk.add_listener(my_listener)
#zk.start()
zk.start(timeout=5)

# Display Zookeeper information
print(zk.get_children('/'))

#print(zk.get_children('hiveserver2')[0])
print(zk.get_children(path='hiveserver2'))

for hiveserver2 in zk.get_children(path='hiveserver2'):
  array01=hiveserver2.split(';')[0].split('=')[1].split(':')
  hive_hostname=array01[0]
  hive_port=array01[1]
  print('Hive hostname: ' + hive_hostname)
  print('Hive port: ' + hive_port)

The list of Zookeeper server can be taken from the Hive Ambari page where you can copy/paste the so called HIVESERVER2 JDBC URL.

The above source code does not include the PyHive connection but once you get the Hive host name and port you can easily connect with something like (configuration parameter is optional):

from pyhive import hive

# Hive connection 
connection=hive.connect(
    host = hive_hostname, 
    port = hive_port,
    configuration={'tez.queue.name': 'your_yarn_queue_name'}, 
    username = "your_account"
    )

pandas01=pd.read_sql("select * from ...", connection)

print(pandas01.sample(10))

Kazoo testing

I have the chance to have configured two HiveServer2 in my Hortonworks Hadoop cluster. Which is, by the way, strongly suggested if you aim to be Highly Available (HA). When the two HiveServer2 processes are up and running I get below result:

(kazoo) [yjaquier@fedora1 ~]$ python kazoo_testing.py
Connected !!
['registry', 'cluster', 'brokers', 'storm', 'zookeeper', 'infra-solr', 'hbase-unsecure', 'tracers', 'hadoop-ha', 'admin', 'isr_change_notification',
 'accumulo', 'logsearch', 'controller_epoch', 'hiveserver2', 'druid', 'rmstore', 'ambari-metrics-cluster', 'consumers', 'config']
['serverUri=hiveserver201.domain.com:10000;version=1.2.1000.2.6.4.0-91;sequence=0000000042', 'serverUri=hiveserver202.domain.com:10000;version=1.2.1000.2.6.4.0-91;sequence=0000000043']
Hive hostname: hiveserver201.domain.com
Hive port: 10000
Hive hostname: hiveserver202.domain.com
Hive port: 10000

If I stop the first HiverServer2, after a while, I suppose the time for Zookeeper to get and propagate the information I finally get:

(kazoo) [yjaquier@fedora1 ~]$ python kazoo_testing.py
Connected !!
['registry', 'cluster', 'brokers', 'storm', 'zookeeper', 'infra-solr', 'hbase-unsecure', 'tracers', 'hadoop-ha', 'admin', 'isr_change_notification',
 'accumulo', 'logsearch', 'controller_epoch', 'hiveserver2', 'druid', 'rmstore', 'ambari-metrics-cluster', 'consumers', 'config']
['serverUri=hiveserver202.domain.com:10000;version=1.2.1000.2.6.4.0-91;sequence=0000000042']
Hive hostname: hiveserver202.domain.com
Hive port: 10000

References

The post Fetch Zookeeper information from Python with Kazoo to connect Hive appeared first on IT World.

]]>
https://blog.yannickjaquier.com/hadoop/fetch-zookeeper-information-from-python-with-kazoo-to-connect-hive.html/feed 0
Hadoop backup: what parts to backup and how to do it ? https://blog.yannickjaquier.com/hadoop/hadoop-backup-what-parts-to-backup-and-how-to-do-it.html https://blog.yannickjaquier.com/hadoop/hadoop-backup-what-parts-to-backup-and-how-to-do-it.html#respond Fri, 27 Sep 2019 08:09:45 +0000 https://blog.yannickjaquier.com/?p=4617 Preamble Hadoop backup, wide and highly important subject and most probably like me you have been surprised by poor availability of official documents and this is most probably why you have landed here trying to find a first answer ! Needless to say this blog post is far to be complete so please do not […]

The post Hadoop backup: what parts to backup and how to do it ? appeared first on IT World.

]]>

Table of contents

Preamble

hadoop_backup01
hadoop_backup01

Hadoop backup, wide and highly important subject and most probably like me you have been surprised by poor availability of official documents and this is most probably why you have landed here trying to find a first answer ! Needless to say this blog post is far to be complete so please do not hesitate to submit a comment and I would enrich this document with great pleasure !

One of the main difficulty with Hadoop is its scale out natural essence that’s make difficult to understand what’s nice to backup and what is REALLY important to backup.

I have split the article in three parts:

  • First part is what you MUST backup to be able to survive to a major issue
  • Second part is what is not required to be backed-up.
  • Third part is what is nice to backup.

Also, I repeat, I’m interested by any comment you might have that would help to enrich this document or correct any mistake…

Mandatory parts to backup

Configuration files

All files under /etc and /usr/hdp on edge nodes (so not for your workers nodes). On the principle you could recreate them from scratch but you surely do not want to loose multiple months or years of fine tuning isn’t it ?

Theoretically all your configuration files will be saved when saving Ambari server meta info but if you have a corporate tool to backup your host OS it is worth putting the two above directories as it is sometimes much simpler to restore a single files from those tools…

Those edge nodes are:

  • Master nodes
  • Management nodes
  • Client nodes
  • Utilities node (Hive, …)
  • Analytics Nodes

In other words all except worker nodes..

Ambari server meta info

[root@mgmtserver ~]# ambari-server backup /tmp/ambari-server-backup.zip
Using python  /usr/bin/python
Backing up Ambari File System state... *this will not backup the server database*
Backup requested.
Backup process initiated.
Creating zip file...
Zip file created at /tmp/ambari-server-backup.zip
Backup complete.
Ambari Server 'backup' completed successfully.
[root@mgmtserver ~]# ll /tmp/ambari-server-backup.zip
-rw-r--r-- 1 root root 2444590592 Dec  3 17:01 /tmp/ambari-server-backup.zip

To restore this backup in case of a big crash the command is:

[root@mgmtserver ~]# ambari-server restore /tmp/ambari-server-backup.zip

NameNode metadata

As they say in NameNode this component is key:

The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself.

While it’s not an Oracle database and a continuous backup is not possible:

Regardless of the solution, a full, up-to-date continuous backup of the namespace is not possible. Some of the most recent data is always lost. HDFS is not an Online Transaction Processing (OLTP) system. Most data can be easily recreated if you re-run Extract, Transform, Load (ETL) or processing jobs.

The always working procedure to do a backup of your NameNode is really simple:

[hdfs@namenode_primary ~]$ hdfs dfsadmin -saveNamespace
saveNamespace: Safe mode should be turned ON in order to create namespace image.
[hdfs@namenode_primary ~]$ hdfs dfsadmin -safemode enter
Safe mode is ON
[hdfs@namenode_primary ~]$ hdfs dfsadmin -safemode get
Safe mode is ON
[hdfs@namenode_primary ~]$ hdfs dfsadmin -saveNamespace
Save namespace successful
[hdfs@namenode_primary ~]$ hdfs dfsadmin -safemode leave
Safe mode is OFF
[hdfs@namenode_primary ~]$ hdfs dfsadmin -safemode get
Safe mode is OFF
[hdfs@namenode_primary ~]$ hdfs dfsadmin -fetchImage /tmp
19/01/07 12:57:10 INFO namenode.TransferFsImage: Opening connection to http://namenode_primary.domain.com:50070/imagetransfer?getimage=1&txid=latest
19/01/07 12:57:10 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
19/01/07 12:57:10 INFO namenode.TransferFsImage: Combined time for fsimage download and fsync to all disks took 0.04s. The fsimage download took 0.04s at 167097.56 KB/s. Synchronous (fsync) write to disk of /tmp/fsimage_0000000000002300573 took 0.00s.

Then you can put in a safe place (tape, SAN, NFS, …) the file that has been copied to /tmp directory. But this has anyways the bad idea to put your entire cluster in read only mode (safemode), so in a 24/7 production cluster this is surely not something you can accept…

All your running processes will end with something like:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot complete file /apps/hive/warehouse/database.db/table_orc/.hive-staging_hive_2019-02-12_06-12-16_976_7305679997226277861-21596/_task_tmp.-ext-10002/_tmp.000199_0. Name node is in safe mode.
It was turned on manually. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.

In the initial releases of Hadoop the NameNode was a Single Point Of Failure (SPOF) as you could have only what is called a secondary NameNode. The secondary NameNode handle an important CPU intensive task called checkpointing. Checkpointing is the operation to combine edits logs files (edits_xx files) and latest fsimage file to create an up to date HDFS filesystem metadata snapshot (fsimage_xxx file). But the secondary NameNode cannot be used as a failover of the primary NameNode so in case of failure is can only be used to rebuild the primary NameNode, not to take his role.

In Haddop 2.0 this limitation has gone and in an High Availability (HA) mode you can have a standby NameNode that does same job as secondary NameNode and can also take the role of the primary NameNode by a simple switch.

If for any reason this checkpoint operation has not happened since long you will receive the scary NameNode Last Checkpoint Ambari alert:

hadoop_backup02
hadoop_backup02

This alert will also trigger below Ambari warning when you will try to stop NameNode process (when the NameNode restart is read the latest fsimage and re-apply to it all the edits log files generated since):

hadoop_backup03
hadoop_backup03

Needless to say that having your NameNode service in High Availability (active/standby) is strongly suggested !

Whether you have NameNode in HA or not there is a list of important parameters to consider with the value we have chosen, maybe I should decrease the checkpoint period value:

  • dfs.namenode.name.dir = /hadoop/hdfs
  • dfs.namenode.checkpoint.period = 21600 (in seconds i.e. 6 hours)
  • dfs.namenode.checkpoint.txns = 1000000
  • dfs.namenode.checkpoint.check.period = 60

But in this case on your standby or secondary NameNode every dfs.namenode.checkpoint.period or every dfs.namenode.checkpoint.txns whichever is reached first you will have a new checkpoint file and the cool thing is that this latest checkpoint is copied back to your active NameNode. In below the checkpoint at 07:08 is the periodic automatic checkpoint while the one at 06:15 is the one we have explicitly done with a hdfs dfsadmin -saveNamespace command.

On standby NameNode:

[root@namenode_standby ~]# ll -rt /hadoop/hdfs/current/fsimage*
-rw-r--r-- 1 hdfs hadoop 650179252 Feb 13 06:15 /hadoop/hdfs/current/fsimage_0000000000520456166
-rw-r--r-- 1 hdfs hadoop        62 Feb 13 06:15 /hadoop/hdfs/current/fsimage_0000000000520456166.md5
-rw-r--r-- 1 hdfs hadoop 650235574 Feb 13 07:08 /hadoop/hdfs/current/fsimage_0000000000520466841
-rw-r--r-- 1 hdfs hadoop        62 Feb 13 07:08 /hadoop/hdfs/current/fsimage_0000000000520466841.md5

On active Namenode:

[root@namenode_primary ~]# ll -rt /hadoop/hdfs/current/fsimage*
-rw-r--r-- 1 hdfs hadoop        62 Feb 13 06:15 /hadoop/hdfs/current/fsimage_0000000000520456198.md5
-rw-r--r-- 1 hdfs hadoop 650179470 Feb 13 06:15 /hadoop/hdfs/current/fsimage_0000000000520456198
-rw-r--r-- 1 hdfs hadoop 650235574 Feb 13 07:08 /hadoop/hdfs/current/fsimage_0000000000520466841
-rw-r--r-- 1 hdfs hadoop        62 Feb 13 07:08 /hadoop/hdfs/current/fsimage_0000000000520466841.md5

So in a NameNode HA cluster you can just copy regularly the dfs.namenode.name.dir to a safe place (tape, NFS, …) and you are not obliged to enter this impacting safemode

If at a point in time you don’t have Ambari and/or you want to script it here is the commands to get your active and standby NameNode servers:

[hdfs@namenode_primary ~]$ hdfs getconf -confKey dfs.ha.namenodes.mycluster
nn1,nn2
[hdfs@namenode_primary ~]$ hdfs getconf -confKey dfs.namenode.rpc-address.mycluster.nn1
namenode_standby.domain.com:8020
[hdfs@namenode_primary ~]$ hdfs getconf -confKey dfs.namenode.rpc-address.mycluster.nn2
namenode_primary.domain.com:8020
[hdfs@namenode_primary ~]$ hdfs haadmin -getServiceState nn1
standby
[hdfs@namenode_primary ~]$ hdfs haadmin -getServiceState nn2
active

Ambari repository database

Our Ambari repository database is a PostgreSQL one, if you have chosen MySQL refer to next chapter.

Backup with Point In Time Recovery (PITR) capability

As clearly explained in the documentation there is a tool to do it called pg_basebackup. To use it you need to put your PostgreSQL instance in write ahead log (WAL) mode that is equivalent of binary logging of MySQL or archive log mode of Oracle. This is done by setting three parameters in postgresql.conf file:

  • wal_level = replica
  • archive_mode = on
  • archive_command = ‘test ! -f /var/lib/pgsql/backups/%f && cp %p /var/lib/pgsql/backups/%f’

Remark:
The archive command that has been chosen is just an example that will copy WAL files to a backup directory that you obviously need to save to a secure place.

If not done you will end up with below error message:

[postgres@fedora1 ~]$ pg_basebackup --pgdata=/tmp/pgbackup01
pg_basebackup: could not get write-ahead log end position from server: ERROR:  could not open file "./.postgresql.conf.swp": Permission denied
pg_basebackup: removing data directory "/tmp/pgbackup01"

Once done and activated (restart required) you can make and online backup that can be used to perform PITR with:

[postgres@fedora1 ~]$ pg_basebackup --pgdata=/tmp/pgbackup01
[postgres@fedora1 ~]$ ll /tmp/pgbackup01
total 52
-rw------- 1 postgres postgres   206 Nov 30 18:02 backup_label
drwx------ 6 postgres postgres   120 Nov 30 18:02 base
-rw------- 1 postgres postgres    30 Nov 30 18:02 current_logfiles
drwx------ 2 postgres postgres  1220 Nov 30 18:02 global
drwx------ 2 postgres postgres    80 Nov 30 18:02 log
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_commit_ts
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_dynshmem
-rw------- 1 postgres postgres  4414 Nov 30 18:02 pg_hba.conf
-rw------- 1 postgres postgres  1636 Nov 30 18:02 pg_ident.conf
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_log
drwx------ 4 postgres postgres   100 Nov 30 18:02 pg_logical
drwx------ 4 postgres postgres    80 Nov 30 18:02 pg_multixact
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_notify
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_replslot
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_serial
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_snapshots
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_stat
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_stat_tmp
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_subtrans
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_tblspc
drwx------ 2 postgres postgres    40 Nov 30 18:02 pg_twophase
-rw------- 1 postgres postgres     3 Nov 30 18:02 PG_VERSION
drwx------ 3 postgres postgres    80 Nov 30 18:02 pg_wal
drwx------ 2 postgres postgres    60 Nov 30 18:02 pg_xact
-rw------- 1 postgres postgres    88 Nov 30 18:02 postgresql.auto.conf
-rw------- 1 postgres postgres 22848 Nov 30 18:02 postgresql.conf
[postgres@fedora1 pg_wal]$ ll /var/lib/pgsql/backups/
total 32772
-rw------- 1 postgres postgres 16777216 Nov 30 18:02 000000010000000000000002
-rw------- 1 postgres postgres 16777216 Nov 30 18:02 000000010000000000000003
-rw------- 1 postgres postgres      302 Nov 30 18:02 000000010000000000000003.00000060.backup
[postgres@fedora1 pg_wal]$ cat /var/lib/pgsql/backups/000000010000000000000003.00000060.backup
START WAL LOCATION: 0/3000060 (file 000000010000000000000003)
STOP WAL LOCATION: 0/3000130 (file 000000010000000000000003)
CHECKPOINT LOCATION: 0/3000098
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2018-11-30 18:02:03 CET
LABEL: pg_basebackup base backup
STOP TIME: 2018-11-30 18:02:03 CET
[postgres@fedora1 pg_wal]$ ll /var/lib/pgsql/data/pg_wal/
total 49156
-rw------- 1 postgres postgres 16777216 Nov 30 18:02 000000010000000000000002
-rw------- 1 postgres postgres 16777216 Nov 30 18:02 000000010000000000000003
-rw------- 1 postgres postgres      302 Nov 30 18:02 000000010000000000000003.00000060.backup
-rw------- 1 postgres postgres 16777216 Nov 30 18:02 000000010000000000000004
drwx------ 2 postgres postgres      133 Nov 30 18:02 archive_status
[postgres@fedora1 pg_wal]$ ll /var/lib/pgsql/data/pg_wal/archive_status/
total 0
-rw------- 1 postgres postgres 0 Nov 30 18:02 000000010000000000000002.done
-rw------- 1 postgres postgres 0 Nov 30 18:02 000000010000000000000003.00000060.backup.done
-rw------- 1 postgres postgres 0 Nov 30 18:02 000000010000000000000003.done

You can also directly generate TAR files with:

[postgres@fedora1 pg_wal]$ pg_basebackup --pgdata=/tmp/pgbackup02 --format=t
[postgres@fedora1 pg_wal]$ ll /tmp/pgbackup02
total 48128
-rw-r--r-- 1 postgres postgres 32500224 Nov 30 18:11 base.tar
-rw------- 1 postgres postgres 16778752 Nov 30 18:11 pg_wal.tar

Backup with no PITR capability

This method is obviously based on the creation of a dump file. Either you use pg_dump or pg_dumpall.

At this stage either you do all with postgres Linux account that is able to connect in a password less fashion, thanks to default pg_hba.conf file:

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             postgres                                     peer
# IPv4 local connections:
host    all             postgres             127.0.0.1/32            ident
# IPv6 local connections:
host    all            postgres             ::1/128                 ident
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     postgres                                     peer
host    replication     postgres             127.0.0.1/32            ident
host    replication     postgres             ::1/128                 ident

Or you set it for another account that has less privileges, the owner of the database you want to backup for example. I initially tried with PGPASSWORD but this is apparently not working anymore in later releases of PostgreSQL (10.6 for the release I have used to test the feature):

[postgres@fedora1 ~]$ export PGPASSWORD='secure_password'
[postgres@fedora1 ~]$ echo $PGPASSWORD
secure_password
[postgres@fedora1 ~]$ psql --dbname=ambari --username=ambari --password
Password for user ambari:

Our Ambari repository is older (9.2.23) but to prepare future better to move to password file. A password file is file called ~/.pgpass and having below structure:

hostname:port:database:username:password

I have created it like:

[postgres@fedora1 ~]$ ll /var/lib/pgsql/.pgpass
-rw-r--r-- 1 postgres postgres 37 Nov 30 15:12 /var/lib/pgsql/.pgpass
[postgres@fedora1 ~]$ cat /var/lib/pgsql/.pgpass
localhost:5432:ambari:ambari:secure_password

The file must be 600 or less or you will get:

[postgres@fedora1 ~]$ psql --dbname=ambari --username=ambari
WARNING: password file "/var/lib/pgsql/.pgpass" has group or world access; permissions should be u=rw (0600) or less
Password for user ambari:

Then you can connect without specifying a password:

[postgres@fedora1 ~]$ psql --dbname=ambari --username=ambari
psql (10.6)
Type "help" for help.

ambari=>

All this to do a backup off all databases with:

[postgres@fedora1 ~]$ pg_dumpall --file=/tmp/pgbackup.sql
[postgres@fedora1 ~]$ ll /tmp/pgbackup.sql
-rw-r--r-- 1 postgres postgres 3768 Nov 30 16:55 /tmp/pgbackup.sql

Or just the Ambari one with:

[postgres@fedora1 ~]$ pg_dump --file=/tmp/pgbackup_ambari.sql ambari
[postgres@fedora1 ~]$ ll /tmp/pgbackup_ambari.sql
-rw-r--r-- 1 postgres postgres 1117 Nov 30 16:57 /tmp/pgbackup_ambari.sql

Hive repository database

Our Hive repository database is a MySQL one, if you have chosen PostgreSQL refer to previous chapter.

Backup with Point In Time Recovery (PITR) capability

You must activate binary log by setting log-bin parameter in my.cnf file with something like (MOCA https://blog.yannickjaquier.com/mysql/mysql-replication-with-global-transaction-identifiers-gtid-hands-on.html):

log-bin = /mysql/logs/mysql01/mysql-bin

You should end up with below configuration:

+---------------------------------+------------------------------------+
| Variable_name                   | Value                              |
+---------------------------------+------------------------------------+
| log_bin                         | ON                                 |
| log_bin_basename                | /mysql/logs/mysql01/mysql-bin      |
| log_bin_index                   | /mysql/logs/mysql01mysql-bin.index |
+---------------------------------+------------------------------------+

First you must regularly backup the MySQL binary logs !

Before any online backup (snapshot) do the following to reset binary logs:

mysql> show binary logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.001087 |       242 |
| mysql-bin.001088 |       242 |
| mysql-bin.001089 |       242 |
| mysql-bin.001090 |      9638 |
| mysql-bin.001091 |      1538 |
| mysql-bin.001092 |       242 |
| mysql-bin.001093 |       242 |
| mysql-bin.001094 |      1402 |
| mysql-bin.001095 |      4314 |
| mysql-bin.001096 |      2304 |
| mysql-bin.001097 |       120 |
+------------------+-----------+
11 rows in set (0.00 sec)

mysql> flush logs;
Query OK, 0 rows affected (0.41 sec)

mysql> show binary logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.001088 |       242 |
| mysql-bin.001089 |       242 |
| mysql-bin.001090 |      9638 |
| mysql-bin.001091 |      1538 |
| mysql-bin.001092 |       242 |
| mysql-bin.001093 |       242 |
| mysql-bin.001094 |      1402 |
| mysql-bin.001095 |      4314 |
| mysql-bin.001096 |      2304 |
| mysql-bin.001097 |       167 |
| mysql-bin.001098 |       120 |
+------------------+-----------+
11 rows in set (0.00 sec)

mysql> purge binary logs to 'mysql-bin.001098';
Query OK, 0 rows affected (0.00 sec)

mysql> show binary logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.001098 |       120 |
+------------------+-----------+
1 row in set (0.00 sec)

Then take the snapshot by keeping tables in read lock with something like:

FLUSH TABLES WITH READ LOCK;
\! lvcreate --snapshot --size 100M --name lvol98_save /dev/vg00/lvol98 or any snapshot command
UNLOCK TABLES;

Backup with no PITR capability

If you don’t want to activate binary logging and manage them or can afford to loose multiple hours of transaction you can simply perform a MySQL dump even once a week when your cluster is stabilized. Use a command like below to create a simple dump file:

[mysql@server1 ~] mysqldump --user=root -p --single-transaction --all-databases > /tmp/backup.sql

Not mandatory parts to backup

JournalNodes

From Cloudera official documentation:

High-availabilty clusters use JournalNodes to synchronize active and standby NameNodes. The active NameNode writes to each JournalNode with changes, or “edits,” to HDFS namespace metadata. During failover, the standby NameNode applies all edits from the JournalNodes before promoting itself to the active state.

Those JournalNodes are installed only if your NameNode is in HA mode. They are preferred method to handle shared storage between your primary and standby NameNodes, this method is called Quorum Journal Manager(QJM).

Each time a new edits file is created or modified on primary NameNode it is also written on maximum (quorum) of JournalNodes. Standby NameNode constantly monitor the JournalNodes for any changes and apply them to its own namespace to be ready to failover primary NameNode in case of failure. All JournalNodes store more or less same files (edits_xx files and edits_inprogress_xx file) as NameNodes except that they do not have the checkpoint fsimages_xx results. You must have three or more (odd number) JournalNodes for high availability and to handle split brain scenarios.

The working directory of JournalNodes is defined by:

  • dfs.journalnode.edits.dir = /var/qjn

On one JournalNode the real directory will be (cluster name is the name of your cluster that has been chosen at installation):

[root@journalnode01 ~]# ll -rt /var/qjn//current
.
.
.
-rw-r--r-- 1 hdfs hadoop 1006436 Jan 18 12:13 edits_0000000000433896848-0000000000433901168
-rw-r--r-- 1 hdfs hadoop  133375 Jan 18 12:15 edits_0000000000433901169-0000000000433901822
-rw-r--r-- 1 hdfs hadoop  133652 Jan 18 12:17 edits_0000000000433901823-0000000000433902395
-rw-r--r-- 1 hdfs hadoop  918778 Jan 18 12:19 edits_0000000000433902396-0000000000433906383
-rw-r--r-- 1 hdfs hadoop  801672 Jan 18 12:21 edits_0000000000433906384-0000000000433910273
-rw-r--r-- 1 hdfs hadoop   76329 Jan 18 12:23 edits_0000000000433910274-0000000000433910699
-rw-r--r-- 1 hdfs hadoop   90404 Jan 18 12:25 edits_0000000000433910700-0000000000433911201
-rw-r--r-- 1 hdfs hadoop   48435 Jan 18 12:27 edits_0000000000433911202-0000000000433911468
-rw-r--r-- 1 hdfs hadoop  882923 Jan 18 12:29 edits_0000000000433911469-0000000000433915208
-rw-r--r-- 1 hdfs hadoop 1048576 Jan 18 12:31 edits_inprogress_0000000000433915209
-rw-r--r-- 1 hdfs hadoop       8 Jan 18 12:31 committed-txid

So as such JournalNodes do not contains any required information that can be inherited from NameNode so nothing to backup

Parts nice to backup

HDFS

In essence your Hadoop cluster has surely been built to handle Terabytes, not to say Petabytes, of data. So doing a backup of all your HDFS data is technically not possible. First HDFS is replicating each data block (of dfs.blocksize in size, 128MB by default) multiple times (parameter is dfs.replication and is set to 3 in my case and you have surely configured what is call rack awareness. Means your worker nodes are physically in different racks in your computer room.

So in other words is you loose one or multiple worker nodes or even a complete rack of your Hadoop cluster this is going to be completely transparent to your application. At worst you might suffer from a performance decrease but no interruption to production (ITP).

But what if you loose the entire data center where is located your Hadoop cluster ? We initially had the idea to split our cluster between two data center geographically separated by 20-30 Kilometers (12 to 18 miles) but this would require a (dedicated) low latency high speed link (dark fiber or else) between the two data centers which is most probably not cost effective…

This is why the most implemented architecture is a second smaller cluster in a remote site where you will try to have a copy of your main Hadoop cluster. This copy can be done by provided Haddop tool called DistCp or simply by running the exact same ingestion process on this failover cluster…

Running the same ingestion process on two distinct clusters might sound a bad idea but if you store your source raw files on a low cost NFS filer then, first, you can easily backup them to tape. Secondly, you can use same exact copy from two (or more) Hadoop cluster and in case of crash or consistency issue you are able to restart the ingestion from raw files. The secondary cluster can then be, with no issue, smaller that the primary one as only ingestion will run on it. Interactive queries and users will remain on primary cluster…

Here I have not at all mentioned HDFS snapshot because for me it is not a all a backup solution ! This is not different from a NFS snapshot and the only case you handle with this is human error. in case of a hardware failure or a data center failure this HDFS snapshot will be of no help as you will loose it at same time of the crash…

References

The post Hadoop backup: what parts to backup and how to do it ? appeared first on IT World.

]]>
https://blog.yannickjaquier.com/hadoop/hadoop-backup-what-parts-to-backup-and-how-to-do-it.html/feed 0
HDFS capacity planning computation and analysis https://blog.yannickjaquier.com/hadoop/hdfs-capacity-planning-generation-analysis.html https://blog.yannickjaquier.com/hadoop/hdfs-capacity-planning-generation-analysis.html#respond Fri, 30 Aug 2019 08:02:42 +0000 https://blog.yannickjaquier.com/?p=4538 Preamble In our ramp up period we wanted to estimate already consumed HDFS size as well as how is split this used space. This would help us build HDFS capacity planning plan and know which investment would be needed. I have found tons of document on how to do it but the snapshot “issue” I […]

The post HDFS capacity planning computation and analysis appeared first on IT World.

]]>

Table of contents

Preamble

In our ramp up period we wanted to estimate already consumed HDFS size as well as how is split this used space. This would help us build HDFS capacity planning plan and know which investment would be needed. I have found tons of document on how to do it but the snapshot “issue” I had was a nice discover…

HDFS capacity planning first estimation

The real two first commands you would use are:

[hdfs@clientnode ~]$ hdfs dfs -df -h /
Filesystem                          Size    Used  Available  Use%
hdfs://DataLakeHdfs               89.5 T  22.4 T     62.5 T   25%

And:

[hdfs@clientnode ~]$ hdfs dfs -du -s -h /
5.9 T  /

You can drill down directories size with:

[hdfs@clientnode ~]$ hdfs dfs -du -h /
169.0 G   /app-logs
466.7 M   /apps
12.5 G    /ats
3.1 T     /data
710.4 M   /hdp
0         /livy2-recovery
0         /mapred
16.8 M    /mr-history
1004.4 M  /spark2-history
2.1 T     /tmp
479.7 G   /user

In HDFS you have dfs.datanode.du.reserved which specify a reserved space in bytes per volume. This is set to 1243,90869140625 MB in my environment.

I also have below HDFS parameters that will be part of formula:

Parameter Value Description
dfs.datanode.du.reserved 1304332800 bytes Reserved space in bytes per volume. Always leave this much space free for non dfs use.
dfs.blocksize 128MB The default block size for new files, in bytes. You can use the following suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.), Or provide complete size in bytes (such as 134217728 for 128 MB).
dfs.replication 3 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
fs.trash.interval 360 (minutes)

Number of minutes after which the checkpoint gets deleted. If zero, the trash feature is disabled. This option may be configured both on the server and the client. If trash is disabled server side then the client side configuration is checked. If trash is enabled on the server side then the value configured on the server is used and the client configuration value is ignored.

You can have a complete report with more precise number than hdfs dfs -df -h / command for all your worker nodes using below command:

[hdfs@clientnode ~]$ hdfs dfsadmin -report
Configured Capacity: 98378048588800 (89.47 TB)
Present Capacity: 93368566571440 (84.92 TB)
DFS Remaining: 68685157293611 (62.47 TB)
DFS Used: 24683409277829 (22.45 TB)
DFS Used%: 26.44%
Under replicated blocks: 20
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (5):

Name: 10.75.144.13:50010 (worker3.domain.com)
Hostname: worker3.domain.com
Rack: /AH/26
Decommission Status : Normal
Configured Capacity: 19675609717760 (17.89 TB)
DFS Used: 3676038734820 (3.34 TB)
Non DFS Used: 0 (0 B)
DFS Remaining: 14998265052417 (13.64 TB)
DFS Used%: 18.68%
DFS Remaining%: 76.23%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 16
Last contact: Wed Oct 24 14:57:06 CEST 2018
Last Block Report: Wed Oct 24 11:48:58 CEST 2018


Name: 10.75.144.12:50010 (worker2.domain.com)
Hostname: worker2.domain.com
Rack: /AH/26
Decommission Status : Normal
Configured Capacity: 19675609717760 (17.89 TB)
DFS Used: 3884987861604 (3.53 TB)
Non DFS Used: 0 (0 B)
DFS Remaining: 14789450082223 (13.45 TB)
DFS Used%: 19.75%
DFS Remaining%: 75.17%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 14
Last contact: Wed Oct 24 14:57:06 CEST 2018
Last Block Report: Wed Oct 24 09:44:51 CEST 2018


Name: 10.75.144.14:50010 (worker4.domain.com)
Hostname: worker4.domain.com
Rack: /AH/27
Decommission Status : Normal
Configured Capacity: 19675609717760 (17.89 TB)
DFS Used: 6604991718895 (6.01 TB)
Non DFS Used: 0 (0 B)
DFS Remaining: 12068909438191 (10.98 TB)
DFS Used%: 33.57%
DFS Remaining%: 61.34%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 22
Last contact: Wed Oct 24 14:57:06 CEST 2018
Last Block Report: Wed Oct 24 12:36:28 CEST 2018


Name: 10.75.144.11:50010 (worker1.domain.com)
Hostname: worker1.domain.com
Rack: /AH/26
Decommission Status : Normal
Configured Capacity: 19675609717760 (17.89 TB)
DFS Used: 3983207846801 (3.62 TB)
Non DFS Used: 0 (0 B)
DFS Remaining: 14690022328249 (13.36 TB)
DFS Used%: 20.24%
DFS Remaining%: 74.66%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 32
Last contact: Wed Oct 24 14:57:06 CEST 2018
Last Block Report: Wed Oct 24 13:50:10 CEST 2018


Name: 10.75.144.15:50010 (worker5.domain.com)
Hostname: worker5.domain.com
Rack: /AH/27
Decommission Status : Normal
Configured Capacity: 19675609717760 (17.89 TB)
DFS Used: 6534183115709 (5.94 TB)
Non DFS Used: 0 (0 B)
DFS Remaining: 12138510392531 (11.04 TB)
DFS Used%: 33.21%
DFS Remaining%: 61.69%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 40
Last contact: Wed Oct 24 14:57:04 CEST 2018
Last Block Report: Wed Oct 24 10:41:56 CEST 2018

So far if I do my computation I get 5.9 TB * 3 (dfs.replication) = 17.7 TB, I am a bit below the 22.4 TB used of hdfs dfs -df -h / command… Where has gone the 4.7 TB ? Quite a few TB isn’t it ?

HDFS snapshot situation

Then after a bit on investigation I had the idea to check if HDFS snapshots have been created on my HDFS:

[hdfs@clientnode ~]$ hdfs lsSnapshottableDir -help
Usage:
hdfs lsSnapshottableDir:
        Get the list of snapshottable directories that are owned by the current user.
        Return all the snapshottable directories if the current user is a super user.

[hdfs@clientnode ~]$ hdfs lsSnapshottableDir
drwxr-xr-x 0 hdfs hdfs 0 2018-07-13 18:14 1 65536 /

You can get snapshot(s) name(s) with:

[hdfs@clientnode ~]$ hdfs dfs -ls /.snapshot
Found 1 items
drwxr-xr-x   - hdfs hdfs          0 2018-07-13 18:14 /.snapshot/s20180713-101304.832

Computing snapshot size is not possible as in case of a pointer to origial block (block not modified) the size of the original block will be added:

[hdfs@clientnode ~]$ hdfs dfs -du -h /.snapshot
3.3 T  /.snapshot/s20180713-101304.832

You can also get a graphical access using NameNode UI in Ambari:

hdfs_capacity_planning01
hdfs_capacity_planning01

Here we are a snapshot of HDFS root directory has been created… I rated this tricky as you don’t see it with a hdfs dfs du command:

[hdfs@clientnode ~]$ hdfs dfs -du -h /
169.0 G   /app-logs
466.7 M   /apps
4.7 G     /ats
3.1 T     /data
710.4 M   /hdp
0         /livy2-recovery
0         /mapred
0         /mr-history
1004.4 M  /spark2-history
2.1 T     /tmp
173.9 G   /user

I have also performed a HSFS filesystem check to be sure everything is fine and no blocks have been marked corrupted:

[hdfs@clientnode ~]$ hdfs fsck
.
.

........................
/user/training/.staging/job_1519657336782_0105/job.jar:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1073754565_13755. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/user/training/.staging/job_1519657336782_0105/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1073754566_13756. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1519657336782_0105/libjars/hive-hcatalog-core.jar:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1073754564_13754. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.
/user/training/.staging/job_1536057043538_0001/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085621525_11894367. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536057043538_0002/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085621527_11894369. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536057043538_0004/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085621593_11894435. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536057043538_0023/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085622064_11894906. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536057043538_0025/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085622086_11894928. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536057043538_0027/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085622115_11894957. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536057043538_0028/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1085622133_11894975. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536642465198_0002/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397707_12670663. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536642465198_0003/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397706_12670662. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536642465198_0004/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397708_12670664. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536642465198_0005/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397718_12670674. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536642465198_0006/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397720_12670676. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536642465198_0007/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086397721_12670677. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
.....
/user/training/.staging/job_1536642465198_2307/job.split:  Under replicated BP-1711156358-10.75.144.1-1519036486930:blk_1086509846_12782817. Target Replicas is 10 but found 5 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).
....

Status: HEALTHY
 Total size:    5981414347660 B (Total open files size: 455501 B)
 Total dirs:    740032
 Total files:   3766023
 Total symlinks:                0 (Files currently being written: 17)
 Total blocks (validated):      3781239 (avg. block size 1581866 B) (Total open file blocks (not validated): 17)
 Minimally replicated blocks:   3781239 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       20 (5.2892714E-4 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0000105
 Corrupt blocks:                0
 Missing replicas:              100 (8.8153436E-4 %)
 Number of data-nodes:          5
 Number of racks:               2
FSCK ended at Wed Oct 17 16:12:48 CEST 2018 in 61172 milliseconds


The filesystem under path '/' is HEALTHY

After delete of HDFS snapshot

Get snapshot(s) name(s) and delete them with, I have also forbid further creation of any snapshot on root directory (does not make sense in my opinion):

[hdfs@clientnode  ~]$ hdfs dfsadmin -disallowSnapshot /
disallowSnapshot: The directory / has snapshot(s). Please redo the operation after removing all the snapshots.
[hdfs@clientnode ~]$ hdfs dfs -ls /.snapshot
Found 1 items
drwxr-xr-x   - hdfs hdfs          0 2018-07-13 18:14 /.snapshot/s20180713-101304.832
[hdfs@clientnode ~]$ hdfs dfs -deleteSnapshot / s20180713-101304.832
[hdfs@clientnode ~]$ hdfs dfsadmin -disallowSnapshot /
Disallowing snaphot on / succeeded
[hdfs@clientnode ~]$ hdfs lsSnapshottableDir

After a cleaning phase I reach the stable below situation:

[hdfs@clientnode ~]$ hdfs dfs -df -h /
Filesystem                          Size    Used  Available  Use%
hdfs://DataLakeHdfs               89.5 T  16.8 T     68.1 T   19%
[hdfs@clientnode ~]$ hdfs dfs -du -s -h /
5.5 T  /

So the computation is more accurate as 5.5 * 3 = 16.5 # 16.8.

As you have noticed my /tmp directory is 2.1 TB which is quite a lot of space for a temporary directory. For me all occupied space was directories under /tmp/hive. It end up that it is aborted Hive queries and can be safely deleted (we currently have one directory of 1.7 TB !!!):

Parameter Description Value
hive.exec.scratchdir This directory is used by Hive to store the plans for different map/reduce stages for the query as well as to stored the intermediate outputs of these stages.
Hive 0.14.0 and later: HDFS root scratch directory for Hive jobs, which gets created with write all (733) permission. For each connecting user, an HDFS scratch directory ${hive.exec.scratchdir}/ is created with ${hive.scratch.dir.permission}.
/tmp//hive (Hive 0.8.0 and earlier)
/tmp/hive- (as of Hive 0.8.1 to 0.14.0)
/tmp/hive (Hive 0.14.0 and later)

References

The post HDFS capacity planning computation and analysis appeared first on IT World.

]]>
https://blog.yannickjaquier.com/hadoop/hdfs-capacity-planning-generation-analysis.html/feed 0