Table of contents
Preamble
To maintain good performance we have develop a script (will be shared in another blog post) to concatenate the partitions of our Hive tables every week. It helps in reducing the number of ORC files per partitions and, as such, helps in reducing the number of MAP and Reduce jobs mandatory to access the partitions in Hive queries.
It all went good till one week we started to have error message for partitions of one of our table. The funny situation was that for some partitions it still works well while for other we get an error message.
This table has been created in Hive but we fill the partition in a PySpark script. So partition got created in Spark because obviously this is when you insert rows that the partition directories in HDFS is created…
Our cluster is running HDP-2.6.4.0.
The concerned stack version are:
- Hive 1.2.1000
- Spark 2.2.0.2.6.4.0-91
Concatenate command failing for access right
The complete error message is the following:
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> alter table prod_ews_refined.tbl_wafer_param_stats partition (fab="CTM8",lot_partition="58053") concatenate; INFO : Session is already open INFO : Dag name: hive_20190830162358_1591b2d1-7489-4f60-8b9f-725fb50a4648 INFO : Tez session was closed. Reopening... -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- File Merge .... RUNNING 5 4 0 1 4 0 -------------------------------------------------------------------------------- VERTICES: 00/01 [====================>>------] 80% ELAPSED TIME: 5.52 s -------------------------------------------------------------------------------- ERROR : Status: Failed ERROR : Vertex failed, vertexName=File Merge, vertexId=vertex_1565718945091_75870_2_00, diagnostics=[Task failed, taskId=task_1565718945091_75870_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:184) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close AbstractFileMergeOperator at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:272) at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:250) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:620) at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:176) ... 15 more Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=mfgdl_ingestion, access=EXECUTE, inode="/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/.hive-staging_hive_2019-08-30_16-23-58_034_3892822656895858873-3012/_tmp.-ext-10000/000000_0_copy_1/000000_0_copy_7":mfgdl_ingestion:hadoop:-rw-r--r-- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:292) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:238) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4142) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1137) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:866) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2167) at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1442) at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1438) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1454) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447) at org.apache.hadoop.hive.ql.exec.Utilities.moveFile(Utilities.java:1807) at org.apache.hadoop.hive.ql.exec.Utilities.renameOrMoveFiles(Utilities.java:1843) at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:258) ... 18 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=mfgdl_ingestion, access=EXECUTE, inode="/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/.hive-staging_hive_2019-08-30_16-23-58_034_3892822656895858873-3012/_tmp.-ext-10000/000000_0_copy_1/000000_0_copy_7":mfgdl_ingestion:hadoop:-rw-r--r-- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:292) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:238) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4142) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1137) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:866) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:823) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2165) ... 26 more ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor18.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252) ... 18 more Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_5 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:266) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:319) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.<init>(OrcFileStripeMergeRecordReader.java:47) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:67) ... 22 more Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_5 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238) ... 39 more ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252) ... 18 more Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_2 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:266) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:319) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.<init>(OrcFileStripeMergeRecordReader.java:47) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:67) ... 23 more Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_2 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238) ... 40 more ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252) ... 18 more Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_2 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:266) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:319) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.<init>(OrcFileStripeMergeRecordReader.java:47) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:67) ... 23 more Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0_copy_2 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238) ... 40 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1565718945091_75870_2_00 [File Merge] killed/failed due to:OWN_TASK_FAILURE] ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=2) |
While, as written, for some other partitions all is still working fine:
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> alter table prod_ews_refined.tbl_wafer_param_stats partition (fab="CTM8",lot_partition="59591") concatenate; INFO : Session is already open INFO : Dag name: hive_20190830145138_334957cb-f329-43af-953b-d03e213c9b03 INFO : Tez session was closed. Reopening... INFO : Session re-established. INFO : Status: Running (Executing on YARN cluster with App id application_1565718945091_74778) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- File Merge ..... SUCCEEDED 4 4 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 31.15 s -------------------------------------------------------------------------------- INFO : Loading data to table prod_ews_refined.tbl_wafer_param_stats partition (fab=CTM8, lot_partition=59591) from hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_ews_refined.db/tbl_wafe r_param_stats/fab=CTM8/lot_partition=59591/.hive-staging_hive_2019-08-30_14-51-38_835_3100511055765258809-3012/-ext-10000 INFO : Partition prod_ews_refined.tbl_wafer_param_stats{fab=CTM8, lot_partition=59591} stats: [numFiles=4, totalSize=107827] No rows affected (56.472 seconds) |
If we extract the root cause from the long above error message it is:
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=mfgdl_ingestion, access=EXECUTE, inode="/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/.hive-staging_hive_2019-08-30_16-23-58_034_3892822656895858873-3012/_tmp.-ext-10000/000000_0_copy_1/000000_0_copy_7":mfgdl_ingestion:hadoop:-rw-r--r-- |
Clearly there is a problem of access right for the concatenate command. The problem is that the command is launched with the exact user as the one that has created and which fill the partition and more importantly the error happens randomly on only few partitions… No doubt we are hitting a bug, cannot say if it is Spark or Hive related…
HDFS default access right with Hive and Spark
I first noticed that for failing partition the rights on the ORC files are not all the same (755, rwxr-xr-x and 644, rw-r–r–):
hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053 Found 35 items . . -rwxr-xr-x 3 mfgdl_ingestion hadoop 1959 2019-08-25 18:12 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000001_0_copy_8 -rwxr-xr-x 3 mfgdl_ingestion hadoop 2563 2019-08-25 18:12 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000002_0_copy_1 -rwxr-xr-x 3 mfgdl_ingestion hadoop 1967 2019-08-25 18:11 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000002_0_copy_2 -rwxr-xr-x 3 mfgdl_ingestion hadoop 2190 2019-08-25 18:08 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000002_0_copy_3 -rwxr-xr-x 3 mfgdl_ingestion hadoop 1985 2019-08-25 18:11 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000002_0_copy_4 -rwxr-xr-x 3 mfgdl_ingestion hadoop 3508 2019-08-26 20:19 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000004_0 -rw-r--r-- 3 mfgdl_ingestion hadoop 2009 2019-08-30 06:18 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/part-00004-9be63b72-9ccc-48f9-a422-b9a6420a3f6f.c000.zlib.orc -rw-r--r-- 3 mfgdl_ingestion hadoop 1959 2019-08-27 21:16 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/part-00007-dd3cce93-8301-42b4-add0-570ee27a5d66.c000.zlib.orc -rw-r--r-- 3 mfgdl_ingestion hadoop 2133 2019-08-27 21:16 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/part-00030-dd3cce93-8301-42b4-add0-570ee27a5d66.c000.zlib.orc -rw-r--r-- 3 mfgdl_ingestion hadoop 2137 2019-08-30 06:18 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/part-00047-9be63b72-9ccc-48f9-a422-b9a6420a3f6f.c000.zlib.orc . . . |
While for working partition it is all the same (755, rwxr-xr-x):
hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=59591 Found 4 items -rwxr-xr-x 3 mfgdl_ingestion hadoop 41397 2019-08-30 14:52 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=59591/000000_0 -rwxr-xr-x 3 mfgdl_ingestion hadoop 39713 2019-08-30 14:52 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=59591/000001_0 -rwxr-xr-x 3 mfgdl_ingestion hadoop 21324 2019-08-30 14:52 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=59591/000002_0 -rwxr-xr-x 3 mfgdl_ingestion hadoop 5393 2019-08-30 14:52 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=59591/000003_0 |
I have also tried to find if all the files are really part of the partitions (no ghost file) and apparently yes. We can only check that number of file (totalNumberFiles) is consistent from Hive but not have the actual list of HDFS files that make the partition:
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> use prod_ews_refined; No rows affected (0.438 seconds) 0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> show table extended like tbl_wafer_param_stats partition (fab="CTM8",lot_partition="58053"); +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ | tab_name | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ | tableName:tbl_wafer_param_stats | | owner:mfgdl_ingestion | | location:hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053 | | inputformat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | | outputformat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | | columns:struct columns { string start_t, string finish_t, string lot_id, string wafer_id, string flow_id, i32 param_id, string param_name, float param_low_limit, float param_high_limit, string param_unit, string ingestion_date, float param_p01, float param_q1, float param_median, float param_q3, float param_p99, float param_min_value, double param_avg_value, float param_max_value, double param_stddev, i64 nb_dies_tested, i64 nb_dies_failed} | | partitioned:true | | partitionColumns:struct partition_columns { string fab, string lot_partition} | | totalNumberFiles:34 | | totalFileSize:88754 | | maxFileSize:16710 | | minFileSize:1954 | | lastAccessTime:1567138817781 | | lastUpdateTime:1567179342705 | | | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ 15 rows selected (0.444 seconds) |
Remark
I have tried the: IN | FROM database_name as described in Official Hive documentation
SHOW TABLE EXTENDED [IN|FROM database_name] LIKE 'identifier_with_wildcards' [PARTITION(partition_spec)]; |
But I have not been able to make it working so decided to finally use the USE database_name statement…
As written this table is filled using PySpark with a code like:
dataframe.write.mode('append').format('orc').option("compression","zlib").partitionBy('fab','lot_partition').saveAsTable("prod_ews_refined.tbl_wafer_param_stats") |
I have checked the default HDFS mask:
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set fs.permissions.umask-mode; +--------------------------------+--+ | set | +--------------------------------+--+ | fs.permissions.umask-mode=022 | +--------------------------------+--+ 1 row selected (0.061 seconds) |
It means that file will be created with 644, rw-r–r– (666 – 022) and directory will be created with 755, rwx-r-xr-x (777 – 022) by default.
But digging a bit inside the tree of my database:
hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/ Found 7 items drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-09-02 10:12 /apps/hive/warehouse/prod_ews_refined.db/tbl_bin_param_stat drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-08-31 01:38 /apps/hive/warehouse/prod_ews_refined.db/tbl_die_bin drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-09-02 10:36 /apps/hive/warehouse/prod_ews_refined.db/tbl_die_param_flow drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-09-02 10:34 /apps/hive/warehouse/prod_ews_refined.db/tbl_die_sp drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-08-30 19:26 /apps/hive/warehouse/prod_ews_refined.db/tbl_stdf drwxr-xr-x - mfgdl_ingestion hadoop 0 2019-09-02 10:39 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-08-30 12:02 /apps/hive/warehouse/prod_ews_refined.db/tbl_wsr_map hdfs@clientnode:~$ hdfs dfs -ls -d /apps/hive/warehouse/prod_ews_refined.db drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-07-16 23:17 /apps/hive/warehouse/prod_ews_refined.db hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/ Found 1 items drwxrwxrwx - hive hadoop 0 2019-09-02 10:09 /apps/hive/warehouse |
Below parameter explain why for sub-directories we always have 777, I have seen that this parameter more apply to files and not to sub-directories (!!):
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.warehouse.subdir.inherit.perms; +-------------------------------------------+--+ | set | +-------------------------------------------+--+ | hive.warehouse.subdir.inherit.perms=true | +-------------------------------------------+--+ 1 row selected (0.103 seconds) |
So this explain why HDFS files (ORC for me) of tables created and filled by Hive have 777 for directories and files. For partitions created by Spark the default HDFS mask (fs.permissions.umask-mode) and the 755 (rwxr–r–) for directories and 644 (rw-r–r–) for files is expected behavior.
So to solve the execute permission issue of the partition I have issued:
hdfs@clientnode:~$ hdfs dfs -chmod 755 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/part* |
And the concatenate went well this time:
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> alter table prod_ews_refined.tbl_wafer_param_stats partition (fab="CTM8",lot_partition="58053") concatenate; INFO : Session is already open INFO : Dag name: hive_20190902152242_ac7ab990-fdfd-4094-89b4-4926c49364ee INFO : Tez session was closed. Reopening... INFO : Session re-established. INFO : Status: Running (Executing on YARN cluster with App id application_1565718945091_85673) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- File Merge ..... SUCCEEDED 5 5 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 12.87 s -------------------------------------------------------------------------------- INFO : Loading data to table prod_ews_refined.tbl_wafer_param_stats partition (fab=CTM8, lot_partition=58053) from hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/.hive-staging_hive_2019-09-02_15-22-42_356_8162954430835088222-15910/-ext-10000 INFO : Partition prod_ews_refined.tbl_wafer_param_stats{fab=CTM8, lot_partition=58053} stats: [numFiles=8, numRows=57, totalSize=65242, rawDataSize=42834] No rows affected (29.993 seconds) |
With number of files reduced drastically:
hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053 Found 9 items drwxr-xr-x - mfgdl_ingestion hadoop 0 2019-08-29 19:20 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/.hive-staging_hive_2019-08-29_18-10-26_174_7966264139569655341-114 -rwxr-xr-x 3 mfgdl_ingestion hadoop 20051 2019-09-02 15:23 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000000_0 -rwxr-xr-x 3 mfgdl_ingestion hadoop 19020 2019-09-02 15:23 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000001_0 -rwxr-xr-x 3 mfgdl_ingestion hadoop 2009 2019-08-30 06:18 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000001_0_copy_1 -rwxr-xr-x 3 mfgdl_ingestion hadoop 1959 2019-08-27 21:16 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000001_0_copy_2 -rwxr-xr-x 3 mfgdl_ingestion hadoop 2163 2019-08-27 03:42 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000001_0_copy_3 -rwxr-xr-x 3 mfgdl_ingestion hadoop 14413 2019-09-02 15:23 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000002_0 -rwxr-xr-x 3 mfgdl_ingestion hadoop 3508 2019-09-02 15:23 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000003_0 -rwxr-xr-x 3 mfgdl_ingestion hadoop 2119 2019-09-02 15:23 /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=CTM8/lot_partition=58053/000004_0 |
To go further
To try to clarify this story of default access rights with hive.warehouse.subdir.inherit.perms and fs.permissions.umask-mode I have created a default table having those parameters set respectively to true and 022:
DROP TABLE yannick01 purge; CREATE TABLE DEFAULT.yannick01 ( VALUE string ) partitioned BY(fab string, int_partition int) stored AS orc; |
I insert a dummy record with:
INSERT INTO DEFAULT.yannick01 PARTITION (fab="CTM8", int_partition=1) VALUES ("One"); |
As expected I have 777 for all files because my /apps/hive/warehouse directory is 777:
hdfs@clientnode:~$ hdfs dfs -ls -d /apps/hive/warehouse/yannick01 drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-09-03 11:21 /apps/hive/warehouse/yannick01 hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/yannick01 Found 1 items drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-09-02 18:03 /apps/hive/warehouse/yannick01/fab=CTM8 hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/yannick01/fab=CTM8 Found 1 items drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-09-02 18:03 /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1 hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1 Found 1 items -rwxrwxrwx 3 mfgdl_ingestion hadoop 214 2019-09-02 18:04 /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1/000000_0 |
Now if I re-execute the creation script and set this:
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.warehouse.subdir.inherit.perms=false; No rows affected (0.003 seconds) 0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> set hive.warehouse.subdir.inherit.perms; +--------------------------------------------+--+ | set | +--------------------------------------------+--+ | hive.warehouse.subdir.inherit.perms=false | +--------------------------------------------+--+ 1 row selected (0.005 seconds) |
I get:
hdfs@clientnode:~$ hdfs dfs -ls -d /apps/hive/warehouse/yannick01 drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-09-03 11:21 /apps/hive/warehouse/yannick01 hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/yannick01/fab=CTM8 Found 1 items drwxrwxrwx - mfgdl_ingestion hadoop 0 2019-09-03 11:22 /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1 hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1 Found 1 items -rw-r--r-- 3 mfgdl_ingestion hadoop 223 2019-09-03 11:22 /apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1/000000_0 |
And the concatenate command is working fine because the directory containing the ORC HDFS files has 777. So it is not required to set execute permission on ORC files, having 777 (write for others, hdfs dfs -chmod g+w,o+w or hdfs dfs -chmod 777) on directory where files are stored is also working. But clearly this is much less secure…
0: jdbc:hive2://zookeeper01.domain.com:2181,zoo> alter table default.yannick01 partition (fab="CTM8", int_partition=1) concatenate; INFO : Session is already open INFO : Dag name: hive_20190904163205_31445337-20a1-42d3-80ac-e08028d6a2a1 INFO : Status: Running (Executing on YARN cluster with App id application_1565718945091_100063) INFO : Loading data to table default.yannick01 partition (fab=CTM8, int_partition=1) from hdfs://ManufacturingDataLakeHdfs/apps/hive/warehouse/yannick01/fab=CTM8/int_partition=1/.hive-staging_hive_2019-09-04_16-32-05_102_6350061968665706892-148/-ext-10000 INFO : Partition default.yannick01{fab=CTM8, int_partition=1} stats: [numFiles=2, numRows=1, totalSize=158170, rawDataSize=87] No rows affected (3.291 seconds) |
Worked well till…
It all worked as expected till we hit a partition with lots of files and even setting the correct rights it failed for:
ERROR : Status: Failed ERROR : Vertex failed, vertexName=File Merge, vertexId=vertex_1565718945091_143224_2_00, diagnostics=[Task failed, taskId=task_1565718945091_143224_2_00_000001, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:184) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close AbstractFileMergeOperator at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:272) at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:250) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:620) at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:176) ... 15 more Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): Current inode is not a directory: 000001_0_copy_21(INodeFile@1491d045), parentDir=_tmp.-ext-10000/ at org.apache.hadoop.hdfs.server.namenode.INode.asDirectory(INode.java:331) at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.verifyFsLimitsForRename(FSDirRenameOp.java:117) at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.unprotectedRenameTo(FSDirRenameOp.java:189) at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:492) at org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:73) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3938) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:993) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:587) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy11.rename(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:529) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) at com.sun.proxy.$Proxy12.rename(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:2006) at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:732) at org.apache.hadoop.hive.ql.exec.Utilities.moveFile(Utilities.java:1815) at org.apache.hadoop.hive.ql.exec.Utilities.renameOrMoveFiles(Utilities.java:1843) at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:258) ... 18 more ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor18.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252) ... 18 more Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_993 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:266) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:319) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.<init>(OrcFileStripeMergeRecordReader.java:47) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:67) ... 22 more Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_993 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238) ... 39 more ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252) ... 18 more Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_969 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:266) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:319) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.<init>(OrcFileStripeMergeRecordReader.java:47) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:67) ... 23 more Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_969 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238) ... 40 more ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:150) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252) ... 18 more Caused by: java.io.FileNotFoundException: File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_969 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:266) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractFileTail(ReaderImpl.java:355) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:319) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:241) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.<init>(OrcFileStripeMergeRecordReader.java:47) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeInputFormat.getRecordReader(OrcFileStripeMergeInputFormat.java:37) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:67) ... 23 more Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/000001_0_copy_969 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238) ... 40 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:1, Vertex vertex_1565718945091_143224_2_00 [File Merge] killed/failed due to:OWN_TASK_FAILURE] ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=2) |
I initially thought it could be a HDFS FileSytem issue so:
hdfs@clientnode:~$ hdfs fsck /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053/ Connecting to namenode via http://namenode01.domain.com:50070/fsck?ugi=hdfs&path=%2Fapps%2Fhive%2Fwarehouse%2Fprod_ews_refined.db%2Ftbl_wafer_param_stats_old%2Ffab%3DC2WF%2Flot_partition%3DQ9053 FSCK started by hdfs (auth:SIMPLE) from /10.75.144.5 for path /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053 at Mon Sep 16 17:01:29 CEST 2019 Status: HEALTHY Total size: 159181622 B Total dirs: 1 Total files: 21778 Total symlinks: 0 Total blocks (validated): 21778 (avg. block size 7309 B) Minimally replicated blocks: 21778 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 8 Number of racks: 2 FSCK ended at Mon Sep 16 17:01:29 CEST 2019 in 377 milliseconds The filesystem under path '/apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats_old/fab=C2WF/lot_partition=Q9053' is HEALTHY |
To read all the logs in a convenient manner I strongly encourage the usage of Tez View. To do so you can read the application id when executing the command in beeline:
. . INFO : Status: Running (Executing on YARN cluster with App id application_1565718945091_161035) . . |
And access the resource using this url http://resourcemanager01.domain.com:8088/cluster/app/application_1565718945091_142464 then click on ApplicationMaster link in displayed page. Finally accessing Dag tab and opening details you should see something like:
You can also generate a text file using:
[yarn@clientnode ~]$ yarn logs -applicationId application_1565718945091_141922 > /tmp/yan.txt |
From this huge amount of logs I have obviously see plenty of strange error messages but none of them led to a clear conclusion:
- |OrcFileMergeOperator|: Incompatible ORC file merge! Writer version mismatch for
- |tez.RecordProcessor|: Hit error while closing operators – failing tree
- |tez.TezProcessor|: java.lang.RuntimeException: Hive Runtime Error while closing operators
- org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): Current inode is not a directory:
But the number of files in the partition is decreasing:
hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=C2WF/lot_partition=Q9053 | wc -l 21693 hdfs@clientnode:~$ hdfs dfs -ls /apps/hive/warehouse/prod_ews_refined.db/tbl_wafer_param_stats/fab=C2WF/lot_partition=Q9053 | wc -l 21642 |
So clearly another bug…
Last but not least one of my teammate noticed a strange behavior while doing simple count on partition. He generated a pure Hive table by inserting the rows of the one created by Spark:
0: jdbc:hive2://zookeeer01.domain.com:2181,zoo> select count(*) from prod_ews_refined.tbl_hive_generated where fab="R8WF" and lot_partition="G3473"; +------+--+ | _c0 | +------+--+ | 137 | +------+--+ 1 row selected (0.045 seconds) 0: jdbc:hive2://zookeeer01.domain.com:2181,zoo> select count(*) from prod_ews_refined.tbl_spark_generated where fab="R8WF" and lot_partition="G3473"; +------+--+ | _c0 | +------+--+ | 130 | +------+--+ 1 row selected (0.058 seconds) |
But by exporting the rows of the Spark generated table in a csv file (–outformat=csv2) the output is the correct one:
Connected to: Apache Hive (version 1.2.1000.2.6.4.0-91) Driver: Hive JDBC (version 1.2.1000.2.6.4.0-91) Transaction isolation: TRANSACTION_REPEATABLE_READ INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from prod_ew...ot_partition="G3473"(Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1565718945091_133078) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 2 2 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 2.11 s -------------------------------------------------------------------------------- 137 rows selected (7.998 seconds) |
So one bug more which has lead to the project of upgrading our cluster to latest HDP version to prepare the migration to Cloudera as Hortonworks is dead…
In meanwhile the non-satisfactory solution we have implemented is to fill a pure Hive table with INSERT AS SELECT from the Spark generated table…