Monday, December 28, 2015

MR debuging by taking JVM heap dumps

MR debuging by taking JVM heap dumps

Taking Heap Dump manually:
jmap -histo:live  pid  (Histogram)
jmap -dump:live,format=b,file=file-name.bin <pid> (dump jvm heap as a file on disk)
  1. Logonto the datanode where the map/reduce jvm is running ,  run ps -eaf | grep attempt_id  to get the pid .
  2. Use Sudo -u “appropriate user to get the heap dump by using jmap command”.
  3. Never use -f option . while taking the dump using jmap .

To analyse the dump , use jhat .
jhat  -port “protno”  heap_file_path .
What to look for in the Jhat analysis
  1. Object address having highest memory footprints
  2. objects having highest instance count .
Taking HeapDump on OutOfMemoryException using Jvm -XX options
set the following option in Job configuration .
set  ‘-Xmx512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/@taskid@S2sSdebug.hprof ‘.
This option launches the map/reduce task jvm with the value specified thus giving us  handle to control various jvm memory related parameters.
Few things to note
  1. -Xmx512m                                                                                        heap memory in MB
  2. -XX:+HeapDumpOnOutOfMemoryError                                           dump heap on disk when jvm goes  out of memory
  3. -XX:HeapDumpPath=/tmp/@taskid@S2sSdebug.hprof                   @taskid@   is replaced by hadoop framework with original taskid which is unique .
One needs to log on to the data nodes and heap dump file would be present at  /tmp   , file would be named as @taskid@S2sSdebug.hprof  ( @taskid@   is replaced by hadoop framework with the original taskid). Jhat can be used to analyze the dump .
Taking HeapDump on OutOfMemoryException And Collecting the  dump files across datanodes  in a  hdfs location for further analysis .
The above mentioned option required one to log on in the  datanode on which the map/reduce task has been spawned , and run jmap , jhat on those machines . A MR task which has 100 of Map/reduce tasks can make this process very difficult . This option mentioned below provides a mechanism to collect all heap dump in a specified hdfs location .
Make a shell script named :
text=`echo $PWD | sed ‘s=/=\\_=g’`        (this helps in figuring out which heap dump belongs to which task)
hadoop fs -put heapdump.hprof    /user/kunalg/hprof/$text.hprof
  1.  Place the  script in a hdfs location by using hadoop dfs -put   “hdfs location (example /user/kunalg/ “
  2.  Create a dir on hdfs where u want to gather all the heap dumps and give 777 permission to that dir . (example hadoop dfs -chmod -R 777 /user/kunalg/hprof)
  3. Set the following proprties in the MR job
  • set ‘-Xmx256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof  -XX:OnOutOfMemoryError=./’
  • set mapred.create.symlink ‘yes’
  • set mapred.cache.files ‘hdfs:///user/kunalg/
Run the MR job , any OOME issue in any of the datanode will take a heapdump and place the dump file into the specified hdfs location .
One can verify sane execution of the script  in the stdoutLog .
on Stdlogout :
java.lang.OutOfMemoryError: Java heap space
Dumping heap to ./heapdump.hprof ...
Heap dump file created [12039655 bytes in 0.081 secs]
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="./"
#   Executing /bin/sh -c "./"...
Use Hadoop Default profiler for profiling and finding issues

 set mapred.task.profile '  true'; set mapred.task.profile.params  '-agentlib:hprof=cpu=samples,heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s'   set mapred.task.profile.maps   '0-1'    set mapred.task.profile.reduces   '0-1' 
profiler will  provide the details  of the jvm  tasks in the specified range . Location of the dump will be  availabe at TaskLogs  under profile.out logs section .

No comments:

Post a Comment