MR debuging by taking JVM heap dumps

Taking Heap Dump manually:
jmap -histo:live pid (Histogram)
jmap -dump:live,format=b,file=file-name.bin <pid> (dump jvm heap as a file on disk)

Logonto the datanode where the map/reduce jvm is running , run ps -eaf | grep attempt_id to get the pid .
Use Sudo -u “appropriate user to get the heap dump by using jmap command”.
Never use -f option . while taking the dump using jmap .

To analyse the dump , use jhat .
jhat -port “protno” heap_file_path .
What to look for in the Jhat analysis

Object address having highest memory footprints
objects having highest instance count .

Taking HeapDump on OutOfMemoryException using Jvm -XX options
set the following option in Job configuration .
set mapred.child.java.opts ‘-Xmx512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/@taskid@S2sSdebug.hprof ‘.
This option launches the map/reduce task jvm with the value specified thus giving us handle to control various jvm memory related parameters.
Few things to note

-Xmx512m heap memory in MB
-XX:+HeapDumpOnOutOfMemoryError dump heap on disk when jvm goes out of memory
-XX:HeapDumpPath=/tmp/@taskid@S2sSdebug.hprof @taskid@ is replaced by hadoop framework with original taskid which is unique .

One needs to log on to the data nodes and heap dump file would be present at /tmp   , file would be named as @taskid@S2sSdebug.hprof ( @taskid@ is replaced by hadoop framework with the original taskid). Jhat can be used to analyze the dump .
Taking HeapDump on OutOfMemoryException And Collecting the dump files across datanodes in a hdfs location for further analysis .
The above mentioned option required one to log on in the datanode on which the map/reduce task has been spawned , and run jmap , jhat on those machines . A MR task which has 100 of Map/reduce tasks can make this process very difficult . This option mentioned below provides a mechanism to collect all heap dump in a specified hdfs location .
Make a shell script named dump.sh :
#!/bin/sh
text=`echo $PWD | sed ‘s=/=\\_=g’`        (this helps in figuring out which heap dump belongs to which task)
hadoop fs -put heapdump.hprof    /user/kunalg/hprof/$text.hprof

Place the dump.sh script in a hdfs location by using hadoop dfs -put dump.sh “hdfs location (example /user/kunalg/dump.sh) “
Create a dir on hdfs where u want to gather all the heap dumps and give 777 permission to that dir . (example hadoop dfs -chmod -R 777 /user/kunalg/hprof)
Set the following proprties in the MR job

set mapred.child.java.opts ‘-Xmx256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./dump.sh’
set mapred.create.symlink ‘yes’
set mapred.cache.files ‘hdfs:///user/kunalg/dump.sh#dump.sh‘

Run the MR job , any OOME issue in any of the datanode will take a heapdump and place the dump file into the specified hdfs location .
One can verify sane execution of the script in the stdoutLog .
on Stdlogout :

java.lang.OutOfMemoryError: Java heap space
Dumping heap to ./heapdump.hprof ...
Heap dump file created [12039655 bytes in 0.081 secs]
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="./dump.sh"
#   Executing /bin/sh -c "./dump.sh"...

Use Hadoop Default profiler for profiling and finding issues

 set mapred.task.profile '  true'; set mapred.task.profile.params  '-agentlib:hprof=cpu=samples,heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s'   set mapred.task.profile.maps   '0-1'    set mapred.task.profile.reduces   '0-1'

profiler will provide the details of the jvm tasks in the specified range . Location of the dump will be availabe at TaskLogs under profile.out logs section .

Stack-Smash!!

Monday, December 28, 2015

MR debuging by taking JVM heap dumps

MR debuging by taking JVM heap dumps

No comments:

Post a Comment

Blog Archive