EMR master instance is not reachable


I faced a rare issue today.  My EMR cluster was not resizing.
Apparently, the master node was facing issue in communicating with EMR.
The cluster status showed "Master instance is not reachable. Please check your master instance status". But I was able to login into master node and use hadoop and HDFS without any issues. Just that the cluster was not resizing.

Searching on the internet just gave one relevant result.
https://forums.aws.amazon.com/thread.jspa?messageID=695687

Although it mentioned that the issue was with the emr 3.x amis and my cluster was a 4.x, I thought this was the same issue. For verifying I checked on my other clusters if any such service was running on them.

$ ps awux | grep instancecontroller
hadoop    2389  0.4  2.0 3584132 309980 ?      Sl   Apr13  97:41 /etc/alternatives/jre/bin/java -Xmx1024m -XX:OnOutOfMemoryError=kill -9 %p -XX:MinHeapFreeRatio=10 -server -cp /usr/share/aws/emr/instance-controller/lib/*:/home/hadoop/conf -Dlog4j.defaultInitOverride aws157.instancecontroller.Main
hadoop   18522  0.0  0.0 110460  2080 pts/2    S+   19:08   0:00 grep --color=auto instancecontroller

So I gave it a try using the following command as -XX "kill -9 %p" was giving error

$ /etc/alternatives/jre/bin/java -Xmx1024m -XX:MinHeapFreeRatio=10 -server -cp /usr/share/aws/emr/instance-controller/lib/*:/home/hadoop/conf -Dlog4j.defltInitOverride aws157.instancecontroller.Main &

Alas! the status on EMR console went ok again. All good till now.
Although later I figured out the downsizing still doesn't work. I terminated all TASK nodes manually and CORE nodes too (Always keep data in S3 instead of HDFS). The dfs is showing missing blocks but who cares.
I tried adding a TASK node and it worked. 

Comments

Sohil Jain said…
Thank you Vipul. Your forum helped me resolve a production issue today.

There is an error in the command above. Please change defltInitOverride to defaultInitOverride.
Unknown said…
@sohil : his command was right i was able to get it correctly
Unknown said…
Which command is correct may depend on version. I used:

sudo /etc/init.d/instance-controller start

that file contained the correct command to run for me.

Popular posts from this blog

Adding jar files to weka

Compiz Desktop Effects Keyboard Shortcuts