EMR master instance is not reachable

April 29, 2016

I faced a rare issue today. My EMR cluster was not resizing.

Apparently, the master node was facing issue in communicating with EMR.

The cluster status showed "Master instance is not reachable. Please check your master instance status". But I was able to login into master node and use hadoop and HDFS without any issues. Just that the cluster was not resizing.

Searching on the internet just gave one relevant result.

https://forums.aws.amazon.com/thread.jspa?messageID=695687

Although it mentioned that the issue was with the emr 3.x amis and my cluster was a 4.x, I thought this was the same issue. For verifying I checked on my other clusters if any such service was running on them.

$ ps awux | grep instancecontroller

hadoop 2389 0.4 2.0 3584132 309980 ? Sl Apr13 97:41 /etc/alternatives/jre/bin/java -Xmx1024m -XX:OnOutOfMemoryError=kill -9 %p -XX:MinHeapFreeRatio=10 -server -cp /usr/share/aws/emr/instance-controller/lib/*:/home/hadoop/conf -Dlog4j.defaultInitOverride aws157.instancecontroller.Main

hadoop 18522 0.0 0.0 110460 2080 pts/2 S+ 19:08 0:00 grep --color=auto instancecontroller

So I gave it a try using the following command as -XX "kill -9 %p" was giving error

$ /etc/alternatives/jre/bin/java -Xmx1024m -XX:MinHeapFreeRatio=10 -server -cp /usr/share/aws/emr/instance-controller/lib/*:/home/hadoop/conf -Dlog4j.defltInitOverride aws157.instancecontroller.Main &

Alas! the status on EMR console went ok again. All good till now.

Although later I figured out the downsizing still doesn't work. I terminated all TASK nodes manually and CORE nodes too (Always keep data in S3 instead of HDFS). The dfs is showing missing blocks but who cares.

I tried adding a TASK node and it worked.

Comments

Sohil Jain said…

Thank you Vipul. Your forum helped me resolve a production issue today.

There is an error in the command above. Please change defltInitOverride to defaultInitOverride.

May 20, 2019 at 11:08 PM

Unknown said…

@sohil : his command was right i was able to get it correctly

October 30, 2019 at 2:13 AM

Unknown said…

Which command is correct may depend on version. I used:

sudo /etc/init.d/instance-controller start

that file contained the correct command to run for me.

December 1, 2020 at 3:44 AM

Search This Blog

Vipul Agrawal

EMR master instance is not reachable

Comments

Popular posts from this blog

Adding jar files to weka

Writing BOLD text in Gmail Chat