VM running solx86 platform will crash immediately when stack size set to unlimited.
Customer report this problem when they try to submit simple java application
through SunONE GridEngine. By default, SunONE GridEngine set stack size to
unlimited in execution hosts. Customer will always see this crash if they simply
accept the default value.(most customer just accept default value)
To reproduce this bug:
1. Login to a solx86 machine
2. set stack size to unlimited
$ ulimit -H -s unlimited ; ulimit -s unlimited
3. run java -version
From Andy Schwierskott <###@###.###>
To Grid Engine Support Group <###@###.###>
Subject CPRE Re: Sun Grid Engine 5.3 and java app sigabrt (fwd)
Hi,
just FYI, the reason for the crashed Java app was the "infinity" resource
limit setting of the queue.
I'm wondering how someone can write a software which crashes when the stack
size is set to infinity? All kind of software and any OS version seems to be
vulnerable, Java, simulation apps on Linux, an older version of the IBM C
compiler on AIX...
This is a good example where you really don't know of such a problem is
related to the operating system or not. Probably the customer will tell you
"it runs on XYZ, but crashes on ABC" and you easily might focus your support
efforts on the wrong area;-)
Andy
---------- Forwarded message ----------
Date: Wed, 29 Jan 2003 18:02:58 -0700 (MST)
From: Geoff Shipman <###@###.###>
To: ###@###.###
Subject: Re: Sun Grid Engine 5.3 and java app sigabrt
Andy,
Thanks for the update on the alias I have corrected that in my mailtool. I
recevied word from cu that setting the ulimit values to what the shell was
outside od SGE worked for him. They were happy and gave the OK to close the
case.
Thanks
}From: Andy Schwierskott <###@###.###>
}X-X-Sender: as114086@sr-ergb01-01
}To: Geoff Shipman <###@###.###>
}cc: ###@###.###
}Subject: Re: Sun Grid Engine 5.3 and java app sigabrt
}MIME-Version: 1.0
}
}Geoff,
}
}(please use the alias "###@###.###" for our internal support alias)
}
}I have a guess which turned out to be true in many similar cases: By default
}the SGE queue config sets all Unix resource limits to "unlimited". Some
}applications seem not to be able to handle an "infinite" stack size.
}
}Try to either configure "standard" limits in the queue config or add
}"ulimit" calls for hard and soft limits in the script before the application
}is called (just use these values you get with "ulimit -a" and "ulimit -a
}-H" in the user shell).
}
}Andy
}
}
}
}
}> Hello all,
}>
}> I have a customer running Sun Grid Engine 5.3 on Solaris 8 X86 systems that
has
}> Linux systems that submit jobs to the grid engine.
}>
}> CU has encountered a SIGABRT when issuing a hello world type of java app to
the
}> grid engine. This java app works fine outside of sun grid engine.
}>
}> Is this a known bug or am I missing something please let me know.
}>
}> I am attaching the working and broken trusses from cu as well as the script
he
}> uses to submit the job and the java code plus the classes used. I do not
have
}> a grid environment setup so I am unable to duplicate.
}>
}> Please reply to me directly as I am not on this alias.
}>
}> Thanks
}>
}> Here is the Java sample code ...
}>
}> package test;
}> public class Tester {
}> public static void main(String[] args) {
}> System.out.println("hello world!");
}> }
}> }
}>
}> Here is the shell script that submits the job ...
}>
}> #!/usr/bin/bash
}>
LD_LIBRARY_PATH=/usr/local/j2re1.4.0_03/lib:/usr/local/j2re1.4.0_03/i386:/metro1
}> /opt/dba/sge/lib:/metro1/opt/dba/sge/lib/solaris86
}> CLASSPATH="/metro1/opt/dba/meerkat/classes"
}> export CLASSPATH
}>
}> set
}> #ulimit -a
}> truss -faeo /tmp/java.out.truss -wall /usr/local/j2re1.4.0_03/bin/java
}> test.Tester
}>
}>
}> Geoff Shipman
}> Technical Support Engineer ( OS )
}> OS Team
}