Analyze and fix selenium grid / RC stability issues with firefox (and other browsers)By neokrates, written on June 24, 2010
We had one particular stability problem with Selenium Grid. On Linux, during particularly “heavy” tests with lots of Firefoxes, some browsers seemed to disappear. We found many reasons for that behavior and stabilized our tests. Here are some lessons we learned underway.
Ubuntu Linux 9.x
Debian GNU/Linux 5.0.3
- Selenium grid remote control standalone 1.0.4
- Selenium server 1.0.1
- Selenium grid hub standalone 1.0.4
- Selenium grid tools standalone 1.0.4
Should also work for:
Other Hudson , Unix system, browser and grid combinations
Problem ONE. Memory issues, overcommit and general OOM
That problem occurs if /proc/sys/vm/overcommit memory is set to 1 or 2. Means, that Linux commits more memory for the processes than there actually is. In rare cases, in which all such processes use the committed memory simultaneously, kernel must decide, which process will it dispose of. It than writes down in /var/log/… “killed” and/or “out of memory” kind of messages.
Just to make sure, put 0 in overcommit_memory:
echo "0" > /proc/sys/vm/overcommit_memory
Out-of-memory (OOM) killer
The OOM killer will kill some random process, say rpm or syslog, because the system is short on memory, and the programmer is unable to do anything about it.
That is a risk factor and you can detect that browsers crash using
less /var/log/messages ... Jun 21 18:41:23 diuw-desktop kernel: [2704707.444818] firefox invoked oom-killer: gfp_mask=0x280da, order=0, oomkilladj=0 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444824] firefox cpuset=/mems_allowed=0 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444828] Pid: 25669, comm: firefox Tainted: P 2.6.31-20-generic #58-Ubuntu Jun 21 18:41:23 diuw-desktop kernel: [2704707.444830] Call Trace: Jun 21 18:41:23 diuw-desktop kernel: [2704707.444841] [<c01b5a2f>] oom_kill_process+0x9f/0x250 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444845] [<c01b603e>] ? select_bad_process+0xbe/0xf0 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444848] [<c01b60c1>] __out_of_memory+0x51/0xa0 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444851] [<c01b6163>] out_of_memory+0x53/0xb0 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444854] [<c01b83f6>] __alloc_pages_slowpath+0x3f6/0x490 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444858] [<c01b859f>] __alloc_pages_nodemask+0x10f/0x120 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444861] [<c01ca1f6>] do_anonymous_page+0x66/0x200 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444865] [<c012d2fd>] ? kmap_atomic_prot+0xcd/0xf0 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444868] [<c01cc5e0>] handle_mm_fault+0x330/0x380 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444874] [<c05760f8>] do_page_fault+0x148/0x380 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444877] [<c0575fb0>] ? do_page_fault+0x0/0x380 Jun 21 18:41:23 diuw-desktop kernel: [2704707.444880] [<c0573fe3>] error_code+0x73/0x80 ...
You might want to read this for more info: http://www.win.tue.nl/~aeb/linux/lk/lk-9.html
Problem TWO. Firefox is unstable
On Debian Lenny, we had some stability issues with Firefox 3.5.7 and 3.6.3. That was clear after I saw that the Selenium HUB<->RC connection is there, but the browser is gone.
After some tests and research we found out that the Swiftfox build of Firefox is generally faster and didn’t crash. (It normally replaces ‘classic’ Firefox. User doesn’t see that it is a “different kind of fox” . User gets more speed and stability with Swiftfox.)
Simple way to install:
1. Go http://getswiftfox.com/deb.htm
2. Select your processor
sudo dpkg -i swiftfox-YOUR-VERSION-HERE.deb
General idea for any OS/browser combination is to try different browser versions or watch through bug lists for your particular browser version.
Problem THREE. Too many parallel builds (and open browsers) freeze the system.
System performance and low resources problem will most likely become the functional problem of the Grid and of all started browsers.
How that kind of problem can be identified:
top Linux command to sees current CPU and MEM usage. If it goes too high, builds might be slowed too much and will fail.
Just try to execute commands in console, if system reacts slowly (15-30 sec delay), it is under heavy load.
Check out build times. It tends to be 30-70% more if system is overloaded.
We reduced the number of parallel builds and running RC’s to solve the issue with System overload.
Problem FOUR. Browser and plug-ins
Does browser disappears during particular pages tests?
Do you test the pages with Flash, Silverlight, Windows media etc? If your browser tends to disappear during particular pages tests, browser<->plugin play might be the root cause. Read through bug reports, maybe upgrade the browser or just a plug-in for it. We had some issues with Flash. They where gone as we introduced newer Firefox version.
That’s it, have fun
LEARN MORE (amazon bookstore)
INCOMING SEARCH TERMS