Message boards :
Graphics cards (GPUs) :
Memory leak in the 6.54_x86_64 for Linux?
Message board moderation
| Author | Message |
|---|---|
KokomikoSend message Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My Linux box has a problem with the 6.54_x86_64. 4 GB RAM is not enough, other WUs are waiting for memory. The 6.54_x86_64 use all of my RAM and I found only 35 MB free. Never seen this problem with the 6.53. After a update from 6.4.2 to 6.4.5 (I still miss the 6.5.0 for Linux 64 bit) the failure is yet not present, I'm still waiting and have a eye on it. On Vista 64 the application 6.55 is only using 35 MB.
|
[AF>Libristes>Jip] Elgrande71Send message Joined: 16 Jul 08 Posts: 45 Credit: 78,618,001 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same things for me on GTX280 and 8800GTS512 graphic cards. I tried to increase ram settings, it seems to fix the problem. This problem occurred on Q6600 (4Go ram) and celeron d420 (2Go ram) based computers. |
Venturini Dario[VENETO]Send message Joined: 26 Jul 08 Posts: 44 Credit: 4,832,360 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My WU with 6.54 has grown, in approx 1 hour, from 50MB to 180MB and it keeps growing |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Try to use 6.5.0. It cannot be a memory leak in the application if it disappear changing BOINC version. Now the Linux version is also out. gdf |
koschiSend message Joined: 14 Aug 08 Posts: 127 Credit: 913,858,161 RAC: 18 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
They released the 6.5.0 for Linux as 32bit, unfortunately there is no 64bit build... I found the same problem now on one of my systems, BOINC 6.4.2 with acemd 6.54, memory usage is increasing pretty fast: root@frickelbude:~# while true; do ps aux | grep acemd |grep -v grep; sleep 60; done boinc 23292 8.2 2.8 81544 58488 ? RNLl 12:04 0:19 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23292 8.2 3.1 87168 64180 ? RNLl 12:04 0:24 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23292 8.2 3.4 93400 70300 ? SNLl 12:04 0:29 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23292 8.2 3.6 98600 75576 ? SNLl 12:04 0:34 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 ^C root@frickelbude:~# invoke-rc.d boinc-client restart * Stopping BOINC core client: boinc ...done. * Starting BOINC core client: boinc ...done. * Setting up scheduling for BOINC core client and children: ...done. root@frickelbude:~# while true; do ps aux | grep acemd |grep -v grep; sleep 60; done boinc 23802 42.0 1.8 61272 38172 ? SNLl 12:11 0:00 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23802 8.6 2.0 64480 41464 ? SNLl 12:11 0:05 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23802 8.4 2.2 70224 47152 ? SNLl 12:11 0:10 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23802 8.3 2.5 75828 52840 ? SNLl 12:11 0:15 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23802 8.3 2.8 81564 58536 ? SNLl 12:11 0:20 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23802 8.2 3.1 87208 64228 ? SNLl 12:11 0:24 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23802 8.2 3.4 93404 70304 ? SNLl 12:11 0:29 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23802 8.2 3.6 98616 75608 ? RNLl 12:11 0:34 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23802 8.2 3.9 104368 81304 ? RNLl 12:11 0:39 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23802 8.2 4.2 110000 87004 ? RNLl 12:11 0:44 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 23802 8.2 4.5 115744 92696 ? RNLl 12:11 0:49 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 System is the following: http://www.sysprofile.de/id84658 |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Which WU is that for? gdf |
koschiSend message Joined: 14 Aug 08 Posts: 127 Credit: 913,858,161 RAC: 18 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
mC16040-SH2_US-5-40-SH2_US1720000_0 http://www.gpugrid.net/result.php?resultid=173191 |
[AF>Libristes>Jip] Elgrande71Send message Joined: 16 Jul 08 Posts: 45 Credit: 78,618,001 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|
|
Send message Joined: 31 Oct 08 Posts: 10 Credit: 6,090,581 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|
koschiSend message Joined: 14 Aug 08 Posts: 127 Credit: 913,858,161 RAC: 18 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have two units on my dualcore, which are mC16040-SH2_US-5-40-SH2_US1720000 and ME12403-SH2_US-4-40-SH2_US950000. Both are waiting for memory now, the first is stuck at 77%, the second at 35%... I will cancel the 35% one and hope I'll get a WU of another type. We'll see if that makes the difference. On my other computer a GPUTEST6 unit is running without any issues... edit: can't get a new one :-( "not available for your your type of computer, bla..." |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Please move to 6.4.5 and see if the problem disapper as reported by the first post. 6.4.5 now is safe after the server bug fixed. gdf |
|
Send message Joined: 28 Aug 08 Posts: 10 Credit: 142,385,295 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have the memory leak too. My system: Intel-Quad-6600 Opensuse 11.0 bit, Boinc_6.5.0 64 bit, NVidia 260. In BOINC/slots/0 (this is the slot which acemd_6.54 is using) you will find two files with the same size (output.dcd and output.vel.dcd) and the size of the files is exactly the amount of memory the "acemd_6.54 process" is using). Both files (and the process) are growing and growing.... I don't have a windows system (with CUDA), so i cannot compare this with the windows-files...... |
koschiSend message Joined: 14 Aug 08 Posts: 127 Credit: 913,858,161 RAC: 18 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I upgrade to 6.4.5 now, the remaining task stays in "waiting for memory" mode and again I got: So 21 Dez 2008 20:48:57 CET|GPUGRID|Message from server: No work sent So 21 Dez 2008 20:48:57 CET|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer. So 21 Dez 2008 20:48:57 CET|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer. My slot 14 which holds the stalled 77% WU is just 19MB... edit: I'm going nuts, without any help from my side the work unit continued crunching now. I wonder how, as there can hardly be more memory available than 30 minutes ago, when I restarted my system and the WU wouldn't start. But again the app eats up my memory: root@frickelbude:/var/lib/boinc-client/slots# while true; do ps aux | grep acemd | grep -v grep; sleep 60;done boinc 12414 9.1 2.9 83496 60488 ? RNLl 21:10 0:24 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 12414 8.8 3.2 89096 66128 ? RNLl 21:10 0:29 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 boinc 12414 8.7 3.4 94748 71764 ? RNLl 21:10 0:34 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0 root@frickelbude:/var/lib/boinc-client/slots# ps aux | head -1 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND |
|
Send message Joined: 28 Aug 08 Posts: 10 Credit: 142,385,295 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
...after a few minutes the files in the slot are growing very slow, but the application is growing with the same speed from the beginning. (more than 1 Gig) |
koschiSend message Joined: 14 Aug 08 Posts: 127 Credit: 913,858,161 RAC: 18 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This morning my Quad was crunching a SH2_USPME-5 workunit (pN16075-SH2_USPME-5-40-SH2_USPME470000) and the memory usage was again increasing. I stopped it and started a GPUTEST6 unit (lY10341-GPUTEST6-1-20-acemd_0), the memory usage is stable, not a single Kbyte more after some minutes... Now I stopped the GPUTEST6 and started one SH2_US (to20339-SH2_US_1-5-40-SH2_US_1240000_0), immediately the memory usage starts growing... |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So we can say for certain that the linux GPU client or driver has a problem with certain WUs, probably independent of the BOINC client? How many credits do these WUs yield? MrS Scanning for our furry friends since Jan 2002 |
Venturini Dario[VENETO]Send message Joined: 26 Jul 08 Posts: 44 Credit: 4,832,360 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had already upgraded to 6.4.5 when I posted in this thread about the memory leak. Btw, I am out of work now, and not getting new WUs. Bah. |
[AF>Libristes>Jip] Elgrande71Send message Joined: 16 Jul 08 Posts: 45 Credit: 78,618,001 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Another WU with memory exceeded error : ZO25834-SH2_USPME_1-0-40-SH2_USPME_110000_0 Workunit 130854 Created 22 Dec 2008 10:31:25 UTC Sent 22 Dec 2008 11:15:21 UTC Received 22 Dec 2008 16:43:18 UTC Server state Over Outcome Client error Client state Compute error Exit status -177 (0xffffffffffffff4f) Computer ID 15576 Report deadline 26 Dec 2008 11:15:21 UTC CPU time 7496.045 stderr out <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> Maximum memory exceeded </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce 8800 GTS 512" # Clock rate: 1620000 kilohertz # Number of multiprocessors: 16 # Number of cores: 128 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> Could it be fixed soon please ? |
koschiSend message Joined: 14 Aug 08 Posts: 127 Credit: 913,858,161 RAC: 18 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I found now an acemd_6.57_x86_64-pc-linux-gnu__cuda process running, unfortunately memory usage is still growing over the time... CPU usage of that app is ~40% of one core, on a C2D 3,4GHz... Now Windows and Linux change roles? ;-) |
koschiSend message Joined: 14 Aug 08 Posts: 127 Credit: 913,858,161 RAC: 18 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm running the 6.57 on a GPUTEST6 right now, no problems. Load is at ~8% (normal on my system) and memory is stable. If I switch back to the SH2_USPME unit I get again increasing memory load and 40% load on one CPU core. There is either something wrong with the WUs or the apps... My BOINC Version is 6.5.0. |
©2025 Universitat Pompeu Fabra