Memory leak in the 6.54_x86_64 for Linux?

Message boards : Graphics cards (GPUs) : Memory leak in the 6.54_x86_64 for Linux?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Kokomiko
Avatar

Send message
Joined: 18 Jul 08
Posts: 190
Credit: 24,093,690
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4592 - Posted: 20 Dec 2008, 1:53:27 UTC

My Linux box has a problem with the 6.54_x86_64. 4 GB RAM is not enough, other WUs are waiting for memory. The 6.54_x86_64 use all of my RAM and I found only 35 MB free. Never seen this problem with the 6.53. After a update from 6.4.2 to 6.4.5 (I still miss the 6.5.0 for Linux 64 bit) the failure is yet not present, I'm still waiting and have a eye on it. On Vista 64 the application 6.55 is only using 35 MB.
ID: 4592 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Libristes>Jip] Elgrande71
Avatar

Send message
Joined: 16 Jul 08
Posts: 45
Credit: 78,618,001
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4606 - Posted: 20 Dec 2008, 11:22:05 UTC - in response to Message 4592.  

Same things for me on GTX280 and 8800GTS512 graphic cards.
I tried to increase ram settings, it seems to fix the problem.
This problem occurred on Q6600 (4Go ram) and celeron d420 (2Go ram) based computers.
ID: 4606 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Venturini Dario[VENETO]

Send message
Joined: 26 Jul 08
Posts: 44
Credit: 4,832,360
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 4657 - Posted: 21 Dec 2008, 8:51:09 UTC

My WU with 6.54 has grown, in approx 1 hour, from 50MB to 180MB and it keeps growing
ID: 4657 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4659 - Posted: 21 Dec 2008, 9:02:24 UTC - in response to Message 4657.  

Try to use 6.5.0.

It cannot be a memory leak in the application if it disappear changing BOINC version. Now the Linux version is also out.

gdf
ID: 4659 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile koschi
Avatar

Send message
Joined: 14 Aug 08
Posts: 127
Credit: 913,858,161
RAC: 18
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4661 - Posted: 21 Dec 2008, 11:30:44 UTC
Last modified: 21 Dec 2008, 11:33:27 UTC

They released the 6.5.0 for Linux as 32bit, unfortunately there is no 64bit build...

I found the same problem now on one of my systems, BOINC 6.4.2 with acemd 6.54, memory usage is increasing pretty fast:
root@frickelbude:~# while true; do ps aux | grep acemd |grep -v grep; sleep 60; done
boinc    23292  8.2  2.8  81544 58488 ?        RNLl 12:04   0:19 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23292  8.2  3.1  87168 64180 ?        RNLl 12:04   0:24 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23292  8.2  3.4  93400 70300 ?        SNLl 12:04   0:29 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23292  8.2  3.6  98600 75576 ?        SNLl 12:04   0:34 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
^C
root@frickelbude:~# invoke-rc.d boinc-client restart
 * Stopping BOINC core client: boinc
   ...done.
 * Starting BOINC core client: boinc
   ...done.
 * Setting up scheduling for BOINC core client and children:
   ...done.
root@frickelbude:~# while true; do ps aux | grep acemd |grep -v grep; sleep 60; done
boinc    23802 42.0  1.8  61272 38172 ?        SNLl 12:11   0:00 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23802  8.6  2.0  64480 41464 ?        SNLl 12:11   0:05 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23802  8.4  2.2  70224 47152 ?        SNLl 12:11   0:10 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23802  8.3  2.5  75828 52840 ?        SNLl 12:11   0:15 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23802  8.3  2.8  81564 58536 ?        SNLl 12:11   0:20 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23802  8.2  3.1  87208 64228 ?        SNLl 12:11   0:24 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23802  8.2  3.4  93404 70304 ?        SNLl 12:11   0:29 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23802  8.2  3.6  98616 75608 ?        RNLl 12:11   0:34 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23802  8.2  3.9 104368 81304 ?        RNLl 12:11   0:39 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23802  8.2  4.2 110000 87004 ?        RNLl 12:11   0:44 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    23802  8.2  4.5 115744 92696 ?        RNLl 12:11   0:49 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0


System is the following:
http://www.sysprofile.de/id84658
ID: 4661 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4666 - Posted: 21 Dec 2008, 12:11:58 UTC - in response to Message 4661.  

Which WU is that for?

gdf
ID: 4666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile koschi
Avatar

Send message
Joined: 14 Aug 08
Posts: 127
Credit: 913,858,161
RAC: 18
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4671 - Posted: 21 Dec 2008, 12:33:20 UTC

mC16040-SH2_US-5-40-SH2_US1720000_0
http://www.gpugrid.net/result.php?resultid=173191
ID: 4671 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Libristes>Jip] Elgrande71
Avatar

Send message
Joined: 16 Jul 08
Posts: 45
Credit: 78,618,001
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4682 - Posted: 21 Dec 2008, 16:33:56 UTC - in response to Message 4671.  

Look at this type of WU, host concerned and the other.
It's a bit frustating.
ID: 4682 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bok

Send message
Joined: 31 Oct 08
Posts: 10
Credit: 6,090,581
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwatwat
Message 4684 - Posted: 21 Dec 2008, 18:33:36 UTC

I've got the same problem :(

this host

Just started a day or so ago.
ID: 4684 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile koschi
Avatar

Send message
Joined: 14 Aug 08
Posts: 127
Credit: 913,858,161
RAC: 18
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4686 - Posted: 21 Dec 2008, 19:06:32 UTC
Last modified: 21 Dec 2008, 19:19:10 UTC

I have two units on my dualcore, which are mC16040-SH2_US-5-40-SH2_US1720000 and ME12403-SH2_US-4-40-SH2_US950000.

Both are waiting for memory now, the first is stuck at 77%, the second at 35%...

I will cancel the 35% one and hope I'll get a WU of another type. We'll see if that makes the difference.

On my other computer a GPUTEST6 unit is running without any issues...

edit:

can't get a new one :-(
"not available for your your type of computer, bla..."
ID: 4686 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 4687 - Posted: 21 Dec 2008, 19:30:28 UTC - in response to Message 4686.  

Please move to 6.4.5 and see if the problem disapper as reported by the first post.
6.4.5 now is safe after the server bug fixed.
gdf
ID: 4687 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DeleteNull

Send message
Joined: 28 Aug 08
Posts: 10
Credit: 142,385,295
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4688 - Posted: 21 Dec 2008, 19:40:20 UTC
Last modified: 21 Dec 2008, 19:45:33 UTC

I have the memory leak too.

My system: Intel-Quad-6600 Opensuse 11.0 bit, Boinc_6.5.0 64 bit, NVidia 260.

In BOINC/slots/0 (this is the slot which acemd_6.54 is using) you will find two files with the same size (output.dcd and output.vel.dcd) and the size of the files is exactly the amount of memory the "acemd_6.54 process" is using).

Both files (and the process) are growing and growing....

I don't have a windows system (with CUDA), so i cannot compare this with the windows-files......
ID: 4688 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile koschi
Avatar

Send message
Joined: 14 Aug 08
Posts: 127
Credit: 913,858,161
RAC: 18
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4689 - Posted: 21 Dec 2008, 20:04:55 UTC
Last modified: 21 Dec 2008, 20:18:36 UTC

I upgrade to 6.4.5 now, the remaining task stays in "waiting for memory" mode and again I got:

So 21 Dez 2008 20:48:57 CET|GPUGRID|Message from server: No work sent
So 21 Dez 2008 20:48:57 CET|GPUGRID|Message from server: Full-atom molecular dynamics for Cell processor is not available for your type of computer.
So 21 Dez 2008 20:48:57 CET|GPUGRID|Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.


My slot 14 which holds the stalled 77% WU is just 19MB...

edit:

I'm going nuts, without any help from my side the work unit continued crunching now. I wonder how, as there can hardly be more memory available than 30 minutes ago, when I restarted my system and the WU wouldn't start.
But again the app eats up my memory:
root@frickelbude:/var/lib/boinc-client/slots# while true; do ps aux | grep acemd | grep -v grep; sleep 60;done
boinc    12414  9.1  2.9  83496 60488 ?        RNLl 21:10   0:24 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    12414  8.8  3.2  89096 66128 ?        RNLl 21:10   0:29 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
boinc    12414  8.7  3.4  94748 71764 ?        RNLl 21:10   0:34 acemd_6.54_x86_64-pc-linux-gnu__cuda --device 0
root@frickelbude:/var/lib/boinc-client/slots# ps aux | head -1
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ID: 4689 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DeleteNull

Send message
Joined: 28 Aug 08
Posts: 10
Credit: 142,385,295
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 4690 - Posted: 21 Dec 2008, 20:36:22 UTC

...after a few minutes the files in the slot are growing very slow, but the application is growing with the same speed from the beginning.

(more than 1 Gig)
ID: 4690 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile koschi
Avatar

Send message
Joined: 14 Aug 08
Posts: 127
Credit: 913,858,161
RAC: 18
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4724 - Posted: 22 Dec 2008, 10:58:06 UTC

This morning my Quad was crunching a SH2_USPME-5 workunit (pN16075-SH2_USPME-5-40-SH2_USPME470000) and the memory usage was again increasing. I stopped it and started a GPUTEST6 unit (lY10341-GPUTEST6-1-20-acemd_0), the memory usage is stable, not a single Kbyte more after some minutes... Now I stopped the GPUTEST6 and started one SH2_US (to20339-SH2_US_1-5-40-SH2_US_1240000_0), immediately the memory usage starts growing...
ID: 4724 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4737 - Posted: 22 Dec 2008, 17:29:30 UTC
Last modified: 22 Dec 2008, 17:30:36 UTC

So we can say for certain that the linux GPU client or driver has a problem with certain WUs, probably independent of the BOINC client?
How many credits do these WUs yield?

MrS
Scanning for our furry friends since Jan 2002
ID: 4737 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Venturini Dario[VENETO]

Send message
Joined: 26 Jul 08
Posts: 44
Credit: 4,832,360
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 4740 - Posted: 22 Dec 2008, 17:56:34 UTC

I had already upgraded to 6.4.5 when I posted in this thread about the memory leak.

Btw, I am out of work now, and not getting new WUs. Bah.
ID: 4740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Libristes>Jip] Elgrande71
Avatar

Send message
Joined: 16 Jul 08
Posts: 45
Credit: 78,618,001
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 4754 - Posted: 22 Dec 2008, 19:47:11 UTC - in response to Message 4740.  

Another WU with memory exceeded error :
ZO25834-SH2_USPME_1-0-40-SH2_USPME_110000_0
Workunit 130854
Created 22 Dec 2008 10:31:25 UTC
Sent 22 Dec 2008 11:15:21 UTC
Received 22 Dec 2008 16:43:18 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -177 (0xffffffffffffff4f)
Computer ID 15576
Report deadline 26 Dec 2008 11:15:21 UTC
CPU time 7496.045
stderr out

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
Maximum memory exceeded
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 8800 GTS 512"
# Clock rate: 1620000 kilohertz
# Number of multiprocessors: 16
# Number of cores: 128
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>

Could it be fixed soon please ?
ID: 4754 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile koschi
Avatar

Send message
Joined: 14 Aug 08
Posts: 127
Credit: 913,858,161
RAC: 18
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4777 - Posted: 23 Dec 2008, 6:20:22 UTC
Last modified: 23 Dec 2008, 6:23:08 UTC

I found now an acemd_6.57_x86_64-pc-linux-gnu__cuda process running, unfortunately memory usage is still growing over the time... CPU usage of that app is ~40% of one core, on a C2D 3,4GHz... Now Windows and Linux change roles? ;-)
ID: 4777 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile koschi
Avatar

Send message
Joined: 14 Aug 08
Posts: 127
Credit: 913,858,161
RAC: 18
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 4794 - Posted: 23 Dec 2008, 13:41:23 UTC

I'm running the 6.57 on a GPUTEST6 right now, no problems. Load is at ~8% (normal on my system) and memory is stable.
If I switch back to the SH2_USPME unit I get again increasing memory load and 40% load on one CPU core.

There is either something wrong with the WUs or the apps...
My BOINC Version is 6.5.0.
ID: 4794 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Graphics cards (GPUs) : Memory leak in the 6.54_x86_64 for Linux?

©2025 Universitat Pompeu Fabra