Message boards :
Number crunching :
New NOELIA Longruns
Message board moderation
Previous · 1 · 2
| Author | Message |
|---|---|
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
BTW these "fixed" NOELIA tasks are running fine on all of my hosts. One of these workunits was stuck on my GTX590 for 7 hours. The progress indicator did not increased since my previous post. A system restart helped. Bye-bye 24 hours bonus.... |
dskagcommunitySend message Joined: 28 Apr 11 Posts: 463 Credit: 958,266,958 RAC: 34 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Soon my first fixed NOELIA Unit is finished on cuda31. Unfortunaly it is the first WU on my 285GTX that need more then 24hours (25hours ;)) to compute :( Bye-bye 24 hours bonus.... But it seems to work here.. (this one, we will see on the next ones ^^) DSKAG Austria Research Team: http://www.research.dskag.at
|
rittermSend message Joined: 31 Jul 09 Posts: 88 Credit: 244,413,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
And another one bites the dust after almost 7 hours: run9_replica14-NOELIA_sh2fragment_fixed-1-4-RND1629_2 Stderr output includes: "SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59" |
dskagcommunitySend message Joined: 28 Apr 11 Posts: 463 Credit: 958,266,958 RAC: 34 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Soon my first fixed NOELIA Unit is finished on cuda31. Unfortunaly it is the first WU on my 285GTX that need more then 24hours (25hours ;)) to compute :( Bye-bye 24 hours bonus.... But it seems to work here.. (this one, we will see on the next ones ^^) Dskagcommunity you might want to update to 301.42 drivers. They still should work on your GTX285. That way you can get the speed advantages of the cuda 4.2 app. BOINC blog |
dskagcommunitySend message Joined: 28 Apr 11 Posts: 463 Credit: 958,266,958 RAC: 34 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I downgrade them because 42 runs much slower on 285gtx. Up to 15000secs! Included some wus that erroring because i dont want to change anything of the stockclocked cardsettings. So i prevere to compute one sort of wus in over 24hours but the rest computes for sure and secure in good times. DSKAG Austria Research Team: http://www.research.dskag.at
|
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I downgrade them because 42 runs much slower on 285gtx. Up to 15000secs! Included some wus that erroring because i dont want to change anything of the stockclocked cardsettings. So i prevere to compute one sort of wus in over 24hours but the rest computes for sure and secure in good times. I wonder if you were getting the downclock bug, which only appears if the cards are running hot (but not overheating). Its supposedly fixed in the 304 (beta) drivers but I have heard of issues with running other project apps (Seti) with 304 drivers. What sort of temps is your GTX285 running at under load? BOINC blog |
dskagcommunitySend message Joined: 28 Apr 11 Posts: 463 Credit: 958,266,958 RAC: 34 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
79-80 degrees in a room with AC with manual set fan speed cos normal it would run with 90-92 that i found little to much for 24h operation. dont installin beta things. When there is a downclocking problem, then i hope 304 is soon available for stable release ^^ someone told me to raise gpu voltage a minimum to kill the errorthing but i dont want to touch these settings because i have only bad experience with such things. Ether it runs like it is from the factory or not ;) thats the ground why i dont buy any OC Cards. DSKAG Austria Research Team: http://www.research.dskag.at
|
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
79-80 degrees in a room with AC with manual set fan speed cos normal it would run with 90-92 that i found little to much for 24h operation. dont installin beta things. When there is a downclocking problem, then i hope 304 is soon available for stable release ^^ someone told me to raise gpu voltage a minimum to kill the errorthing but i dont want to touch these settings because i have only bad experience with such things. Ether it runs like it is from the factory or not ;) thats the ground why i dont buy any OC Cards. Yep that would be enough to trip the overheating bug. Well anyway at least you know why 301.42 doesn't work well for your config. Hopefully they'll sort out the issues with 304 and get a good one out the door soon. BOINC blog |
|
Send message Joined: 21 Mar 10 Posts: 23 Credit: 861,667,631 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My system is currently processing WU 3601638 which appears to now be at 41 hours elapsed and 61% completed. This appears likely to be 2 or 2.5 times longer than most units that my system has worked recently. Does this appear to be a WU that I should allow to run to completion, or is it indicating a problem. Seems that most WU of this type of had problems with immediate failures. Don't know if this is also a problem of a cuda31 process on my machine (which seems like it has been handling cuda42 work properly). |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Presumably you mean this task: run10_replica19-NOELIA_sh2fragment_fixed-0-4-RND1077_1 3601638 1 Aug 2012 | 14:31:44 UTC 4 Aug 2012 | 22:41:22 UTC Completed and validated 240,549.78 7,615.19 67,500.00 Long runs (8-12 hours on fastest card) v6.16 (cuda31) There is probably more than one thing affecting performance in this case. As you say it ran under the 3.1app, which is around 50% slower for most tasks. Still, 2.8 days on a GT550Ti is too long: Although the 3.1tasks report a Stderr output file, it doesn't show anything interesting in this case: Stderr output <core_client_version>6.12.34</core_client_version> <![CDATA[ <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 550 Ti" # Clock rate: 1.80 GHz # Total amount of global memory: 1072889856 bytes # Number of multiprocessors: 4 # Number of cores: 32 MDIO: cannot open file "restart.coor" # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 550 Ti" # Clock rate: 1.80 GHz # Total amount of global memory: 1072889856 bytes # Number of multiprocessors: 4 # Number of cores: 32 # Time per step (avg over 1930000 steps): 48.186 ms # Approximate elapsed time for entire WU: 240929.858 s 18:17:15 (28048): called boinc_finish </stderr_txt> ]]> The task was just stopped once during the run (system restart for example). I don't know what the expected 'Time per step' values are for these tasks on a similar card. In this case my guess is that your GPU downclocked for a period during the run; the GPU ran at 100MHz for a while. Maybe after the restart it ran at normal clock rates. Increasing the GPU fan speed can sometimes prevent downclocking. One other possibility is that your CPU (a dual core Opteron) was struggling; I think the 3.1app is more demanding of the CPU which isn't really high end. I think your system also uses DDR2 which can cause some performance loss (seen in GPU utilization). If you are also running CPU tasks, 1 would be fine but 2 would result in a poor performance for the GPU (unless you redefined nice values). Just running these tasks on the new app seems to avoid this problem - your card ran a similar NOELIA_sh2fragment_fixed task on the cuda42 app in almost 1/3rd the time. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
©2025 Universitat Pompeu Fabra