Message boards :
Graphics cards (GPUs) :
GA: information and issues
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
| Author | Message |
|---|---|
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
ga10R failled after 7 min. Ton (ftpd) Netherlands |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
StoneagemanSend message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
GA11R....two failed & two completed ok. Looks like there's still an issue! |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Sorry guys, GA is making me sweat too... However for now I am not aware of mistakes in GA11. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Add another exit code 1 (0x1) to the collection: f196r4-TONI_GA11R-0-1-RND1898_0 Edit - And a 'ERROR: file tclutil.cpp line 31: get_Dvec() element 0 (b) ': f136r9-TONI_GA11R-0-1-RND4524_0 |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Hi Richard, could it be that you are getting errors on host 45218 since you upgraded from 6.10.48 to 6.10.51? |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Stopped all suspicious GAxx. There is a small number of GAUS1 out that should run fine, except they produce large uploads. A batch of GAUS2 should work well. Thanks for all of your reports. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi Richard, Interesting question. True, but I don't think you can claim "cause and effect". I upgraded host 43404 to BOINC v6.10.51 at the same time. That host hasn't thrown any errors yet - but then, it hasn't been issued any GA11R tasks either. The other difference between the hosts is that 43404 (factory overclocked 9800GTX+, no errors at the moment) is running NVidia drivers 190.38: host 45218 (stock speed 9800GT, errors on GA11R) I opgraded from 197.13 to 197.45 in the same session as I installed v6.10.51. (Both 197 drivers have difficulty holding my 1600 x 1200 resolution when I switch the DVI KVM to another host). I'm active in BOINC development testing, and I'm not aware of any changes in v6.10.51 that could cause application errors - if anything, the 197 drivers might be more of a problem, because (at least as reported by BOINC) they leave less GPU RAM available for apps to use. Both hosts are currently running TONI_GAUS1 tasks (do they count as 'GA' for the purposes of this thread?): 43404 is at 15%, 45218 is at 65%. That'll be the first head-to-head comparison between the two hosts - resultis in a few hours. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Uh oh. "exit code 1 (0x1)" on f5r6-TONI_GAUS1-0-50-RND5224_1 - that's on 43404, the 9800 GTX+ that was OK overnight. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Yes the driver is more likely to be a culprit than the BOINC v. (And I would be also cautious in cause-effects). Your hosts make an interesing pair. So, if I understand 43404, factory oc -> v 190.38, errors 45218 was 197.13, now 197.45, errors after upgrade (but did not crunch GA before it) is that correct? |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes the driver is more likely to be a culprit than the BOINC v. (And I would be also cautious in cause-effects). I wouldn't put it that way. 43404, factory oc -> v 190.38, 1 error with GAUS1, success with CAPBIND, HERG and pTEEI. Also succeeded with a GA10F and a GA8F, errored with another GA8F. 45218 fair comment, but it has no GA tasks shown from before upgrade, ONLY (so far) GA11R tasks since upgrade. PS - I have a third 'control' host, 43362, with a non-overclicked 9800GT, BOINC v6.10.36, 190.38 driver. But it hasn't got any GA tasks yet.... Got to go out now, will review outcomes when I get back. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm also getting errors on my system with four GT240's. Two things are interesting, - they are all TONI_GA11R-0-1-RND - The system picked up a few Beta 6.22 tasks. Although these all ran successfully, perhaps they interfeered with the GA11R task in some way. I say this because I saw the same pattern a few days ago; Betas OK but other WU's failed. The Failures, 2266574 1428922 3 May 2010 13:02:26 UTC 3 May 2010 15:17:42 UTC Error while computing 4,354.23 518.53 3,429.72 --- ACEMD - GPU molecular dynamics v6.03 (cuda) 2265705 1429788 3 May 2010 21:25:49 UTC 4 May 2010 0:32:03 UTC Error while computing 827.67 108.02 3,429.72 --- ACEMD - GPU molecular dynamics v6.03 (cuda) 2271301 1429669 4 May 2010 1:32:37 UTC 4 May 2010 6:44:26 UTC Error while computing 18,111.57 2,181.78 3,429.72 --- ACEMD - GPU molecular dynamics v6.03 (cuda) f184r5-TONI_GA11R-0-1-RND4020_2 f123r4-TONI_GA11R-0-1-RND7715_1 f193r9-TONI_GA11R-0-1-RND1232_0 (6.10.45, 19745, 3 of the cards on this system are OC'd but complete other tasks OK) |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
f199r1-TONI_GA11R-0-1-RND7585_1 Workunit 1429855 Aangemaakt 4 May 2010 5:26:42 UTC Sent 4 May 2010 5:40:45 UTC Received 4 May 2010 14:56:44 UTC Server state Over Outcome Success Client state Geen Exit status 0 (0x0) Computer ID 47762 Report deadline 9 May 2010 5:40:45 UTC Run time 14756.328125 CPU time 1624.516 stderr out <core_client_version>6.10.50</core_client_version> <![CDATA[ <stderr_txt> # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 295" # Clock rate: 1.24 GHz # Total amount of global memory: 939327488 bytes # Number of multiprocessors: 30 # Number of cores: 240 # Device 1: "GeForce GTX 295" # Clock rate: 1.24 GHz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 30 # Number of cores: 240 MDIO ERROR: cannot open file "restart.coor" # Time per step: 29.504 ms # Approximate elapsed time for entire WU: 14751.813 s called boinc_finish </stderr_txt> ] This one is OK with driver 197.45 and windows-xp-pro and boinc 06.10.50. Ton (ftpd) Netherlands |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
f8r6-TONI_GAUS1-0-50-RND3701_2 Workunit 1431573 Aangemaakt 4 May 2010 5:33:57 UTC Sent 4 May 2010 5:40:45 UTC Received 4 May 2010 15:00:57 UTC Server state Over Outcome Success Client state Geen Exit status 0 (0x0) Computer ID 47762 Report deadline 9 May 2010 5:40:45 UTC Run time 18665.984375 CPU time 2017.328 stderr out <core_client_version>6.10.50</core_client_version> <![CDATA[ <stderr_txt> # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 295" # Clock rate: 1.24 GHz # Total amount of global memory: 939327488 bytes # Number of multiprocessors: 30 # Number of cores: 240 # Device 1: "GeForce GTX 295" # Clock rate: 1.24 GHz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 30 # Number of cores: 240 MDIO ERROR: cannot open file "restart.coor" # Time per step: 28.732 ms # Approximate elapsed time for entire WU: 18675.547 s called boinc_finish </stderr_txt> ]]> This one is also OK, only 60,13MB upload, same driver and boinc-version. Ton (ftpd) Netherlands |
CNT - IQESend message Joined: 21 Sep 09 Posts: 3 Credit: 1,951,950,972 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi, Is it possible to reactivate log information about GPU? We want to know, on which card is WU crunched. We have 4x295 system and now, if here is a error WU, we dont know, which card is bad. In old log was information like: WU is started on GPU 4... Computer: http://www.gpugrid.net/show_host_detail.php?hostid=59988 OS: Ubuntu 9.10 64b Drivers: 195.36.15 Its only TONI-GA WUs Error wus in last 7 days: http://www.gpugrid.net/result.php?resultid=2268379 http://www.gpugrid.net/result.php?resultid=2268375 http://www.gpugrid.net/result.php?resultid=2265727 http://www.gpugrid.net/result.php?resultid=2265544 http://www.gpugrid.net/result.php?resultid=2265061 http://www.gpugrid.net/result.php?resultid=2264541 http://www.gpugrid.net/result.php?resultid=2241247 http://www.gpugrid.net/result.php?resultid=2241094 http://www.gpugrid.net/result.php?resultid=2240609 http://www.gpugrid.net/result.php?resultid=2240370 http://www.gpugrid.net/result.php?resultid=2240225 http://www.gpugrid.net/result.php?resultid=2228009 |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Uh oh. "exit code 1 (0x1)" on f5r6-TONI_GAUS1-0-50-RND5224_1 - that's on 43404, the 9800 GTX+ that was OK overnight. But it was followed immediately by SUCCESS with f92r9-TONI_GAUS1-0-50-RND9426_1. You're not wrong about that upload! |
K1atOdessaSend message Joined: 25 Feb 08 Posts: 249 Credit: 444,646,963 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Is it possible to reactivate log information about GPU? We want to know, on which card is WU crunched. Definitely would like to have that information in the WU's again. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Latest conundrum: f10r3-TONI_GAUS2-0-50-RND3526_0 FAILED on stock 9800GT f79r2-TONI_GAUS2-0-50-RND5339_1 SUCCESS on overclocked 9800GTX+ |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
> Definitely would like to have that information in the WU's again. It should be restored in some coming update in fact. |
K1atOdessaSend message Joined: 25 Feb 08 Posts: 249 Credit: 444,646,963 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
> Definitely would like to have that information in the WU's again. Thanks for the update. |
©2026 Universitat Pompeu Fabra