GA: information and issues

Message boards : Graphics cards (GPUs) : GA: information and issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16740 - Posted: 2 May 2010, 11:22:42 UTC

ga10R failled after 7 min.


Ton (ftpd) Netherlands
ID: 16740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16743 - Posted: 2 May 2010, 12:22:48 UTC - in response to Message 16740.  

f114r5-TONI_GA10R-0-1-RND7226

Too many errors (may have bug)

ID: 16743 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stoneageman
Avatar

Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16780 - Posted: 3 May 2010, 13:52:58 UTC - in response to Message 16743.  

GA11R....two failed & two completed ok. Looks like there's still an issue!
ID: 16780 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 16781 - Posted: 3 May 2010, 14:10:06 UTC - in response to Message 16780.  
Last modified: 3 May 2010, 14:19:44 UTC

Sorry guys, GA is making me sweat too... However for now I am not aware of mistakes in GA11.
ID: 16781 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16794 - Posted: 3 May 2010, 22:07:09 UTC
Last modified: 3 May 2010, 22:09:26 UTC

Add another exit code 1 (0x1) to the collection:

f196r4-TONI_GA11R-0-1-RND1898_0

Edit - And a 'ERROR: file tclutil.cpp line 31: get_Dvec() element 0 (b) ':

f136r9-TONI_GA11R-0-1-RND4524_0
ID: 16794 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 16797 - Posted: 4 May 2010, 9:13:52 UTC - in response to Message 16794.  

Hi Richard,

could it be that you are getting errors on host 45218 since you upgraded from 6.10.48 to 6.10.51?
ID: 16797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 16798 - Posted: 4 May 2010, 9:16:41 UTC - in response to Message 16797.  
Last modified: 4 May 2010, 9:22:29 UTC

Stopped all suspicious GAxx. There is a small number of GAUS1 out that should run fine, except they produce large uploads. A batch of GAUS2 should work well. Thanks for all of your reports.
ID: 16798 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16799 - Posted: 4 May 2010, 9:37:00 UTC - in response to Message 16797.  

Hi Richard,

could it be that you are getting errors on host 45218 since you upgraded from 6.10.48 to 6.10.51?

Interesting question. True, but I don't think you can claim "cause and effect".

I upgraded host 43404 to BOINC v6.10.51 at the same time. That host hasn't thrown any errors yet - but then, it hasn't been issued any GA11R tasks either.

The other difference between the hosts is that 43404 (factory overclocked 9800GTX+, no errors at the moment) is running NVidia drivers 190.38: host 45218 (stock speed 9800GT, errors on GA11R) I opgraded from 197.13 to 197.45 in the same session as I installed v6.10.51. (Both 197 drivers have difficulty holding my 1600 x 1200 resolution when I switch the DVI KVM to another host). I'm active in BOINC development testing, and I'm not aware of any changes in v6.10.51 that could cause application errors - if anything, the 197 drivers might be more of a problem, because (at least as reported by BOINC) they leave less GPU RAM available for apps to use.

Both hosts are currently running TONI_GAUS1 tasks (do they count as 'GA' for the purposes of this thread?): 43404 is at 15%, 45218 is at 65%. That'll be the first head-to-head comparison between the two hosts - resultis in a few hours.
ID: 16799 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16800 - Posted: 4 May 2010, 9:51:24 UTC

Uh oh. "exit code 1 (0x1)" on f5r6-TONI_GAUS1-0-50-RND5224_1 - that's on 43404, the 9800 GTX+ that was OK overnight.
ID: 16800 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 16801 - Posted: 4 May 2010, 10:11:45 UTC - in response to Message 16800.  

Yes the driver is more likely to be a culprit than the BOINC v. (And I would be also cautious in cause-effects).

Your hosts make an interesing pair. So, if I understand

43404, factory oc -> v 190.38, errors
45218 was 197.13, now 197.45, errors after upgrade (but did not crunch GA before it)

is that correct?
ID: 16801 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16802 - Posted: 4 May 2010, 10:22:01 UTC - in response to Message 16801.  
Last modified: 4 May 2010, 10:32:32 UTC

Yes the driver is more likely to be a culprit than the BOINC v. (And I would be also cautious in cause-effects).

Your hosts make an interesing pair. So, if I understand

43404, factory oc -> v 190.38, errors
45218 was 197.13, now 197.45, errors after upgrade (but did not crunch GA before it)

is that correct?

I wouldn't put it that way.

43404, factory oc -> v 190.38, 1 error with GAUS1, success with CAPBIND, HERG and pTEEI. Also succeeded with a GA10F and a GA8F, errored with another GA8F.

45218 fair comment, but it has no GA tasks shown from before upgrade, ONLY (so far) GA11R tasks since upgrade.

PS - I have a third 'control' host, 43362, with a non-overclicked 9800GT, BOINC v6.10.36, 190.38 driver. But it hasn't got any GA tasks yet....

Got to go out now, will review outcomes when I get back.
ID: 16802 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16803 - Posted: 4 May 2010, 11:24:17 UTC - in response to Message 16802.  
Last modified: 4 May 2010, 11:32:57 UTC

I'm also getting errors on my system with four GT240's.

Two things are interesting,
- they are all TONI_GA11R-0-1-RND
- The system picked up a few Beta 6.22 tasks. Although these all ran successfully, perhaps they interfeered with the GA11R task in some way.

I say this because I saw the same pattern a few days ago; Betas OK but other WU's failed.


The Failures,
2266574 1428922 3 May 2010 13:02:26 UTC 3 May 2010 15:17:42 UTC Error while computing 4,354.23 518.53 3,429.72 --- ACEMD - GPU molecular dynamics v6.03 (cuda)
2265705 1429788 3 May 2010 21:25:49 UTC 4 May 2010 0:32:03 UTC Error while computing 827.67 108.02 3,429.72 --- ACEMD - GPU molecular dynamics v6.03 (cuda)
2271301 1429669 4 May 2010 1:32:37 UTC 4 May 2010 6:44:26 UTC Error while computing 18,111.57 2,181.78 3,429.72 --- ACEMD - GPU molecular dynamics v6.03 (cuda)

f184r5-TONI_GA11R-0-1-RND4020_2
f123r4-TONI_GA11R-0-1-RND7715_1
f193r9-TONI_GA11R-0-1-RND1232_0

(6.10.45, 19745, 3 of the cards on this system are OC'd but complete other tasks OK)
ID: 16803 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16814 - Posted: 4 May 2010, 14:58:26 UTC
Last modified: 4 May 2010, 15:00:00 UTC

f199r1-TONI_GA11R-0-1-RND7585_1
Workunit 1429855
Aangemaakt 4 May 2010 5:26:42 UTC
Sent 4 May 2010 5:40:45 UTC
Received 4 May 2010 14:56:44 UTC
Server state Over
Outcome Success
Client state Geen
Exit status 0 (0x0)
Computer ID 47762
Report deadline 9 May 2010 5:40:45 UTC
Run time 14756.328125
CPU time 1624.516
stderr out <core_client_version>6.10.50</core_client_version>
<![CDATA[
<stderr_txt>
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939327488 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 30
# Number of cores: 240
MDIO ERROR: cannot open file "restart.coor"
# Time per step: 29.504 ms
# Approximate elapsed time for entire WU: 14751.813 s
called boinc_finish

</stderr_txt>
]

This one is OK with driver 197.45 and windows-xp-pro and boinc 06.10.50.
Ton (ftpd) Netherlands
ID: 16814 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16815 - Posted: 4 May 2010, 15:03:29 UTC

f8r6-TONI_GAUS1-0-50-RND3701_2
Workunit 1431573
Aangemaakt 4 May 2010 5:33:57 UTC
Sent 4 May 2010 5:40:45 UTC
Received 4 May 2010 15:00:57 UTC
Server state Over
Outcome Success
Client state Geen
Exit status 0 (0x0)
Computer ID 47762
Report deadline 9 May 2010 5:40:45 UTC
Run time 18665.984375
CPU time 2017.328
stderr out <core_client_version>6.10.50</core_client_version>
<![CDATA[
<stderr_txt>
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939327488 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 30
# Number of cores: 240
MDIO ERROR: cannot open file "restart.coor"
# Time per step: 28.732 ms
# Approximate elapsed time for entire WU: 18675.547 s
called boinc_finish

</stderr_txt>
]]>


This one is also OK, only 60,13MB upload, same driver and boinc-version.
Ton (ftpd) Netherlands
ID: 16815 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile CNT - IQE

Send message
Joined: 21 Sep 09
Posts: 3
Credit: 1,951,950,972
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16820 - Posted: 4 May 2010, 17:56:35 UTC
Last modified: 4 May 2010, 18:42:52 UTC

Hi,

Is it possible to reactivate log information about GPU?

We want to know, on which card is WU crunched.

We have 4x295 system and now, if here is a error WU, we dont know, which card is bad.

In old log was information like:

WU is started on GPU 4...

Computer:

http://www.gpugrid.net/show_host_detail.php?hostid=59988

OS: Ubuntu 9.10 64b
Drivers: 195.36.15


Its only TONI-GA WUs
Error wus in last 7 days:

http://www.gpugrid.net/result.php?resultid=2268379
http://www.gpugrid.net/result.php?resultid=2268375
http://www.gpugrid.net/result.php?resultid=2265727
http://www.gpugrid.net/result.php?resultid=2265544
http://www.gpugrid.net/result.php?resultid=2265061
http://www.gpugrid.net/result.php?resultid=2264541
http://www.gpugrid.net/result.php?resultid=2241247
http://www.gpugrid.net/result.php?resultid=2241094
http://www.gpugrid.net/result.php?resultid=2240609
http://www.gpugrid.net/result.php?resultid=2240370
http://www.gpugrid.net/result.php?resultid=2240225
http://www.gpugrid.net/result.php?resultid=2228009
ID: 16820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16821 - Posted: 4 May 2010, 19:05:30 UTC - in response to Message 16800.  

Uh oh. "exit code 1 (0x1)" on f5r6-TONI_GAUS1-0-50-RND5224_1 - that's on 43404, the 9800 GTX+ that was OK overnight.

But it was followed immediately by SUCCESS with f92r9-TONI_GAUS1-0-50-RND9426_1. You're not wrong about that upload!
ID: 16821 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile K1atOdessa

Send message
Joined: 25 Feb 08
Posts: 249
Credit: 444,646,963
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16826 - Posted: 4 May 2010, 22:15:01 UTC - in response to Message 16820.  

Is it possible to reactivate log information about GPU? We want to know, on which card is WU crunched.


Definitely would like to have that information in the WU's again.
ID: 16826 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16831 - Posted: 5 May 2010, 13:23:08 UTC

Latest conundrum:

f10r3-TONI_GAUS2-0-50-RND3526_0 FAILED on stock 9800GT
f79r2-TONI_GAUS2-0-50-RND5339_1 SUCCESS on overclocked 9800GTX+
ID: 16831 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 16864 - Posted: 6 May 2010, 11:42:19 UTC - in response to Message 16826.  

> Definitely would like to have that information in the WU's again.

It should be restored in some coming update in fact.
ID: 16864 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile K1atOdessa

Send message
Joined: 25 Feb 08
Posts: 249
Credit: 444,646,963
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 16866 - Posted: 6 May 2010, 13:38:14 UTC - in response to Message 16864.  

> Definitely would like to have that information in the WU's again.

It should be restored in some coming update in fact.


Thanks for the update.
ID: 16866 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Graphics cards (GPUs) : GA: information and issues

©2026 Universitat Pompeu Fabra