Beta testing starting soon

Message boards : News : Beta testing starting soon
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28454 - Posted: 8 Feb 2013, 23:46:26 UTC - in response to Message 28450.  

Hi,
we cannot test until the beta queue is cleared.

There is a problem with the new app and it has been difficult test it out.

gdf


Well I will help to clear the beta queue, but I don't get them often. I have checked "run test applications" and " beta".
Greetings from TJ
ID: 28454 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 57
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28466 - Posted: 10 Feb 2013, 2:14:32 UTC

I had particularly bad experience with a beta unit. Most bad WU simply give you computation error message when they crash, and you go on to the next WU, without any reboot or computer crash. This unit ran for a few seconds, froze up the computer, then blue screen, and the computer reboots. It did this few times, before I aborted the unit. It also cause another perfectly good WU to crash as well.

Here is the link:

http://www.gpugrid.net/workunit.php?wuid=4137081
ID: 28466 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28467 - Posted: 10 Feb 2013, 13:37:53 UTC - in response to Message 28466.  

This task appeared to do something similar; cause a system reboot somehow.

6486709 4137080 139265 10 Feb 2013 | 5:57:04 UTC 10 Feb 2013 | 13:11:46 UTC Error while computing 2.15 0.03 --- ACEMD beta version v6.48 (cuda42)

http://www.gpugrid.net/workunit.php?wuid=4137080

Thanks,
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 28467 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28476 - Posted: 11 Feb 2013, 23:53:07 UTC

I think this one belongs in the list too:

http://www.gpugrid.net/workunit.php?wuid=4138484

It gave me a BSOD after 7 seconds with

The problem seems to be caused by the following file: dxgkrnl.sys

STOP: 0x00000116 (0xfffffa801b5de010, 0xfffff88006dc4404, 0x0000000000000000,
0x000000000000000d)

That's from BlueScreenView: the original had a line about an NVIDIA driver crash and not restarting within the time allowed. BSV doesn't seem able to retrieve that information - I'll record it manually (and accurately) if it happens again.
ID: 28476 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nate

Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 28482 - Posted: 12 Feb 2013, 15:17:04 UTC

Thanks for bringing it to our attention. There was an issue building a small number of the simulations, which our checks didn't catch before they were sent out. We have cancelled the work units that were crashing machines, but it is possible that there are others so let us know if it happens again. Crashing your machines is obviously the last thing we want to do. In the future we can avoid this with additional checks we'll be doing for this type of work unit.

nate
ID: 28482 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28483 - Posted: 12 Feb 2013, 15:29:39 UTC

This WU: trypsin_lig_904_3-NOELIA_RC3_equ-0-1-RND0962, resultied in the nVidia driver to stop. However it recovered automatically without booting the system.
Greetings from TJ
ID: 28483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28486 - Posted: 13 Feb 2013, 9:31:45 UTC
Last modified: 13 Feb 2013, 9:38:16 UTC

Been running a few betas now. GPU load varies between WUs in the range of 2x - 3x% (GTX660Ti). Accordingly, Power consumption, temperature, fan speed and memeory controller load are really low. Runtimes for 1500 credit-WUs vary between 1700 and 4000s.

Edit: another observation.. GPU Grid beta 6.48 gets an entire core of my i7. However, GPU load decreased considerably as soon as I ran one Einstein CPU task along. Running 6 Einsteins reduces GPU load by ~25%. I don't think the regular apps have been this fragile.

MrS
Scanning for our furry friends since Jan 2002
ID: 28486 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28494 - Posted: 13 Feb 2013, 16:22:27 UTC - in response to Message 28476.  

Had a repeat of my BSOD, though this time it seemed to be another project which triggered it.

The exact phrase on screen is:

"Attempt to reset the display driver and recover from timeout failed" (stop 116)

There was also a reference to nvlddkm.sys

Host is an i7 3770K with dual Gainward Phantom GTX 670 - driver 310.90 WHQL
ID: 28494 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 57
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28502 - Posted: 13 Feb 2013, 23:14:18 UTC

Here is a beta unit that ran rather slowly.

trypsin_lig_491_run4-NOELIA_RL3_equ-0-1-RND5688

13 Feb 2013 | 11:57:24 UTC 13 Feb 2013 | 19:50:06 UTC Completed and validated 26,096.32 25,685.34 1,500.00 ACEMD beta version v6.48 (cuda42)


See link:

http://www.gpugrid.net/workunit.php?wuid=4142465

ID: 28502 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28517 - Posted: 14 Feb 2013, 15:40:08 UTC

I have set my system to accept only beta WU's to help clear the queue. However today I got 9 WU's that error out quickly. All are Noelia's run 2 and run 3 and only one run 4. All the run4-Noelia from yesterday and this morning (11 and 4) finished correctly.

This is the error message:
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
ERROR: file mdioload.cpp line 207: Error reading parmtop file
called boinc_finish

</stderr_txt>
]]>

Wing(wo)ma(e)n had error for same WU's.
Greetings from TJ
ID: 28517 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28531 - Posted: 15 Feb 2013, 15:33:56 UTC
Last modified: 15 Feb 2013, 15:34:33 UTC

This one: trypsin_lig_1259_run2-NOELIA_RL3_equ-0-1-RND7950 and 2 more (1 run1) where resulting in an unresponsive system. Mouse pointer was moveable not click-able. All windows freeze for a few minutes then screen blank, and back with a notification that display driver had recovered, but again all windows freeze immediately. I had to abort these step by step in the seconds the system was responsive.
WinVista x64 ultimate, i7, 12GB, GTX285, driver 310.90 CUDA version 5, BOINC 7.0.28

The system is now running a short run (cuda31) without problems so far.
Greetings from TJ
ID: 28531 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28564 - Posted: 17 Feb 2013, 15:11:21 UTC - in response to Message 28450.  

Hi,
we cannot test until the beta queue is cleared.

OK, we seem to be done. The server status page says there are no tasks in the Beta queue, and my log just got these messages:

17/02/2013 15:07:56 | GPUGRID | Reporting 1 completed tasks
17/02/2013 15:07:56 | GPUGRID | Requesting new tasks for NVIDIA
17/02/2013 15:07:58 | GPUGRID | No tasks sent
17/02/2013 15:07:58 | GPUGRID | No tasks are available for ACEMD beta version

So, fastening my seat belt and holding on tight for the next twist in the roller-coaster ride that is beta testing... :-)
ID: 28564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28565 - Posted: 17 Feb 2013, 15:28:28 UTC - in response to Message 28564.  
Last modified: 17 Feb 2013, 15:57:42 UTC

ACEMD beta version 0 248 0.57 (0.17 - 1.26) 111

There are 248 in progress.
If they fail they will return and go back into the queue to be resent repeatedly until the x_7th failure.
If they succeed do they auto-generate a new task?

- Just got a couple of Beta's. The first one basically killed my system!
Lots of GPU driver restarts, Blue screen/crash, recovered to windows, Boinc starts, No GPU detected, closed and opened Boinc, Still no GPU detected. Restarted, Windows wouldn't start up; just keeps restarting.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 28565 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28569 - Posted: 17 Feb 2013, 18:26:31 UTC - in response to Message 28565.  

Well, if a task crashes, a replacement is generated. I just got

http://www.gpugrid.net/workunit.php?wuid=4144581

courtesy of somebody who hoarded it for three days and then crashed it (shouldn't really be doing Beta testing on an anonymous host, and certainly not for this project on a host with a three-day turnround)

skgiven, if you're warning us about rogue tasks which are likely to blue-screen our machines, can you identify the tasks, please? I'd like to know whether my resend is from the same batch, so I can make appropriate decisions how to leave my test machine when I go out later this evening.
ID: 28569 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28570 - Posted: 17 Feb 2013, 19:20:39 UTC - in response to Message 28569.  

Probably this one, trypsin_lig_904_4-NOELIA_RC3_equ-0-1-RND8427_4

I would suggest you run a nice Long WU :)

FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 28570 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : News : Beta testing starting soon

©2025 Universitat Pompeu Fabra