Message boards :
News :
Beta testing starting soon
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5
| Author | Message |
|---|---|
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi, Well I will help to clear the beta queue, but I don't get them often. I have checked "run test applications" and " beta". Greetings from TJ |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 69 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had particularly bad experience with a beta unit. Most bad WU simply give you computation error message when they crash, and you go on to the next WU, without any reboot or computer crash. This unit ran for a few seconds, froze up the computer, then blue screen, and the computer reboots. It did this few times, before I aborted the unit. It also cause another perfectly good WU to crash as well. Here is the link: http://www.gpugrid.net/workunit.php?wuid=4137081 |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This task appeared to do something similar; cause a system reboot somehow. 6486709 4137080 139265 10 Feb 2013 | 5:57:04 UTC 10 Feb 2013 | 13:11:46 UTC Error while computing 2.15 0.03 --- ACEMD beta version v6.48 (cuda42) http://www.gpugrid.net/workunit.php?wuid=4137080 Thanks, FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I think this one belongs in the list too: http://www.gpugrid.net/workunit.php?wuid=4138484 It gave me a BSOD after 7 seconds with The problem seems to be caused by the following file: dxgkrnl.sys STOP: 0x00000116 (0xfffffa801b5de010, 0xfffff88006dc4404, 0x0000000000000000, 0x000000000000000d) That's from BlueScreenView: the original had a line about an NVIDIA driver crash and not restarting within the time allowed. BSV doesn't seem able to retrieve that information - I'll record it manually (and accurately) if it happens again. |
nateSend message Joined: 6 Jun 11 Posts: 124 Credit: 2,928,865 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Thanks for bringing it to our attention. There was an issue building a small number of the simulations, which our checks didn't catch before they were sent out. We have cancelled the work units that were crashing machines, but it is possible that there are others so let us know if it happens again. Crashing your machines is obviously the last thing we want to do. In the future we can avoid this with additional checks we'll be doing for this type of work unit. nate |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This WU: trypsin_lig_904_3-NOELIA_RC3_equ-0-1-RND0962, resultied in the nVidia driver to stop. However it recovered automatically without booting the system. Greetings from TJ |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Been running a few betas now. GPU load varies between WUs in the range of 2x - 3x% (GTX660Ti). Accordingly, Power consumption, temperature, fan speed and memeory controller load are really low. Runtimes for 1500 credit-WUs vary between 1700 and 4000s. Edit: another observation.. GPU Grid beta 6.48 gets an entire core of my i7. However, GPU load decreased considerably as soon as I ran one Einstein CPU task along. Running 6 Einsteins reduces GPU load by ~25%. I don't think the regular apps have been this fragile. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Had a repeat of my BSOD, though this time it seemed to be another project which triggered it. The exact phrase on screen is: "Attempt to reset the display driver and recover from timeout failed" (stop 116) There was also a reference to nvlddkm.sys Host is an i7 3770K with dual Gainward Phantom GTX 670 - driver 310.90 WHQL |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 69 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here is a beta unit that ran rather slowly. trypsin_lig_491_run4-NOELIA_RL3_equ-0-1-RND5688 13 Feb 2013 | 11:57:24 UTC 13 Feb 2013 | 19:50:06 UTC Completed and validated 26,096.32 25,685.34 1,500.00 ACEMD beta version v6.48 (cuda42) See link: http://www.gpugrid.net/workunit.php?wuid=4142465 |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have set my system to accept only beta WU's to help clear the queue. However today I got 9 WU's that error out quickly. All are Noelia's run 2 and run 3 and only one run 4. All the run4-Noelia from yesterday and this morning (11 and 4) finished correctly. This is the error message: <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> ERROR: file mdioload.cpp line 207: Error reading parmtop file called boinc_finish </stderr_txt> ]]> Wing(wo)ma(e)n had error for same WU's. Greetings from TJ |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This one: trypsin_lig_1259_run2-NOELIA_RL3_equ-0-1-RND7950 and 2 more (1 run1) where resulting in an unresponsive system. Mouse pointer was moveable not click-able. All windows freeze for a few minutes then screen blank, and back with a notification that display driver had recovered, but again all windows freeze immediately. I had to abort these step by step in the seconds the system was responsive. WinVista x64 ultimate, i7, 12GB, GTX285, driver 310.90 CUDA version 5, BOINC 7.0.28 The system is now running a short run (cuda31) without problems so far. Greetings from TJ |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi, OK, we seem to be done. The server status page says there are no tasks in the Beta queue, and my log just got these messages: 17/02/2013 15:07:56 | GPUGRID | Reporting 1 completed tasks So, fastening my seat belt and holding on tight for the next twist in the roller-coaster ride that is beta testing... :-) |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
ACEMD beta version 0 248 0.57 (0.17 - 1.26) 111 There are 248 in progress. If they fail they will return and go back into the queue to be resent repeatedly until the x_7th failure. If they succeed do they auto-generate a new task? - Just got a couple of Beta's. The first one basically killed my system! Lots of GPU driver restarts, Blue screen/crash, recovered to windows, Boinc starts, No GPU detected, closed and opened Boinc, Still no GPU detected. Restarted, Windows wouldn't start up; just keeps restarting. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Well, if a task crashes, a replacement is generated. I just got http://www.gpugrid.net/workunit.php?wuid=4144581 courtesy of somebody who hoarded it for three days and then crashed it (shouldn't really be doing Beta testing on an anonymous host, and certainly not for this project on a host with a three-day turnround) skgiven, if you're warning us about rogue tasks which are likely to blue-screen our machines, can you identify the tasks, please? I'd like to know whether my resend is from the same batch, so I can make appropriate decisions how to leave my test machine when I go out later this evening. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Probably this one, trypsin_lig_904_4-NOELIA_RC3_equ-0-1-RND8427_4 I would suggest you run a nice Long WU :) FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
©2025 Universitat Pompeu Fabra