New NATHAN_KID WUs on long

Message boards : News : New NATHAN_KID WUs on long
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30527 - Posted: 29 May 2013, 13:42:47 UTC - in response to Message 30483.  

Ok, there are two groups from me on the grid right now:
NATHAN_KIDc22_2
NATHAN_KIDc22_SODcharge

So you guys know. The second group might run a little faster. You'll have to tell me.

Thanks for the info. The SOD WUs do run faster and they just barely allow a GTX 460 to slip under the 24 hour mark (if micromanaged). So for me it's a large improvement, but still on the long side.

Micromanagement drill: set a backup project, set BOINC to report immediately: then DL a GPUGrid WU, then turn off GPUGrid work fetch, then wait until you notice the GPUGrid WU is done and the backup project is running, then turn on work fetch to DL a new WU, then pause the WU from the backup project so the GPUGrid WU starts immediately, then un-pause the backup project WU so it will run again when the GPUGrid WU finishes. Repeat ad infinitum... If you're lucky you've squeaked under 24 hours.
ID: 30527 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tomba

Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30528 - Posted: 29 May 2013, 14:08:44 UTC - in response to Message 30527.  

The SOD WUs do run faster and they just barely allow a GTX 460 to slip under the 24 hour mark (if micromanaged).

Do what I just did: replaced my GTX 460 with a GTX 660. You'll sleep easy!
ID: 30528 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vagelis Giannadakis

Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30529 - Posted: 29 May 2013, 14:09:07 UTC

I acknowledge the new NATHAN WUs, I'm about to finish I84R7-NATHAN_KIDc22_SODcharge-0-10-RND9833.

Runtime, just above 18h on a GTX 650TI on Linux x86_64.

Nice-behaving, these NATHANs! Keep 'em coming, Nate!
ID: 30529 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30531 - Posted: 29 May 2013, 14:50:00 UTC - in response to Message 30528.  

The SOD WUs do run faster and they just barely allow a GTX 460 to slip under the 24 hour mark (if micromanaged).

Do what I just did: replaced my GTX 460 with a GTX 660. You'll sleep easy!

If I had unlimited money I'd buy the fastest GPUs available. Some of us are retired and on fixed incomes. Buying less expensive cards and paying the electric bill is enough of a strain. Of course if you send $$$ I'll certainly purchase some GTX 660 GPUs ;-)
JK, BTW: congrats on your GTX 660, it's a nice card.
ID: 30531 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tomba

Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30533 - Posted: 29 May 2013, 15:09:26 UTC - in response to Message 30531.  

Some of us are retired and on fixed incomes.

Me too, but I had a strategy vs. She who holds the purse strings and must be obeyed. We replace our rigs every four year. My rig is almost four year old. She was persuaded that changing the PSU from 425W to 620W, and replacing the video card, was a better alternative to a new PC!

BTW: congrats on your GTX 660, it's a nice card.

Thank you! Love it!!
ID: 30533 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30554 - Posted: 30 May 2013, 12:55:53 UTC - in response to Message 30533.  

I'm getting system restarts while running NATHAN_KIDc22 WU's on XP-x86.
The only other app I have running is WUProp (NCI).
The tasks are recovering however.

I also had one of these WU's fail on Linux:

    Exit status 255 (0xff) Unknown error number

    Stderr output

    <core_client_version>7.0.27</core_client_version>
    <![CDATA[
    <message>
    process exited with code 255 (0xff, -1)
    </message>
    <stderr_txt>
    MDIO: cannot open file "output.restart.coor"

    </stderr_txt>
    ]]>



FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 30554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30556 - Posted: 30 May 2013, 13:19:46 UTC - in response to Message 30554.  

I'm getting system restarts while running NATHAN_KIDc22 WU's on XP-x86.
The only other app I have running is WUProp (NCI).
The tasks are recovering however.

I had several acemd crashes (more than are listed in the file below, think there were about 5) on one of the NATHAN_KID WUs yesterday on a non-OCed GTX 460:

core_client_version>7.0.64</core_client_version>
<![CDATA[
<stderr_txt>
MDIO: cannot open file "output.restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
# Time per step (avg over 4705000 steps): 	6.844 ms
# Approximate elapsed time for entire WU:  	82131.953 s
called boinc_finish

</stderr_txt>
]]>


Did the usual drill: shut down BOINC THEN hit the X on the acemd error message then reboot (as the GPU can sometimes become unstable when this happens) It eventually finished successfully (and I beat the 24hr deadline by 12 minutes!). This WU failed on 2 previous machines:

http://www.gpugrid.net/workunit.php?wuid=4482496

ID: 30556 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
terencewee*

Send message
Joined: 29 May 12
Posts: 8
Credit: 21,605,500
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 30574 - Posted: 31 May 2013, 8:31:31 UTC - in response to Message 30556.  


It eventually finished successfully (and I beat the 24hr deadline by 12 minutes!).


12mins?

I should feel especially lucky then... to get this (I10R4-NATHAN_KIDc22_2-6-8-RND2039) in with 51secs to spare. :)

GPU-load was 90%.



terencewee*
Sicituradastra.
ID: 30574 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vagelis Giannadakis

Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30576 - Posted: 31 May 2013, 9:47:18 UTC - in response to Message 30574.  

Ahh, the thrill of sending your WU in within the 24h window!!
ID: 30576 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 30599 - Posted: 1 Jun 2013, 0:44:55 UTC - in response to Message 30576.  

Ahh, the thrill of sending your WU in within the 24h window!!

Hint: sell all the old hardware around and grab the best GPU you can. Save energy and produce more :)
Gamers would love or 1 year "old" series 5 and older cards :D
ID: 30599 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30602 - Posted: 1 Jun 2013, 9:10:20 UTC - in response to Message 30554.  
Last modified: 1 Jun 2013, 9:31:53 UTC

I'm getting system restarts while running NATHAN_KIDc22 WU's on XP-x86.
The only other app I have running is WUProp (NCI).
The tasks are recovering however.

A different type of problem this time:
The WU ran for 29h but had only reached 30%. There was an acemd.2865P.exe pop-up error sitting on the screen when I checked. I exited Boinc, restarted and got the same error, however the Elapsed time now told me that the WU had only run for ~5h30min. When I suspended the task the error message disappeared. I started to run a different WU (from GPUGrid, then suspended it and ran an Einstein WU, which had been suspended), then suspended it and tried to run the Nathan WU. After about 10sec I got the same error message. I closed the message and the task went to 100% and Error after ~10sec.

I81R1-NATHAN_KIDc22_3-0-8-RND4827_0 4488285 31 May 2013 | 3:33:05 UTC 5 Jun 2013 | 3:33:05 UTC In progress --- --- --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)

Stderr output

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified.
(0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "output.restart.coor"
Kernel not foundAssertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Kernel not foundAssertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Kernel not foundAssertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>


I also had one of these WU's fail on Linux:


Exit status 255 (0xff) Unknown error number

Stderr output

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
process exited with code 255 (0xff, -1)
</message>
<stderr_txt>
MDIO: cannot open file "output.restart.coor"

</stderr_txt>
]]>


...and 2 more, same system, same error:

I69R6-NATHAN_KIDc22_3-0-8-RND5284_1 4488166 31 May 2013 | 3:14:50 UTC 31 May 2013 | 6:08:41 UTC Error while computing 9,273.96 9,196.11 --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)
I97R8-NATHAN_KIDc22_3-0-8-RND1236_0 4488457 31 May 2013 | 6:08:41 UTC 31 May 2013 | 12:10:10 UTC Error while computing 21,353.54 21,177.70 --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 30602 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30603 - Posted: 1 Jun 2013, 11:47:34 UTC - in response to Message 30602.  

A different type of problem this time:
The WU ran for 29h but had only reached 30%. There was an acemd.2865P.exe pop-up error sitting on the screen when I checked. I exited Boinc, restarted and got the same error, however the Elapsed time now told me that the WU had only run for ~5h30min.

Stderr output

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified.
(0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "output.restart.coor"
Kernel not foundAssertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Kernel not foundAssertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Kernel not foundAssertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

Looks like pretty much exactly what happened here:

http://www.gpugrid.net/forum_thread.php?id=3378&nowrap=true#30556
ID: 30603 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30631 - Posted: 2 Jun 2013, 9:27:52 UTC - in response to Message 30603.  

I31R1-NATHAN_KIDc22_SODcharge-3-10-RND8395_0 4492253 1 Jun 2013 | 19:43:28 UTC 1 Jun 2013 | 23:27:30 UTC Error while computing 13,091.41 12,971.71 --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)
I76R5-NATHAN_KIDc22_SODcharge-3-10-RND5030_0 4491243 1 Jun 2013 | 10:43:07 UTC 1 Jun 2013 | 19:43:28 UTC Error while computing 12,157.94 12,044.00 --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)
I14R2-NATHAN_KIDc22_3-2-8-RND5964_0 4490995 1 Jun 2013 | 6:06:40 UTC 1 Jun 2013 | 10:43:07 UTC Error while computing 15,460.73 15,325.41 --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)

Top two:
Stderr output

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
process exited with code 255 (0xff, -1)
</message>
<stderr_txt>
MDIO: cannot open file "output.restart.coor"

</stderr_txt>
]]>


Stderr output

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
process exited with code 255 (0xff, -1)
</message>
<stderr_txt>
MDIO: cannot open file "output.restart.coor"
SWAN : FATAL : Cuda driver error 700 in file 'swanlibnv2.cpp' in line 1841.

</stderr_txt>
]]>

FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 30631 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim Daniels (JD)

Send message
Joined: 20 Jan 13
Posts: 9
Credit: 206,731,892
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 30662 - Posted: 5 Jun 2013, 1:01:58 UTC - in response to Message 30602.  

I have had two WU with the same error:

I36R2-NATHAN_KIDc22_2-3-8-RND5161_8

Run time 11.7 seconds. CPU time 0.55 seconds.

There are seven other error reports on this WU all with very small CPU times.

and an older one

I55R4-NATHAN_KIDc22_SODcharge-1-10-RND5713_0

Run time 140.58 seconds. CPU time 99.79 seconds.
There is no other error reports and one completion report for this WU.
ID: 30662 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 311
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30699 - Posted: 7 Jun 2013, 0:38:58 UTC

STRANGE: http://www.gpugrid.net/result.php?resultid=6930954, I19R7-NATHAN_KIDc22_3-1-8-RND5865_1:

Runtime: 14:51.51 and counting, advanced: 16.457 %, Remaining: 32:13:07.
AVGA OC Scanner X says: GPU load 66-82%, MEM load: 32%, 412 MB and MCU load 20%.

On a GTX570, AMD 6200 FX.

I have to go, so up for comments. I will not be able to do anything untill tomorrow morning.
ID: 30699 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vagelis Giannadakis

Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30702 - Posted: 7 Jun 2013, 9:47:01 UTC - in response to Message 30699.  

Something must be wrong with this WU, it has failed on another host:

http://www.gpugrid.net/result.php?resultid=6930512
ID: 30702 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 30704 - Posted: 7 Jun 2013, 12:17:48 UTC - in response to Message 30702.  

WU's can fail many times. I had one which failed 8 times and at the last one succeeded. So I think it's not really any indication.
ID: 30704 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30708 - Posted: 7 Jun 2013, 13:12:36 UTC - in response to Message 30704.  
Last modified: 7 Jun 2013, 13:26:25 UTC

The computer it failed on was a titan (which cannot run these WU's):
    Coprocessors NVIDIA GeForce GTX TITAN (4095MB) driver: 314.22



If a WU fails on several known good systems then perhaps there is a problem, but you have to dig deep to find how reliable the previous systems were.

In the case of the GTX570, it's most likely that the GPU clocks dropped, but these were not reported, and neither was whether the system is set to prefer maximum performance, or how much the of CPU was being used...

The recommended settings are list in the FAQ's, and there is a suggested way to ask for help (because we cannot see your systems setup).


FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 30708 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 311
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30710 - Posted: 7 Jun 2013, 14:51:59 UTC - in response to Message 30699.  

STRANGE: http://www.gpugrid.net/result.php?resultid=6930954, I19R7-NATHAN_KIDc22_3-1-8-RND5865_1:

UP-DATE on this one: Finished after 28:11:19 hours. Will miss 24 hours deadline...
It is uploading at the moment.
In the case of the GTX570, it's most likely that the GPU clocks dropped, but these were not reported, and neither was whether the system is set to prefer maximum performance, or how much the of CPU was being used...

This system does this RNDXXXX normally in around 58000 seconds, and has done so quite a few times, so this is really a strange WU. I never had issues with the system although this RND tasks takes quite long for one of the faster Video cards of the last generation.

Video card is EVGA GTX 570 SC (not further pressed, I prefer that it is running more or less cool with 68º to 73º C Fan speed at 85%) and on the AMD 6200 FX there is a core reserved for the Video card as recommended. The card is a recent replacement for a faulty card by EVGA, so new.
ID: 30710 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 311
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30711 - Posted: 7 Jun 2013, 14:56:42 UTC - in response to Message 30710.  

Oh, and I just noticed the up-load file does have a size of 107.88 MB (so more or less the double as before), therefore it will take quite a while until it is up-loaded, as my internet connection will brake several times and will end up with the famous 5 hours brakes between each up-load try.
ID: 30711 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : New NATHAN_KID WUs on long

©2025 Universitat Pompeu Fabra