New project in long queue

Message boards : News : New project in long queue
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
noelia

Send message
Joined: 5 Jul 12
Posts: 35
Credit: 393,375
RAC: 0
Level

Scientific publications
wat
Message 28895 - Posted: 1 Mar 2013, 10:49:14 UTC

Hello all,

After testing the new application, it is time to send a new project. I'm sending at the moment around 6000WUs to the long queue. Credits will be around 100000. Let me know if you have any issues, since is the first big thing we submit to the recently updated long queue and these WUs include new features.

Noelia
ID: 28895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28896 - Posted: 1 Mar 2013, 11:16:58 UTC

I can't download any, I keep trying, but no long runs in the last hour.
ID: 28896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 52,725
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28899 - Posted: 1 Mar 2013, 12:20:38 UTC - in response to Message 28896.  

These units appear to be very long, could be close to 20 hours finishing time on my computers. Assuming there are no errors!!
ID: 28899 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28911 - Posted: 2 Mar 2013, 7:02:10 UTC

I've had to abort 3 NOELIA'S in the past 2 hours, GPU usage was at 100% and the memory controller was at 0%. I had to reboot the computer to get the GPU's working again. Windows popped up an error message complaining that "acemd.2865P.exe had to be terminated unexpectedly". As soon as I suspended the NOELIA work unit the error message went away.

This was on a GTX560, GTX670 and a GTX680. Windows XP 64 bit, BIONIC v7.0.28
ID: 28911 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile dskagcommunity
Avatar

Send message
Joined: 28 Apr 11
Posts: 462
Credit: 958,266,958
RAC: 31,461
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28930 - Posted: 3 Mar 2013, 9:20:30 UTC
Last modified: 3 Mar 2013, 9:21:51 UTC

So far i got an error one After 14000secs :( and a second one witch was successful after 15 hours (560ti 448 core edition, 157k credits). Now its calculating a third one..lets see.
DSKAG Austria Research Team: http://www.research.dskag.at



ID: 28930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28933 - Posted: 3 Mar 2013, 13:17:06 UTC - in response to Message 28895.  

The first one of these I received was:

005px1x2-NOELIA_005p-0-2-RND6570_0

After running for over 24 hours this happened:

<core_client_version>7.0.52</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified.
 (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>


I have another one that's at 62.5% after 14 hours. Looking at some of the NOELIA Wus, they seem to be failing all over the place, some of them repeatedly. They're also too long for my machines to process and return in 24 hours. After the one that's running either errors out or completes I will be aborting the NOELIA WUs. Wasting 24+ hours of GPU time per failure is not my favorite way to waste electricity. Sorry. BTW, the TONI WUs run fine.
ID: 28933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile microchip
Avatar

Send message
Joined: 4 Sep 11
Posts: 110
Credit: 326,102,587
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28934 - Posted: 3 Mar 2013, 14:10:34 UTC

I've found NOELIA WUs to be highly unreliable, even on the short queue. I don't like getting one as I've no idea if it'll complete without errors. I had to abort a short NOELIA one yesterday as it kept crunching in circles meaning it crunched for some minutes and then returned to the beginning to do the same all over again.

Team Belgium
ID: 28934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28935 - Posted: 3 Mar 2013, 14:17:03 UTC
Last modified: 3 Mar 2013, 14:18:20 UTC

These new NOELIA tasks don't use a full CPU thread (core if you like) to feed a Kepler type GPU, like other workunits (like TONI_AGG) used to. Is this behavior intentional or not? Maybe that's why it takes so long to process them. It takes 40.400 secs for my overclocked GTX 580 to finish these tasks, while it takes 38.800 for a (slightly overclocked) GTX 670, so there is a demonstrable loss (~5%) in their performance.
ID: 28935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 28936 - Posted: 3 Mar 2013, 15:19:13 UTC

Some of the new NOELIA units are bugged somehow, I think. Some run fine, some of them not.
ID: 28936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim Daniels (JD)

Send message
Joined: 20 Jan 13
Posts: 9
Credit: 206,731,892
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28937 - Posted: 3 Mar 2013, 17:17:41 UTC

I posted this in the "long application updated to the latest version" but Firehawk inferred these issues should be reported in this thread. I don't know if this is a 6.18 issue or a NOELIA WU issue but I guess time will tell. So I apologize in advance for the double posting if that is a bigger faux pas than not knowing which thread is the appropriate one to post to. ;-)

--------------------

While running my first 6.18 long run task my laptop locked up and I had to do a hard reboot. After the system was back up this WU had terminated with an error. The details are below. However, I have run two 6.18 WUs successfully since then. It appears one other host also terminated with an error on this WU.

The NOELIA WUs seem to be averaging about 80% utilization GTX 680M and the run times are over 18 hours. I don't know how much effect having to share CPU time is having on these numbers.

Error Details:

i7-3740QM 16GB - Win7 Pro x64 - GTX 680m (Alienware 9.18.13.717)
GPU: dedicated to GPUGRID - CPU: SETI, Poem, Milkyway, WUProp, FreeHAL

------------------------------------------------------------------------

Work Unit: 4209987 (041px21x2-NOELIA_041p-0-2-RND9096_0)

Stderr output

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

ID: 28937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28938 - Posted: 3 Mar 2013, 17:40:37 UTC

You're getting us hawks mixed up, I've been using this name sense 95 and that's the first time I think that's happend.
ID: 28938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim Daniels (JD)

Send message
Joined: 20 Jan 13
Posts: 9
Credit: 206,731,892
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 28940 - Posted: 3 Mar 2013, 19:09:22 UTC - in response to Message 28938.  

Mea Culpa.
ID: 28940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stoneageman
Avatar

Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28941 - Posted: 3 Mar 2013, 19:28:05 UTC
Last modified: 3 Mar 2013, 19:31:29 UTC

Am also getting several Noelia tasks making very slow progress. Same problem as flagged in the beta test. Also the size of the upload is causing issues for me as well.
ID: 28941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bikermatt

Send message
Joined: 8 Apr 10
Posts: 37
Credit: 4,422,457,619
RAC: 71,169
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28942 - Posted: 3 Mar 2013, 20:00:13 UTC

The Noelia workunits refuse to run on my 660ti linux system. They lock up or make no progress. I have finished one on two different linux systems with 670s without problems.
ID: 28942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28943 - Posted: 3 Mar 2013, 20:33:00 UTC

The Noelia longs either fail in the first 3 to 4 minutes on my GTX 560 and GTX 650 Ti (the only four failures I have had on GPUGRID), or else they complete successfully.

I can't tell if they take any longer though; the last one took 23 hours, instead of the more usual 18 hours, but I have seen that on the 6.17 work units also, so it may just be the size of the work unit. I have not had any hangs thus far (Win7 64-bit; BOINC 7.0.52 x64).

All in all, it is not that bad for me.
ID: 28943 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28944 - Posted: 3 Mar 2013, 21:12:11 UTC

Well Jim, now you can understand how most of us got 90% of our errors. If you had looked closer you would have noticed that almost all of them came from a first run of NOELIA's in early February. Instead, you thought you would display you're distributed computing prowess and give us you're expert advice and proceeded to tell us about our substandard components or our inability to overclock correctly and the overheating issue's we must be having.

I'm referencing this thread.

http://www.gpugrid.net/forum_thread.php?id=3299
ID: 28944 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28945 - Posted: 3 Mar 2013, 22:12:07 UTC - in response to Message 28944.  

flashawk,

Thank you for your insight. But I just started on Feb. 14, and I think it was well past your first group of errors. At any rate, they ran fine on my cards even though not on some others, where they often failed after an hour or more. Maybe you can give better advice?
ID: 28945 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28946 - Posted: 3 Mar 2013, 22:44:05 UTC

I guess it's understandable, the best advice I could ever give in my 51 years "is wait and see". I don't walk in to another’s club house and start rearranging the furniture. There's been many a time when I've jumped to quick conclusions in my own mind only to find out later that I was wrong.

Anyway, didn't mean to be too harsh, let me be the first to say "Welcome to GPU-GRID" and I'm sure you have allot to contribute.
ID: 28946 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28947 - Posted: 4 Mar 2013, 0:19:33 UTC
Last modified: 4 Mar 2013, 0:44:59 UTC

I managed to get http://www.gpugrid.net/result.php?resultid=6567675 to run to completion, by making sure it wasn't interrupted during computation. But 12 hours on a GTX 670 is a long time to run without task switching, when you're trying to support more than one BOINC project.

Edit - on the other hand, task http://www.gpugrid.net/result.php?resultid=6563457 following behind on the same card with the same configuration failed three times with

SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

The TONI task following, again on the same card, seems to have started and to be running normally.
ID: 28947 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28948 - Posted: 4 Mar 2013, 1:07:04 UTC - in response to Message 28947.  
Last modified: 4 Mar 2013, 1:27:51 UTC

Richard Haselgrove wrote:

SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.


I just had the same thing happen to me Richard, right after the computation error, a TONI wu started on the same GPU card and it was at an idle with 0% GPU load and 0% memory controller usage. I had to suspend BOINC and reboot to get the GPU crunching again. As far as times go on my GTX670's, the NOELIA wu's have ranged from 112MB to 172MB so far and the smaller one took 7.5 hours and the large one took 11.75 hours.

So I think the size of the output file directly effects the run time (as usual). They may have to pull the plug on this batch and rework them, we'll have to wait and see what they decide.

Edit: Check out this one, I just downloaded it a couple minutes ago. I noticed it ended in a 6, that means I'm the 7th person to get it. This is off the hook - man!

http://www.gpugrid.net/workunit.php?wuid=4210634
ID: 28948 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : News : New project in long queue

©2025 Universitat Pompeu Fabra