Message boards :
News :
New project in long queue
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Jul 12 Posts: 35 Credit: 393,375 RAC: 0 Level ![]() Scientific publications ![]() |
Hello all, After testing the new application, it is time to send a new project. I'm sending at the moment around 6000WUs to the long queue. Credits will be around 100000. Let me know if you have any issues, since is the first big thing we submit to the recently updated long queue and these WUs include new features. Noelia |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I can't download any, I keep trying, but no long runs in the last hour. |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 47,738 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
These units appear to be very long, could be close to 20 hours finishing time on my computers. Assuming there are no errors!! |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've had to abort 3 NOELIA'S in the past 2 hours, GPU usage was at 100% and the memory controller was at 0%. I had to reboot the computer to get the GPU's working again. Windows popped up an error message complaining that "acemd.2865P.exe had to be terminated unexpectedly". As soon as I suspended the NOELIA work unit the error message went away. This was on a GTX560, GTX670 and a GTX680. Windows XP 64 bit, BIONIC v7.0.28 |
![]() ![]() Send message Joined: 28 Apr 11 Posts: 462 Credit: 958,266,958 RAC: 28,485 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So far i got an error one After 14000secs :( and a second one witch was successful after 15 hours (560ti 448 core edition, 157k credits). Now its calculating a third one..lets see. DSKAG Austria Research Team: http://www.research.dskag.at ![]() |
![]() Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The first one of these I received was: 005px1x2-NOELIA_005p-0-2-RND6570_0 After running for over 24 hours this happened: <core_client_version>7.0.52</core_client_version> <![CDATA[ <message> The system cannot find the path specified. (0x3) - exit code 3 (0x3) </message> <stderr_txt> MDIO: cannot open file "restart.coor" SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. </stderr_txt> ]]> I have another one that's at 62.5% after 14 hours. Looking at some of the NOELIA Wus, they seem to be failing all over the place, some of them repeatedly. They're also too long for my machines to process and return in 24 hours. After the one that's running either errors out or completes I will be aborting the NOELIA WUs. Wasting 24+ hours of GPU time per failure is not my favorite way to waste electricity. Sorry. BTW, the TONI WUs run fine. |
![]() ![]() Send message Joined: 4 Sep 11 Posts: 110 Credit: 326,102,587 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've found NOELIA WUs to be highly unreliable, even on the short queue. I don't like getting one as I've no idea if it'll complete without errors. I had to abort a short NOELIA one yesterday as it kept crunching in circles meaning it crunched for some minutes and then returned to the beginning to do the same all over again. ![]() Team Belgium |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
These new NOELIA tasks don't use a full CPU thread (core if you like) to feed a Kepler type GPU, like other workunits (like TONI_AGG) used to. Is this behavior intentional or not? Maybe that's why it takes so long to process them. It takes 40.400 secs for my overclocked GTX 580 to finish these tasks, while it takes 38.800 for a (slightly overclocked) GTX 670, so there is a demonstrable loss (~5%) in their performance. |
Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Some of the new NOELIA units are bugged somehow, I think. Some run fine, some of them not. |
Send message Joined: 20 Jan 13 Posts: 9 Credit: 206,731,892 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I posted this in the "long application updated to the latest version" but Firehawk inferred these issues should be reported in this thread. I don't know if this is a 6.18 issue or a NOELIA WU issue but I guess time will tell. So I apologize in advance for the double posting if that is a bigger faux pas than not knowing which thread is the appropriate one to post to. ;-) -------------------- While running my first 6.18 long run task my laptop locked up and I had to do a hard reboot. After the system was back up this WU had terminated with an error. The details are below. However, I have run two 6.18 WUs successfully since then. It appears one other host also terminated with an error on this WU. The NOELIA WUs seem to be averaging about 80% utilization GTX 680M and the run times are over 18 hours. I don't know how much effect having to share CPU time is having on these numbers. Error Details: i7-3740QM 16GB - Win7 Pro x64 - GTX 680m (Alienware 9.18.13.717) GPU: dedicated to GPUGRID - CPU: SETI, Poem, Milkyway, WUProp, FreeHAL ------------------------------------------------------------------------ Work Unit: 4209987 (041px21x2-NOELIA_041p-0-2-RND9096_0) Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> The system cannot find the path specified. (0x3) - exit code 3 (0x3) </message> <stderr_txt> MDIO: cannot open file "restart.coor" SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. </stderr_txt> ]]> |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You're getting us hawks mixed up, I've been using this name sense 95 and that's the first time I think that's happend. |
Send message Joined: 20 Jan 13 Posts: 9 Credit: 206,731,892 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Mea Culpa. |
![]() ![]() Send message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Am also getting several Noelia tasks making very slow progress. Same problem as flagged in the beta test. Also the size of the upload is causing issues for me as well. |
![]() Send message Joined: 8 Apr 10 Posts: 37 Credit: 4,422,457,619 RAC: 64,437 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The Noelia workunits refuse to run on my 660ti linux system. They lock up or make no progress. I have finished one on two different linux systems with 670s without problems. |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The Noelia longs either fail in the first 3 to 4 minutes on my GTX 560 and GTX 650 Ti (the only four failures I have had on GPUGRID), or else they complete successfully. I can't tell if they take any longer though; the last one took 23 hours, instead of the more usual 18 hours, but I have seen that on the 6.17 work units also, so it may just be the size of the work unit. I have not had any hangs thus far (Win7 64-bit; BOINC 7.0.52 x64). All in all, it is not that bad for me. |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well Jim, now you can understand how most of us got 90% of our errors. If you had looked closer you would have noticed that almost all of them came from a first run of NOELIA's in early February. Instead, you thought you would display you're distributed computing prowess and give us you're expert advice and proceeded to tell us about our substandard components or our inability to overclock correctly and the overheating issue's we must be having. I'm referencing this thread. http://www.gpugrid.net/forum_thread.php?id=3299 |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
flashawk, Thank you for your insight. But I just started on Feb. 14, and I think it was well past your first group of errors. At any rate, they ran fine on my cards even though not on some others, where they often failed after an hour or more. Maybe you can give better advice? |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I guess it's understandable, the best advice I could ever give in my 51 years "is wait and see". I don't walk in to another’s club house and start rearranging the furniture. There's been many a time when I've jumped to quick conclusions in my own mind only to find out later that I was wrong. Anyway, didn't mean to be too harsh, let me be the first to say "Welcome to GPU-GRID" and I'm sure you have allot to contribute. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I managed to get http://www.gpugrid.net/result.php?resultid=6567675 to run to completion, by making sure it wasn't interrupted during computation. But 12 hours on a GTX 670 is a long time to run without task switching, when you're trying to support more than one BOINC project. Edit - on the other hand, task http://www.gpugrid.net/result.php?resultid=6563457 following behind on the same card with the same configuration failed three times with SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. The TONI task following, again on the same card, seems to have started and to be running normally. |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Richard Haselgrove wrote: I just had the same thing happen to me Richard, right after the computation error, a TONI wu started on the same GPU card and it was at an idle with 0% GPU load and 0% memory controller usage. I had to suspend BOINC and reboot to get the GPU crunching again. As far as times go on my GTX670's, the NOELIA wu's have ranged from 112MB to 172MB so far and the smaller one took 7.5 hours and the large one took 11.75 hours. So I think the size of the output file directly effects the run time (as usual). They may have to pull the plug on this batch and rework them, we'll have to wait and see what they decide. Edit: Check out this one, I just downloaded it a couple minutes ago. I noticed it ended in a 6, that means I'm the 7th person to get it. This is off the hook - man! http://www.gpugrid.net/workunit.php?wuid=4210634 |
©2025 Universitat Pompeu Fabra