Old Noelia WUs

Message boards : News : Old Noelia WUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 17 · Next

AuthorMessage
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29141 - Posted: 13 Mar 2013, 11:01:13 UTC

The previous bunch of Noelia's beta's did good on my WinVista 32 bit pc with driver 314.7 BOINC 6.10.58. The batch from last days error out after hours with the message that the acemd driver stopped and has recovered from an unexpected error. I am now trying the long runs from Nathan on my GTX550Ti.
Greetings from TJ
ID: 29141 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Oktan

Send message
Joined: 28 Mar 09
Posts: 16
Credit: 953,280,454
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29142 - Posted: 13 Mar 2013, 11:18:51 UTC - in response to Message 29139.  

Hi there im having problems on my linux box havent been able to run any work at all for a 3-4 days..

Mvh/ Oktan
ID: 29142 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29143 - Posted: 13 Mar 2013, 11:22:00 UTC - in response to Message 29139.  
Last modified: 13 Mar 2013, 11:31:32 UTC

Thanks for the reply, Nate. I'm glad to hear that you guys are looking to improve the testability for Windows, even before issuing tasks on the Beta application to us Beta users.

Regarding your request for info, my previously mentioned NOELIA task failures are happening on Windows 8 Pro x64, using BOINC v7.0.55 x64 beta, running nVidia drivers 314.14 beta, using 2 video cards, GTX 660 Ti and GTX 460.

It appears to me that, when a GPUGrid task causes the nVidia driver to stop responding, Windows catches the error and restarts the driver (instead of BSOD), giving a Taskbar balloon to the effect of "The nVidia driver had a problem and has been restarted successfully." (I'm not sure of the exact text). When this happens, in addition to the GPUGrid task erroring out on my main video card, crunching on my other GPU (which is usually doing World Community Grid Help Conquer Cancer work) also results in its tasks erroring out.

I believe the next tasks that get processed after that driver recovery, are successful, unless another NOELIA task on the beta app causes an additional driver crash and recovery.

If you have any more resources to test these tasks out more, locally, it would save us a huge headache. I understand I signed up for these beta tasks, and I understand that seeing these errors is part of the gig, and so... If you find a way to replicate the error locally, then I'd politely ask that you also remove the bugged tasks from the beta queue. If you cannot yet reproduce the problem locally, then we'll keep erroring them for you, as part of our obligation.

Not sure if this much info helps, but that's the behavior I'm seeing on my Windows 8 x64 PC, and if you need anything more, feel free to ask.

Kind regards,
Jacob Klein
ID: 29143 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29144 - Posted: 13 Mar 2013, 11:24:51 UTC - in response to Message 29139.  

What we are thinking is that this might be related to the Windows application. Has anyone who experiences these problems seen them on a linux box? Is it only Windows? The more we know, the more quickly we can improve. The last thing we want is to crash your machines. A failed WU is one thing. Locking up cruncher machines is much, much worse. Please let us know so we can fix it.

I've just aborted one of your long run tasks which looked as if it was going bad - http://www.gpugrid.net/workunit.php?wuid=4246107 (replication _6 is always a bad sign).

The first cruncher to try it was running Linux.
ID: 29144 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Killer 69

Send message
Joined: 7 Jan 09
Posts: 3
Credit: 3,624,425
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29146 - Posted: 13 Mar 2013, 15:50:08 UTC

All NOELIA tasks at the moment freeze my Linux pretty totally to the point that I have to restart computer. What's worse, I did de-select beta tasks but after the reboot BOINC downloads more from the ACEMDBETA queue those tasks and I'm back to reboot cycle.
ID: 29146 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29147 - Posted: 13 Mar 2013, 16:03:24 UTC - in response to Message 29146.  
Last modified: 13 Mar 2013, 16:04:34 UTC

All NOELIA tasks at the moment freeze my Linux pretty totally to the point that I have to restart computer. What's worse, I did de-select beta tasks but after the reboot BOINC downloads more from the ACEMDBETA queue those tasks and I'm back to reboot cycle.

Deselect

Run test applications?
This helps us develop applications, but may cause jobs to fail on your computer

as well.
ID: 29147 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Killer 69

Send message
Joined: 7 Jan 09
Posts: 3
Credit: 3,624,425
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29150 - Posted: 13 Mar 2013, 17:20:32 UTC - in response to Message 29147.  

Ok, I had still test applications selected, after deselecting that and resetting project I got now NATHAN long run task, which is also pretty odd, because I have only short runs enabled at the moment.
ID: 29150 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29151 - Posted: 13 Mar 2013, 17:29:49 UTC - in response to Message 29150.  

Ok, I had still test applications selected, after deselecting that and resetting project I got now NATHAN long run task, which is also pretty odd, because I have only short runs enabled at the moment.

There aren't any short run tasks available today. Might you have had

If no work for selected applications is available, accept work from other applications?

selected as well?
ID: 29151 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29154 - Posted: 13 Mar 2013, 18:14:17 UTC - in response to Message 29151.  

109nx33-NOELIA_109n_equ-1-2-RND6949_0 4248581 139265 12 Mar 2013 | 8:04:33 UTC 13 Mar 2013 | 14:48:02 UTC Error while computing 58,742.39 1.73 --- ACEMD beta version v6.49 (cuda42)

This Long WU hung after 16h on a W7 system with a GTX660Ti. The GPU sat at zero usage and the app stayed running/crashed preventing new work units from starting or a backup GPU project from running. It also prevented an additional CPU core from being used at a CPU project. Saw the usual cuda driver pop-up error.

Stderr output

<core_client_version>7.0.44</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "output.restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
MDIO: cannot open file "output.restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

The next two WU's also failed:
148px38-NOELIA_148p_equ-1-2-RND3814_6 4249317 139265 13 Mar 2013 | 17:29:07 UTC 13 Mar 2013 | 17:32:46 UTC Error while computing 31.09 1.76 --- ACEMD beta version v6.49 (cuda42)
216px36-NOELIA_216p_equ-1-2-RND0721_0 4249016 139265 12 Mar 2013 | 9:48:07 UTC 13 Mar 2013 | 14:48:02 UTC Error while computing 12.60 1.81 --- ACEMD beta version v6.49 (cuda42)

I don't see the point in testing a WU 7 or more times, especially if it's one of a batch of hundreds.

Again, I suggest you start up an alpha project to test on properly - Beta testing shouldn't crash systems, hang drivers or banjax the OS!
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 29154 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Martin Aliger

Send message
Joined: 16 Jul 10
Posts: 7
Credit: 35,198,028
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29156 - Posted: 13 Mar 2013, 20:07:36 UTC
Last modified: 13 Mar 2013, 20:14:10 UTC

All of these betas failed on mine machine. Moreover, I opted out from beta, updated, but am still receiving them (and only them).

I also observed that mine W7 always restart driver few seconds after acemd application is killed.

And all those WUs failed immediatelly. If you see some times at statistics, that becouse there is crash messagebox onscreen which counts to time running. Sometimes its on screen for hours...
ID: 29156 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dylan

Send message
Joined: 16 Jul 12
Posts: 98
Credit: 386,043,752
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 29158 - Posted: 13 Mar 2013, 20:18:33 UTC - in response to Message 29156.  

Martin, did you follow this thread on how to completely opt out of beta tasks?


http://www.gpugrid.net/forum_thread.php?id=3272
ID: 29158 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29159 - Posted: 13 Mar 2013, 20:20:05 UTC
Last modified: 13 Mar 2013, 20:22:23 UTC

Seven in a row of the 6.49 ACEMD beta NOELIAs failed for me also, all in 8 seconds or less, so I am giving it a rest for now. That was on a Kepler GTX 650 Ti card, and I will try a Fermi GTX 560 tomorrow to see if that does any better. This is on Win7 64-bit, and BOINC 7.0.56 x64. Those cards have been basically error free for the last several days, since the last Noelia errors.
ID: 29159 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Tsukiouji

Send message
Joined: 6 Jan 13
Posts: 1
Credit: 1,548,050
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 29164 - Posted: 14 Mar 2013, 11:08:10 UTC

A few NOELIA WUs failed recently on my system too.
I'm running GTS450 (314.14, Win7 x64).
ID: 29164 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29165 - Posted: 14 Mar 2013, 12:39:22 UTC - in response to Message 29164.  

Tsukiouji,
When I clicked your link, I got a page that says "No access". For your account settings, in GPUGRID preferences, do you have "Should GPUGRID show your computers on its web site?" set to yes?
ID: 29165 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile nenym

Send message
Joined: 31 Mar 09
Posts: 137
Credit: 1,429,587,071
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29166 - Posted: 14 Mar 2013, 14:49:35 UTC - in response to Message 29165.  

The problem is at link. Filter set "http://www.gpugrid.net/results.php?userid=94436" can be set and results can be seen by the owner only. There is no problem with "host" filers, e.g. http://www.gpugrid.net/results.php?hostid=144019.
ID: 29166 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29170 - Posted: 14 Mar 2013, 15:54:42 UTC - in response to Message 29166.  

Thanks for the explanation -- I was able to find the user's tasks by clicking on their name, and looking at the tasks for the only computer. Link: http://www.gpugrid.net/results.php?hostid=144019

Anyway... The way I look at this issue is...

The project admins have already made a decision whether they want the beta testers to suffer through and to "process all these failures".

So, what I do is, look at the server status page here http://www.gpugrid.net/server_status.php ... and just keep praying that the "Unsent" tasks for the "ACEMD beta version" app goes down quicker.

Good news - it's pretty much exhausted - Now maybe my system will be stable again!
ID: 29170 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Martin Aliger

Send message
Joined: 16 Jul 10
Posts: 7
Credit: 35,198,028
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29174 - Posted: 15 Mar 2013, 4:14:14 UTC - in response to Message 29158.  

Martin, did you follow this thread on how to completely opt out of beta tasks?


http://www.gpugrid.net/forum_thread.php?id=3272


No, but I'm in all other queues, so there is (plenty of) other work.

But the problem is solved now. Admins cancelled existing beta tasks and no others are waiting. I'll opt in to beta again to help test on Win platform.
ID: 29174 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile AdamYusko

Send message
Joined: 29 Jun 12
Posts: 26
Credit: 21,540,800
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 29185 - Posted: 16 Mar 2013, 23:53:19 UTC

Now I am not sure if this is an error with my machine it has been offline for a few weeks, or if it is due to a bug in the Noelia Tasks it got earlier today.

The errors both of them sent out had issues with the file: "restart.coor"

One had this output:
<message>
process exited with code 255 (0xff, -1)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"


the other had a much shorter but similar output of:

<message>
process exited with code 255 (0xff, -1)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"

ID: 29185 · Rating: 0 · rate: Rate + / Rate - Report as offensive
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29189 - Posted: 17 Mar 2013, 12:36:08 UTC
Last modified: 17 Mar 2013, 12:36:36 UTC

I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come.
ID: 29189 · Rating: 0 · rate: Rate + / Rate - Report as offensive
idimitro

Send message
Joined: 25 Jun 12
Posts: 3
Credit: 47,912,263
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 29281 - Posted: 29 Mar 2013, 13:01:30 UTC

After working for a few days the Nathan packages are also crashing the application and my driver.
I liked the cause of this project but basically I can not allow it to crush my computer and interrupt my work.
Asta la vista.
ID: 29281 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 17 · Next

Message boards : News : Old Noelia WUs

©2025 Universitat Pompeu Fabra