Message boards :
News :
Old Noelia WUs
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 17 · Next
Author | Message |
---|---|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The previous bunch of Noelia's beta's did good on my WinVista 32 bit pc with driver 314.7 BOINC 6.10.58. The batch from last days error out after hours with the message that the acemd driver stopped and has recovered from an unexpected error. I am now trying the long runs from Nathan on my GTX550Ti. Greetings from TJ |
Send message Joined: 28 Mar 09 Posts: 16 Credit: 953,280,454 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi there im having problems on my linux box havent been able to run any work at all for a 3-4 days.. Mvh/ Oktan |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for the reply, Nate. I'm glad to hear that you guys are looking to improve the testability for Windows, even before issuing tasks on the Beta application to us Beta users. Regarding your request for info, my previously mentioned NOELIA task failures are happening on Windows 8 Pro x64, using BOINC v7.0.55 x64 beta, running nVidia drivers 314.14 beta, using 2 video cards, GTX 660 Ti and GTX 460. It appears to me that, when a GPUGrid task causes the nVidia driver to stop responding, Windows catches the error and restarts the driver (instead of BSOD), giving a Taskbar balloon to the effect of "The nVidia driver had a problem and has been restarted successfully." (I'm not sure of the exact text). When this happens, in addition to the GPUGrid task erroring out on my main video card, crunching on my other GPU (which is usually doing World Community Grid Help Conquer Cancer work) also results in its tasks erroring out. I believe the next tasks that get processed after that driver recovery, are successful, unless another NOELIA task on the beta app causes an additional driver crash and recovery. If you have any more resources to test these tasks out more, locally, it would save us a huge headache. I understand I signed up for these beta tasks, and I understand that seeing these errors is part of the gig, and so... If you find a way to replicate the error locally, then I'd politely ask that you also remove the bugged tasks from the beta queue. If you cannot yet reproduce the problem locally, then we'll keep erroring them for you, as part of our obligation. Not sure if this much info helps, but that's the behavior I'm seeing on my Windows 8 x64 PC, and if you need anything more, feel free to ask. Kind regards, Jacob Klein |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What we are thinking is that this might be related to the Windows application. Has anyone who experiences these problems seen them on a linux box? Is it only Windows? The more we know, the more quickly we can improve. The last thing we want is to crash your machines. A failed WU is one thing. Locking up cruncher machines is much, much worse. Please let us know so we can fix it. I've just aborted one of your long run tasks which looked as if it was going bad - http://www.gpugrid.net/workunit.php?wuid=4246107 (replication _6 is always a bad sign). The first cruncher to try it was running Linux. |
Send message Joined: 7 Jan 09 Posts: 3 Credit: 3,624,425 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
All NOELIA tasks at the moment freeze my Linux pretty totally to the point that I have to restart computer. What's worse, I did de-select beta tasks but after the reboot BOINC downloads more from the ACEMDBETA queue those tasks and I'm back to reboot cycle. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
All NOELIA tasks at the moment freeze my Linux pretty totally to the point that I have to restart computer. What's worse, I did de-select beta tasks but after the reboot BOINC downloads more from the ACEMDBETA queue those tasks and I'm back to reboot cycle. Deselect Run test applications? as well. |
Send message Joined: 7 Jan 09 Posts: 3 Credit: 3,624,425 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ok, I had still test applications selected, after deselecting that and resetting project I got now NATHAN long run task, which is also pretty odd, because I have only short runs enabled at the moment. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ok, I had still test applications selected, after deselecting that and resetting project I got now NATHAN long run task, which is also pretty odd, because I have only short runs enabled at the moment. There aren't any short run tasks available today. Might you have had If no work for selected applications is available, accept work from other applications? selected as well? |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
109nx33-NOELIA_109n_equ-1-2-RND6949_0 4248581 139265 12 Mar 2013 | 8:04:33 UTC 13 Mar 2013 | 14:48:02 UTC Error while computing 58,742.39 1.73 --- ACEMD beta version v6.49 (cuda42) This Long WU hung after 16h on a W7 system with a GTX660Ti. The GPU sat at zero usage and the app stayed running/crashed preventing new work units from starting or a backup GPU project from running. It also prevented an additional CPU core from being used at a CPU project. Saw the usual cuda driver pop-up error. Stderr output <core_client_version>7.0.44</core_client_version> <![CDATA[ <message> The system cannot find the path specified. (0x3) - exit code 3 (0x3) </message> <stderr_txt> MDIO: cannot open file "output.restart.coor" SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. MDIO: cannot open file "output.restart.coor" SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. </stderr_txt> ]]> The next two WU's also failed: 148px38-NOELIA_148p_equ-1-2-RND3814_6 4249317 139265 13 Mar 2013 | 17:29:07 UTC 13 Mar 2013 | 17:32:46 UTC Error while computing 31.09 1.76 --- ACEMD beta version v6.49 (cuda42) 216px36-NOELIA_216p_equ-1-2-RND0721_0 4249016 139265 12 Mar 2013 | 9:48:07 UTC 13 Mar 2013 | 14:48:02 UTC Error while computing 12.60 1.81 --- ACEMD beta version v6.49 (cuda42) I don't see the point in testing a WU 7 or more times, especially if it's one of a batch of hundreds. Again, I suggest you start up an alpha project to test on properly - Beta testing shouldn't crash systems, hang drivers or banjax the OS! FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 16 Jul 10 Posts: 7 Credit: 35,198,028 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
All of these betas failed on mine machine. Moreover, I opted out from beta, updated, but am still receiving them (and only them). I also observed that mine W7 always restart driver few seconds after acemd application is killed. And all those WUs failed immediatelly. If you see some times at statistics, that becouse there is crash messagebox onscreen which counts to time running. Sometimes its on screen for hours... |
Send message Joined: 16 Jul 12 Posts: 98 Credit: 386,043,752 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Martin, did you follow this thread on how to completely opt out of beta tasks? http://www.gpugrid.net/forum_thread.php?id=3272 |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Seven in a row of the 6.49 ACEMD beta NOELIAs failed for me also, all in 8 seconds or less, so I am giving it a rest for now. That was on a Kepler GTX 650 Ti card, and I will try a Fermi GTX 560 tomorrow to see if that does any better. This is on Win7 64-bit, and BOINC 7.0.56 x64. Those cards have been basically error free for the last several days, since the last Noelia errors. |
Send message Joined: 6 Jan 13 Posts: 1 Credit: 1,548,050 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
A few NOELIA WUs failed recently on my system too. I'm running GTS450 (314.14, Win7 x64). |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Tsukiouji, When I clicked your link, I got a page that says "No access". For your account settings, in GPUGRID preferences, do you have "Should GPUGRID show your computers on its web site?" set to yes? |
![]() Send message Joined: 31 Mar 09 Posts: 137 Credit: 1,429,587,071 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The problem is at link. Filter set "http://www.gpugrid.net/results.php?userid=94436" can be set and results can be seen by the owner only. There is no problem with "host" filers, e.g. http://www.gpugrid.net/results.php?hostid=144019. |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for the explanation -- I was able to find the user's tasks by clicking on their name, and looking at the tasks for the only computer. Link: http://www.gpugrid.net/results.php?hostid=144019 Anyway... The way I look at this issue is... The project admins have already made a decision whether they want the beta testers to suffer through and to "process all these failures". So, what I do is, look at the server status page here http://www.gpugrid.net/server_status.php ... and just keep praying that the "Unsent" tasks for the "ACEMD beta version" app goes down quicker. Good news - it's pretty much exhausted - Now maybe my system will be stable again! |
Send message Joined: 16 Jul 10 Posts: 7 Credit: 35,198,028 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Martin, did you follow this thread on how to completely opt out of beta tasks? No, but I'm in all other queues, so there is (plenty of) other work. But the problem is solved now. Admins cancelled existing beta tasks and no others are waiting. I'll opt in to beta again to help test on Win platform. |
![]() Send message Joined: 29 Jun 12 Posts: 26 Credit: 21,540,800 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
Now I am not sure if this is an error with my machine it has been offline for a few weeks, or if it is due to a bug in the Noelia Tasks it got earlier today. The errors both of them sent out had issues with the file: "restart.coor" One had this output: <message> the other had a much shorter but similar output of: <message> ![]() |
Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come. |
Send message Joined: 25 Jun 12 Posts: 3 Credit: 47,912,263 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
After working for a few days the Nathan packages are also crashing the application and my driver. I liked the cause of this project but basically I can not allow it to crush my computer and interrupt my work. Asta la vista. |
©2025 Universitat Pompeu Fabra