Message boards :
News :
Old Noelia WUs
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 17 · Next
Author | Message |
---|---|
Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come. On my end, i´m having suspicious about one of the 690´s beeing not that strong. Taking out the oc of it seems to improve the machine stability. This issue should be machine fault, because none of my other machines does it. Plus no one seems to have the same BSOD problem with the current units, then the problem is here. Just want to share it, because that´s not a project fault. BTW I would like to have more news from the results front, so I can proudly share it with my family and friends, and maybe found some more volunteers to the cause. Typo edited* |
Send message Joined: 13 Aug 09 Posts: 24 Credit: 156,684,745 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come. BSODs Strike Back! I don't have my 690's OC'ed and my system crashed today with NATHAN units e.g. http://www.gpugrid.net/workunit.php?wuid=4313870 (I deactivated the project before error reports from this unit could be assembled, as the system BSODs first before BOINC notices it) Have been working through them for a month or so without a BSOD, after experiencing the same crash reports seen elsewhere around here (e.g. .http://www.gpugrid.net/forum_thread.php?id=3308&nowrap=true#29090) I will be crunching my backup project until this is fixed. |
![]() Send message Joined: 8 Apr 10 Posts: 37 Credit: 4,422,457,619 RAC: 64,437 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I just noticed I have two Noelia WU on my linux boxes for the first time in a few weeks. They were both stuck at 0% and the boxes had to be rebooted to get the gpu running again. |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I just noticed I have two Noelia WU on my linux boxes for the first time in a few weeks. They were both stuck at 0% and the boxes had to be rebooted to get the gpu running again. Exact same thing here, Windows XP Pro 64 bit. I had 3 NOELIA's come through, I caught one at 0% after 5 1/2 hours of crunching on a GTX680, GPU was at 99%, memory controller was at 0% along with the CPU usage for that GPU. The other 2 caused a 2685 error and one NOELIA hosed a CPDN work unit that I had over 250 hours on. I am not signed on to do beta testing, these came through the regular server (I also did a TONI without issue). Interesting that they slipped them through like this, makes me feel like they don't trust us. |
![]() Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Interesting that they slipped them through like this, makes me feel like they don't trust us. No, the way I understand it is that Noelia is testing new functionality, which had been added in the recent app update but wasn't used in previous WUs (except the infamous Noelias). To me it looks like there's more alpha and beta testing needed here. And serious debugging. MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 25 Mar 12 Posts: 103 Credit: 14,948,929,771 RAC: 11,649 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Same here, this morning the machine (Ubuntu 64, 2x660GTIs) was hung, reboot to see that there was a Noelia stuck at 0%, wait to see if it progresses...no way...a couple of reboots more to finally abort and get back to normality. Weekends are not the best moments for new trials imho. |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well, I guess you're getting information through the moderators lounge, I seriously didn't see any post about those work units coming through or I would have been on the look out. I guess I got a little complacent doing the NATHAN's for the last month. I just can't wrap my mind around the fact that she (NOELIA) always has problems with her work units and it's tough for anyone to figure out why. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
On 30th March I had a Short task sit for 18h before I spotted it doing nothing, 47x2-NOELIA_TRYP_0-2-3-RND8854_6 (6.52app). Since then I've had three Nathan tasks fail and one Noelia 148nx9xBIS-NOELIA_148n-1-2-RND8819_1 (all 6.18apps). It bugs me too when tasks fail after 6h, run indefinitely or crash systems. 'moderators lounge' - ha! FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
![]() Send message Joined: 6 Jun 11 Posts: 124 Credit: 2,928,865 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints. |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints. Ya buddy, you got the touch. Maybe you can work you're magic on rebuilding the NOELIA's, you seem to have the "Right Stuff". I admit, I have no idea what goes into writing these wu's, Noelia must be doing something fundamentally different than the rest of the scientist's at GPUGRID. I'm hoping she'll get it right soon and this well all have been worth it. |
![]() Send message Joined: 7 Dec 12 Posts: 92 Credit: 225,897,225 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Please NO MORE NEW LONG NOELIA tasks until they are really tested. I have been running well any tasks for few weeks, but yesterday got a new long Noelia and the same result again - hang. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There have been some really odd errors in the last couple of months, I11R10-NATHAN_dhfr36_3-26-32-RND2505_7 Stderr output <core_client_version>7.0.44</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> MDIO: unexpected end-of-file for file "input.coor": reached end-of-file before reading 39350 coordinates ERROR: file mdioload.cpp line 80: Unable to read bincoordfile called boinc_finish </stderr_txt> ]]> Would like plenty of Noelia's NOELIA_Klebe_Equ WU's. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints. Thank you Nate for suspending them. I really hope you guys can figure out the problems in your staging environment, before even sending them through the beta app. If there's anything I can do to help (like some sort of pre-Beta test, if possible), you can PM me. I really enjoy testing, especially when I know it might fail, but I expect the production apps to be near-error-free. Regards, Jacob |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I just got another NOELIA long wu and it gave me an error message after 30 seconds of run time, I had to reboot to get the GPU back working. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Had a NOELIA beta fail this morning, 291px1x1BIS-NOELIA_291p_beta-1-2-RND9212 FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
![]() Send message Joined: 25 Apr 12 Posts: 32 Credit: 945,543,997 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
063ppx1xBIS-NOELIA_063pp_beta-0-2-RND4224_2 WU has run for 8 hr 20 min with another 8 hr 05 min projected. Seems excessive on a GTX580 ![]() |
Send message Joined: 5 Dec 11 Posts: 147 Credit: 69,970,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free. you can set teamviewer to start with windows and auto-login, so if the computer at home is setup this way, if it reboots, you will still have access to it. |
![]() Send message Joined: 25 Apr 12 Posts: 32 Credit: 945,543,997 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Further to http://www.gpugrid.net/forum_thread.php?id=3318&nowrap=true#29409 063ppx1xBIS-NOELIA_063pp_beta-0-2-RND4224_2 crashed after 10+ hours Locking up whole system and requiring reboot. The following error from tasks: <core_client_version>7.0.31</core_client_version> <![CDATA[ <message> The system cannot find the path specified. (0x3) - exit code 3 (0x3) </message> <stderr_txt> MDIO: cannot open file "restart.coor" SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. </stderr_txt> ]]> ![]() |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I aborted 063px1x1BIS-NOELIA_063p_beta-1-2-RND8034_1 after it had given the "acemd.2865P.exe has encountered a problem ..." popup error three times in succession. |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I guess I should have clarified, the NOELIA that crashed on me came through the regular server. Richard, I always get the 2865P error, I thought it was a Windows XP thing. |
©2025 Universitat Pompeu Fabra