Old Noelia WUs

Message boards : News : Old Noelia WUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 17 · Next

AuthorMessage
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29287 - Posted: 30 Mar 2013, 15:37:56 UTC - in response to Message 29189.  
Last modified: 30 Mar 2013, 15:39:08 UTC

I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come.


On my end, i´m having suspicious about one of the 690´s beeing not that strong. Taking out the oc of it seems to improve the machine stability. This issue should be machine fault, because none of my other machines does it. Plus no one seems to have the same BSOD problem with the current units, then the problem is here. Just want to share it, because that´s not a project fault.
BTW I would like to have more news from the results front, so I can proudly share it with my family and friends, and maybe found some more volunteers to the cause.

Typo edited*
ID: 29287 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jorge Alberto Ramos Oliveira

Send message
Joined: 13 Aug 09
Posts: 24
Credit: 156,684,745
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 29303 - Posted: 31 Mar 2013, 23:40:02 UTC - in response to Message 29287.  

I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come.


On my end, i´m having suspicious about one of the 690´s beeing not that strong. Taking out the oc of it seems to improve the machine stability. This issue should be machine fault, because none of my other machines does it. Plus no one seems to have the same BSOD problem with the current units, then the problem is here. Just want to share it, because that´s not a project fault.
BTW I would like to have more news from the results front, so I can proudly share it with my family and friends, and maybe found some more volunteers to the cause.

Typo edited*


BSODs Strike Back!

I don't have my 690's OC'ed and my system crashed today with NATHAN units e.g. http://www.gpugrid.net/workunit.php?wuid=4313870 (I deactivated the project before error reports from this unit could be assembled, as the system BSODs first before BOINC notices it)

Have been working through them for a month or so without a BSOD, after experiencing the same crash reports seen elsewhere around here (e.g. .http://www.gpugrid.net/forum_thread.php?id=3308&nowrap=true#29090)

I will be crunching my backup project until this is fixed.
ID: 29303 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Bikermatt

Send message
Joined: 8 Apr 10
Posts: 37
Credit: 4,422,457,619
RAC: 64,437
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29330 - Posted: 6 Apr 2013, 3:18:06 UTC

I just noticed I have two Noelia WU on my linux boxes for the first time in a few weeks. They were both stuck at 0% and the boxes had to be rebooted to get the gpu running again.
ID: 29330 · Rating: 0 · rate: Rate + / Rate - Report as offensive
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29335 - Posted: 6 Apr 2013, 7:36:26 UTC - in response to Message 29330.  

I just noticed I have two Noelia WU on my linux boxes for the first time in a few weeks. They were both stuck at 0% and the boxes had to be rebooted to get the gpu running again.


Exact same thing here, Windows XP Pro 64 bit. I had 3 NOELIA's come through, I caught one at 0% after 5 1/2 hours of crunching on a GTX680, GPU was at 99%, memory controller was at 0% along with the CPU usage for that GPU. The other 2 caused a 2685 error and one NOELIA hosed a CPDN work unit that I had over 250 hours on. I am not signed on to do beta testing, these came through the regular server (I also did a TONI without issue).

Interesting that they slipped them through like this, makes me feel like they don't trust us.

ID: 29335 · Rating: 0 · rate: Rate + / Rate - Report as offensive
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29338 - Posted: 6 Apr 2013, 8:34:30 UTC - in response to Message 29335.  

Interesting that they slipped them through like this, makes me feel like they don't trust us.

No, the way I understand it is that Noelia is testing new functionality, which had been added in the recent app update but wasn't used in previous WUs (except the infamous Noelias).

To me it looks like there's more alpha and beta testing needed here. And serious debugging.

MrS
Scanning for our furry friends since Jan 2002
ID: 29338 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Trotador

Send message
Joined: 25 Mar 12
Posts: 103
Credit: 14,948,929,771
RAC: 11,649
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29339 - Posted: 6 Apr 2013, 8:45:23 UTC

Same here, this morning the machine (Ubuntu 64, 2x660GTIs) was hung, reboot to see that there was a Noelia stuck at 0%, wait to see if it progresses...no way...a couple of reboots more to finally abort and get back to normality.

Weekends are not the best moments for new trials imho.

ID: 29339 · Rating: 0 · rate: Rate + / Rate - Report as offensive
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29340 - Posted: 6 Apr 2013, 9:02:35 UTC
Last modified: 6 Apr 2013, 9:04:36 UTC

Well, I guess you're getting information through the moderators lounge, I seriously didn't see any post about those work units coming through or I would have been on the look out.

I guess I got a little complacent doing the NATHAN's for the last month. I just can't wrap my mind around the fact that she (NOELIA) always has problems with her work units and it's tough for anyone to figure out why.
ID: 29340 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29341 - Posted: 6 Apr 2013, 9:02:42 UTC - in response to Message 29339.  
Last modified: 6 Apr 2013, 9:07:21 UTC

On 30th March I had a Short task sit for 18h before I spotted it doing nothing, 47x2-NOELIA_TRYP_0-2-3-RND8854_6 (6.52app). Since then I've had three Nathan tasks fail and one Noelia 148nx9xBIS-NOELIA_148n-1-2-RND8819_1 (all 6.18apps).

It bugs me too when tasks fail after 6h, run indefinitely or crash systems.

'moderators lounge' - ha!
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 29341 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile nate

Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29360 - Posted: 6 Apr 2013, 22:14:52 UTC

Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints.
ID: 29360 · Rating: 0 · rate: Rate + / Rate - Report as offensive
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29362 - Posted: 6 Apr 2013, 23:36:25 UTC - in response to Message 29360.  

Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints.


Ya buddy, you got the touch. Maybe you can work you're magic on rebuilding the NOELIA's, you seem to have the "Right Stuff". I admit, I have no idea what goes into writing these wu's, Noelia must be doing something fundamentally different than the rest of the scientist's at GPUGRID. I'm hoping she'll get it right soon and this well all have been worth it.

ID: 29362 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Mumak
Avatar

Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29363 - Posted: 7 Apr 2013, 5:36:58 UTC

Please NO MORE NEW LONG NOELIA tasks until they are really tested.
I have been running well any tasks for few weeks, but yesterday got a new long Noelia and the same result again - hang.
ID: 29363 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29364 - Posted: 7 Apr 2013, 6:28:57 UTC - in response to Message 29360.  
Last modified: 18 Apr 2013, 12:25:17 UTC

There have been some really odd errors in the last couple of months,
I11R10-NATHAN_dhfr36_3-26-32-RND2505_7
Stderr output

<core_client_version>7.0.44</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
MDIO: unexpected end-of-file for file "input.coor": reached end-of-file before reading 39350 coordinates
ERROR: file mdioload.cpp line 80: Unable to read bincoordfile

called boinc_finish

</stderr_txt>
]]>


Would like plenty of Noelia's NOELIA_Klebe_Equ WU's.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 29364 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29395 - Posted: 9 Apr 2013, 3:39:59 UTC - in response to Message 29360.  

Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints.


Thank you Nate for suspending them. I really hope you guys can figure out the problems in your staging environment, before even sending them through the beta app. If there's anything I can do to help (like some sort of pre-Beta test, if possible), you can PM me. I really enjoy testing, especially when I know it might fail, but I expect the production apps to be near-error-free.

Regards,
Jacob
ID: 29395 · Rating: 0 · rate: Rate + / Rate - Report as offensive
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29406 - Posted: 11 Apr 2013, 9:42:31 UTC

I just got another NOELIA long wu and it gave me an error message after 30 seconds of run time, I had to reboot to get the GPU back working.
ID: 29406 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29407 - Posted: 11 Apr 2013, 11:28:36 UTC - in response to Message 29406.  

ID: 29407 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile The King's Own
Avatar

Send message
Joined: 25 Apr 12
Posts: 32
Credit: 945,543,997
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29409 - Posted: 11 Apr 2013, 13:05:45 UTC

063ppx1xBIS-NOELIA_063pp_beta-0-2-RND4224_2
WU has run for 8 hr 20 min with another 8 hr 05 min projected.
Seems excessive on a GTX580
ID: 29409 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Simba123

Send message
Joined: 5 Dec 11
Posts: 147
Credit: 69,970,684
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29411 - Posted: 11 Apr 2013, 13:17:21 UTC - in response to Message 29111.  

If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free.


https://play.google.com/store/apps/details?id=com.teamviewer.teamviewer.market.mobile&hl=en

exactly what I do on my tablet. Problem is, when the big rig starts to reboot, I can´t access it, Hope it won´t happen.



you can set teamviewer to start with windows and auto-login, so if the computer at home is setup this way, if it reboots, you will still have access to it.
ID: 29411 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile The King's Own
Avatar

Send message
Joined: 25 Apr 12
Posts: 32
Credit: 945,543,997
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29412 - Posted: 11 Apr 2013, 15:57:06 UTC

Further to http://www.gpugrid.net/forum_thread.php?id=3318&nowrap=true#29409


063ppx1xBIS-NOELIA_063pp_beta-0-2-RND4224_2 crashed after 10+ hours Locking up whole system and requiring reboot.

The following error from tasks:


<core_client_version>7.0.31</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

ID: 29412 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29413 - Posted: 11 Apr 2013, 16:02:06 UTC

I aborted 063px1x1BIS-NOELIA_063p_beta-1-2-RND8034_1 after it had given the "acemd.2865P.exe has encountered a problem ..." popup error three times in succession.
ID: 29413 · Rating: 0 · rate: Rate + / Rate - Report as offensive
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29418 - Posted: 11 Apr 2013, 19:20:51 UTC

I guess I should have clarified, the NOELIA that crashed on me came through the regular server. Richard, I always get the 2865P error, I thought it was a Windows XP thing.
ID: 29418 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 17 · Next

Message boards : News : Old Noelia WUs

©2025 Universitat Pompeu Fabra