WU: NOELIA_KLEBEs

Message boards : News : WU: NOELIA_KLEBEs
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Stoneageman
Avatar

Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32423 - Posted: 28 Aug 2013, 21:49:21 UTC

So far I've had
one long wu error after 13s

one short wu complete OK
one long wu stall, ran for 2hr on a 660 but little progress, so got the bullet

Be on your guard!


ID: 32423 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32430 - Posted: 28 Aug 2013, 23:52:33 UTC

ID: 32430 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 32438 - Posted: 29 Aug 2013, 8:18:06 UTC
Last modified: 29 Aug 2013, 8:24:32 UTC

Since I saw a few error posts popping out about Noelia's new WU's and there was no official thread... I make this thread to collect them all. Once they all come to the office I will inform them.
ID: 32438 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 32443 - Posted: 29 Aug 2013, 10:00:33 UTC

I've had 5 NOELIA's fail in the past 24 hours.
ID: 32443 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
noelia

Send message
Joined: 5 Jul 12
Posts: 35
Credit: 393,375
RAC: 0
Level

Scientific publications
wat
Message 32444 - Posted: 29 Aug 2013, 10:16:26 UTC

Hi, for some reason some of you are having problems with this WUs in the new application and we've moved them to the beta queue to have a proper look. I've also just sent 50 WU under the name KLEBEbeta with a much simpler configuration file. These simulations are really important, and fixing this bug will also help for future similar projects in drug discovery. Please report any problems you might have on groups KLEBEs and KLEBEbeta.
ID: 32444 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32446 - Posted: 29 Aug 2013, 11:34:38 UTC - in response to Message 32444.  
Last modified: 29 Aug 2013, 11:57:33 UTC

Noelia,

Thanks for moving to beta to try to fix the problems!
I'm noticing a "new" problem with the NOELIA_KLEBEbeta tasks, though.

Essentially, they get assigned to a GPU, and then when they try to "get going", BOINC shows that they run for about 15-40 seconds, and then the task resets back to the beginning (with Elapsed back to 0 seconds), and it retries.
It just keeps retrying until failure.
Additionally, if the user closes BOINC, the acemd.800-55.exe process for that task does not close properly (it still remains in the Task Manager's process list, even though all other related BOINC processes have exited normally).

Also, looking at stderr.txt for one of the tasks that I aborted (http://www.gpugrid.net/result.php?resultid=7221709), said the following lines that might give a hint as to what's happening:
swanMemset failed
Can't acquire lockfile - exiting
FILE_LOCK::unlock(): close failed.: No error

I have not seen this behavior before today, so I think there is at least 1 new bug here.

This happens both on my GTX 660 Ti, as well as my GTX 460, in Windows 8.1 Preview x64.

The current task exhibiting this behavior is:
109nx4-NOELIA_KLEBEbeta-0-3-RND0846_0

I hope this information helps you to track it down to correct the problem(s) quickly, as right now my GPU is spinning in circles and doing no work. Are you able to reproduce the problem in your testing?

If there's anything else you might need, please let us know.

Thanks,
Jacob
ID: 32446 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32447 - Posted: 29 Aug 2013, 11:59:05 UTC

I've probably fixed the fault. There'll be an updated acemdbeta app very soon.

MJH
ID: 32447 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32449 - Posted: 29 Aug 2013, 12:19:56 UTC - in response to Message 32447.  

801 is now live.
ID: 32449 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile dskagcommunity
Avatar

Send message
Joined: 28 Apr 11
Posts: 462
Credit: 958,266,958
RAC: 31,461
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32450 - Posted: 29 Aug 2013, 12:29:45 UTC
Last modified: 29 Aug 2013, 12:35:27 UTC

Lol What a "luck" ^^

http://www.gpugrid.net/workunit.php?wuid=4729004

One of my maschine failed this, and i got exactly this wu to my next machine where it stucks O.o saw it now after 2 hours. Puh earliy enough before weekend ^^
DSKAG Austria Research Team: http://www.research.dskag.at



ID: 32450 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32451 - Posted: 29 Aug 2013, 12:30:16 UTC - in response to Message 32449.  

Thanks for the prompt response. I got an 801 KLEBEbeta, and it's at least getting off the ground now.

I hope that you can see that it is very difficult for us to know if the problem is in the task set, or if the problem is in the application.

Will continue to monitor...
ID: 32451 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32452 - Posted: 29 Aug 2013, 12:36:38 UTC
Last modified: 29 Aug 2013, 12:36:56 UTC

Would anyone with a cc 1.3 card - Geforce GTX 200 series - please try some of the current acemdbeta v801 Noelia-KLEBE WUs and report back here?

MJH
ID: 32452 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32453 - Posted: 29 Aug 2013, 12:37:07 UTC - in response to Message 32451.  
Last modified: 29 Aug 2013, 12:37:40 UTC

I tested suspending one of these KLEBEbeta tasks, and it caused a driver reset. So, the problem still persists.

Can you please look into it more closely? The issue has to deal with how the KLEBE tasks are exiting - it seems they are not releasing the GPU in a timely fashion, as compared to every other GPU task I run (across all my GPU projects).

Maybe compare the exit logic of a KLEBE task, versus the exit logic of other GPUGrid task types?
ID: 32453 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Operator

Send message
Joined: 15 May 11
Posts: 108
Credit: 297,176,099
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32454 - Posted: 29 Aug 2013, 12:42:04 UTC

On my Titan box I've gotten two of these NOELIAs.

Both exhibited the same behavior.

"8/29/2013 7:36:55 AM | GPUGRID | Task 063px38-NOELIA_KLEBEs-1-3-RND3786_0 exited with zero status but no 'finished' file"

Over and over again without making much progress.

So I pulled the trigger to kill them.

So it's back to babysitting to make sure I only get NATHAN longs for the time being.

That NOELIA sure has a reputation! ;-}

Operator

ID: 32454 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 32455 - Posted: 29 Aug 2013, 12:43:38 UTC - in response to Message 32453.  

Jacob,

The current beta addresses the "swanMemset failed" and "access violation" errors.
The suspend problem I have not yet investigated. (Is it with 'suspend to memory' or 'suspend and exit'? )

MJH
ID: 32455 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32456 - Posted: 29 Aug 2013, 12:49:56 UTC - in response to Message 32455.  
Last modified: 29 Aug 2013, 13:00:27 UTC

Jacob,

The current beta addresses the "swanMemset failed" and "access violation" errors.
The suspend problem I have not yet investigated. (Is it with 'suspend to memory' or 'suspend and exit'? )

MJH


There's a whole thread about it, where I posted in as much detail as I could about the problem, on 4/4/2013 (4+ months ago), here:
http://www.gpugrid.net/forum_thread.php?id=3333

It happens whenever a NOELIA task (especially KLEBE) is suspended for any reason, including:
- BOINC set to Snooze
- BOINC set to Snooze GPU
- BOINC set to Suspend
- BOINC set to Suspend GPU
- BOINC set to Suspend due to exclusive app running
- BOINC set to Suspend GPU due to exclusive GPU app running
- GPUGrid project set to Suspend
- NOELIA KLEBE task set to Suspend
- BOINC exited with "Stop running tasks" checked

Something in the KLEBE exit logic has been causing driver resets and watchdog timeouts, for several months, for many of your Windows users. I sure hope you guys can work together to get a handle on it!

Note: I do use the "Leave application in memory when suspended" setting, but so far as I know, that is irrelevant to GPU tasks. When a GPU task is suspended, BOINC has to remove it from memory, regardless of that user setting. It treats GPU tasks differently because there's no PageFile backing the GPU RAM.

Thanks for looking into this. It's my biggest problem across all of my 20 BOINC projects.
ID: 32456 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32457 - Posted: 29 Aug 2013, 13:16:01 UTC

I have two NOELIA KLEEBEbeata on my 770 and they start, then when 0.021% complete, no more progress but they keep running, 2h57m16s elapsed and 0h0m0s remaining. This was in app 8.00 and I have now aborted these WU's and try the new 8.01 app.

I have now 1 with the 8.01 (cuda55) app (NOELIA KLEEBEbeta) and it is running normal. Twelve hour to finish, progress runs up, elapsed time runs up, and remaining runs down. Win7 x64, BOINC 7.0.64, driver 326.80
Greetings from TJ
ID: 32457 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Carlos Augusto Engel

Send message
Joined: 5 Jun 09
Posts: 38
Credit: 2,880,758,878
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32459 - Posted: 29 Aug 2013, 14:07:08 UTC - in response to Message 32457.  

I have two NOELIA KLEEBEbeata on my 770 and they start, then when 0.021% complete, no more progress but they keep running, 2h57m16s elapsed and 0h0m0s remaining. This was in app 8.00 and I have now aborted these WU's and try the new 8.01 app.




Same thing here , but after 30 minutes i stop it.

http://www.gpugrid.net/result.php?resultid=7221521


ID: 32459 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile dskagcommunity
Avatar

Send message
Joined: 28 Apr 11
Posts: 462
Credit: 958,266,958
RAC: 31,461
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32464 - Posted: 29 Aug 2013, 14:51:21 UTC - in response to Message 32452.  
Last modified: 29 Aug 2013, 15:02:22 UTC

Would anyone with a cc 1.3 card - Geforce GTX 200 series - please try some of the current acemdbeta v801 Noelia-KLEBE WUs and report back here?

MJH


Ok i started one with 8.01. But this can take some time even on my 670mhz 285gtx..I normaly dont run gpugrid on this anymore. It will need about 33hours. The short run 8.00 was ok on this card. I dont think anybody still uses a powerhungry 200series on long runs O.o
DSKAG Austria Research Team: http://www.research.dskag.at



ID: 32464 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32467 - Posted: 29 Aug 2013, 15:12:49 UTC

With the new 8.01 app they run normal!

I noticed the following:
On the 770 I have this one: 063px79-NOELIA_KLEEBEbeta2-0-3-RND678_0
MEM use: 1003MB
Clock: 1097MHz (however I have set the clock to 1060MHz!)
GPU load: 87%
Temp: 65°C
7.5% done in 40 minutes

On the 660 I have this one: 109nx37-NOELIA_KLEEBEbeta-0-3-RND0283_0
MEM use: 779MB
Clock: 1045MHZ (as I set it)
GPU load: ~88%
Temp: 67°C
5.6% done in 40 minutes

I now these are not the same WU's and the GPU's are not the same as well. But it is strange that the WU can manage to get the clock higher, or this must have been the result of the faulty WU that I aborted and not reboot afterwards.

Due to the difference in memory load it can also be that cards with only 1MB can not do these WU's as before. That may result in some comments ;-)
Greetings from TJ
ID: 32467 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile dskagcommunity
Avatar

Send message
Joined: 28 Apr 11
Posts: 462
Credit: 958,266,958
RAC: 31,461
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 32468 - Posted: 29 Aug 2013, 15:16:55 UTC - in response to Message 32464.  

Would anyone with a cc 1.3 card - Geforce GTX 200 series - please try some of the current acemdbeta v801 Noelia-KLEBE WUs and report back here?

MJH


Ok i started one with 8.01. But this can take some time even on my 670mhz 285gtx..I normaly dont run gpugrid on this anymore. It will need about 33hours. The short run 8.00 was ok on this card. I dont think anybody still uses a powerhungry 200series on long runs O.o


Oh and when somebody started one too on 200series, plz tell me, got my energybill today, so i would love to stop it the next hours when not needed :p
DSKAG Austria Research Team: http://www.research.dskag.at



ID: 32468 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : News : WU: NOELIA_KLEBEs

©2025 Universitat Pompeu Fabra