New project in long queue

Message boards : News : New project in long queue
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 47,738
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 28989 - Posted: 5 Mar 2013, 0:49:34 UTC - in response to Message 28972.  

We're looking at the issue. The problematic WUs have been cancelled for now. The problem was clearly on our end, but it seems that there were multiple reasons they were having issues, and mostly not Noelia's fault.

Nate


Were the issues related to the new application, the Wu's or both?


ID: 28989 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 28998 - Posted: 6 Mar 2013, 7:26:23 UTC

How big are the uploads for these reworked NOELIA's supposed to be? The 3 I've finished were barely over 4MB after 11 1/2 hours of "crunching". Is this about right?
ID: 28998 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FrRie

Send message
Joined: 21 Dec 11
Posts: 2
Credit: 21,062,866
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 29000 - Posted: 6 Mar 2013, 11:45:09 UTC

... I got messages like "abort by user" - but I didn't abort any ...
I observed incrementing of remaining time in one case ...
W 7 Ultimate 64
BOINC 7.0.28
GTX 580

Thanks for reactions

FrRie
ID: 29000 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
idimitro

Send message
Joined: 25 Jun 12
Posts: 3
Credit: 47,912,263
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 29001 - Posted: 6 Mar 2013, 12:58:28 UTC

I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing.
This is waste of our time and as somebody mentioned I prefer to waste my electricity to something useful.
Can somebody PM me how to block this noelia packages?
ID: 29001 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile microchip
Avatar

Send message
Joined: 4 Sep 11
Posts: 110
Credit: 326,102,587
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29002 - Posted: 6 Mar 2013, 13:08:20 UTC - in response to Message 29001.  
Last modified: 6 Mar 2013, 13:08:38 UTC

I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing.
This is waste of our time and as somebody mentioned I prefer to waste my electricity to something useful.
Can somebody PM me how to block this noelia packages?


it's not possible to block specific tasks. At least that's what I learned from my own tread. http://www.gpugrid.net/forum_thread.php?id=3315

Team Belgium
ID: 29002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 47,738
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29003 - Posted: 6 Mar 2013, 13:08:30 UTC - in response to Message 29001.  

I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing.
This is waste of our time and as somebody mentioned I prefer to waste my electricity to something useful.
Can somebody PM me how to block this noelia packages?


It happened to me too. I had 2 noelia units that are aborted by user, which I didn't abort. They were otherwise running fine. So, what is happening?
ID: 29003 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nate

Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29004 - Posted: 6 Mar 2013, 13:25:16 UTC - in response to Message 29001.  

I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing.
This is waste of our time and as somebody mentioned I prefer to waste my electricity to something useful.
Can somebody PM me how to block this noelia packages?


http://www.gpugrid.net/forum_thread.php?id=3311&nowrap=true#28972
ID: 29004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nate

Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29006 - Posted: 6 Mar 2013, 14:21:34 UTC
Last modified: 6 Mar 2013, 14:22:13 UTC

Were the issues related to the new application, the Wu's or both?


The WUs were not set to upload the smaller file size format we are now trying to move to. They were set to use the old format, which could result in very large file upload sizes, as some people complained about.

The problem with the application was an obscure one. It wasn't an issue with the application per se, but rather with how the application interacts with BOINC and this specific type of configuration file for the simulations. In short, the application was doing at the start of every WU a function that it was only supposed to do in the first WU in a chain. This caused all but the first WU in a chain to fail. This isn't a problem locally for us, but with how BOINC handles the files, it became a problem. We are working on a long-term fix, but we have simply found a way around it for now.

... I got messages like "abort by user" - but I didn't abort any ...

It happened to me too. I had 2 noelia units that are aborted by user, which I didn't abort. They were otherwise running fine. So, what is happening?


I am not sure what is happening. Even if we cancel a group of WUs, they should complete on your computer (if they are good simulations). The "Abort by user" can only come from the user/client, typically when you deliberately cancel a WU with the "Abort" button. Hopefully there is nothing else going on...
ID: 29006 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29008 - Posted: 6 Mar 2013, 14:46:03 UTC - in response to Message 29006.  

... I got messages like "abort by user" - but I didn't abort any ...

It happened to me too. I had 2 noelia units that are aborted by user, which I didn't abort. They were otherwise running fine. So, what is happening?


I am not sure what is happening. Even if we cancel a group of WUs, they should complete on your computer (if they are good simulations). The "Abort by user" can only come from the user/client, typically when you deliberately cancel a WU with the "Abort" button. Hopefully there is nothing else going on...

Ah. That's one I can help you with.

I got an 'aborted by user', too - task 6581613. If you look at the task details, it has "exit status 202".

At some stage in the development of recent BOINC clients, David updated and expanded the range of error and exit status codes returned by the client. Unfortunately, he didn't - at first, and until prodded - update the decode tables used on project web sites.

You need to update html/inc/result.inc on your web server to something later than
http://boinc.berkeley.edu/trac/changeset/1f7ddbfe3a27498e7fd2b4f50f3bf9269b7dae25/boinc/html/inc/result.inc
to get a proper website display using

case 202: return "EXIT_ABORTED_BY_PROJECT";

Full story in http://boinc.berkeley.edu/dev/forum_thread.php?id=7704
ID: 29008 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Operator

Send message
Joined: 15 May 11
Posts: 108
Credit: 297,176,099
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29009 - Posted: 6 Mar 2013, 15:15:01 UTC - in response to Message 29008.  

I was surprised to see some "Aborted By User" tasks this morning, especially since they happened while I was sleeping!

As an example: 290px20xbis-NOELIA_290p-0-2-RND4773_1

Created 5 Mar 2013 | 19:22:08 UTC
Sent 5 Mar 2013 | 21:01:03 UTC
Received 6 Mar 2013 | 9:58:01 UTC
Server state Over
Outcome Computation error
Client state Aborted by user
Exit status 202 (0xca)

But after viewing the details of the task itself it said "WU cancelled" in red.

http://www.gpugrid.net/workunit.php?wuid=4227683

name 290px20xbis-NOELIA_290p-0-2-RND4773
application Long runs (8-12 hours on fastest card)
created 5 Mar 2013 | 16:59:14 UTC
minimum quorum 1
initial replication 1
max # of error/total/success tasks 7, 10, 6
errors WU cancelled

So they got cut off in mid crunch, and on the surface it makes it look like we aborted them.

Operator
ID: 29009 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29013 - Posted: 6 Mar 2013, 17:37:14 UTC

Out of all 4 of my machines, I had 7 "Aborted by user" errors last night. My computers will be on probation by tomorrow and I won’t be able to download work units.
ID: 29013 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ken_g6

Send message
Joined: 6 Aug 11
Posts: 8
Credit: 76,046,994
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 29014 - Posted: 6 Mar 2013, 18:04:23 UTC

I haven't had the server abort any Noelias lately. I've just had them all segfault within an hour or two. :(
ID: 29014 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29028 - Posted: 7 Mar 2013, 5:00:41 UTC

Still having the same problems with noelias. That will put my biggest machine down, because this one is BSODing and ruining all the cache, with is very hard to build atm.
ID: 29028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29029 - Posted: 7 Mar 2013, 6:22:10 UTC

It seems to me that these NOELIA's are suffering from memory leaks, when my card finnishes one and starts the next the GPU pegs at 99 - 100% and the memory controller stays at 0%. If I reboot, all is well and works fine. The previous wu won't release the memory on the GPU, thus the reboot. This is Windows XP Pro 64 bit, different operating systems seem to be dealing with it differently, Windows 7 and 8 are getting BSOD's or driver crashes, I also get the "acemd.2865P.exe had to be terminated unexpectedly" error. Oh well, I don't even know if this stuff we post helps or gets read.
ID: 29029 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wdiz

Send message
Joined: 4 Nov 08
Posts: 20
Credit: 871,871,594
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29030 - Posted: 7 Mar 2013, 8:10:30 UTC
Last modified: 7 Mar 2013, 8:10:49 UTC

Same here..Seems that the new Noelia WU doesn't work well..it freeze the computer. had to reset the project.
I'm running Archlinux 3.7.10-1-ARCH kernel with GTX 660Ti and GTX 580
Boinc 7.0.53
ID: 29030 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29032 - Posted: 7 Mar 2013, 10:57:04 UTC
Last modified: 7 Mar 2013, 11:18:18 UTC

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs.
ID: 29032 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29033 - Posted: 7 Mar 2013, 11:21:03 UTC

Just noelias incoming. Impossible to run the project atm. Too bad, is a big farm.
ID: 29033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29036 - Posted: 7 Mar 2013, 12:07:34 UTC - in response to Message 29033.  

Just noelias incoming. Impossible to run the project atm. Too bad, is a big farm.

I just moved to a different project too. Too bad, I liked helping out here but they don't seem to test anything before release.
ID: 29036 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nate

Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29038 - Posted: 7 Mar 2013, 12:10:31 UTC - in response to Message 29032.  
Last modified: 7 Mar 2013, 12:18:54 UTC

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs.


Thanks, we're looking at it. Obviously this is pretty serious. I will submit some additional stuff to long that I know for sure are good simulations so that we can get a handle on this.

Edit: So I have submitted to long queue some simulations we know are good. If it is an issue with the app, we will find out. They have name NATHAN_dhfr36_3
ID: 29038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29039 - Posted: 7 Mar 2013, 12:19:38 UTC - in response to Message 29032.  

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs.

I've just reported (in Number Crunching) a failure with a long queue task under Windows 7/64, which didn't freeze the computer or poison the GPU, while short queue tasks under XP/32 are (mostly) running.
ID: 29039 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : New project in long queue

©2025 Universitat Pompeu Fabra