Message boards :
News :
New project in long queue
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 47,738 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We're looking at the issue. The problematic WUs have been cancelled for now. The problem was clearly on our end, but it seems that there were multiple reasons they were having issues, and mostly not Noelia's fault. Were the issues related to the new application, the Wu's or both? |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
How big are the uploads for these reworked NOELIA's supposed to be? The 3 I've finished were barely over 4MB after 11 1/2 hours of "crunching". Is this about right? |
Send message Joined: 21 Dec 11 Posts: 2 Credit: 21,062,866 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
... I got messages like "abort by user" - but I didn't abort any ... I observed incrementing of remaining time in one case ... W 7 Ultimate 64 BOINC 7.0.28 GTX 580 Thanks for reactions FrRie |
Send message Joined: 25 Jun 12 Posts: 3 Credit: 47,912,263 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing. This is waste of our time and as somebody mentioned I prefer to waste my electricity to something useful. Can somebody PM me how to block this noelia packages? |
![]() ![]() Send message Joined: 4 Sep 11 Posts: 110 Credit: 326,102,587 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing. it's not possible to block specific tasks. At least that's what I learned from my own tread. http://www.gpugrid.net/forum_thread.php?id=3315 ![]() Team Belgium |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 47,738 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing. It happened to me too. I had 2 noelia units that are aborted by user, which I didn't abort. They were otherwise running fine. So, what is happening? |
![]() Send message Joined: 6 Jun 11 Posts: 124 Credit: 2,928,865 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I also have the same problem with the tasks killing acemd process. When I checked the thread I got even more pissed off - one week after the "bomb" was thrown, no reaction form noelia, no official response, no retraction of the packages - nothing. http://www.gpugrid.net/forum_thread.php?id=3311&nowrap=true#28972 |
![]() Send message Joined: 6 Jun 11 Posts: 124 Credit: 2,928,865 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Were the issues related to the new application, the Wu's or both? The WUs were not set to upload the smaller file size format we are now trying to move to. They were set to use the old format, which could result in very large file upload sizes, as some people complained about. The problem with the application was an obscure one. It wasn't an issue with the application per se, but rather with how the application interacts with BOINC and this specific type of configuration file for the simulations. In short, the application was doing at the start of every WU a function that it was only supposed to do in the first WU in a chain. This caused all but the first WU in a chain to fail. This isn't a problem locally for us, but with how BOINC handles the files, it became a problem. We are working on a long-term fix, but we have simply found a way around it for now. ... I got messages like "abort by user" - but I didn't abort any ... I am not sure what is happening. Even if we cancel a group of WUs, they should complete on your computer (if they are good simulations). The "Abort by user" can only come from the user/client, typically when you deliberately cancel a WU with the "Abort" button. Hopefully there is nothing else going on... |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
... I got messages like "abort by user" - but I didn't abort any ... Ah. That's one I can help you with. I got an 'aborted by user', too - task 6581613. If you look at the task details, it has "exit status 202". At some stage in the development of recent BOINC clients, David updated and expanded the range of error and exit status codes returned by the client. Unfortunately, he didn't - at first, and until prodded - update the decode tables used on project web sites. You need to update html/inc/result.inc on your web server to something later than http://boinc.berkeley.edu/trac/changeset/1f7ddbfe3a27498e7fd2b4f50f3bf9269b7dae25/boinc/html/inc/result.inc to get a proper website display using case 202: return "EXIT_ABORTED_BY_PROJECT"; Full story in http://boinc.berkeley.edu/dev/forum_thread.php?id=7704 |
Send message Joined: 15 May 11 Posts: 108 Credit: 297,176,099 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I was surprised to see some "Aborted By User" tasks this morning, especially since they happened while I was sleeping! As an example: 290px20xbis-NOELIA_290p-0-2-RND4773_1 Created 5 Mar 2013 | 19:22:08 UTC Sent 5 Mar 2013 | 21:01:03 UTC Received 6 Mar 2013 | 9:58:01 UTC Server state Over Outcome Computation error Client state Aborted by user Exit status 202 (0xca) But after viewing the details of the task itself it said "WU cancelled" in red. http://www.gpugrid.net/workunit.php?wuid=4227683 name 290px20xbis-NOELIA_290p-0-2-RND4773 application Long runs (8-12 hours on fastest card) created 5 Mar 2013 | 16:59:14 UTC minimum quorum 1 initial replication 1 max # of error/total/success tasks 7, 10, 6 errors WU cancelled So they got cut off in mid crunch, and on the surface it makes it look like we aborted them. Operator |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Out of all 4 of my machines, I had 7 "Aborted by user" errors last night. My computers will be on probation by tomorrow and I won’t be able to download work units. |
Send message Joined: 6 Aug 11 Posts: 8 Credit: 76,046,994 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
I haven't had the server abort any Noelias lately. I've just had them all segfault within an hour or two. :( |
Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Still having the same problems with noelias. That will put my biggest machine down, because this one is BSODing and ruining all the cache, with is very hard to build atm. |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It seems to me that these NOELIA's are suffering from memory leaks, when my card finnishes one and starts the next the GPU pegs at 99 - 100% and the memory controller stays at 0%. If I reboot, all is well and works fine. The previous wu won't release the memory on the GPU, thus the reboot. This is Windows XP Pro 64 bit, different operating systems seem to be dealing with it differently, Windows 7 and 8 are getting BSOD's or driver crashes, I also get the "acemd.2865P.exe had to be terminated unexpectedly" error. Oh well, I don't even know if this stuff we post helps or gets read. |
Send message Joined: 4 Nov 08 Posts: 20 Credit: 871,871,594 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Same here..Seems that the new Noelia WU doesn't work well..it freeze the computer. had to reset the project. I'm running Archlinux 3.7.10-1-ARCH kernel with GTX 660Ti and GTX 580 Boinc 7.0.53 |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just noelias incoming. Impossible to run the project atm. Too bad, is a big farm. |
![]() Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just noelias incoming. Impossible to run the project atm. Too bad, is a big farm. I just moved to a different project too. Too bad, I liked helping out here but they don't seem to test anything before release. |
![]() Send message Joined: 6 Jun 11 Posts: 124 Credit: 2,928,865 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs. Thanks, we're looking at it. Obviously this is pretty serious. I will submit some additional stuff to long that I know for sure are good simulations so that we can get a handle on this. Edit: So I have submitted to long queue some simulations we know are good. If it is an issue with the app, we will find out. They have name NATHAN_dhfr36_3 |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs. I've just reported (in Number Crunching) a failure with a long queue task under Windows 7/64, which didn't freeze the computer or poison the GPU, while short queue tasks under XP/32 are (mostly) running. |
©2025 Universitat Pompeu Fabra