Message boards :
News :
New project in long queue
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So I think the size of the output file directly effects the run time (as usual). They may have to pull the plug on this batch and rework them, we'll have to wait and see what they decide. Far more likely that the tasks which run - by design - for a long time, generate a large output file. After the last NOELIA failure (which triggered a driver restart), I ran a couple of small BOINC tasks from another project. The first one errored, the second ran correctly. After that, I ran a long TONI - successful completion, no computer restart needed. I'm running the 314.07 driver. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My systems hasn't been changed since the application upgrade. I've had no problems with these new NOELIA tasks until now. (I've received a couple of tasks with their name ending with _4 and _5) They do all the strange behavior a workunit can do: - 95-100% GPU usage with no progress indicator increase (even after hours of processing) - the same thing as above, but 0% GPU usage. - Causing the following workunit (a TONI for example) do the same strange behavior (a system restart can fix this) - significant change in the GPU usage (from 75-80% to 95-100%) after a couple of minutes, but no progress. - the progress indicator stays at 0% when I abort a stuck task. |
Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I´m having some new weird issue, but only on my AMD 3x690 rig. For 3 times now, BSOD´s, systems restarts. It only go away if all the worunits (and the cache!!!) where aborted. I don´t have a clue on why this happens, but this AMD rig is rock solid in normal crunching and it´s doing more than 2m per day alone. |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 52,725 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My systems hasn't been changed since the application upgrade. I have had the same issues and on top of that I got error message saying that acemd.2865.exe has crashed, and the video card ends up running at a slower speed. I have had more errors with this application than the last time I did beta testing. |
Send message Joined: 29 Oct 08 Posts: 3 Credit: 493,308,259 RAC: 8,915 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello! I just want to add up my experience with the latest batch: Until late yesterday/ early this morning,my capable pc's run just fine! The two win pc's (ID: 67760 and ID: 145297)started to crash after running for about 4 minutes , when looking at the boinc messages they told me that output files were missing and during the short run before crashing no check pointing was done. I also did take a look at my wingmen, most errors was "The system cannot find the path specified. (0x3) - exit code 3 (0x3). Some times also exit code -1 and -9 occured. To elliminate Windows driver or other Window error, I loaded some wu's into this host (ID: 132991)running Ubuntu oh yes after running for about 5 minutes the crashed telling me ( by Boinc Message tab of course)"exited with zero status but no 'finished' file", did this several times before crashing and then with "Output file absent". When looking at the outcome after upload this is what I got. Stderr output <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> process exited with code 255 (0xff, -1) </message> <stderr_txt> MDIO: cannot open file "restart.coor" SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841. MDIO: cannot open file "restart.coor" SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841. MDIO: cannot open file "restart.coor" SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841. MDIO: cannot open file "restart.coor" MDIO: cannot open file "restart.coor" </stderr_txt> ]]> Hope this can help debugging the batch! Ps: All three pc's now running "TONI" wu's without the need to restart! With regards, Hans Sveen Oslo, Norway |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have had more errors with this application than the last time I did beta testing. I think we need to try and distinguish between 'application' problems and 'task' problems - or 'project' (as in research project), as Noelia called it in starting this thread. |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 52,725 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have had more errors with this application than the last time I did beta testing. So do you think that the fact that we getting these errors after we changed application to version 6.18 is mere coincidence? Maybe you are right. There is a way to prove this: run the failed units under application 6.17. If they fail, it's the units, but if they don't fail, it's the new application. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have had more errors with this application than the last time I did beta testing. In my personal experience, all TONI tasks, and 50% of NOELIA tasks, have run correctly under application version 6.18 |
Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
041px48x2-NOELIA_041p-1-2-RND9263--After 15 hours of when the work on this task ended, nvidia driver crashed and the work has been marked as faulty .. Another was marked correctly--nn016_r2-TONI_AGGd8-38-100-RND3157_0--- but these problems are already more than a week, it's insane..nvidia driver falls for a proper shut down boinc manager,exempl.. Now comming this tasks Ann166_r2-TONI_AGGd8-11-100-RND7649_0 and nn137_r2-TONI_AGGd8-20-100-RND8105_0 and Ann027_r2-TONI_AGGd8-19-100-RND9134_3 But I'm skeptical and I do not think that they will end well. Counting two week without any sense..as many volunteers now |
![]() Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have had more errors with this application than the last time I did beta testing. Richard, this is my experience exactly. All TONIs run fine and 50% of NOELIAs crash. TONI should maybe give a clinic to the others. I don't think it has much to do with 6.18 either, it's just that the new NOELIAS were released at the same time as 6.18. |
![]() ![]() Send message Joined: 4 Sep 11 Posts: 110 Credit: 326,102,587 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
Send message Joined: 6 Aug 11 Posts: 8 Credit: 76,046,994 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
The first Noelia (the angels did say...) took over 48 hours (on a GTX 460 768mb that's completed most work in 25 or less, but it finally...) completed today. The second one I got, which many have apparently had a different problem with, kept restarting on my machine with: SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841. That seems to be an out-of-GPU-memory error. So maybe someone should set stricter minimum memory limits on these Noelia tasks? Edit: Technically, that wasn't my first Noelia; just the first one of this batch. I got at least one, probably more, in February, and they took 25 hours but were otherwise fine. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The first Noelia I see that both WUs are marked errors WU cancelled Something may be happening behind the scenes. |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
These NOELIA WUs have been cancelled. Their successors will have a slightly different configuration that will hopefully be more stable. Note that with this app GPUs of compute capabilities 1.0, 1.1 and 1.2 are no longer supported. This means that only Geforce GTX260s and higher will get Long WUs. MJH |
![]() Send message Joined: 6 Jun 11 Posts: 124 Credit: 2,928,865 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
We're looking at the issue. The problematic WUs have been cancelled for now. The problem was clearly on our end, but it seems that there were multiple reasons they were having issues, and mostly not Noelia's fault. She'll resend new simulations that avoid the problems in the next day or so. The large upload sizes will also be fixed. As always, thanks for making your concerns known and alerting us to the issue. Nate |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Be aware also these and subsequent WUs will fail if you have over-ridden the application version and are not running the latest. MJH |
![]() Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We're looking at the issue. The problematic WUs have been cancelled for now. Were the TONI WUs cancelled too? They ran fine.. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We're looking at the issue. The problematic WUs have been cancelled for now. And the two I have in progress are still fine, and shown as viable on the website. |
![]() Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We're looking at the issue. The problematic WUs have been cancelled for now. Just got a couple new ones. Seems the queue coincidentally ran dry for a while: GPUGRID 03-04-13 13:45 Requesting new tasks for NVIDIA GPUGRID 03-04-13 13:45 Scheduler request completed: got 0 new tasks GPUGRID 03-04-13 13:45 No tasks sent GPUGRID 03-04-13 13:45 No tasks are available for Long runs (8-12 hours on fastest card) |
Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We're looking at the issue. The problematic WUs have been cancelled for now. The problem was clearly on our end, but it seems that there were multiple reasons they were having issues, and mostly not Noelia's fault. She'll resend new simulations that avoid the problems in the next day or so. The large upload sizes will also be fixed. Thank you guys. Another thing that I really appreciate on this project is your awesome and fast support. Wich didn´t happen on the project I ran in the past 13 years.... sadly. |
©2025 Universitat Pompeu Fabra