Message boards :
News :
New app is out for testing
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
We have finished beta testing and we are now submitting workunits into a new queue for short runs. If all works, we are going to update also the long queue. Only cuda4.2 for the new app of course. Soon we will disable cuda3.1 as the application is way too old. gdf |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've got one of these waiting to run, and I noticed it's up to replication _4 already: http://www.gpugrid.net/workunit.php?wuid=4173049 3 of the previous runs ended with error -9 Anything special you'd like me to watch out for when it runs? |
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Put your safety glasses on and watch for smoke? BOINC <<--- credit whores, pedants, alien hunters |
Send message Joined: 20 Jan 10 Posts: 4 Credit: 2,569,014 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Does this mean that those of us who have only been able to run the cuda 3.1 code are no longer wanted? |
![]() ![]() Send message Joined: 28 Apr 11 Posts: 462 Credit: 958,266,958 RAC: 31,461 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hm im suprised that cuda31 will finally disabled after switching it extra to short units queue. My 285gtx can normally do 6 wus per day :( DSKAG Austria Research Team: http://www.research.dskag.at ![]() |
![]() Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
It will always be possible to run with 280s but on new drivers. Simply the new application cannot be compiled with cuda3.1. gdf |
![]() ![]() Send message Joined: 28 Apr 11 Posts: 462 Credit: 958,266,958 RAC: 31,461 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Possible but for the half performance of now, i dont invest >200w/h on 3 short wus per day ;) buuuut perhaps the new app runs better, so i will see and test some wus when 31 queue is empty. I will report then ;) Ps: is it a typeerror to see now cuda32 on the site? Or is this cuda31 or something other? DSKAG Austria Research Team: http://www.research.dskag.at ![]() |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Put your safety glasses on and watch for smoke? Well, I went to bed and pulled the duvet over my head, which amounts to much the same thing. Results for host 43404 As you can see, the _4 task completed successfully, as did the subsequent _7 - that was the was last opportunity to get any science done, according to the "max # of error/total/success tasks 7, 10, 6" policy. And now I've got another _4. That's a horribly high error rate - are you sure this app was ready for prime time? While we're here, could we have some thoughts about the naming of the various application types, please? It's very misleading to have two separate (but identically-named) filters for short runs - especially when the the second one (appid=18) seems to be described as "CUDA 3.2" on the task selection preference page, but jobs from that queue were allocated as cuda42 to my host. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ps: is it a typeerror to see now cuda32 on the site? Or is this cuda31 or something other?Yes it should be 3.1, but saying as it's being deprecated I wouldn't worry about it now. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just watched a tasks complete and two subsequently fail after 2seconds. trypsin_lig_375_run1-NOELIA_RL3_equ-0-1-RND1921_1 4141973 13 Feb 2013 | 9:40:31 UTC 13 Feb 2013 | 10:58:54 UTC Completed and validated 2,033.93 1,484.83 1,500.00 ACEMD beta version v6.48 (cuda42) trypsin_lig_905_run3-NOELIA_RL3_equ-0-1-RND5342_2 4144209 14 Feb 2013 | 11:03:01 UTC 14 Feb 2013 | 11:03:51 UTC Error while computing 2.07 0.06 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_905_run2-NOELIA_RL3_equ-0-1-RND6964_2 4144208 14 Feb 2013 | 11:03:01 UTC 14 Feb 2013 | 11:03:51 UTC Error while computing 2.11 0.05 --- ACEMD beta version v6.48 (cuda42) Stderr output <core_client_version>7.0.44</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> ERROR: file mdioload.cpp line 207: Error reading parmtop file called boinc_finish </stderr_txt> ]]> Both tasks that failed had already done so 2 times and have not been resent: 6459826 30790 14 Feb 2013 | 8:51:55 UTC 14 Feb 2013 | 9:18:53 UTC Error while computing 3.05 0.14 --- ACEMD beta version v6.48 (cuda42) 6503647 126506 14 Feb 2013 | 10:24:54 UTC 14 Feb 2013 | 10:30:32 UTC Error while computing 2.06 0.08 --- ACEMD beta version v6.48 (cuda42) 6503815 139265 14 Feb 2013 | 11:03:01 UTC 14 Feb 2013 | 11:03:51 UTC Error while computing 2.11 0.05 --- ACEMD beta version v6.48 (cuda42) 6503960 --- --- --- Unsent --- --- --- FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 52,725 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just watched a tasks complete and two subsequently fail after 2seconds. I had a bunch of failures as well: http://www.gpugrid.net/workunit.php?wuid=4144270 http://www.gpugrid.net/workunit.php?wuid=4144240 http://www.gpugrid.net/workunit.php?wuid=4144211 http://www.gpugrid.net/workunit.php?wuid=4144208 http://www.gpugrid.net/workunit.php?wuid=4144196 |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
These Betas are all failing on my systems, so I've had to suspend any more Beta testing for a while (otherwise I'll stop getting tasks): trypsin_lig_941_run4-NOELIA_RL3_equ-0-1-RND4515_3 4144364 139265 14 Feb 2013 | 13:17:12 UTC 14 Feb 2013 | 13:19:09 UTC Error while computing 2.07 0.05 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_940_run3-NOELIA_RL3_equ-0-1-RND0852_1 4144359 139265 14 Feb 2013 | 13:17:12 UTC 14 Feb 2013 | 13:19:09 UTC Error while computing 2.07 0.05 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_941_run3-NOELIA_RL3_equ-0-1-RND2477_2 4144363 139859 14 Feb 2013 | 12:10:32 UTC 14 Feb 2013 | 12:16:50 UTC Error while computing 2.35 0.08 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_911_run2-NOELIA_RL3_equ-0-1-RND2760_2 4144232 139265 14 Feb 2013 | 11:45:48 UTC 14 Feb 2013 | 11:47:38 UTC Error while computing 2.11 0.05 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_929_run2-NOELIA_RL3_equ-0-1-RND8942_1 4144310 139859 14 Feb 2013 | 12:22:28 UTC 14 Feb 2013 | 12:28:57 UTC Error while computing 2.26 0.08 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_933_run4-NOELIA_RL3_equ-0-1-RND6668_1 4144329 139859 14 Feb 2013 | 11:59:09 UTC 14 Feb 2013 | 12:04:48 UTC Error while computing 2.29 0.06 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_912_run3-NOELIA_RL3_equ-0-1-RND2352_2 4144238 139859 14 Feb 2013 | 12:16:50 UTC 14 Feb 2013 | 12:22:28 UTC Error while computing 2.24 0.08 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_900_run3-NOELIA_RL3_equ-0-1-RND4793_2 4144189 139265 14 Feb 2013 | 11:45:48 UTC 14 Feb 2013 | 11:47:38 UTC Error while computing 2.06 0.05 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_916_run3-NOELIA_RL3_equ-0-1-RND4035_2 4144255 139859 14 Feb 2013 | 11:46:58 UTC 14 Feb 2013 | 11:52:44 UTC Error while computing 2.21 0.09 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_900_run2-NOELIA_RL3_equ-0-1-RND3255_2 4144188 139859 14 Feb 2013 | 11:41:13 UTC 14 Feb 2013 | 11:46:58 UTC Error while computing 2.20 0.05 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_905_run3-NOELIA_RL3_equ-0-1-RND5342_2 4144209 139265 14 Feb 2013 | 11:03:01 UTC 14 Feb 2013 | 11:03:51 UTC Error while computing 2.07 0.06 --- ACEMD beta version v6.48 (cuda42) trypsin_lig_905_run2-NOELIA_RL3_equ-0-1-RND6964_2 4144208 139265 14 Feb 2013 | 11:03:01 UTC 14 Feb 2013 | 11:03:51 UTC Error while computing 2.11 0.05 --- ACEMD beta version v6.48 (cuda42) I would suggest that anyone also seeing numerous Errors, stop running the Beta's for a while. Stick to the Long &/or Short tasks and after you complete a few try the odd Beta again. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Tried a few as confirmation, with the same result - 12 errors in a row. Beta tasks for host 132158 But it must be a data error - you can see the host has over 100 valid tasks, all done last weekend after the call went out to clear the queue so that proper application testing could resume. At least these tasks weren't of the crashing/BSODing kind. |
![]() ![]() Send message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thought I'd dip a toe back into the Beta testing pool, but I'm getting 'No beta tasks available'. Is it windows only? |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Hi, a subset of the betas had indeed a problem that makes them fail immediately. We devised a way to selectively remove single unsent tasks and cancelled them, so many should have disappeared from the queue; those already downloaded will disappear gradually. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
By 'disappear gradually' I presume you mean they will fail, get resent, fail, get resent, fail and then be cancelled. But for the stubborn scheduler, the 2sec runtime wouldn't be such an issue. Anyway, I've been running a few again and they are not failing. However the other issues persist. Of note is the dependence on high CPU Kernel time. At 85% CPU usage I was seeing 10% GPU usage, and on another system with only 50% CPU usage (but high Kernel usage) I only saw 2% GPU utilization. Another app was hogging the Kernel and memory, and GPU Utilization went up to 50% when I suspended it. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
trypsin_lig_901_run1-NOELIA_RL3_equ-0-1-RND1273_7 errors Too many errors (may have bug) All the same 2" errors, http://www.gpugrid.net/workunit.php?wuid=4144191 FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
![]() ![]() Send message Joined: 28 Apr 11 Posts: 462 Credit: 958,266,958 RAC: 31,461 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Since today there is only 1 user left who connected the last 24hours to the short cuda31 queue (Serverstats). Im proud to tell, im this lonely guy ;) So i need at least 3 more (@24h crunching) days to clear this queue up (~4 hours per WU). Only as little estimate when the adminstaff can deactived it, and the problems with the queue selection on some computers should go away then ;) DSKAG Austria Research Team: http://www.research.dskag.at ![]() |
![]() ![]() Send message Joined: 28 Apr 11 Posts: 462 Credit: 958,266,958 RAC: 31,461 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
hmm ok gpugrid dont sends me anymore tasks from cuda31 queue. strange. who should compute them now? O.o DSKAG Austria Research Team: http://www.research.dskag.at ![]() |
©2025 Universitat Pompeu Fabra