Message boards :
News :
More tasks: MDAD*
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
80-90% of what I'm downloading are all bombing out. Setting NNT until it calms down. all of the errors are kicking me into long backoffs and just wasting time. ![]() |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What's more, I'm starting to get resends of the tasks which couldn't negotiate an SSL server name earlier today, and need manual tweaking to download. Night-time here, and I don't want to stop BOINC to muck about, because they're on machines with mixed GPUs and can't be relied on to restart on the right card. They'll just have to wait it out overnight and I'll sort them out in the morning. |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 960 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
... seems WUs are slightly shaky I've had 3 faulty ones last night, all with "195 (0xc3) EXIT_CHILD_FAILED": http://www.gpugrid.net/result.php?resultid=25128849 http://www.gpugrid.net/result.php?resultid=25125004 http://www.gpugrid.net/result.php?resultid=25082116 |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Confirmed. About 10% of the tasks were created with a missing file, which makes them crash on startup. I'm figuring out the best course of action. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Confirmed. About 10% of the tasks were created with a missing file, which makes them crash on startup. I'm figuring out the best course of action. OK, so long as you know - I'll carry on burning them off as quickly as I can ;-) Your'e going to have a bit of an extra bandwidth bill this month for us downloading the files that were created. |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
I'm cancelling them. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm cancelling them. And it's working. All GPUs are either running productive work, or have viable tasks waiting to run after backup projects have finished. Thank you. |
![]() Send message Joined: 1 May 11 Posts: 9 Credit: 144,358,529 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() |
5/22/2020 4:37:15 AM | GPUGRID | Started download of 1a5cA00_379_0-TONI_MDADex7sa-0-pdb_file |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It often happens here - the project server is very busy, and I think constrained for bandwidth. Wait a couple of minutes and try again. |
Send message Joined: 21 Apr 20 Posts: 13 Credit: 4,411,884 RAC: 0 Level ![]() Scientific publications ![]() |
23 of 23 dont work 1ezgA00_320_4-TONI_MDADex1se-0-50-RND5032_5 20179781 543598 22 May 2020 | 7:57:25 UTC 22 May 2020 | 8:47:14 UTC Error while computing 5.30 0.00 --- New version of ACEMD v2.10 (cuda101) 1ev0A00_450_3-TONI_MDADex1se-0-50-RND6497_4 20179280 543598 22 May 2020 | 7:22:19 UTC 22 May 2020 | 7:24:03 UTC Error while computing 6.17 0.00 --- New version of ACEMD v2.10 (cuda101) 1eu3A02_348_4-TONI_MDADex1se-0-50-RND0288_5 20179102 543598 22 May 2020 | 7:20:15 UTC 22 May 2020 | 7:22:19 UTC Error while computing 6.17 0.02 --- New version of ACEMD v2.10 (cuda101) 1etb200_348_4-TONI_MDADex1se-0-50-RND4587_5 20178954 543598 22 May 2020 | 7:18:06 UTC 22 May 2020 | 7:20:15 UTC Error while computing 6.16 0.00 --- New version of ACEMD v2.10 (cuda101) 1e8gA04_379_2-TONI_MDADex1se-0-50-RND1569_7 20176164 543598 22 May 2020 | 6:56:28 UTC 22 May 2020 | 7:18:06 UTC Error while computing 6.38 0.00 --- New version of ACEMD v2.10 (cuda101) 1e8gA04_450_3-TONI_MDADex1se-0-50-RND0934_6 20176186 543598 22 May 2020 | 6:06:23 UTC 22 May 2020 | 6:08:27 UTC Error while computing 6.25 0.02 --- New version of ACEMD v2.10 (cuda101) 1ba5A00_413_3-TONI_MDADex1sb-0-50-RND8087_6 20163061 543598 22 May 2020 | 6:04:55 UTC 22 May 2020 | 6:06:23 UTC Error while computing 7.15 0.00 --- New version of ACEMD v2.10 (cuda101) 1eb4A02_413_2-TONI_MDADex1se-0-50-RND9618_7 20176742 543598 22 May 2020 | 6:02:37 UTC 22 May 2020 | 6:04:55 UTC Error while computing 9.20 0.02 --- New version of ACEMD v2.10 (cuda101) 1encA00_320_0-TONI_MDADex1se-0-50-RND5173_2 20178305 543598 22 May 2020 | 6:00:23 UTC 22 May 2020 | 6:02:37 UTC Error while computing 6.12 0.00 --- New version of ACEMD v2.10 (cuda101) 1edqA03_348_4-TONI_MDADex1se-0-50-RND8101_5 20177101 543598 22 May 2020 | 5:58:59 UTC 22 May 2020 | 6:00:23 UTC Error while computing 6.07 0.00 --- New version of ACEMD v2.10 (cuda101) 1e8uA00_450_4-TONI_MDADex1se-0-50-RND7534_2 20176331 543598 22 May 2020 | 5:55:17 UTC 22 May 2020 | 5:56:55 UTC Error while computing 7.13 0.00 --- New version of ACEMD v2.10 (cuda101) 1ej6A01_450_0-TONI_MDADex1se-0-50-RND9222_4 20177970 543598 22 May 2020 | 5:53:54 UTC 22 May 2020 | 5:55:17 UTC Error while computing 6.55 0.00 --- New version of ACEMD v2.10 (cuda101) 1e5wA04_413_4-TONI_MDADex1se-0-50-RND9110_7 20175698 543598 22 May 2020 | 5:52:10 UTC 22 May 2020 | 5:53:54 UTC Error while computing 6.57 0.02 --- New version of ACEMD v2.10 (cuda101) 1efpB00_413_3-TONI_MDADex1se-0-50-RND6574_6 20177356 543598 22 May 2020 | 5:50:31 UTC 22 May 2020 | 5:52:10 UTC Error while computing 5.85 0.00 --- New version of ACEMD v2.10 (cuda101) 1e20A00_320_4-TONI_MDADex1se-0-50-RND7445_6 20175171 543598 22 May 2020 | 5:48:44 UTC 22 May 2020 | 5:50:31 UTC Error while computing 6.53 0.02 --- New version of ACEMD v2.10 (cuda101) 1e8uA00_450_0-TONI_MDADex1se-0-50-RND6268_2 20176319 543598 22 May 2020 | 5:46:13 UTC 22 May 2020 | 5:48:44 UTC Error while computing 6.37 0.00 --- New version of ACEMD v2.10 (cuda101) 1e7lA02_450_4-TONI_MDADex1se-0-50-RND8816_6 20175989 543598 22 May 2020 | 5:43:58 UTC 22 May 2020 | 5:46:13 UTC Error while computing 30.73 0.66 --- New version of ACEMD v2.10 (cuda101) 1e6dM01_379_2-TONI_MDADex1se-0-50-RND3537_5 20175771 543598 22 May 2020 | 5:42:03 UTC 22 May 2020 | 5:43:58 UTC Error while computing 6.23 0.00 --- New version of ACEMD v2.10 (cuda101) 1ej5A00_413_0-TONI_MDADex1se-0-50-RND4767_1 20177877 543598 22 May 2020 | 5:40:10 UTC 22 May 2020 | 5:42:03 UTC Error while computing 6.15 0.00 --- New version of ACEMD v2.10 (cuda101) 1e6vA03_348_1-TONI_MDADex1se-0-50-RND5457_5 20175828 543598 22 May 2020 | 5:38:16 UTC 22 May 2020 | 5:40:10 UTC Error while computing 6.56 0.00 --- New version of ACEMD v2.10 (cuda101) |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Read the older posts in this thread. There was a problem, but it's over - they've been cancelled. |
Send message Joined: 21 Apr 20 Posts: 13 Credit: 4,411,884 RAC: 0 Level ![]() Scientific publications ![]() |
canceld ?.. i still get them 1eb6A00_450_3-TONI_MDADex1se-0-50-RND7847_4 20176835 543598 22 May 2020 | 5:20:15 UTC 22 May 2020 | 5:38:16 UTC Error while computing 6.38 0.00 --- New version of ACEMD v2.10 (cuda101) 1eokA00_379_1-TONI_MDADex1se-0-50-RND6186_0 20178430 543598 22 May 2020 | 5:56:55 UTC 22 May 2020 | 5:58:59 UTC Error while computing 6.91 0.00 --- New version of ACEMD v2.10 (cuda101) 1a8oA00_348_3-TONI_MDADpr4sa-9-10-RND7509_0 20009637 543598 11 May 2020 | 3:23:47 UTC 11 May 2020 | 7:04:15 UTC Error while computing 6.10 0.02 --- New version of ACEMD v2.10 (cuda101) not onely ...ex1... even pr4sa... |
Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level ![]() Scientific publications ![]() |
Tasks 0-50 seem to work right now. I had 36 failures. Tullio |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Cancellation is always flaky. Let them wither. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
canceld ?.. i still get them You got them - past tense. You had already returned them before Toni got into the office this morning and started thinking about what to do. |
Send message Joined: 21 Apr 20 Posts: 13 Credit: 4,411,884 RAC: 0 Level ![]() Scientific publications ![]() |
canceld ?.. i still get them Yes the first 20 wu but i reed he have stop the bad but i got at least 3 bad after. I diden know there was some latency when he stop them. But now it looks good and working fine so it's no problems. (I put out the work list onely so he can se if ther was a system error on some work he not have notis aboute. I know u all working hard on it.) No hard minds. Ty for all u suport. |
![]() Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
So far I'm batting .500 on this batch. 54 successes and 54 'exceptional condition' errors. 🤔I'm curious if we are purposely "pushing the envelope" here, Toni. It looks like we're exploring the outer boundaries of the acemd3 program viability from under my rock. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Confirmed. About 10% of the tasks were created with a missing file, which makes them crash on startup. I'm figuring out the best course of action. Just pulling forward what Toni has has already written in this thread. The only envelope we're pushing is that of one very tired researcher, who - like all of us - makes mistakes from time to time. |
![]() Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've only had one task since the latest changes. https://www.gpugrid.net/result.php?resultid=25200010 Its output showed some dump sections, but it appears to have downloaded, run, and uploaded correctly otherwise. Marked as Valid. |
![]() Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Grosso appears to be feeling the strain of so many WUs failing and hosts requesting replacement downloads. Things are pretty slow on my end, only one host at a time getting anything, and that download is intermittent. It looks to me like the shutting down of SETI@home triggered an unexpected hardware bottleneck for many other projects. I wonder if a policy of 2 'spares' per GPU might alleviate this some. |
©2025 Universitat Pompeu Fabra