More tasks: MDAD*

Message boards : News : More tasks: MDAD*
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 54851 - Posted: 21 May 2020, 22:19:25 UTC

80-90% of what I'm downloading are all bombing out. Setting NNT until it calms down. all of the errors are kicking me into long backoffs and just wasting time.
ID: 54851 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54852 - Posted: 21 May 2020, 22:32:31 UTC

What's more, I'm starting to get resends of the tasks which couldn't negotiate an SSL server name earlier today, and need manual tweaking to download. Night-time here, and I don't want to stop BOINC to muck about, because they're on machines with mixed GPUs and can't be relied on to restart on the right card.

They'll just have to wait it out overnight and I'll sort them out in the morning.
ID: 54852 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 54861 - Posted: 22 May 2020, 4:49:32 UTC - in response to Message 54848.  

... seems WUs are slightly shaky

I've had 3 faulty ones last night, all with "195 (0xc3) EXIT_CHILD_FAILED":

http://www.gpugrid.net/result.php?resultid=25128849
http://www.gpugrid.net/result.php?resultid=25125004
http://www.gpugrid.net/result.php?resultid=25082116
ID: 54861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 54862 - Posted: 22 May 2020, 7:22:25 UTC - in response to Message 54861.  

Confirmed. About 10% of the tasks were created with a missing file, which makes them crash on startup. I'm figuring out the best course of action.
ID: 54862 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54863 - Posted: 22 May 2020, 7:48:40 UTC - in response to Message 54862.  

Confirmed. About 10% of the tasks were created with a missing file, which makes them crash on startup. I'm figuring out the best course of action.

OK, so long as you know - I'll carry on burning them off as quickly as I can ;-)

Your'e going to have a bit of an extra bandwidth bill this month for us downloading the files that were created.
ID: 54863 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 54864 - Posted: 22 May 2020, 8:07:10 UTC - in response to Message 54863.  

I'm cancelling them.
ID: 54864 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54865 - Posted: 22 May 2020, 8:28:57 UTC - in response to Message 54864.  

I'm cancelling them.

And it's working. All GPUs are either running productive work, or have viable tasks waiting to run after backup projects have finished. Thank you.
ID: 54865 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile BladeD

Send message
Joined: 1 May 11
Posts: 9
Credit: 144,358,529
RAC: 0
Level
Cys
Scientific publications
watwatwat
Message 54867 - Posted: 22 May 2020, 9:37:19 UTC

5/22/2020 4:37:15 AM | GPUGRID | Started download of 1a5cA00_379_0-TONI_MDADex7sa-0-pdb_file
5/22/2020 4:37:16 AM | | Project communication failed: attempting access to reference site
5/22/2020 4:37:16 AM | GPUGRID | Temporarily failed download of 1a5cA00_379_0-TONI_MDADex7sa-0-pdb_file: transient HTTP error
5/22/2020 4:37:16 AM | GPUGRID | Backing off 04:46:52 on download of 1a5cA00_379_0-TONI_MDADex7sa-0-pdb_file
5/22/2020 4:37:17 AM | | Internet access OK - project servers may be temporarily down.


ID: 54867 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54868 - Posted: 22 May 2020, 9:48:39 UTC - in response to Message 54867.  

It often happens here - the project server is very busy, and I think constrained for bandwidth. Wait a couple of minutes and try again.
ID: 54868 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zirma

Send message
Joined: 21 Apr 20
Posts: 13
Credit: 4,411,884
RAC: 0
Level
Ala
Scientific publications
wat
Message 54869 - Posted: 22 May 2020, 10:09:31 UTC

23 of 23 dont work

1ezgA00_320_4-TONI_MDADex1se-0-50-RND5032_5 20179781 543598 22 May 2020 | 7:57:25 UTC 22 May 2020 | 8:47:14 UTC Error while computing 5.30 0.00 --- New version of ACEMD v2.10 (cuda101)
1ev0A00_450_3-TONI_MDADex1se-0-50-RND6497_4 20179280 543598 22 May 2020 | 7:22:19 UTC 22 May 2020 | 7:24:03 UTC Error while computing 6.17 0.00 --- New version of ACEMD v2.10 (cuda101)
1eu3A02_348_4-TONI_MDADex1se-0-50-RND0288_5 20179102 543598 22 May 2020 | 7:20:15 UTC 22 May 2020 | 7:22:19 UTC Error while computing 6.17 0.02 --- New version of ACEMD v2.10 (cuda101)
1etb200_348_4-TONI_MDADex1se-0-50-RND4587_5 20178954 543598 22 May 2020 | 7:18:06 UTC 22 May 2020 | 7:20:15 UTC Error while computing 6.16 0.00 --- New version of ACEMD v2.10 (cuda101)
1e8gA04_379_2-TONI_MDADex1se-0-50-RND1569_7 20176164 543598 22 May 2020 | 6:56:28 UTC 22 May 2020 | 7:18:06 UTC Error while computing 6.38 0.00 --- New version of ACEMD v2.10 (cuda101)
1e8gA04_450_3-TONI_MDADex1se-0-50-RND0934_6 20176186 543598 22 May 2020 | 6:06:23 UTC 22 May 2020 | 6:08:27 UTC Error while computing 6.25 0.02 --- New version of ACEMD v2.10 (cuda101)
1ba5A00_413_3-TONI_MDADex1sb-0-50-RND8087_6 20163061 543598 22 May 2020 | 6:04:55 UTC 22 May 2020 | 6:06:23 UTC Error while computing 7.15 0.00 --- New version of ACEMD v2.10 (cuda101)
1eb4A02_413_2-TONI_MDADex1se-0-50-RND9618_7 20176742 543598 22 May 2020 | 6:02:37 UTC 22 May 2020 | 6:04:55 UTC Error while computing 9.20 0.02 --- New version of ACEMD v2.10 (cuda101)
1encA00_320_0-TONI_MDADex1se-0-50-RND5173_2 20178305 543598 22 May 2020 | 6:00:23 UTC 22 May 2020 | 6:02:37 UTC Error while computing 6.12 0.00 --- New version of ACEMD v2.10 (cuda101)
1edqA03_348_4-TONI_MDADex1se-0-50-RND8101_5 20177101 543598 22 May 2020 | 5:58:59 UTC 22 May 2020 | 6:00:23 UTC Error while computing 6.07 0.00 --- New version of ACEMD v2.10 (cuda101)
1e8uA00_450_4-TONI_MDADex1se-0-50-RND7534_2 20176331 543598 22 May 2020 | 5:55:17 UTC 22 May 2020 | 5:56:55 UTC Error while computing 7.13 0.00 --- New version of ACEMD v2.10 (cuda101)
1ej6A01_450_0-TONI_MDADex1se-0-50-RND9222_4 20177970 543598 22 May 2020 | 5:53:54 UTC 22 May 2020 | 5:55:17 UTC Error while computing 6.55 0.00 --- New version of ACEMD v2.10 (cuda101)
1e5wA04_413_4-TONI_MDADex1se-0-50-RND9110_7 20175698 543598 22 May 2020 | 5:52:10 UTC 22 May 2020 | 5:53:54 UTC Error while computing 6.57 0.02 --- New version of ACEMD v2.10 (cuda101)
1efpB00_413_3-TONI_MDADex1se-0-50-RND6574_6 20177356 543598 22 May 2020 | 5:50:31 UTC 22 May 2020 | 5:52:10 UTC Error while computing 5.85 0.00 --- New version of ACEMD v2.10 (cuda101)
1e20A00_320_4-TONI_MDADex1se-0-50-RND7445_6 20175171 543598 22 May 2020 | 5:48:44 UTC 22 May 2020 | 5:50:31 UTC Error while computing 6.53 0.02 --- New version of ACEMD v2.10 (cuda101)
1e8uA00_450_0-TONI_MDADex1se-0-50-RND6268_2 20176319 543598 22 May 2020 | 5:46:13 UTC 22 May 2020 | 5:48:44 UTC Error while computing 6.37 0.00 --- New version of ACEMD v2.10 (cuda101)
1e7lA02_450_4-TONI_MDADex1se-0-50-RND8816_6 20175989 543598 22 May 2020 | 5:43:58 UTC 22 May 2020 | 5:46:13 UTC Error while computing 30.73 0.66 --- New version of ACEMD v2.10 (cuda101)
1e6dM01_379_2-TONI_MDADex1se-0-50-RND3537_5 20175771 543598 22 May 2020 | 5:42:03 UTC 22 May 2020 | 5:43:58 UTC Error while computing 6.23 0.00 --- New version of ACEMD v2.10 (cuda101)
1ej5A00_413_0-TONI_MDADex1se-0-50-RND4767_1 20177877 543598 22 May 2020 | 5:40:10 UTC 22 May 2020 | 5:42:03 UTC Error while computing 6.15 0.00 --- New version of ACEMD v2.10 (cuda101)
1e6vA03_348_1-TONI_MDADex1se-0-50-RND5457_5 20175828 543598 22 May 2020 | 5:38:16 UTC 22 May 2020 | 5:40:10 UTC Error while computing 6.56 0.00 --- New version of ACEMD v2.10 (cuda101)
ID: 54869 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54870 - Posted: 22 May 2020, 10:11:31 UTC - in response to Message 54869.  

Read the older posts in this thread. There was a problem, but it's over - they've been cancelled.
ID: 54870 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zirma

Send message
Joined: 21 Apr 20
Posts: 13
Credit: 4,411,884
RAC: 0
Level
Ala
Scientific publications
wat
Message 54871 - Posted: 22 May 2020, 10:14:20 UTC - in response to Message 54870.  

canceld ?.. i still get them

1eb6A00_450_3-TONI_MDADex1se-0-50-RND7847_4 20176835 543598 22 May 2020 | 5:20:15 UTC 22 May 2020 | 5:38:16 UTC Error while computing 6.38 0.00 --- New version of ACEMD v2.10 (cuda101)
1eokA00_379_1-TONI_MDADex1se-0-50-RND6186_0 20178430 543598 22 May 2020 | 5:56:55 UTC 22 May 2020 | 5:58:59 UTC Error while computing 6.91 0.00 --- New version of ACEMD v2.10 (cuda101)
1a8oA00_348_3-TONI_MDADpr4sa-9-10-RND7509_0 20009637 543598 11 May 2020 | 3:23:47 UTC 11 May 2020 | 7:04:15 UTC Error while computing 6.10 0.02 --- New version of ACEMD v2.10 (cuda101)

not onely ...ex1... even pr4sa...
ID: 54871 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 54872 - Posted: 22 May 2020, 10:39:56 UTC

Tasks 0-50 seem to work right now. I had 36 failures.
Tullio
ID: 54872 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 54873 - Posted: 22 May 2020, 10:42:22 UTC - in response to Message 54871.  

Cancellation is always flaky. Let them wither.
ID: 54873 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54874 - Posted: 22 May 2020, 11:37:24 UTC - in response to Message 54871.  

canceld ?.. i still get them

You got them - past tense. You had already returned them before Toni got into the office this morning and started thinking about what to do.
ID: 54874 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zirma

Send message
Joined: 21 Apr 20
Posts: 13
Credit: 4,411,884
RAC: 0
Level
Ala
Scientific publications
wat
Message 54875 - Posted: 22 May 2020, 11:51:06 UTC - in response to Message 54874.  

canceld ?.. i still get them

You got them - past tense. You had already returned them before Toni got into the office this morning and started thinking about what to do.


Yes the first 20 wu but i reed he have stop the bad but i got at least 3 bad after. I diden know there was some latency when he stop them. But now it looks good and working fine so it's no problems. (I put out the work list onely so he can se if ther was a system error on some work he not have notis aboute. I know u all working hard on it.) No hard minds. Ty for all u suport.
ID: 54875 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54887 - Posted: 22 May 2020, 21:50:34 UTC

So far I'm batting .500 on this batch.
54 successes and 54 'exceptional condition' errors.

🤔I'm curious if we are purposely "pushing the envelope" here, Toni. It looks like we're exploring the outer boundaries of the acemd3 program viability from under my rock.
ID: 54887 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54888 - Posted: 22 May 2020, 22:01:58 UTC - in response to Message 54862.  

Confirmed. About 10% of the tasks were created with a missing file, which makes them crash on startup. I'm figuring out the best course of action.

Just pulling forward what Toni has has already written in this thread. The only envelope we're pushing is that of one very tired researcher, who - like all of us - makes mistakes from time to time.
ID: 54888 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54889 - Posted: 22 May 2020, 22:11:37 UTC
Last modified: 22 May 2020, 22:15:42 UTC

I've only had one task since the latest changes.

https://www.gpugrid.net/result.php?resultid=25200010

Its output showed some dump sections, but it appears to have downloaded, run, and uploaded correctly otherwise. Marked as Valid.
ID: 54889 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54892 - Posted: 22 May 2020, 23:17:27 UTC
Last modified: 22 May 2020, 23:42:26 UTC

Grosso appears to be feeling the strain of so many WUs failing and hosts requesting replacement downloads. Things are pretty slow on my end, only one host at a time getting anything, and that download is intermittent.

It looks to me like the shutting down of SETI@home triggered an unexpected hardware bottleneck for many other projects.

I wonder if a policy of 2 'spares' per GPU might alleviate this some.
ID: 54892 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : More tasks: MDAD*

©2025 Universitat Pompeu Fabra