Large scale experiment: MDAD

Author	Message
Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 53551 - Posted: 29 Jan 2020, 18:31:04 UTC - in response to Message 53550. Actually they were only 500. Better this way - they came out too large. Feel free to abort them. ID: 53551 · Rating: 0 · rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 53552 - Posted: 29 Jan 2020, 18:36:17 UTC - in response to Message 53551. Actually they were only 500. Better this way - they came out too large. Feel free to abort them. Had 6 of them, about 4500s-4900s into them when the server cancelled them..... Now you have me curious as to how long they would have run.... ID: 53552 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 75,187 Level Scientific publications	Message 53553 - Posted: 29 Jan 2020, 18:37:48 UTC Last modified: 29 Jan 2020, 18:43:32 UTC about an hour ago, I had two tasks (on two different hosts) that were "aborted by project" after about 5.900 seconds: http://www.gpugrid.net/result.php?resultid=21644737 http://www.gpugrid.net/result.php?resultid=21644681 what happened? edit: just now, two other ones like those mentioned in Toni's message To come back on topic, there is a batch ("MDADpr1") of ~50k workunits being created. I hope it's correct. were aborted by server, right after start. What's wrong with them? ID: 53553 · Rating: 0 · rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 53554 - Posted: 29 Jan 2020, 18:38:17 UTC - in response to Message 53552. Last modified: 29 Jan 2020, 18:38:57 UTC Ok, I did not know the server would cancel running WUs. Good to know. They would have run around 6h-ish, but I was not sure they wouldn't fail at the end due to large uploads. The next test batch (MDADpr2) is out. ID: 53554 · Rating: 0 · rate: / Reply Quote

BladeD Send message Joined: 1 May 11 Posts: 9 Credit: 144,358,529 RAC: 0 Level Scientific publications	Message 53555 - Posted: 29 Jan 2020, 18:58:51 UTC - in response to Message 53554. Ok, I did not know the server would cancel running WUs. Good to know. They would have run around 6h-ish, but I was not sure they wouldn't fail at the end due to large uploads. The next test batch (MDADpr2) is out. Okay, glad to see that I have the good ones! ID: 53555 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 53558 - Posted: 30 Jan 2020, 1:18:30 UTC - in response to Message 53548. Last modified: 30 Jan 2020, 1:26:02 UTC As long a host turns in valid work and in a timely manner, I don't think any kind of new restriction is needed. The faster hosts get more work done for the project which should keep the scientists happy with the progress of their research. GPUGrid differs from SETI@home in the way the progress of the research actually made by our computers, as for GPUGrid our hosts actually make the data to be analysed by the scientists, while SETI@home use pre-recorded data split into many small chunks to be processed by the hosts. At SETI@home the individual pieces can be processed independently, but at GPUGrid fresh workunits are generated from the result of the previous run. If your host grabs 64 workunits, but actually process only 1, then your host hinder the progress of the other 63 "chain of workunits". The more you grab the more delay you put into the progress of the ongoing MD simulation batches. ID: 53558 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 34,713 Level Scientific publications	Message 53559 - Posted: 30 Jan 2020, 3:57:06 UTC - in response to Message 53554. Last modified: 30 Jan 2020, 4:38:11 UTC Ok, I did not know the server would cancel running WUs. Good to know. They would have run around 6h-ish, but I was not sure they wouldn't fail at the end due to large uploads. The next test batch (MDADpr2) is out. The MDADpr2 batch ain't small in their own right. 188MB upload only at 60% so far after an hour. [Edit] Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time. https://www.gpugrid.net/workunit.php?wuid=16977060 More in line with the previous batch of work. ID: 53559 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 75,187 Level Scientific publications	Message 53560 - Posted: 30 Jan 2020, 5:50:57 UTC - in response to Message 53559. Last modified: 30 Jan 2020, 6:43:19 UTC [Edit] Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time. hm, for the first time that I read someone complaining about too high credit :-) ID: 53560 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 34,713 Level Scientific publications	Message 53561 - Posted: 30 Jan 2020, 8:08:35 UTC - in response to Message 53560. [Edit] Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time. hm, for the first time that I read someone complaining about too high credit :-) My comment was simply an observation. The discussion about credit awarded among projects needs to be in another thread. That has been hashed to death before many times over. Search on CreditScrew or CreditNew. Oh where is Jeff Cobb? ID: 53561 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level Scientific publications	Message 53563 - Posted: 30 Jan 2020, 15:46:51 UTC - in response to Message 53541. It's not just the speed. There's some DDOS prevention algorithm in operation, because my hosts gets blocked if they try to contact the server one by one in rapid succession (from the same public IP address). What can we do to mitigate this effect??? OAS: Many projects are adding a Max # WUs option in Preferences. Maybe add it with the choice of 1 or 2. OAS: Bunkering for serial projects should be banned one way or another. These "races" and "sprints" have some folks requesting as many WUs per host as they can get but they don't get submitted to the work server until after the race start time, i.e. bunkering. I triggered something a few days ago on GPUGrid that I've never seen before on a BOINC project. It was a fluke combination of things that had me upgrade my drivers but delayed a reboot. It wouldn't have bothered anything else but an unbeknownst slug of GPUGrid WUs had appeared. All those WUs had computation errors. Then both computers got banned with a Project Request. I thought it would be a 24-hour timeout I'd seen folks mention before but it persisted for days. After a few days I tried a manual Project Update and it started working again. Can this Project Requested Ban be applied to bunkerers??? PrimeGrid has found a way to reduce bunkering - in the races, count only tasks that were both downloaded and returned during the period scheduled for the race. ID: 53563 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 75,187 Level Scientific publications	Message 53569 - Posted: 30 Jan 2020, 19:26:35 UTC - in response to Message 53559. Last modified: 30 Jan 2020, 20:02:59 UTC Keith Myers wrote: Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time. however, even now there are some unexplainable differences, e.g. between the following two tasks which ran on the same GPU (GTX980Ti) in the same PC: http://www.gpugrid.net/result.php?resultid=21645452 runtime: 39.444 secs - 202.525 credit points http://www.gpugrid.net/result.php?resultid=21645453 runtime: 39.899 secs - 168,771 credit points any idea how come? Edit: only now I realized what happened: the second above cited task missed the 24-hours limit by 1 minute 17 seconds. Hence the difference of credit by 20 % :-( ID: 53569 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 34,713 Level Scientific publications	Message 53570 - Posted: 30 Jan 2020, 21:38:06 UTC - in response to Message 53569. Also Toni explained over in the QC Chemistry forum that tasks run for different lengths of times depending how many atoms are in the model. So for the exact same MDADpr2 campaign, there can be differing credit awards depending on the task and whether it is hard to crunch or easy. Throw on top of that the early return benefit and late return penalty, there can be a lot of variability. ID: 53570 · Rating: 0 · rate: / Reply Quote

davidBAM Send message Joined: 17 Sep 18 Posts: 11 Credit: 1,857,385,729 RAC: 0 Level Scientific publications	Message 53572 - Posted: 31 Jan 2020, 3:51:11 UTC Last modified: 31 Jan 2020, 4:41:43 UTC I crunch competitively on up to 20 nVidia Turing cards and believe that every WU I do is returned within 24 hours. You have already solved the 'bunkering' problem but if you want to improve the supply of WU to us volunteers it is very very simple. Just follow Primegrid's lead and remove GPUgrid from the projects white-listed by GridCoin. Keep it to unpaid volunteers ID: 53572 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 10 Sep 10 Posts: 164 Credit: 388,132 RAC: 0 Level Scientific publications	Message 53573 - Posted: 31 Jan 2020, 9:29:43 UTC - in response to Message 53502. It is a little ironic that a project specially for GPU's supports less GPU's than other projects. Einstein, Milky Way, Seti, etc. no problem. If i'm not wrong the problem is that they have not an "hard" gpu developer. Today is not impossible to convert Cuda code to OpenCl, but it seems that they are not able to do this. ID: 53573 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level Scientific publications	Message 53574 - Posted: 31 Jan 2020, 14:22:39 UTC - in response to Message 53572. I crunch competitively on up to 20 nVidia Turing cards and believe that every WU I do is returned within 24 hours. You have already solved the 'bunkering' problem but if you want to improve the supply of WU to us volunteers it is very very simple. Just follow Primegrid's lead and remove GPUgrid from the projects white-listed by GridCoin. Keep it to unpaid volunteers I've got a better idea, avoid primegrid. ID: 53574 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 53575 - Posted: 31 Jan 2020, 14:58:56 UTC - in response to Message 53573. Today is not impossible to convert Cuda code to OpenCl, but it seems that they are not able to do this. There is no reason to. They have more than enough volunteers with Nvidia cards, and it is simpler to support one set rather than two. In fact, even if you went to OpenCL for both, I think it is harder to support both manufacturers from the problems I have seen. Supporting both is more for political-correctness reasons rather than need. ID: 53575 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 75,187 Level Scientific publications	Message 53576 - Posted: 31 Jan 2020, 15:05:38 UTC - in response to Message 53575. [quote]... They have more than enough volunteers with Nvidia cards and very often they don't have enough work for them. Hence, to bring, in addition, a second group of crunchers on bord would only enlarge the problem of "no tasks available" ... ID: 53576 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level Scientific publications	Message 53577 - Posted: 31 Jan 2020, 16:11:19 UTC - in response to Message 53573. It is a little ironic that a project specially for GPU's supports less GPU's than other projects. Einstein, Milky Way, Seti, etc. no problem. If i'm not wrong the problem is that they have not an "hard" gpu developer. Today is not impossible to convert Cuda code to OpenCl, but it seems that they are not able to do this. I've seen a program called swan that is supposed to be able to do this automatically. No idea if an up-to-date version is available. I'd expect whether GPUGRID actually does this to depend on how fast the resulting OpenCL code runs. If it is much slower than the CUDA code, why would they want to release it? Note - I found a version of swan, with a note saying that it is no longer maintained and is therefore deprecated. If you're good enough in both CUDA and OpenCL, why don't you take over maintenance of this program, and see if you can make it produce an OpenCL version of the GPUGRID code that runs fast enough to be worth releasing? https://github.com/Acellera/swan ID: 53577 · Rating: 0 · rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 53578 - Posted: 31 Jan 2020, 16:22:36 UTC - in response to Message 53577. As was correctly said above, it's not a technical problem, but a matter of putting effort where it is more critical, i.e. the scientific part (experiment preparation and analysis). ID: 53578 · Rating: 0 · rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,802,631,008 RAC: 6,009 Level Scientific publications	Message 53579 - Posted: 31 Jan 2020, 18:03:16 UTC - in response to Message 53572. I crunch competitively on up to 20 nVidia Turing cards and believe that every WU I do is returned within 24 hours. Cheers! Congratulation for your personal success so you are able to buy so many GPUs and maintain them crunching for all the years to come! You have already solved the 'bunkering' problem but if you want to improve the supply of WU to us volunteers it is very very simple. Just follow Primegrid's lead and remove GPUgrid from the projects white-listed by GridCoin. As I understand for the project team it is better to get the results sooner than later, so they are able to analyze and investigate them and issue new WUs if needed, rather than to wait for a few happy crunchers to crunch them for a long time (as might be the case with primegrid – just as you mentioned them), so they have an interest to have the biggest pool of Nvidia GPUs as possible at their disposal! I never read, BOINC guaranties an un-interrupted work supply, so the volunteers will have always work to crunch. Keep it to unpaid volunteers PAID?! Where is this paid “volunteer”? Just as an example, I spend about USD 300.00 on electric bills per month just for BOINC, beside all the hardware I buy for BOINC - I would not buy, if I would not be an addict. I earn about USD 9.00 equivalent of Gridcoins per month, so I would rather see it as a very small subsidy at best, or just another dope to keep me crunching BOINC! ID: 53579 · Rating: 0 · rate: / Reply Quote