Large scale experiment: MDAD

Message boards : News : Large scale experiment: MDAD
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

AuthorMessage
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53551 - Posted: 29 Jan 2020, 18:31:04 UTC - in response to Message 53550.  

Actually they were only 500. Better this way - they came out too large. Feel free to abort them.
ID: 53551 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zalster
Avatar

Send message
Joined: 26 Feb 14
Posts: 211
Credit: 4,496,324,562
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 53552 - Posted: 29 Jan 2020, 18:36:17 UTC - in response to Message 53551.  

Actually they were only 500. Better this way - they came out too large. Feel free to abort them.


Had 6 of them, about 4500s-4900s into them when the server cancelled them.....

Now you have me curious as to how long they would have run....
ID: 53552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53553 - Posted: 29 Jan 2020, 18:37:48 UTC
Last modified: 29 Jan 2020, 18:43:32 UTC

about an hour ago, I had two tasks (on two different hosts) that were "aborted by project" after about 5.900 seconds:

http://www.gpugrid.net/result.php?resultid=21644737
http://www.gpugrid.net/result.php?resultid=21644681

what happened?

edit: just now, two other ones like those mentioned in Toni's message
To come back on topic, there is a batch ("MDADpr1") of ~50k workunits being created. I hope it's correct.
were aborted by server, right after start.
What's wrong with them?
ID: 53553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53554 - Posted: 29 Jan 2020, 18:38:17 UTC - in response to Message 53552.  
Last modified: 29 Jan 2020, 18:38:57 UTC

Ok, I did not know the server would cancel running WUs. Good to know. They would have run around 6h-ish, but I was not sure they wouldn't fail at the end due to large uploads.

The next test batch (MDADpr2) is out.
ID: 53554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile BladeD

Send message
Joined: 1 May 11
Posts: 9
Credit: 144,358,529
RAC: 0
Level
Cys
Scientific publications
watwatwat
Message 53555 - Posted: 29 Jan 2020, 18:58:51 UTC - in response to Message 53554.  

Ok, I did not know the server would cancel running WUs. Good to know. They would have run around 6h-ish, but I was not sure they wouldn't fail at the end due to large uploads.

The next test batch (MDADpr2) is out.

Okay, glad to see that I have the good ones!
ID: 53555 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53558 - Posted: 30 Jan 2020, 1:18:30 UTC - in response to Message 53548.  
Last modified: 30 Jan 2020, 1:26:02 UTC

As long a host turns in valid work and in a timely manner, I don't think any kind of new restriction is needed. The faster hosts get more work done for the project which should keep the scientists happy with the progress of their research.
GPUGrid differs from SETI@home in the way the progress of the research actually made by our computers, as for GPUGrid our hosts actually make the data to be analysed by the scientists, while SETI@home use pre-recorded data split into many small chunks to be processed by the hosts. At SETI@home the individual pieces can be processed independently, but at GPUGrid fresh workunits are generated from the result of the previous run. If your host grabs 64 workunits, but actually process only 1, then your host hinder the progress of the other 63 "chain of workunits". The more you grab the more delay you put into the progress of the ongoing MD simulation batches.
ID: 53558 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 53559 - Posted: 30 Jan 2020, 3:57:06 UTC - in response to Message 53554.  
Last modified: 30 Jan 2020, 4:38:11 UTC

Ok, I did not know the server would cancel running WUs. Good to know. They would have run around 6h-ish, but I was not sure they wouldn't fail at the end due to large uploads.

The next test batch (MDADpr2) is out.

The MDADpr2 batch ain't small in their own right. 188MB upload only at 60% so far after an hour.

[Edit] Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time.
https://www.gpugrid.net/workunit.php?wuid=16977060
More in line with the previous batch of work.
ID: 53559 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53560 - Posted: 30 Jan 2020, 5:50:57 UTC - in response to Message 53559.  
Last modified: 30 Jan 2020, 6:43:19 UTC

[Edit] Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time.

hm, for the first time that I read someone complaining about too high credit :-)
ID: 53560 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 53561 - Posted: 30 Jan 2020, 8:08:35 UTC - in response to Message 53560.  

[Edit] Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time.

hm, for the first time that I read someone complaining about too high credit :-)

My comment was simply an observation. The discussion about credit awarded among projects needs to be in another thread.

That has been hashed to death before many times over.

Search on CreditScrew or CreditNew. Oh where is Jeff Cobb?
ID: 53561 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53563 - Posted: 30 Jan 2020, 15:46:51 UTC - in response to Message 53541.  

It's not just the speed. There's some DDOS prevention algorithm in operation, because my hosts gets blocked if they try to contact the server one by one in rapid succession (from the same public IP address).
What can we do to mitigate this effect???

OAS: Many projects are adding a Max # WUs option in Preferences. Maybe add it with the choice of 1 or 2.

OAS: Bunkering for serial projects should be banned one way or another. These "races" and "sprints" have some folks requesting as many WUs per host as they can get but they don't get submitted to the work server until after the race start time, i.e. bunkering.

I triggered something a few days ago on GPUGrid that I've never seen before on a BOINC project. It was a fluke combination of things that had me upgrade my drivers but delayed a reboot. It wouldn't have bothered anything else but an unbeknownst slug of GPUGrid WUs had appeared. All those WUs had computation errors. Then both computers got banned with a Project Request. I thought it would be a 24-hour timeout I'd seen folks mention before but it persisted for days. After a few days I tried a manual Project Update and it started working again. Can this Project Requested Ban be applied to bunkerers???

PrimeGrid has found a way to reduce bunkering - in the races, count only tasks that were both downloaded and returned during the period scheduled for the race.
ID: 53563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53569 - Posted: 30 Jan 2020, 19:26:35 UTC - in response to Message 53559.  
Last modified: 30 Jan 2020, 20:02:59 UTC

Keith Myers wrote:
Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time.

however, even now there are some unexplainable differences, e.g. between the following two tasks which ran on the same GPU (GTX980Ti) in the same PC:

http://www.gpugrid.net/result.php?resultid=21645452
runtime: 39.444 secs - 202.525 credit points

http://www.gpugrid.net/result.php?resultid=21645453
runtime: 39.899 secs - 168,771 credit points

any idea how come?

Edit: only now I realized what happened: the second above cited task missed the 24-hours limit by 1 minute 17 seconds. Hence the difference of credit by 20 % :-(
ID: 53569 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 53570 - Posted: 30 Jan 2020, 21:38:06 UTC - in response to Message 53569.  

Also Toni explained over in the QC Chemistry forum that tasks run for different lengths of times depending how many atoms are in the model.

So for the exact same MDADpr2 campaign, there can be differing credit awards depending on the task and whether it is hard to crunch or easy.

Throw on top of that the early return benefit and late return penalty, there can be a lot of variability.
ID: 53570 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
davidBAM

Send message
Joined: 17 Sep 18
Posts: 11
Credit: 1,857,385,729
RAC: 0
Level
His
Scientific publications
watwatwat
Message 53572 - Posted: 31 Jan 2020, 3:51:11 UTC
Last modified: 31 Jan 2020, 4:41:43 UTC

I crunch competitively on up to 20 nVidia Turing cards and believe that every WU I do is returned within 24 hours.

You have already solved the 'bunkering' problem but if you want to improve the supply of WU to us volunteers it is very very simple. Just follow Primegrid's lead and remove GPUgrid from the projects white-listed by GridCoin. Keep it to unpaid volunteers
ID: 53572 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 10 Sep 10
Posts: 164
Credit: 388,132
RAC: 0
Level

Scientific publications
wat
Message 53573 - Posted: 31 Jan 2020, 9:29:43 UTC - in response to Message 53502.  

It is a little ironic that a project specially for GPU's supports less GPU's than other projects. Einstein, Milky Way, Seti, etc. no problem.

If i'm not wrong the problem is that they have not an "hard" gpu developer.
Today is not impossible to convert Cuda code to OpenCl, but it seems that they are not able to do this.
ID: 53573 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 0
Level
Trp
Scientific publications
watwatwat
Message 53574 - Posted: 31 Jan 2020, 14:22:39 UTC - in response to Message 53572.  

I crunch competitively on up to 20 nVidia Turing cards and believe that every WU I do is returned within 24 hours.

You have already solved the 'bunkering' problem but if you want to improve the supply of WU to us volunteers it is very very simple. Just follow Primegrid's lead and remove GPUgrid from the projects white-listed by GridCoin. Keep it to unpaid volunteers
I've got a better idea, avoid primegrid.
ID: 53574 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53575 - Posted: 31 Jan 2020, 14:58:56 UTC - in response to Message 53573.  

Today is not impossible to convert Cuda code to OpenCl, but it seems that they are not able to do this.

There is no reason to. They have more than enough volunteers with Nvidia cards, and it is simpler to support one set rather than two.

In fact, even if you went to OpenCL for both, I think it is harder to support both manufacturers from the problems I have seen. Supporting both is more for political-correctness reasons rather than need.
ID: 53575 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53576 - Posted: 31 Jan 2020, 15:05:38 UTC - in response to Message 53575.  

[quote]... They have more than enough volunteers with Nvidia cards

and very often they don't have enough work for them. Hence, to bring, in addition, a second group of crunchers on bord would only enlarge the problem of "no tasks available" ...
ID: 53576 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53577 - Posted: 31 Jan 2020, 16:11:19 UTC - in response to Message 53573.  

It is a little ironic that a project specially for GPU's supports less GPU's than other projects. Einstein, Milky Way, Seti, etc. no problem.

If i'm not wrong the problem is that they have not an "hard" gpu developer.
Today is not impossible to convert Cuda code to OpenCl, but it seems that they are not able to do this.

I've seen a program called swan that is supposed to be able to do this automatically. No idea if an up-to-date version is available.

I'd expect whether GPUGRID actually does this to depend on how fast the resulting OpenCL code runs. If it is much slower than the CUDA code, why would they want to release it?

Note - I found a version of swan, with a note saying that it is no longer maintained and is therefore deprecated. If you're good enough in both CUDA and OpenCL, why don't you take over maintenance of this program, and see if you can make it produce an OpenCL version of the GPUGRID code that runs fast enough to be worth releasing?

https://github.com/Acellera/swan
ID: 53577 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53578 - Posted: 31 Jan 2020, 16:22:36 UTC - in response to Message 53577.  

As was correctly said above, it's not a technical problem, but a matter of putting effort where it is more critical, i.e. the scientific part (experiment preparation and analysis).
ID: 53578 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53579 - Posted: 31 Jan 2020, 18:03:16 UTC - in response to Message 53572.  

I crunch competitively on up to 20 nVidia Turing cards and believe that every WU I do is returned within 24 hours.

Cheers! Congratulation for your personal success so you are able to buy so many GPUs and maintain them crunching for all the years to come!
You have already solved the 'bunkering' problem but if you want to improve the supply of WU to us volunteers it is very very simple. Just follow Primegrid's lead and remove GPUgrid from the projects white-listed by GridCoin.

As I understand for the project team it is better to get the results sooner than later, so they are able to analyze and investigate them and issue new WUs if needed, rather than to wait for a few happy crunchers to crunch them for a long time (as might be the case with primegrid – just as you mentioned them), so they have an interest to have the biggest pool of Nvidia GPUs as possible at their disposal!
I never read, BOINC guaranties an un-interrupted work supply, so the volunteers will have always work to crunch.
Keep it to unpaid volunteers

PAID?! Where is this paid “volunteer”? Just as an example, I spend about USD 300.00 on electric bills per month just for BOINC, beside all the hardware I buy for BOINC - I would not buy, if I would not be an addict.
I earn about USD 9.00 equivalent of Gridcoins per month, so I would rather see it as a very small subsidy at best, or just another dope to keep me crunching BOINC!
ID: 53579 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

Message boards : News : Large scale experiment: MDAD

©2025 Universitat Pompeu Fabra