New D3RBanditTest workunits

Message boards : News : New D3RBanditTest workunits
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 14 · Next

AuthorMessage
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56554 - Posted: 16 Feb 2021, 13:21:02 UTC - in response to Message 56540.  

I'm not seeing many resends, mostly _0 and _1 original tasks.

No issues getting work or returning valid results.


There’s no quorum required here. So _1 is a resend.
ID: 56554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
YamFan

Send message
Joined: 11 Feb 16
Posts: 2
Credit: 60,774,031
RAC: 0
Level
Thr
Scientific publications
wat
Message 56557 - Posted: 16 Feb 2021, 14:24:00 UTC

WU:
New version of ACEMD 2.11 (cuda101)
Name
e16s23_e1s182p0f136-ADRIA_D3RBandit_batch0-0-1-RND8763

Currently 4days;6hours. Approx. 6hours left (just in time for the deadline, I hope). GPU 750Ti, running continuously, although I have been using the computer for other things too, at times.

So, yes, current work units are long. Much longer than "normal".
This post is just for reference.
ID: 56557 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
YamFan

Send message
Joined: 11 Feb 16
Posts: 2
Credit: 60,774,031
RAC: 0
Level
Thr
Scientific publications
wat
Message 56558 - Posted: 16 Feb 2021, 14:26:02 UTC - in response to Message 56553.  

I agree. Overclocking GPU is much more likely to cause WU failure.
ID: 56558 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56560 - Posted: 16 Feb 2021, 15:15:04 UTC
Last modified: 16 Feb 2021, 15:15:35 UTC

Task 27021821 was canceled by server. Why? It was waiting to run.
Tullio
ID: 56560 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56561 - Posted: 16 Feb 2021, 15:18:30 UTC - in response to Message 56560.  

Task 27021821 was canceled by server. Why? It was waiting to run.
Tullio


http://www.gpugrid.net/workunit.php?wuid=27021821

because it was no longer needed. the original host returned the task after the deadline, but before you had processed it. since a quorum is not required, they only need one result. allowing you to process something which already has a result is just a waste of time.
ID: 56561 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill

Send message
Joined: 28 Oct 20
Posts: 2
Credit: 153,096,086
RAC: 0
Level
Ile
Scientific publications
wat
Message 56562 - Posted: 16 Feb 2021, 15:31:46 UTC - in response to Message 56535.  
Last modified: 16 Feb 2021, 15:32:12 UTC

I have a 1660 super that takes about 34-38 hours on these new units, still seeing temps in the 60s, only uses 15% of gpu in task manager tho

Interesting. My 1660 Ti is completing these tasks in about 25 hours. I would not have thought that there would be that much of a lead in crunching time.

You have a 2600 as well, I'm running a 2200G, so my theory of a slower CPU doesn't seem to hold here.

EDIT: Oh hey, this is my first post here. Hi everyone!
ID: 56562 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56564 - Posted: 16 Feb 2021, 16:03:37 UTC - in response to Message 56562.  

the 1660ti is still faster than a 1660S. you're both on Windows so your times should be a little more comparable to my 1660S Linux times (27hrs).

his first task actually completed in about 29hrs. which seems right. the second task took 40hrs, but I can only speculate the reason. maybe he was doing other things on the system to slow down processing.
ID: 56564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 56566 - Posted: 16 Feb 2021, 16:11:20 UTC

Is it overclocking if you're underpowering???
This script runs 2080 Ti WUs in about 15 hours at 36% less power. First run this command:
sudo nvidia-xconfig --enable-all-gpus --cool-bits=28 --allow-empty-initial-configuration
Then execute this script:
#!/bin/bash
/usr/bin/nvidia-smi -pm 1
/usr/bin/nvidia-smi -acp UNRESTRICTED
/usr/bin/nvidia-smi -i 0 -pl 160
/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"  #0=Adaptive, 1=Prefer Maximum Performance , 2=Auto
/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[3]=400" -a "[gpu:0]/GPUGraphicsClockOffset[3]=100"
ID: 56566 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56568 - Posted: 16 Feb 2021, 16:26:54 UTC - in response to Message 56566.  

160W seems too low for a 2080ti IMO. you're really restricting performance at that point.

for reference,

my RTX 2070's run in about 17hr @ 150W (not far away from the performance of your 2080ti, but 2070 is much less expensive)

my RTX 2080ti's run in about 10hr @ 225W, 50% faster for 40% more power.


ID: 56568 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56569 - Posted: 16 Feb 2021, 16:37:57 UTC - in response to Message 56545.  

One example: WU #27023500 has been reported by a GTX 750 Ti at one of my hosts in 446,845.54 seconds, more than 4 hours after deadline.
This task has been rewarded with 348,750.00 credits, and hasn't been resent to any other host, with "Didn't need" legend.
May be the project managers are hiddenly attending this way the request of many Gpugrid users in this regard (?)


This is an interesting opportunity to study how the server handles tardy but completed WUs. My GTX 750ti was also allowed to run past the deadline and received equal credit to yours. https://www.gpugrid.net/workunit.php?wuid=27023657

Could it be that the server can detect a WU being actively crunched and stop cancellation? Or is this a normal delay function we see?

It had already created another task but had not sent it yet, and marked it as "didn't need" when my host reported.
Run time was 408,214.47 sec. CPU time was 406,250.30 sec
GPU clock was 1366MHz
Mem clock was 2833MHz
With my homespun fan intake mod it ran at 55C with 22C ambient room temp.

Just can't kill it.
ID: 56569 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 56570 - Posted: 16 Feb 2021, 16:39:16 UTC - in response to Message 56568.  

160W seems too low for a 2080ti
My circuit breakers are perfectly optimized. Not to mention heat management. I'm going to have to sit this race out.
ID: 56570 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56571 - Posted: 16 Feb 2021, 16:51:35 UTC - in response to Message 56569.  

One example: WU #27023500 has been reported by a GTX 750 Ti at one of my hosts in 446,845.54 seconds, more than 4 hours after deadline.
This task has been rewarded with 348,750.00 credits, and hasn't been resent to any other host, with "Didn't need" legend.
May be the project managers are hiddenly attending this way the request of many Gpugrid users in this regard (?)


This is an interesting opportunity to study how the server handles tardy but completed WUs. My GTX 750ti was also allowed to run past the deadline and received equal credit to yours. https://www.gpugrid.net/workunit.php?wuid=27023657

Could it be that the server can detect a WU being actively crunched and stop cancellation? Or is this a normal delay function we see?

It had already created another task but had not sent it yet, and marked it as "didn't need" when my host reported.
Run time was 408,214.47 sec. CPU time was 406,250.30 sec
GPU clock was 1366MHz
Mem clock was 2833MHz
With my homespun fan intake mod it ran at 55C with 22C ambient room temp.

Just can't kill it.


I think it's probably just first come first serve. so whoever returns it first "wins" and the second other WU gets the axe. another user reported that his task was cancelled off of his host (but he hadn't started crunching yet). Unsure what would happen to a task that was axed while crunching had already started.

ID: 56571 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56572 - Posted: 16 Feb 2021, 16:54:16 UTC - in response to Message 56570.  
Last modified: 16 Feb 2021, 17:01:15 UTC

160W seems too low for a 2080ti
My circuit breakers are perfectly optimized. Not to mention heat management. I'm going to have to sit this race out.


I understand. I actually have my 6x 2080ti host split between 2 breakers to avoid overloading a single 20A one (there are other systems on one of the circuits). I'm just saying that if you're going to restrict it that far, you might be better off with a cheaper card to begin with and save some money.
ID: 56572 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56575 - Posted: 16 Feb 2021, 17:26:46 UTC - in response to Message 56534.  
Last modified: 16 Feb 2021, 17:38:52 UTC

Thanks Zoltan, I gave up for now and switched that GPU to FAH for now.
It seems to be doing more FLOPS/hr when running FAHcore CUDA vs ACEMD, but that may be just the difference in scoring procedures.


I have received another WU for the GTX 750ti. The send glitch has been remediated.

I'm the 5th user to receive this WU iteration.
https://www.gpugrid.net/workunit.php?wuid=27024256

It is a batch_1 (the 2nd batch) and the 0_1 refers to it being number 0 of 1, hence it's the only iteration that will exist of this WU.

[edit]
It just dawned on me that these WUs might be an experiment in having the same host perform all the generations of the model simulation consecutively instead of on different hosts. Or am I way off?
ID: 56575 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56576 - Posted: 16 Feb 2021, 17:31:28 UTC - in response to Message 56554.  

I'm not seeing many resends, mostly _0 and _1 original tasks.

No issues getting work or returning valid results.


There’s no quorum required here. So _1 is a resend.

Uhh, Duh . . forgot where I'm at.
ID: 56576 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56579 - Posted: 16 Feb 2021, 17:57:03 UTC - in response to Message 56571.  

another user reported that his task was cancelled off of his host (but he hadn't started crunching yet). Unsure what would happen to a task that was axed while crunching had already started.

When a resent task is started to process and then reported by the overdue host, this resent task will be let run to the end, and three scenes may produce:
- 1) Deadline for the resent task is reached at this second host. Then, even if it is completed afterwards, it won't receive any credits, because there is already a previous valid result for it. It will be labeled by server as "Completed, too late to validate".
- 2) What I call a "credit paradox" will happen when this resent task is finished in time for full bonus or mid bonus at the second host. It will receive anyway the standard credit amount without any bonus, to match the same credit amount that has already received the overdue task.
- 3) When the resent task is finished past 48 hours but before its deadline, it also will receive the standard (no bonus) credit amount.
ID: 56579 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 56581 - Posted: 16 Feb 2021, 18:37:28 UTC - in response to Message 56572.  

better off with a cheaper card to begin with and save some money.

I guess you haven't been shopping for a GPU lately. Prices are really high. Perfect time to sell cards not buy them.

Best thing I did was switch computers over to 240 V 20 A circuits. The most frequent problem I used to have was the A-phase getting out of balance with the B-phase and tripping one leg of the main breaker. With 240 V both phases are exactly balanced and PSUs run 6-8% more efficiently.

Now if I could just find a diagnostic tool to tell me when a PSU is on its last leg. E.g, this Inline PSU Tester for $590 is pricey but looks like it's the most comprehensive I've found: https://www.passmark.com/products/inline-psu-tester/index.php
If anyone knows of other brands please share.
ID: 56581 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56583 - Posted: 16 Feb 2021, 19:14:55 UTC - in response to Message 56581.  
Last modified: 16 Feb 2021, 19:16:11 UTC

better off with a cheaper card to begin with and save some money.

I guess you haven't been shopping for a GPU lately. Prices are really high. Perfect time to sell cards not buy them.

Best thing I did was switch computers over to 240 V 20 A circuits. The most frequent problem I used to have was the A-phase getting out of balance with the B-phase and tripping one leg of the main breaker. With 240 V both phases are exactly balanced and PSUs run 6-8% more efficiently.

Now if I could just find a diagnostic tool to tell me when a PSU is on its last leg. E.g, this Inline PSU Tester for $590 is pricey but looks like it's the most comprehensive I've found: https://www.passmark.com/products/inline-psu-tester/index.php
If anyone knows of other brands please share.


I'm aware of the situation. but if you sell a 2080ti, and re-buy a 2070. you're still left with more money, no? even at higher prices, everything just shifts up because of the market. You're restricting the 2080ti so much that it performs similarly to a 2070S, so why have the 2080ti? that was my only point.

I agree about 240V, and I normally run my systems in a remote location on 240V, but due to renovations, I have one system temporarily at my house. but if you're on 240V, why restrict it so much?

I use the voltage telemetry (via IPMI) to identify when a PSU might be failing.
ID: 56583 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 22 May 20
Posts: 110
Credit: 115,525,136
RAC: 345
Level
Cys
Scientific publications
wat
Message 56587 - Posted: 16 Feb 2021, 23:07:56 UTC
Last modified: 16 Feb 2021, 23:10:51 UTC

Are there prolonged CPU-heavy periods where GPU util drops nearly to zero? Otherwise, I'll probably have an issue with my card. Just ~2 hrs into a new WU and it stalled. BOINC manager reports steadily increasing processor time since last checkpoints but, GPU util has been at 0% for nearly 30 min. Is that normal?

Weird, I just suspended/unsuspended, it jumped back to the latest checkpoint and immeadiately the GPU util spiked back to normal levels. I'll see if the same issue comes up again between now and the next checkpoint.
ID: 56587 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56589 - Posted: 17 Feb 2021, 0:17:31 UTC - in response to Message 56587.  

Are there prolonged CPU-heavy periods where GPU util drops nearly to zero? Otherwise, I'll probably have an issue with my card. Just ~2 hrs into a new WU and it stalled. BOINC manager reports steadily increasing processor time since last checkpoints but, GPU util has been at 0% for nearly 30 min. Is that normal?

Weird, I just suspended/unsuspended, it jumped back to the latest checkpoint and immeadiately the GPU util spiked back to normal levels. I'll see if the same issue comes up again between now and the next checkpoint.

not normal from my experience. mine stay pegged at 98% GPU utilization for the entire run.
ID: 56589 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 14 · Next

Message boards : News : New D3RBanditTest workunits

©2025 Universitat Pompeu Fabra