Message boards :
News :
New D3RBanditTest workunits
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 14 · Next
Author | Message |
---|---|
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
I'm not seeing many resends, mostly _0 and _1 original tasks. There’s no quorum required here. So _1 is a resend. ![]() |
Send message Joined: 11 Feb 16 Posts: 2 Credit: 60,774,031 RAC: 0 Level ![]() Scientific publications ![]() |
WU: New version of ACEMD 2.11 (cuda101) Name e16s23_e1s182p0f136-ADRIA_D3RBandit_batch0-0-1-RND8763 Currently 4days;6hours. Approx. 6hours left (just in time for the deadline, I hope). GPU 750Ti, running continuously, although I have been using the computer for other things too, at times. So, yes, current work units are long. Much longer than "normal". This post is just for reference. ✌ |
Send message Joined: 11 Feb 16 Posts: 2 Credit: 60,774,031 RAC: 0 Level ![]() Scientific publications ![]() |
I agree. Overclocking GPU is much more likely to cause WU failure. ✌ |
Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level ![]() Scientific publications ![]() |
Task 27021821 was canceled by server. Why? It was waiting to run. Tullio |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
Task 27021821 was canceled by server. Why? It was waiting to run. http://www.gpugrid.net/workunit.php?wuid=27021821 because it was no longer needed. the original host returned the task after the deadline, but before you had processed it. since a quorum is not required, they only need one result. allowing you to process something which already has a result is just a waste of time. ![]() |
Send message Joined: 28 Oct 20 Posts: 2 Credit: 153,096,086 RAC: 0 Level ![]() Scientific publications ![]() |
I have a 1660 super that takes about 34-38 hours on these new units, still seeing temps in the 60s, only uses 15% of gpu in task manager tho Interesting. My 1660 Ti is completing these tasks in about 25 hours. I would not have thought that there would be that much of a lead in crunching time. You have a 2600 as well, I'm running a 2200G, so my theory of a slower CPU doesn't seem to hold here. EDIT: Oh hey, this is my first post here. Hi everyone! |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
the 1660ti is still faster than a 1660S. you're both on Windows so your times should be a little more comparable to my 1660S Linux times (27hrs). his first task actually completed in about 29hrs. which seems right. the second task took 40hrs, but I can only speculate the reason. maybe he was doing other things on the system to slow down processing. ![]() |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
Is it overclocking if you're underpowering??? This script runs 2080 Ti WUs in about 15 hours at 36% less power. First run this command: sudo nvidia-xconfig --enable-all-gpus --cool-bits=28 --allow-empty-initial-configurationThen execute this script: #!/bin/bash /usr/bin/nvidia-smi -pm 1 /usr/bin/nvidia-smi -acp UNRESTRICTED /usr/bin/nvidia-smi -i 0 -pl 160 /usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1" #0=Adaptive, 1=Prefer Maximum Performance , 2=Auto /usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[3]=400" -a "[gpu:0]/GPUGraphicsClockOffset[3]=100" |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
160W seems too low for a 2080ti IMO. you're really restricting performance at that point. for reference, my RTX 2070's run in about 17hr @ 150W (not far away from the performance of your 2080ti, but 2070 is much less expensive) my RTX 2080ti's run in about 10hr @ 225W, 50% faster for 40% more power. ![]() |
![]() Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
One example: WU #27023500 has been reported by a GTX 750 Ti at one of my hosts in 446,845.54 seconds, more than 4 hours after deadline. This is an interesting opportunity to study how the server handles tardy but completed WUs. My GTX 750ti was also allowed to run past the deadline and received equal credit to yours. https://www.gpugrid.net/workunit.php?wuid=27023657 Could it be that the server can detect a WU being actively crunched and stop cancellation? Or is this a normal delay function we see? It had already created another task but had not sent it yet, and marked it as "didn't need" when my host reported. Run time was 408,214.47 sec. CPU time was 406,250.30 sec GPU clock was 1366MHz Mem clock was 2833MHz With my homespun fan intake mod it ran at 55C with 22C ambient room temp. Just can't kill it. |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
160W seems too low for a 2080tiMy circuit breakers are perfectly optimized. Not to mention heat management. I'm going to have to sit this race out. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
One example: WU #27023500 has been reported by a GTX 750 Ti at one of my hosts in 446,845.54 seconds, more than 4 hours after deadline. I think it's probably just first come first serve. so whoever returns it first "wins" and the second other WU gets the axe. another user reported that his task was cancelled off of his host (but he hadn't started crunching yet). Unsure what would happen to a task that was axed while crunching had already started. ![]() |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
160W seems too low for a 2080tiMy circuit breakers are perfectly optimized. Not to mention heat management. I'm going to have to sit this race out. I understand. I actually have my 6x 2080ti host split between 2 breakers to avoid overloading a single 20A one (there are other systems on one of the circuits). I'm just saying that if you're going to restrict it that far, you might be better off with a cheaper card to begin with and save some money. ![]() |
![]() Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Thanks Zoltan, I gave up for now and switched that GPU to FAH for now. I have received another WU for the GTX 750ti. The send glitch has been remediated. I'm the 5th user to receive this WU iteration. https://www.gpugrid.net/workunit.php?wuid=27024256 It is a batch_1 (the 2nd batch) and the 0_1 refers to it being number 0 of 1, hence it's the only iteration that will exist of this WU. [edit] It just dawned on me that these WUs might be an experiment in having the same host perform all the generations of the model simulation consecutively instead of on different hosts. Or am I way off? |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I'm not seeing many resends, mostly _0 and _1 original tasks. Uhh, Duh . . forgot where I'm at. |
![]() ![]() Send message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 998,578 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
another user reported that his task was cancelled off of his host (but he hadn't started crunching yet). Unsure what would happen to a task that was axed while crunching had already started. When a resent task is started to process and then reported by the overdue host, this resent task will be let run to the end, and three scenes may produce: - 1) Deadline for the resent task is reached at this second host. Then, even if it is completed afterwards, it won't receive any credits, because there is already a previous valid result for it. It will be labeled by server as "Completed, too late to validate". - 2) What I call a "credit paradox" will happen when this resent task is finished in time for full bonus or mid bonus at the second host. It will receive anyway the standard credit amount without any bonus, to match the same credit amount that has already received the overdue task. - 3) When the resent task is finished past 48 hours but before its deadline, it also will receive the standard (no bonus) credit amount. |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
better off with a cheaper card to begin with and save some money. I guess you haven't been shopping for a GPU lately. Prices are really high. Perfect time to sell cards not buy them. Best thing I did was switch computers over to 240 V 20 A circuits. The most frequent problem I used to have was the A-phase getting out of balance with the B-phase and tripping one leg of the main breaker. With 240 V both phases are exactly balanced and PSUs run 6-8% more efficiently. Now if I could just find a diagnostic tool to tell me when a PSU is on its last leg. E.g, this Inline PSU Tester for $590 is pricey but looks like it's the most comprehensive I've found: https://www.passmark.com/products/inline-psu-tester/index.php If anyone knows of other brands please share. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
better off with a cheaper card to begin with and save some money. I'm aware of the situation. but if you sell a 2080ti, and re-buy a 2070. you're still left with more money, no? even at higher prices, everything just shifts up because of the market. You're restricting the 2080ti so much that it performs similarly to a 2070S, so why have the 2080ti? that was my only point. I agree about 240V, and I normally run my systems in a remote location on 240V, but due to renovations, I have one system temporarily at my house. but if you're on 240V, why restrict it so much? I use the voltage telemetry (via IPMI) to identify when a PSU might be failing. ![]() |
Send message Joined: 22 May 20 Posts: 110 Credit: 115,525,136 RAC: 345 Level ![]() Scientific publications ![]() |
Are there prolonged CPU-heavy periods where GPU util drops nearly to zero? Otherwise, I'll probably have an issue with my card. Just ~2 hrs into a new WU and it stalled. BOINC manager reports steadily increasing processor time since last checkpoints but, GPU util has been at 0% for nearly 30 min. Is that normal? Weird, I just suspended/unsuspended, it jumped back to the latest checkpoint and immeadiately the GPU util spiked back to normal levels. I'll see if the same issue comes up again between now and the next checkpoint. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
Are there prolonged CPU-heavy periods where GPU util drops nearly to zero? Otherwise, I'll probably have an issue with my card. Just ~2 hrs into a new WU and it stalled. BOINC manager reports steadily increasing processor time since last checkpoints but, GPU util has been at 0% for nearly 30 min. Is that normal? not normal from my experience. mine stay pegged at 98% GPU utilization for the entire run. ![]() |
©2025 Universitat Pompeu Fabra