New D3RBanditTest workunits

Message boards : News : New D3RBanditTest workunits
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 14 · Next

AuthorMessage
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 56591 - Posted: 17 Feb 2021, 0:33:18 UTC - in response to Message 56587.  

Are there prolonged CPU-heavy periods where GPU util drops nearly to zero? Otherwise, I'll probably have an issue with my card. Just ~2 hrs into a new WU and it stalled. BOINC manager reports steadily increasing processor time since last checkpoints but, GPU util has been at 0% for nearly 30 min. Is that normal?

Weird, I just suspended/unsuspended, it jumped back to the latest checkpoint and immediately the GPU util spiked back to normal levels. I'll see if the same issue comes up again between now and the next checkpoint.

No that's not normal. I haven't seen that behavior myself. Are you using an app_config.xml to say run 2 WUs on the same GPU or max out the CPUs?
This is mine:
<app_config>
<app>
    <name>acemd3</name>
    <gpu_versions>
        <cpu_usage>1.00</cpu_usage>
        <gpu_usage>1.00</gpu_usage>
    </gpu_versions>
</app>
<project_max_concurrent>4</project_max_concurrent>
</app_config>
Might be something else you're running.
ID: 56591 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 56593 - Posted: 17 Feb 2021, 1:11:59 UTC - in response to Message 56583.  

I'm aware of the situation. but if you sell a 2080ti, and re-buy a 2070. you're still left with more money, no? even at higher prices, everything just shifts up because of the market. You're restricting the 2080ti so much that it performs similarly to a 2070S, so why have the 2080ti? that was my only point.
I'm selling off my second string (1070s & 1080s) but keeping my 2080s. Waiting for RTC 3080s with the new design rules but that'll probably be April.
I agree about 240V, and I normally run my systems in a remote location on 240V, but due to renovations, I have one system temporarily at my house. but if you're on 240V, why restrict it so much?
This is funny since you gave me the script in one of the GG threads and I've been grateful since it cut my electric bill by a third. My 100 Amp load center is maxxed out. No wiggle room left.
I use the voltage telemetry (via IPMI) to identify when a PSU might be failing.
Sounds good but Dr Google threw so much stuff at me. Voltage telemetry can be so many different things. The IPMI article seemed to indicate that AMT might be better for me: https://en.wikipedia.org/wiki/Intel_Active_Management_Technology.
Most of my PSUs have been running for about 7 years now and starting to die of old age. Hard failures are nice since they're easy to diagnose. It's the flaky ones with intermittent problems. I swap parts between a good computer and a bad actor and sometimes I get lucky and I can convince myself that a PSU is over the hill. I have one now that after a couple of hours randomly stops communicating while remaining powered up. I need to put a head on it and play swapsies.
Does the voltage telemetry you observe give you an unambiguous indication of the demise of a PSU???
ID: 56593 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56594 - Posted: 17 Feb 2021, 2:29:47 UTC

IPMI stands for Intelligent Platform Management Interface.
https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface
Always found on server motherboards with a BMC. Baseboard Management Controller.
https://searchnetworking.techtarget.com/definition/baseboard-management-controller

You can look at the voltages coming out of the power supply on the system under load and spot issues with a power supply flaking out or on the edge of stability.

Set warnings about voltage levels and such. Very handy. All done remotely.
ID: 56594 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,909,595
RAC: 4,232,576
Level
Trp
Scientific publications
wat
Message 56595 - Posted: 17 Feb 2021, 4:55:54 UTC - in response to Message 56593.  

As Keith said, IPMI is the remote management interface built into the ASRock Rack and Supermicro motherboards that I use. They have voltage monitoring built in for the most part. It just measures the voltages that it sees at the 24-pin and reports it out over the IPMI interface. I access the telemetry via the dedicated webGUI that is provided at the configurable IP address on the LAN.

Tell tale signs of failure are usually sagging voltages. And not always where you expect. I have a PSU that I think is on the way out. It’s a 1200W PSU, only loaded maybe 600W now, but it’s10 years old, and was previously run pretty hard at the limit, in hot temps, for a long time. I’ve noticed random system restarts, and lots of warnings in the IPMI about low 3.3V below the 3.04v threshold. I’ll need to replace this soon.
ID: 56595 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,909,595
RAC: 4,232,576
Level
Trp
Scientific publications
wat
Message 56596 - Posted: 17 Feb 2021, 5:00:07 UTC - in response to Message 56593.  

“RTC 3080s with new design rules” ? Huh?

It’s unfortunate that even existing 30-series cards still don’t work for GPUGRID. The app is holding them back. Right now the only options for 30-series are unoptimized OpenCL projects, FAH, or PrimeGrid (if you think finding big prime numbers is useful?)
ID: 56596 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jeffwy

Send message
Joined: 15 Jan 12
Posts: 5
Credit: 59,000,574
RAC: 0
Level
Thr
Scientific publications
watwat
Message 56599 - Posted: 17 Feb 2021, 9:49:39 UTC

Seems I did not get one yet, and I have a Geforce RTX 2060
ID: 56599 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56600 - Posted: 17 Feb 2021, 12:14:12 UTC - in response to Message 56599.  
Last modified: 17 Feb 2021, 12:14:32 UTC

Seems I did not get one yet, and I have a Geforce RTX 2060
According to the status page of your host, you have two, and you had 7 before.
ID: 56600 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jeffwy

Send message
Joined: 15 Jan 12
Posts: 5
Credit: 59,000,574
RAC: 0
Level
Thr
Scientific publications
watwat
Message 56602 - Posted: 17 Feb 2021, 12:39:02 UTC - in response to Message 56600.  

Ok, maybe I was looking for a different name than what is there, so what are the names of the two new ones then?
ID: 56602 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56603 - Posted: 17 Feb 2021, 13:16:19 UTC
Last modified: 17 Feb 2021, 14:11:02 UTC

GTX 1060 completed in 157,027.38 s
GTX 1650 completed in 175,814.54 s
Both on a Windows 10 computer, the first with a Ryzen 5 1400 CPU.the second with an Intel i5 9400F.
Tullio
Sorry, I had exchanged the computers The GTX 1060 has 3 GB of Video RAM and was excluded from running Einstein@home Gravitational wave tasks which needed more than 3 GB. The GTX 1650 has 4 GB of Video RAM.
ID: 56603 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56605 - Posted: 17 Feb 2021, 15:14:24 UTC - in response to Message 56602.  

Seems I did not get one yet, and I have a Geforce RTX 2060
According to the status page of your host, you have two, and you had 7 before.
Ok, maybe I was looking for a different name than what is there, so what are the names of the two new ones then?
Click on the link in my reply, and you'll see.
ID: 56605 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 56606 - Posted: 17 Feb 2021, 17:49:12 UTC - in response to Message 56596.  

“RTC 3080s with new design rules” ? Huh?

They're switching some or all from Samsung 8 nm design rules to TSMC 7 nm design rules. Also an RTX 3080 Ti with more memory is expected.
https://hexus.net/tech/news/graphics/146170-nvidia-shift-rtx-30-gpus-tsmc-7nm-2021-says-report/
ID: 56606 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,909,595
RAC: 4,232,576
Level
Trp
Scientific publications
wat
Message 56607 - Posted: 17 Feb 2021, 17:56:08 UTC - in response to Message 56606.  
Last modified: 17 Feb 2021, 17:58:43 UTC

“RTC 3080s with new design rules” ? Huh?

They're switching some or all from Samsung 8 nm design rules to TSMC 7 nm design rules. Also an RTX 3080 Ti with more memory is expected.
https://hexus.net/tech/news/graphics/146170-nvidia-shift-rtx-30-gpus-tsmc-7nm-2021-says-report/


oh, it was a typo on the RTX/RTC.

They've had those rumors since pretty much launch (note the date on that article lol). Personally I wouldn't hold my breath. If they release anything, expect more paper launches with a few thousand cards available day one, then basically nothing for months again.

but its all moot for GPUGRID anyway until they get around to making the app compatible with Ampere. there are 5 different ampere models that i've seen attemtped to be used here (A100, 3090, 3080, 3070, 3060ti), and the 3060 is set for launch late February. more models are meaningless for us if we can't use them :(
ID: 56607 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 56608 - Posted: 17 Feb 2021, 22:35:54 UTC - in response to Message 56607.  

its all moot for GPUGRID anyway until they get around to making the app compatible with Ampere.

I bet they don't have an Ampere GPU to do development and testing on.
ID: 56608 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,909,595
RAC: 4,232,576
Level
Trp
Scientific publications
wat
Message 56609 - Posted: 18 Feb 2021, 2:28:06 UTC - in response to Message 56608.  

The thing is. They don’t even need to. They can just download the new CUDA toolkit, edit a few arguments in the config file or make file and re-compile the app as-is. It’s not a whole lot of work. And it’ll work. They basically just need to unlock the new architecture. The app now doesn’t even try to run, it fails at the architecture check.
ID: 56609 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56611 - Posted: 18 Feb 2021, 5:22:10 UTC - in response to Message 56539.  

I have a dual GPU system, GTX 980 and GTX 1080 TI, and all of these work units have failed. Drivers are current as of Decemeber. I had to roll back January update as it didn't play nice with Milkyway@Home while you folks were on Holiday. Suddenly I can't complete a work unit without error.

Running for how many hours a day? The GTX 1080 Ti should be adequate if you run it 24 hours a day, but I'm not sure if the GTX 980 will be.
ID: 56611 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jeffwy

Send message
Joined: 15 Jan 12
Posts: 5
Credit: 59,000,574
RAC: 0
Level
Thr
Scientific publications
watwat
Message 56612 - Posted: 18 Feb 2021, 10:42:44 UTC - in response to Message 56605.  

I'm talking about the new test units that were being talked about by ADMIN, not WUs from ACEMD, and if they have similar names than I wouldn't know. But they are most certainly not taking 18 hours to crunch, they are taking the same amount of time to crunch so I think they are the same WUs that I have been getting for six months, not the new ones listed at the top of this thread.
ID: 56612 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56613 - Posted: 18 Feb 2021, 12:53:58 UTC - in response to Message 56612.  
Last modified: 18 Feb 2021, 12:56:45 UTC

I'm talking about the new test units that were being talked about by ADMIN,
These workunits are the same as the "test" batch.
not WUs from ACEMD,
ACEMD is tha app that process all of the GPUGrid workunits, regardless of their size.
and if they have similar names than I wouldn't know.
They are named like: e22s16_e2s343p0f9-ADRIA_D3RBandit_batch2-0-1-RND4443_1
But they are most certainly not taking 18 hours to crunch,
They take about 72,800~74,000 seconds on your host with an RTX 2060. That is 20h 13m ~ 20h 33m.
they are taking the same amount of time to crunch so I think they are the same WUs that I have been getting for six months, not the new ones listed at the top of this thread.
They are definitely not the same, as those took less than 2 hours on a similar GPU.
ID: 56613 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,909,595
RAC: 4,232,576
Level
Trp
Scientific publications
wat
Message 56614 - Posted: 18 Feb 2021, 13:31:47 UTC - in response to Message 56539.  

I have a dual GPU system, GTX 980 and GTX 1080 TI, and all of these work units have failed. Drivers are current as of Decemeber. I had to roll back January update as it didn't play nice with Milkyway@Home while you folks were on Holiday. Suddenly I can't complete a work unit without error.


they have not all failed. you actually have a few that were submitted fine.

see your tasks for that system here: http://www.gpugrid.net/results.php?hostid=514949

of your errors:
00:31:16 (11412): wrapper: running acemd3.exe (--boinc input --device 0)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

00:28:21 (11224): wrapper: running acemd3.exe (--boinc input --device 0)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

00:28:21 (3488): wrapper: running acemd3.exe (--boinc input --device 1)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

23:35:51 (3068): wrapper: running acemd3.exe (--boinc input --device 0)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

23:39:12 (1400): wrapper: running acemd3.exe (--boinc input --device 1)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!

23:39:12 (4088): wrapper: running acemd3.exe (--boinc input --device 0)
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!



so it's clear what's causing your issue. you're either routinely starting and stopping BOINC computation or you have some task switching with other projects going on due to the long run of these tasks, which sometimes results in the process restarting on a different card. it's fairly well known that this will result in an error for GPUGRID tasks. you should increase the time threshold for task switching to longer than the run of these tasks (24 hrs?), and avoid interrupting the computation. Do not turn off the computer or let it go to sleep. I would probably even avoid the use of the "suspend GPU while computer is in use" option in BOINC. anything to avoid interrupting these very long tasks.

ID: 56614 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 56615 - Posted: 18 Feb 2021, 16:38:45 UTC - in response to Message 56614.  

you should increase the time threshold for task switching to longer than the run of these tasks (24 hrs?), and avoid interrupting the computation.

Try setting Resource=Zero on the other program you wish to time-slice with. Then it should only send you its WU if you have no GG WUs left.
ID: 56615 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,909,595
RAC: 4,232,576
Level
Trp
Scientific publications
wat
Message 56616 - Posted: 18 Feb 2021, 16:46:39 UTC - in response to Message 56615.  

true, I do this.

but some people like to concurrently crunch multiple projects giving some love to them all, not just prime/backup.
ID: 56616 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 14 · Next

Message boards : News : New D3RBanditTest workunits

©2025 Universitat Pompeu Fabra