Message boards :
News :
New D3RBanditTest workunits
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 14 · Next
Author | Message |
---|---|
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
Are there prolonged CPU-heavy periods where GPU util drops nearly to zero? Otherwise, I'll probably have an issue with my card. Just ~2 hrs into a new WU and it stalled. BOINC manager reports steadily increasing processor time since last checkpoints but, GPU util has been at 0% for nearly 30 min. Is that normal? No that's not normal. I haven't seen that behavior myself. Are you using an app_config.xml to say run 2 WUs on the same GPU or max out the CPUs? This is mine: <app_config> <app> <name>acemd3</name> <gpu_versions> <cpu_usage>1.00</cpu_usage> <gpu_usage>1.00</gpu_usage> </gpu_versions> </app> <project_max_concurrent>4</project_max_concurrent> </app_config>Might be something else you're running. |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
I'm aware of the situation. but if you sell a 2080ti, and re-buy a 2070. you're still left with more money, no? even at higher prices, everything just shifts up because of the market. You're restricting the 2080ti so much that it performs similarly to a 2070S, so why have the 2080ti? that was my only point.I'm selling off my second string (1070s & 1080s) but keeping my 2080s. Waiting for RTC 3080s with the new design rules but that'll probably be April. I agree about 240V, and I normally run my systems in a remote location on 240V, but due to renovations, I have one system temporarily at my house. but if you're on 240V, why restrict it so much?This is funny since you gave me the script in one of the GG threads and I've been grateful since it cut my electric bill by a third. My 100 Amp load center is maxxed out. No wiggle room left. I use the voltage telemetry (via IPMI) to identify when a PSU might be failing.Sounds good but Dr Google threw so much stuff at me. Voltage telemetry can be so many different things. The IPMI article seemed to indicate that AMT might be better for me: https://en.wikipedia.org/wiki/Intel_Active_Management_Technology. Most of my PSUs have been running for about 7 years now and starting to die of old age. Hard failures are nice since they're easy to diagnose. It's the flaky ones with intermittent problems. I swap parts between a good computer and a bad actor and sometimes I get lucky and I can convince myself that a PSU is over the hill. I have one now that after a couple of hours randomly stops communicating while remaining powered up. I need to put a head on it and play swapsies. Does the voltage telemetry you observe give you an unambiguous indication of the demise of a PSU??? |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
IPMI stands for Intelligent Platform Management Interface. https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface Always found on server motherboards with a BMC. Baseboard Management Controller. https://searchnetworking.techtarget.com/definition/baseboard-management-controller You can look at the voltages coming out of the power supply on the system under load and spot issues with a power supply flaking out or on the edge of stability. Set warnings about voltage levels and such. Very handy. All done remotely. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,909,595 RAC: 4,232,576 Level ![]() Scientific publications ![]() |
As Keith said, IPMI is the remote management interface built into the ASRock Rack and Supermicro motherboards that I use. They have voltage monitoring built in for the most part. It just measures the voltages that it sees at the 24-pin and reports it out over the IPMI interface. I access the telemetry via the dedicated webGUI that is provided at the configurable IP address on the LAN. Tell tale signs of failure are usually sagging voltages. And not always where you expect. I have a PSU that I think is on the way out. It’s a 1200W PSU, only loaded maybe 600W now, but it’s10 years old, and was previously run pretty hard at the limit, in hot temps, for a long time. I’ve noticed random system restarts, and lots of warnings in the IPMI about low 3.3V below the 3.04v threshold. I’ll need to replace this soon. ![]() |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,909,595 RAC: 4,232,576 Level ![]() Scientific publications ![]() |
“RTC 3080s with new design rules” ? Huh? It’s unfortunate that even existing 30-series cards still don’t work for GPUGRID. The app is holding them back. Right now the only options for 30-series are unoptimized OpenCL projects, FAH, or PrimeGrid (if you think finding big prime numbers is useful?) ![]() |
Send message Joined: 15 Jan 12 Posts: 5 Credit: 59,000,574 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Seems I did not get one yet, and I have a Geforce RTX 2060 |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Seems I did not get one yet, and I have a Geforce RTX 2060According to the status page of your host, you have two, and you had 7 before. |
Send message Joined: 15 Jan 12 Posts: 5 Credit: 59,000,574 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Ok, maybe I was looking for a different name than what is there, so what are the names of the two new ones then? |
Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level ![]() Scientific publications ![]() |
GTX 1060 completed in 157,027.38 s GTX 1650 completed in 175,814.54 s Both on a Windows 10 computer, the first with a Ryzen 5 1400 CPU.the second with an Intel i5 9400F. Tullio Sorry, I had exchanged the computers The GTX 1060 has 3 GB of Video RAM and was excluded from running Einstein@home Gravitational wave tasks which needed more than 3 GB. The GTX 1650 has 4 GB of Video RAM. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Click on the link in my reply, and you'll see.Ok, maybe I was looking for a different name than what is there, so what are the names of the two new ones then?Seems I did not get one yet, and I have a Geforce RTX 2060According to the status page of your host, you have two, and you had 7 before. |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
“RTC 3080s with new design rules” ? Huh? They're switching some or all from Samsung 8 nm design rules to TSMC 7 nm design rules. Also an RTX 3080 Ti with more memory is expected. https://hexus.net/tech/news/graphics/146170-nvidia-shift-rtx-30-gpus-tsmc-7nm-2021-says-report/ |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,909,595 RAC: 4,232,576 Level ![]() Scientific publications ![]() |
“RTC 3080s with new design rules” ? Huh? oh, it was a typo on the RTX/RTC. They've had those rumors since pretty much launch (note the date on that article lol). Personally I wouldn't hold my breath. If they release anything, expect more paper launches with a few thousand cards available day one, then basically nothing for months again. but its all moot for GPUGRID anyway until they get around to making the app compatible with Ampere. there are 5 different ampere models that i've seen attemtped to be used here (A100, 3090, 3080, 3070, 3060ti), and the 3060 is set for launch late February. more models are meaningless for us if we can't use them :( ![]() |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
its all moot for GPUGRID anyway until they get around to making the app compatible with Ampere. I bet they don't have an Ampere GPU to do development and testing on. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,909,595 RAC: 4,232,576 Level ![]() Scientific publications ![]() |
The thing is. They don’t even need to. They can just download the new CUDA toolkit, edit a few arguments in the config file or make file and re-compile the app as-is. It’s not a whole lot of work. And it’ll work. They basically just need to unlock the new architecture. The app now doesn’t even try to run, it fails at the architecture check. ![]() |
![]() Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have a dual GPU system, GTX 980 and GTX 1080 TI, and all of these work units have failed. Drivers are current as of Decemeber. I had to roll back January update as it didn't play nice with Milkyway@Home while you folks were on Holiday. Suddenly I can't complete a work unit without error. Running for how many hours a day? The GTX 1080 Ti should be adequate if you run it 24 hours a day, but I'm not sure if the GTX 980 will be. |
Send message Joined: 15 Jan 12 Posts: 5 Credit: 59,000,574 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
I'm talking about the new test units that were being talked about by ADMIN, not WUs from ACEMD, and if they have similar names than I wouldn't know. But they are most certainly not taking 18 hours to crunch, they are taking the same amount of time to crunch so I think they are the same WUs that I have been getting for six months, not the new ones listed at the top of this thread. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm talking about the new test units that were being talked about by ADMIN,These workunits are the same as the "test" batch. not WUs from ACEMD,ACEMD is tha app that process all of the GPUGrid workunits, regardless of their size. and if they have similar names than I wouldn't know.They are named like: e22s16_e2s343p0f9-ADRIA_D3RBandit_batch2-0-1-RND4443_1 But they are most certainly not taking 18 hours to crunch,They take about 72,800~74,000 seconds on your host with an RTX 2060. That is 20h 13m ~ 20h 33m. they are taking the same amount of time to crunch so I think they are the same WUs that I have been getting for six months, not the new ones listed at the top of this thread.They are definitely not the same, as those took less than 2 hours on a similar GPU. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,909,595 RAC: 4,232,576 Level ![]() Scientific publications ![]() |
I have a dual GPU system, GTX 980 and GTX 1080 TI, and all of these work units have failed. Drivers are current as of Decemeber. I had to roll back January update as it didn't play nice with Milkyway@Home while you folks were on Holiday. Suddenly I can't complete a work unit without error. they have not all failed. you actually have a few that were submitted fine. see your tasks for that system here: http://www.gpugrid.net/results.php?hostid=514949 of your errors: 00:31:16 (11412): wrapper: running acemd3.exe (--boinc input --device 0) 00:28:21 (11224): wrapper: running acemd3.exe (--boinc input --device 0) 00:28:21 (3488): wrapper: running acemd3.exe (--boinc input --device 1) 23:35:51 (3068): wrapper: running acemd3.exe (--boinc input --device 0) 23:39:12 (1400): wrapper: running acemd3.exe (--boinc input --device 1) 23:39:12 (4088): wrapper: running acemd3.exe (--boinc input --device 0) so it's clear what's causing your issue. you're either routinely starting and stopping BOINC computation or you have some task switching with other projects going on due to the long run of these tasks, which sometimes results in the process restarting on a different card. it's fairly well known that this will result in an error for GPUGRID tasks. you should increase the time threshold for task switching to longer than the run of these tasks (24 hrs?), and avoid interrupting the computation. Do not turn off the computer or let it go to sleep. I would probably even avoid the use of the "suspend GPU while computer is in use" option in BOINC. anything to avoid interrupting these very long tasks. ![]() |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
you should increase the time threshold for task switching to longer than the run of these tasks (24 hrs?), and avoid interrupting the computation. Try setting Resource=Zero on the other program you wish to time-slice with. Then it should only send you its WU if you have no GG WUs left. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,909,595 RAC: 4,232,576 Level ![]() Scientific publications ![]() |
true, I do this. but some people like to concurrently crunch multiple projects giving some love to them all, not just prime/backup. ![]() |
©2025 Universitat Pompeu Fabra