Advanced search

Message boards : Number crunching : RTX performance on Windows

Author Message
Azmodes
Send message
Joined: 7 Jan 17
Posts: 34
Credit: 1,371,429,518
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 53713 - Posted: 21 Feb 2020 | 11:55:01 UTC
Last modified: 21 Feb 2020 | 12:09:38 UTC

First of all, it's awesome that there's plenty of work again. I am happily crunching again on my favourite GPU project.

Now, I've noticed two things with the new WUs:

a) Interrupted WUs cannot be continued, I get an instant error. I guess this is nothing new, though.

b) My RTX 2080 can't seem to get maxed out, I'm lucky to make it above 80% (whereas a 1660 Ti and a 1060 3GB in the same system don't have that problem, although the other Turing also usually stays below 90%). I compared some tasks and it seems that it is outperformed by 25% by a 2070 in another host of mine running Linux. I have another 1060 3GB on Linux and it performs very closely to the one on Windows. For the record, SWAN_SYNC is enabled and a full thread reserved for each task.

Any way to fix? Any ideas, comments? Anyone else observe this on their rigs?

EDIT: Maybe move this to the GPU subforum, I dunno.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,656,423,724
RAC: 13,411,734
Level
Tyr
Scientific publications
watwatwatwatwat
Message 53716 - Posted: 21 Feb 2020 | 18:27:59 UTC - in response to Message 53713.

I run all Linux hosts and the most powerful card I have is a RTX2080.

But all my currently running tasks show over 99% cpu usage and at least 92% gpu usage as shown in nvidia-smi. I have an app_config setting cpu usage to 1 core to support the gpu tasks.

Swan sync is no longer used or even enabled for both Windows and Linux hosts running the acemd3 app. Not a factor.

The speed of the task in Linux is just, because in general, Linux is faster than Windows because of less overhead in the OS.

Azmodes
Send message
Joined: 7 Jan 17
Posts: 34
Credit: 1,371,429,518
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 53727 - Posted: 21 Feb 2020 | 22:52:40 UTC

I know Linux is generally faster, however I don't remember the difference being that stark and core load was always at least around 90% (admittedly my experience is naturally limited to the old app). Also, as I said, the Pascal card seems more or less the same on Windows.

Right now the RTX is literally running at 70% core load. I tried doing two tasks in tandem, but that didn't contribute anything. Can't we do better? :)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53730 - Posted: 21 Feb 2020 | 23:28:13 UTC - in response to Message 53713.
Last modified: 21 Feb 2020 | 23:30:27 UTC

a) Interrupted WUs cannot be continued, I get an instant error. I guess this is nothing new, though.
It happens if you have different GPUs in the same system, and the task restarts on a different GPU than it was running before.
To avoid this you should suspend the queued GPUGrid tasks first, then the running GPUGrid tasks one by one, and make note which GPU became unused before you suspend the next running task. After restart first you should resume the task which was running on device 0, then the task was running on device 1 and so on, then the unprocessed tasks.

b) My RTX 2080 can't seem to get maxed out, I'm lucky to make it above 80% (whereas a 1660 Ti and a 1060 3GB in the same system don't have that problem, although the other Turing also usually stays below 90%).
It's usually the result of an overcomitted CPU. Depending on the other (CPU+GPU) tasks running it is advised to reduce the number of simultaneous CPU tasks to the number of CPU cores (50% of CPUs in BOINC manager / Computing settings), or even less. High core count CPUs (AMD Threadripper, Intel i9-9900, AMD Ryzen 9 3950x) can use up their memory bandwidth when many CPU tasks are running simultaneously, even with 4 channel memory. This result in increased runtime of the CPU app, and reduced performance of the GPU app.
Reduced PCIe bandwidth can also be the cause of reduced GPUGrid performance. CPUs with 20 PCIe lanes can't provide PCIe 3.0 x16 for all GPUs in a multi-GPU setup.
The other factor is the atom count of the given simulation - we don't have any info on that with the new GPUGrid app - but I think the present batch has low atom count, and it can cause lower GPU utilization on high-end GPUs.
The third factor is the host OS: Linux tends to be faster (however the previous batch was just as fast on Windows). I see 89% GPU (RTX 2080Ti) usage under Windows 10, while 97-98% GPU (RTX 2080Ti) usage under Linux.

For the record, SWAN_SYNC is enabled and a full thread reserved for each task.
SWAN_SYNC is always on for the acemd3, this environmental variable is ignored.

Azmodes
Send message
Joined: 7 Jan 17
Posts: 34
Credit: 1,371,429,518
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 53731 - Posted: 21 Feb 2020 | 23:36:46 UTC - in response to Message 53730.
Last modified: 21 Feb 2020 | 23:49:08 UTC

It happens if you have different GPUs in the same system, and the task restarts on a different GPU than it was running before.
To avoid this you should suspend the queued GPUGrid tasks first, then the running GPUGrid tasks one by one, and make note which GPU became unused before you suspend the next running task. After restart first you should resume the task which was running on device 0, then the task was running on device 1 and so on, then the unprocessed tasks.

ah right, gotcha. I remember it resuming on a different GPU being the problem now.

It's usually the result of an overcomitted CPU. Depending on the other (CPU+GPU) tasks running it is advised to reduce the number of simultaneous CPU tasks to the number of CPU cores (50% of CPUs in BOINC manager / Computing settings), or even less. High core count CPUs (AMD Threadripper, Intel i9-9900, AMD Ryzen 9 3950x) can use up their memory bandwidth when many CPU tasks are running simultaneously, even with 4 channel memory. This result in increased runtime of the CPU app, and reduced performance of the GPU app.

I set cores used to 50% and the RTX load went up by maybe 1-2%. Setting it even lower adds maybe another percent, adding up to a whoopin' 82% for this particular WU. Running Rosetta on CPU right now, which I guess it quite memory intensive, but I was doing more forgiving stuff previously too and GPU usage was the same.

Reduced PCIe bandwidth can also be the cause of reduced GPUGrid performance. CPUs with 20 PCIe lanes can't provide PCIe 3.0 x16 for all GPUs in a multi-GPU setup.

I have PCI-E 2.0, three cards. Bus load is below 10% for the RTX anyway. The 1950X has 64 lanes, apparently?

The other factor is the atom count of the given simulation - we don't have any info on that with the new GPUGrid app - but I think the present batch has low atom count, and it can cause lower GPU utilization on high-end GPUs.
The third factor is the host OS: Linux tends to be faster (however the previous batch was just as fast on Windows). I see 89% GPU (RTX 2080Ti) usage under Windows 10, while 97-98% GPU (RTX 2080Ti) usage under Linux.

I think best I got so far was maybe 85%.

Azmodes
Send message
Joined: 7 Jan 17
Posts: 34
Credit: 1,371,429,518
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 53743 - Posted: 22 Feb 2020 | 16:18:29 UTC

Hm okay then, now I got a WU that's running around 90% on the RTX.

Nick Name
Send message
Joined: 3 Sep 13
Posts: 53
Credit: 1,533,531,731
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 53744 - Posted: 22 Feb 2020 | 20:23:38 UTC

I two have nearly identical systems: same CPU, almost identical motherboard, same GPU. The main hardware difference is RAM. One is Windows 8 and the other Linux, and the Linux box has the lesser quality RAM. GPU load under Linux is ~90% or higher, under Windows it's ~50-60% depending. I can get it marginally higher if I completely stop CPU work, but that's not a reasonable trade-off in my opinion. The Windows app just seems to be much less efficient, whatever the reason. This has been the case since the ACEMD3 release.
____________
Team USA forum | Team USA page
Join us and #crunchforcures. We are now also folding:join team ID 236370!

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,206,655,749
RAC: 261,147
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53749 - Posted: 23 Feb 2020 | 8:39:26 UTC - in response to Message 53744.

GPU load under Linux is ~90% or higher, under Windows it's ~50-60% depending.
This is way too low. How did you get this readout?
In Windows task manager the main display for GPU usage don't show the right value. You should change one of the sub-displays to "CUDA" to get the right one.

Nick Name
Send message
Joined: 3 Sep 13
Posts: 53
Credit: 1,533,531,731
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 53753 - Posted: 23 Feb 2020 | 19:54:26 UTC - in response to Message 53749.

GPU load under Linux is ~90% or higher, under Windows it's ~50-60% depending.
This is way too low. How did you get this readout?
In Windows task manager the main display for GPU usage don't show the right value. You should change one of the sub-displays to "CUDA" to get the right one.

System Information Viewer and Nvidia Inspector. I've verified these values with Nvidia-SMI too. There is quite a bit of variance in these jobs, since my previous post I've seen a job or two using ~80% of the card. But that's not usual. I've tried running two at a time but it didn't really improve run times.
____________
Team USA forum | Team USA page
Join us and #crunchforcures. We are now also folding:join team ID 236370!

Azmodes
Send message
Joined: 7 Jan 17
Posts: 34
Credit: 1,371,429,518
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 53755 - Posted: 23 Feb 2020 | 22:58:36 UTC - in response to Message 53731.

I have PCI-E 2.0

Err, nevermind me, it's 3.0, durr

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,780,262,024
RAC: 21,535,136
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54031 - Posted: 24 Mar 2020 | 21:14:38 UTC - in response to Message 53730.

a) Interrupted WUs cannot be continued, I get an instant error. I guess this is nothing new, though.

It happens if you have different GPUs in the same system, and the task restarts on a different GPU than it was running before.
To avoid this you should suspend the queued GPUGrid tasks first, then the running GPUGrid tasks one by one, and make note which GPU became unused before you suspend the next running task. After restart first you should resume the task which was running on device 0, then the task was running on device 1 and so on, then the unprocessed tasks.

Thank you very much, Retvari Zoltan.
Your ingenious method works fine for me on this triple mixed graphics card system.
I'm taking it in mind every time I reboot the system for any reason.

It is a known problem in wrapper-working ACEMD3 tasks, already announced by Toni in a previous post.
Can I use it on multi-GPU systems?

In general yes, with one caveat: if you have DIFFERENT types of NVIDIA GPUs in the same PC, suspending a job in one and restarting it in the other will NOT be possible (errors on restart). Consider restricting the client to one GPU type only ("exclude_gpu",
see here).

Post to thread

Message boards : Number crunching : RTX performance on Windows

//