Message boards :
News :
New workunits
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next
Author | Message |
---|---|
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Driver updates complete, and 1 of my 2 GTX750ti has already received a task, it's running well. Good News! What I noticed, also on the other hosts (GTX980ti and GTX970), is that the GPU usage (as shown in the NVIDIA Inspector and GPU-Z) now is up to 99% most of the time; this was not the case before, most probably due to the WDDM "brake" in Win7 and Win10 (it was at 99% in WinXP which had no WDDM). The ACEMD3 performance is impressive. Toni did indicate that the performance using the Wrapper will be better (here: http://gpugrid.net/forum_thread.php?id=4935&nowrap=true#51939)...and he is right! Toni (and GPUgrid team) set out with a vision to make the app more portable and faster. They have delivered. Thank you Toni (and GPUgrid team). |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
http://www.gpugrid.net/result.php?resultid=21502590 Crashed and burned after going 2% or more. Memory leaks Updated my drivers and have another task in queue. |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 960 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Toni (and GPUgrid team) set out with a vision to make the app more portable and faster. They have delivered. Thank you Toni (and GPUgrid team). + 1 |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
http://www.gpugrid.net/result.php?resultid=21502590 The memory leaks do appear on startup, probably not critical errors. The issue in your case is ACEMD3 tasks cannot start on one GPU and be resumed on another. From your STDerr Output: ..... 04:26:56 (8564): wrapper: running acemd3.exe (--boinc input --device 0) ..... 06:08:12 (16628): wrapper: running acemd3.exe (--boinc input --device 1) ERROR: src\mdsim\context.cpp line 322: Cannot use a restart file on a different device! It was started on Device 0 but failed when it was resumed on Device 1 Refer this FAQ post by Toni for further clarification: http://www.gpugrid.net/forum_thread.php?id=5002 |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Thanks to all! To summarize some responses of the feedback above: * GPU occupation is high (100% on my Linux machine) * %/day is not an indication of performance because WU size differs between WU types * Minimum required drivers, failures on notebook cards: see FAQ - thanks for those posting the links * Tasks apparently stuck: may be an impression due to the % being rounded (e.g. 8h task divided in 100% fractions = no apparent progress for minutes) * "Memory leaks": ignore the message, it's always there. The actual error, if present, is at the top. |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 960 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Toni, since the new app is an obvious success - now the inevitable question: when will you send out the next batch of tasks? |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Toni "Memory leaks": ignore the message, it's always there. The actual error, if present, is at the top. I am not seeing the error at the top, am I missing it? All I find is the generic Wrapper error message stating there is an Error in the Client task. The task error is buried in the STDerr Output. Can the task error be passed to the Wrapper Error code? |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
@rod4x4 which error? no resume on different cards is known, please see the faq. |
Send message Joined: 16 Jan 17 Posts: 8 Credit: 27,984,427 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
WAITING FOR WU's |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
oh interesting. then I guess I have to write a script to keep all your tasks on the 1050. That's my better GPU anyway. |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
Why is CPU usage so high? I expect GPU to be high, but CPU? One thread running between 85-100+% on CPU |
Send message Joined: 3 Jun 10 Posts: 4 Credit: 2,175,081,911 RAC: 37,696 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
c'est déja fini le test aucune erreur sur mes 1050ti et sur ma 1080ti |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
oh interesting. See faq, you can restrict usable gpus. |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 678,713 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
http://www.gpugrid.net/result.php?resultid=21502590 Solve the issue of stopping processing one type of card and attempting to finish on another type of card by changing your compute preferences of "switch between tasks every xx minutes" to a larger value than the default 60. Change to a value that will allow the task to finish on your slowest card. I suggest 360-640 minutes depending on your hardware. |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
I'm looking for a confirmation that the app works on windows machine with > 1 device. I'm seeing some 7:33:28 (10748): wrapper: running acemd3.exe (--boinc input --device 2) # Engine failed: Illegal value for DeviceIndex: 2 |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 678,713 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Why is CPU usage so high? Because that is the way the gpu application and wrapper requires. The science application is faster and needs a constant supply of data fed to it by the cpu thread because of higher gpu utilization. The tasks finish in 1/3 to 1/2 the time that the old acemd2 app needed. |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 678,713 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Thanks to all! To summarize some responses of the feedback above: Toni, new features are available for CUDA-MEMCHECK in CUDA10.2. The CUDA-MEMCHECK tool seems useful. It can be called against the application with: cuda-memcheck [memcheck_options] app_name [app_options] https://docs.nvidia.com/cuda/cuda-memcheck/index.html#memcheck-tool |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 960 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm looking for a confirmation that the app works on windows machine with > 1 device. I'm seeing some In one of my hosts I have 2 GTX980Ti. However, one of them I have excluded from GPUGRID via cc_config.xml since one of the fans became defective. But with regard to your request, I guess this does not matter. At any rate, the other GPU processes the new app perfectly. |
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
http://www.gpugrid.net/result.php?resultid=21502590 360 is already where it is at since I also run LHC ATLAS and that does not like to be disturbed and usually finishes in 6 hrs. I added a cc_config file to force your project to use just the 1050. I will double check my placement a bit later. |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
The %Progress keeps resetting to zero on 2080 Ti's but seems normal on 1080 Ti's. ![]() |
©2025 Universitat Pompeu Fabra