New workunits

Message boards : News : New workunits
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

AuthorMessage
rod4x4

Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53034 - Posted: 22 Nov 2019, 8:48:13 UTC - in response to Message 53032.  
Last modified: 22 Nov 2019, 8:53:13 UTC

Driver updates complete, and 1 of my 2 GTX750ti has already received a task, it's running well.

Good News!

What I noticed, also on the other hosts (GTX980ti and GTX970), is that the GPU usage (as shown in the NVIDIA Inspector and GPU-Z) now is up to 99% most of the time; this was not the case before, most probably due to the WDDM "brake" in Win7 and Win10 (it was at 99% in WinXP which had no WDDM).
And this is noticable, as the new software seems to have overcome this problem.

The ACEMD3 performance is impressive. Toni did indicate that the performance using the Wrapper will be better (here:
http://gpugrid.net/forum_thread.php?id=4935&nowrap=true#51939)...and he is right!
Toni (and GPUgrid team) set out with a vision to make the app more portable and faster. They have delivered. Thank you Toni (and GPUgrid team).
ID: 53034 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg _BE

Send message
Joined: 30 Jun 14
Posts: 153
Credit: 129,654,684
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53036 - Posted: 22 Nov 2019, 8:52:30 UTC

http://www.gpugrid.net/result.php?resultid=21502590

Crashed and burned after going 2% or more.
Memory leaks

Updated my drivers and have another task in queue.
ID: 53036 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53037 - Posted: 22 Nov 2019, 9:00:41 UTC - in response to Message 53034.  

Toni (and GPUgrid team) set out with a vision to make the app more portable and faster. They have delivered. Thank you Toni (and GPUgrid team).

+ 1
ID: 53037 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rod4x4

Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53038 - Posted: 22 Nov 2019, 9:03:05 UTC - in response to Message 53036.  
Last modified: 22 Nov 2019, 9:06:17 UTC

http://www.gpugrid.net/result.php?resultid=21502590

Crashed and burned after going 2% or more.
Memory leaks

Updated my drivers and have another task in queue.


The memory leaks do appear on startup, probably not critical errors.

The issue in your case is ACEMD3 tasks cannot start on one GPU and be resumed on another.

From your STDerr Output:
.....
04:26:56 (8564): wrapper: running acemd3.exe (--boinc input --device 0)
.....
06:08:12 (16628): wrapper: running acemd3.exe (--boinc input --device 1)
ERROR: src\mdsim\context.cpp line 322: Cannot use a restart file on a different device!


It was started on Device 0
but failed when it was resumed on Device 1

Refer this FAQ post by Toni for further clarification:
http://www.gpugrid.net/forum_thread.php?id=5002
ID: 53038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53039 - Posted: 22 Nov 2019, 9:20:03 UTC - in response to Message 53038.  
Last modified: 22 Nov 2019, 9:21:18 UTC

Thanks to all! To summarize some responses of the feedback above:

* GPU occupation is high (100% on my Linux machine)
* %/day is not an indication of performance because WU size differs between WU types
* Minimum required drivers, failures on notebook cards: see FAQ - thanks for those posting the links
* Tasks apparently stuck: may be an impression due to the % being rounded (e.g. 8h task divided in 100% fractions = no apparent progress for minutes)
* "Memory leaks": ignore the message, it's always there. The actual error, if present, is at the top.
ID: 53039 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53040 - Posted: 22 Nov 2019, 9:30:14 UTC

Toni, since the new app is an obvious success - now the inevitable question: when will you send out the next batch of tasks?
ID: 53040 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rod4x4

Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53041 - Posted: 22 Nov 2019, 9:44:45 UTC - in response to Message 53039.  

Hi Toni

"Memory leaks": ignore the message, it's always there. The actual error, if present, is at the top.

I am not seeing the error at the top, am I missing it? All I find is the generic Wrapper error message stating there is an Error in the Client task.
The task error is buried in the STDerr Output.
Can the task error be passed to the Wrapper Error code?
ID: 53041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53042 - Posted: 22 Nov 2019, 11:04:23 UTC - in response to Message 53041.  

@rod4x4 which error? no resume on different cards is known, please see the faq.
ID: 53042 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 16 Jan 17
Posts: 8
Credit: 27,984,427
RAC: 0
Level
Val
Scientific publications
watwat
Message 53043 - Posted: 22 Nov 2019, 11:42:23 UTC

WAITING FOR WU's
ID: 53043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg _BE

Send message
Joined: 30 Jun 14
Posts: 153
Credit: 129,654,684
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53044 - Posted: 22 Nov 2019, 13:26:53 UTC - in response to Message 53038.  

oh interesting.
then I guess I have to write a script to keep all your tasks on the 1050.
That's my better GPU anyway.
ID: 53044 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg _BE

Send message
Joined: 30 Jun 14
Posts: 153
Credit: 129,654,684
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53045 - Posted: 22 Nov 2019, 13:28:46 UTC - in response to Message 53039.  

Why is CPU usage so high?
I expect GPU to be high, but CPU?
One thread running between 85-100+% on CPU
ID: 53045 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jp de malo

Send message
Joined: 3 Jun 10
Posts: 4
Credit: 2,175,081,911
RAC: 37,696
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53046 - Posted: 22 Nov 2019, 14:05:37 UTC - in response to Message 53043.  
Last modified: 22 Nov 2019, 14:06:39 UTC

c'est déja fini le test aucune erreur sur mes 1050ti et sur ma 1080ti
ID: 53046 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53047 - Posted: 22 Nov 2019, 14:09:04 UTC - in response to Message 53044.  

oh interesting.
then I guess I have to write a script to keep all your tasks on the 1050.
That's my better GPU anyway.


See faq, you can restrict usable gpus.
ID: 53047 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 53048 - Posted: 22 Nov 2019, 14:32:35 UTC - in response to Message 53038.  

http://www.gpugrid.net/result.php?resultid=21502590

Crashed and burned after going 2% or more.
Memory leaks

Updated my drivers and have another task in queue.


The memory leaks do appear on startup, probably not critical errors.

The issue in your case is ACEMD3 tasks cannot start on one GPU and be resumed on another.

From your STDerr Output:
.....
04:26:56 (8564): wrapper: running acemd3.exe (--boinc input --device 0)
.....
06:08:12 (16628): wrapper: running acemd3.exe (--boinc input --device 1)
ERROR: src\mdsim\context.cpp line 322: Cannot use a restart file on a different device!


It was started on Device 0
but failed when it was resumed on Device 1

Refer this FAQ post by Toni for further clarification:
http://www.gpugrid.net/forum_thread.php?id=5002

Solve the issue of stopping processing one type of card and attempting to finish on another type of card by changing your compute preferences of "switch between tasks every xx minutes" to a larger value than the default 60. Change to a value that will allow the task to finish on your slowest card. I suggest 360-640 minutes depending on your hardware.
ID: 53048 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53049 - Posted: 22 Nov 2019, 14:36:02 UTC

I'm looking for a confirmation that the app works on windows machine with > 1 device. I'm seeing some
7:33:28 (10748): wrapper: running acemd3.exe (--boinc input --device 2)
# Engine failed: Illegal value for DeviceIndex: 2
ID: 53049 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 53051 - Posted: 22 Nov 2019, 14:48:22 UTC - in response to Message 53045.  

Why is CPU usage so high?
I expect GPU to be high, but CPU?
One thread running between 85-100+% on CPU

Because that is the way the gpu application and wrapper requires. The science application is faster and needs a constant supply of data fed to it by the cpu thread because of higher gpu utilization. The tasks finish in 1/3 to 1/2 the time that the old acemd2 app needed.
ID: 53051 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 53052 - Posted: 22 Nov 2019, 15:14:44 UTC - in response to Message 53039.  

Thanks to all! To summarize some responses of the feedback above:

* GPU occupation is high (100% on my Linux machine)
* %/day is not an indication of performance because WU size differs between WU types
* Minimum required drivers, failures on notebook cards: see FAQ - thanks for those posting the links
* Tasks apparently stuck: may be an impression due to the % being rounded (e.g. 8h task divided in 100% fractions = no apparent progress for minutes)
* "Memory leaks": ignore the message, it's always there. The actual error, if present, is at the top.

Toni, new features are available for CUDA-MEMCHECK in CUDA10.2. The CUDA-MEMCHECK tool seems useful. It can be called against the application with:
cuda-memcheck [memcheck_options] app_name [app_options]

https://docs.nvidia.com/cuda/cuda-memcheck/index.html#memcheck-tool
ID: 53052 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53053 - Posted: 22 Nov 2019, 15:25:17 UTC - in response to Message 53049.  

I'm looking for a confirmation that the app works on windows machine with > 1 device. I'm seeing some
7:33:28 (10748): wrapper: running acemd3.exe (--boinc input --device 2)
# Engine failed: Illegal value for DeviceIndex: 2


In one of my hosts I have 2 GTX980Ti. However, one of them I have excluded from GPUGRID via cc_config.xml since one of the fans became defective. But with regard to your request, I guess this does not matter.
At any rate, the other GPU processes the new app perfectly.
ID: 53053 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg _BE

Send message
Joined: 30 Jun 14
Posts: 153
Credit: 129,654,684
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 53054 - Posted: 22 Nov 2019, 15:55:46 UTC - in response to Message 53048.  

http://www.gpugrid.net/result.php?resultid=21502590

Crashed and burned after going 2% or more.
Memory leaks

Updated my drivers and have another task in queue.


The memory leaks do appear on startup, probably not critical errors.

The issue in your case is ACEMD3 tasks cannot start on one GPU and be resumed on another.

From your STDerr Output:
.....
04:26:56 (8564): wrapper: running acemd3.exe (--boinc input --device 0)
.....
06:08:12 (16628): wrapper: running acemd3.exe (--boinc input --device 1)
ERROR: src\mdsim\context.cpp line 322: Cannot use a restart file on a different device!


It was started on Device 0
but failed when it was resumed on Device 1

Refer this FAQ post by Toni for further clarification:
http://www.gpugrid.net/forum_thread.php?id=5002

Solve the issue of stopping processing one type of card and attempting to finish on another type of card by changing your compute preferences of "switch between tasks every xx minutes" to a larger value than the default 60. Change to a value that will allow the task to finish on your slowest card. I suggest 360-640 minutes depending on your hardware.


360 is already where it is at since I also run LHC ATLAS and that does not like to be disturbed and usually finishes in 6 hrs.

I added a cc_config file to force your project to use just the 1050. I will double check my placement a bit later.
ID: 53054 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 53055 - Posted: 22 Nov 2019, 18:57:10 UTC

The %Progress keeps resetting to zero on 2080 Ti's but seems normal on 1080 Ti's.
ID: 53055 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

Message boards : News : New workunits

©2025 Universitat Pompeu Fabra