Windows GPU Applications broken

Message boards : News : Windows GPU Applications broken
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 50126 - Posted: 28 Jul 2018, 13:15:15 UTC - in response to Message 50125.  
Last modified: 28 Jul 2018, 13:16:14 UTC

Unfortunately rsc ops values can't be changed once the task is created. I'm waiting that the newly created tasks make the flops estimate return to normal, and then the old tasks should work as well.
ID: 50126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile bcavnaugh

Send message
Joined: 8 Nov 13
Posts: 56
Credit: 1,002,640,163
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 50128 - Posted: 28 Jul 2018, 15:46:44 UTC
Last modified: 28 Jul 2018, 15:47:21 UTC

Thanks Retvari Zoltan for your fix as for the most part worked for me.
Could the cause of this (197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED) been the new version of the Client Software 7.12.1?
I noted that after the update the Client Software ran benchmarks off the bat and I do not recall older versions doing this.

Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community.
ID: 50128 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jjch

Send message
Joined: 10 Nov 13
Posts: 101
Credit: 15,773,211,122
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50129 - Posted: 28 Jul 2018, 16:48:26 UTC

I have noted that work units have been coming through beginning on the 27th however they all seem to be failing. Refer to a sample system here: https://www.gpugrid.net/results.php?hostid=176801

It was not clear to me if all the Windows GPU systems need intervention or if it will sort it out eventually. These are all at a remote location and I don't have remote access anymore. It will be a few days until I can get to them.
ID: 50129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50130 - Posted: 28 Jul 2018, 17:20:58 UTC - in response to Message 50128.  

Could the cause of this (197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED) been the new version of the Client Software 7.12.1?

That's a firm NO.

I am currently closely involved in the preparation, testing, and releasing of new client versions. The new client was released well before this problem arose, and (in this respect) the new client works exactly the same as previous ones, going back several releases.

We've now got a pretty clear handle on the release of GPUGrid application 9.22 as the culprit, though I will still test my own machines as I start each of them back up (which will happen after the next transfusion of coffee - only just got back home).
ID: 50130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50131 - Posted: 28 Jul 2018, 18:35:40 UTC

OK, I've started the first.

Got (project) flops of 461,290,595,930 - 461 gigaflops. That's still too high (this machine had 243 GF for the previous version), but it's in the right ballpark and I'll let it run.
ID: 50131 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50136 - Posted: 29 Jul 2018, 9:56:37 UTC

All my machines have now completed tasks without error and without manual intervention. I think we're out of the woods.

Host 477287 is useful. I left that running throughout: you can see that the 'time before error' slowly increased from 17Ksec to 23Ksec as the speed estimate normalised. The task which completed successfully would have crashed after about 90Ksec, but that was more than enough.

I have a slight concern about the short queue task it's working on now, which is running at a very erratic speed. But that could be the app, the task, or the machine. I'll keep an eye on it.
ID: 50136 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile bcavnaugh

Send message
Joined: 8 Nov 13
Posts: 56
Credit: 1,002,640,163
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 50209 - Posted: 5 Aug 2018, 3:10:21 UTC

Seems that we need to remove the settings now or Reset the Project.
Tasks are saying 9 Days to complete even though they take less than 3 hours.
This is no longer needed;
<workunit>
<name>e17s86_e4s46p0f53-PABLO_2IDP_P01106_4_LEUP23P_IDP-0-1-RND2735</name>
<app_name>acemdlong</app_name>
<version_num>922</version_num>
<rsc_fpops_est>5000000000000000.000000</rsc_fpops_est>
<rsc_fpops_bound>250000000000000000.000000</rsc_fpops_bound>
<rsc_memory_bound>300000000.000000</rsc_memory_bound>
<rsc_disk_bound>4000000000.000000</rsc_disk_bound>
<file_ref>

Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community.
ID: 50209 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 28 Oct 10
Posts: 9
Credit: 25,781,299
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 50211 - Posted: 5 Aug 2018, 8:42:45 UTC

All my GPU WUs are failing : -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS

No more details in log :( :
Stderr output

<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
too many exit(0)s</message>
]]>


Any ideas ?
ID: 50211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 28 Oct 10
Posts: 9
Credit: 25,781,299
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 50214 - Posted: 6 Aug 2018, 12:36:23 UTC

If I try to run directly from a slot, I get this :
D:\BOINC\data\slots\9>acemd-922-80.exe
# ACEMD Molecular Dynamics Version [3212]
# CUDA Synchronisation mode: BLOCKING
# CUDA Synchronisation mode: BLOCKING
# SWAN: Cannot create context 0 on GPU 0 : [999]
# Could not create GPU contexts.
# SWAN swan_assert 0


Any ideas ?
ID: 50214 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50215 - Posted: 6 Aug 2018, 12:59:19 UTC

I am glad not to be the only one. This is what I was getting.
Stderr output

<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -80 (0xffffffb0)</message>
<stderr_txt>
# GPU [GeForce GTX 1050 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 1050 Ti
# ECC : Disabled
# Global mem : 4096MB
# Capability : 6.1
# PCI ID : 0000:01:00.0
# Device clock : 1392MHz
# Memory clock : 3504MHz
# Memory width : 128bit
# Driver version : r397_05 : 39764
# GPU 0 : 64C
# GPU 0 : 67C
# GPU 0 : 68C
# GPU 0 : 70C
# GPU 0 : 72C
# GPU 0 : 73C
# GPU 0 : 74C
# GPU 0 : 75C
# GPU 0 : 76C
# GPU 0 : 77C
# GPU 0 : 78C
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 1755000)
called boinc_finish

</stderr_txt>
]]>
ID: 50215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50216 - Posted: 6 Aug 2018, 15:48:07 UTC - in response to Message 50215.  

# GPU 0 : 78C
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 1755000)

called boinc_finish
This message is usually the sign of too high GPU clocks and / or too high GPU temperature (Yes, 78°C could be high).
You should use some 3rd party GPU monitoring software (like MSI Afterburner) to:
1. increase the GPU fan speed,
2. reduce the power target of your GPU
3. reduce GPU clock frequency.
This error message has nothing to do with the new Windows app.
ID: 50216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50217 - Posted: 6 Aug 2018, 15:51:26 UTC - in response to Message 50211.  
Last modified: 6 Aug 2018, 15:51:45 UTC

All my GPU WUs are failing : -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS

No more details in log :( :
Stderr output

<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
too many exit(0)s</message>
]]>


Any ideas ?

Have you installed BOINC manager in "protected application execution" mode? (as a system service?)
If you did so, you should uninstall it, and reinstall without this setting.
ID: 50217 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50218 - Posted: 6 Aug 2018, 16:09:24 UTC - in response to Message 50216.  

.
You should use some 3rd party GPU monitoring software (like MSI Afterburner) to:
1. increase the GPU fan speed,
2. reduce the power target of your GPU
3. reduce GPU clock frequency.
This error message has nothing to do with the new Windows app.

The same app on my SUSE Linux box with a GTX 750 Ti board runs at 62 C.
Tullio
ID: 50218 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 28 Oct 10
Posts: 9
Credit: 25,781,299
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 50219 - Posted: 7 Aug 2018, 9:11:52 UTC - in response to Message 50217.  

All my GPU WUs are failing : -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS

No more details in log :( :
Stderr output

<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
too many exit(0)s</message>
]]>


Any ideas ?

Have you installed BOINC manager in "protected application execution" mode? (as a system service?)
If you did so, you should uninstall it, and reinstall without this setting.

No ... other projects are working fine.

See my post after the one you quoted with the real error ...
ID: 50219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50221 - Posted: 7 Aug 2018, 13:58:12 UTC - in response to Message 50216.  


1. increase the GPU fan speed,
2. reduce the power target of your GPU
3. reduce GPU clock frequency.
This error message has nothing to do with the new Windows app.

The same GPU board runs SETI@home GPU tasks at 71 C, fan speed 50%, clock 1695 MHz and no error.
Tullio
ID: 50221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50222 - Posted: 7 Aug 2018, 14:56:21 UTC - in response to Message 50221.  
Last modified: 7 Aug 2018, 14:56:50 UTC

The same GPU board runs SETI@home GPU tasks at 71 C, fan speed 50%, clock 1695 MHz and no error.
That's irrelevant. The GPUGrid app is much harder on GPUs than other apps, partly because it's based on CUDA8.0, while the other apps based on earlier CUDA versions.
ID: 50222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 50223 - Posted: 7 Aug 2018, 20:08:52 UTC

Not necessarily true. The Seti Linux CUDA9 app runs gpus a lot harder than the stock OpenCL application. I don't see more than 62° C. on my air cooled cards.
ID: 50223 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50224 - Posted: 8 Aug 2018, 2:32:05 UTC - in response to Message 50222.  

The SETI@home GPU tasks run on opencl_nvidia_SoG
ID: 50224 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zalster
Avatar

Send message
Joined: 26 Feb 14
Posts: 211
Credit: 4,496,324,562
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 50225 - Posted: 8 Aug 2018, 2:40:50 UTC - in response to Message 50224.  
Last modified: 8 Aug 2018, 2:41:09 UTC

Open_Cl

The SoG is just the name of the app.
ID: 50225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 50226 - Posted: 8 Aug 2018, 7:21:41 UTC

You don't have to run the stock SoG Linux apps at Seti. Most Linux users run the CUDA8 or CUDA9 gpu apps which are about 10 times faster.
ID: 50226 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next

Message boards : News : Windows GPU Applications broken

©2025 Universitat Pompeu Fabra