all WUs downloaded recently produce "computation error" right away

Message boards : Number crunching : all WUs downloaded recently produce "computation error" right away
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46905 - Posted: 15 Apr 2017, 12:53:54 UTC

which driver version is necesary and which driver version is save?
ID: 46905 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46908 - Posted: 15 Apr 2017, 13:40:30 UTC - in response to Message 46904.  

... updating drivers should do it

which might be impossible, or at least risky in case of Windows XP; Zoltan, what's your opinion on this?
ID: 46908 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 46909 - Posted: 15 Apr 2017, 14:54:36 UTC
Last modified: 15 Apr 2017, 14:55:05 UTC

My drivers are locked to the versions that came with the devices: changing drivers causes failures.

I will return as soon as the current system issues have been resolved as I believe GPUGrid performs valuable work.

Now I am off to Folding and WCG.....
John
ID: 46909 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46910 - Posted: 15 Apr 2017, 16:25:10 UTC - in response to Message 46904.  
Last modified: 15 Apr 2017, 16:55:56 UTC

For a more correct solution we will have to wait for Matt to update the old app next week. In the meanwhile as I said updating drivers should do it


What the crap, Stefan? :) I'm already using the latest drivers! My failures are on Windows 10, using 381.65 and 381.78. Please provide more details on what drivers you think should work, and also why failures still happen on 381.65 and 381.78.

Edit: I'm not 100% sure that I've been able to attempt a task using 381.78 yet.
ID: 46910 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46911 - Posted: 15 Apr 2017, 16:33:40 UTC - in response to Message 46910.  
Last modified: 15 Apr 2017, 16:35:11 UTC

... My failures are on Windows 10, using 381.65 and 381.78. Please provide more details on what drivers you think should work, and also why failures still happen on 381.65 and 381.78.

I was just going to ask here whether some-one has already tried the latest drivers - your posting answers my question, although in the negative sense.
So Matt's assumption that the latest drivers should solve the current problem unfortunately seems to be wrong :-(
ID: 46911 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46912 - Posted: 15 Apr 2017, 19:23:49 UTC - in response to Message 46856.  

The problem should now be fixed for anyone with a CUDA 8-capable driver.


Matt
ID: 46912 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46914 - Posted: 15 Apr 2017, 19:56:30 UTC - in response to Message 46912.  

I see you've deprecated v8.48 completely, but left v9.15 (superficially - as far as we can see) unchanged. I couldn't get it to work earlier, but I'll try again within the hour - test machine is busy with another project just at the moment.
ID: 46914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46915 - Posted: 15 Apr 2017, 19:59:34 UTC - in response to Message 46912.  

The problem should now be fixed for anyone with a CUDA 8-capable driver.

Matt

which means that for Windows XP users, the problem is NOT solved yet, right?
When will this be the case?
ID: 46915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46916 - Posted: 15 Apr 2017, 20:18:40 UTC - in response to Message 46915.  

I've changed the rules for issuing the 915 version. Any Windows machine that is 64 bit and reports CUDA 8.0 capability will get it now.

Matt
ID: 46916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46917 - Posted: 15 Apr 2017, 20:20:32 UTC - in response to Message 46916.  
Last modified: 15 Apr 2017, 20:25:54 UTC

I've changed the rules for issuing the 915 version. Any Windows machine that is 64 bit and reports CUDA 8.0 capability will get it now.
Matt

So which steps will be taken next to enable older drivers for XP to work?

My XP with driver 368.81 did download version 915, the task did start, but was broken off after a few minutes with "too many exit(0)s"
ID: 46917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46919 - Posted: 15 Apr 2017, 20:31:20 UTC - in response to Message 46917.  


So which steps will be taken next to enable older drivers for XP to work?

My XP with driver 368.81 did download version 915, the task did start, but was broken off after a few minutes with "too many exit(0)s"


I was expecting it to work on 64 bit XP, actually. Given that it doesn't there's not a tremendous amount I can do to fix it immediately.

We haven't had an XP test platform for a long time: Microsoft's ended support for it 3 years ago! You really should upgrade...

Matt
ID: 46919 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46920 - Posted: 15 Apr 2017, 20:59:17 UTC

OK, let's put XP to bed - I think it's a red herring in this case.

I have two - well, three - identical Windows 7/64 machines, each with GTX 970 GPUs.

Two tests - first, with an older cuda 7.0 driver: no tasks available, no tasks sent. That's the right answer after deprecating v8.48

Second, the one which I upgraded with a cuda 8.0 driver earlier today (specifically, 368.81). Task was sent, and along with it the v9.15 application - again, as intended. So far so good.

BUT - as reported earlier in this thread (but I appreciate you wouldn't want to read through the entire thing on a holiday Saturday), v9.15 isn't running on my Maxwell cards with the current batch of tasks. (It runs fine on a Pascal card in another machine)

Symptoms are:

Under BOINC, repeated iterations of

Task e4s7_e2s3p0f357-ADRIA_FOLDGREED10_crystal_ss_contacts_100_ubiquitin_4-1-2-RND7142_0 exited with zero status but no 'finished' file

until BOINC kills the task with the 'Too many exits' after 100 tries - exactly the message Erich got under XP. No difference between the OS versions - this difference applies to the hardware (different generations of GPU). It seems to have changed with this new batch of tasks, since the initial test release a week ago.

Running standalone in a terminal window, I get

D:\BOINCdata\slots\0>acemd.915-80
# ACEMD Molecular Dynamics Version [3212]
# CUDA Synchronisation mode: BLOCKING
# CUDA Synchronisation mode: BLOCKING
# SWAN: Created context 0 on GPU 0
SWAN : FATAL : Cuda driver error 35 in file 'swanlibnv2.cpp' in line 448.
# SWAN swan_assert 0

- that's the only diagnostic I've been able to capture. Nothing is written to the output or stderr files.

Test task is 16240262 - I'll let it run through its 100 exits and report it as soon as I've posted this, so you can compare my Windows 7 output with Erich's XP.
ID: 46920 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46921 - Posted: 15 Apr 2017, 21:02:31 UTC - in response to Message 46919.  


So which steps will be taken next to enable older drivers for XP to work?

My XP with driver 368.81 did download version 915, the task did start, but was broken off after a few minutes with "too many exit(0)s"


I was expecting it to work on 64 bit XP, actually. Given that it doesn't there's not a tremendous amount I can do to fix it immediately.

We haven't had an XP test platform for a long time: Microsoft's ended support for it 3 years ago! You really should upgrade...

Matt
Matt,
It's a bit off-topic, but let me explain:
These Windows XP x64 hosts are dedicated crunching boxes (therefore it does not matter if their OS is not supported anymore). A lot of effort have been put into them to make the GTX 980Ti work under Windows XP, selecting the right MB, "hacking" the NV driver to recognize the top-end cards, etc. The reason for *not* to upgrade them from Windows XP is to maximize their throughput (avoiding WDDM). The other path to achieve this is to use Linux, but you haven't put the SWAN_SYNC option into the latest Linux client (as far as my test proved it, but please correct me if I'm wrong), which hinders the performance of the top-end cards under Linux too. So you could motivate us to use Linux instead of the deprecated Windows XP if you would put that option in the Linux client, it could also increase the performance of the top end cards by 10~15% under Linux. But for now, if you could make a fresh CUDA 6.5 client, that would be great (and it would save us a lot of work).
Thank you in advance!
ID: 46921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46922 - Posted: 15 Apr 2017, 21:12:36 UTC - in response to Message 46920.  

Richard,

That error means "insufficient driver version".
According to the records that machine is running Windows 7 64b, not XP. Why are you running that driver version?

Matt
ID: 46922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46924 - Posted: 15 Apr 2017, 21:21:24 UTC - in response to Message 46922.  
Last modified: 15 Apr 2017, 21:25:31 UTC

I was running the same cuda 7.0 driver version on all machines until this morning - I upgraded this morning for testing only.

My experience is that each successive driver release is slower for general purpose computing (generalisation - YMMV). Since I'm not a gamer, I don't want need or want the latest game patches. I just picked one that was the last in its particular sequence, so more likely to be stable and bug-free.

We have the benefit of Jacob Klein reporting into this thread as well (see message 46910) - he does test the latest drivers for the benefit of the wider BOINC community, and has persuaded NVidia to fix several bugs over the years. He reports the same as me.

Edit - Your Pascal app release post (message 44869) says simply "NVIDIA Driver 360+" - I thought I'd aimed high enough above that?
ID: 46924 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46925 - Posted: 15 Apr 2017, 21:25:38 UTC - in response to Message 46924.  

Jacob was testing before I'd changed the issuing rules for 915 - he never even go the app to test, let alone see any failures.

If you could try a later version I'd appreciate it.
ID: 46925 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46926 - Posted: 15 Apr 2017, 21:27:53 UTC - in response to Message 46925.  
Last modified: 15 Apr 2017, 21:30:37 UTC

Sure, anything I can do to help. Supper has just beeped in the microwave, but I'll download while I eat, and install later.

Edit - 381.65 on its way.
ID: 46926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46927 - Posted: 15 Apr 2017, 21:29:48 UTC
Last modified: 15 Apr 2017, 21:38:09 UTC

MJH:

Can you please explain:
1) What caused the problem?
2) What solved the problem?
3) Why were "updated drivers" previously recommended as a solution?
ID: 46927 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46928 - Posted: 15 Apr 2017, 21:33:10 UTC - in response to Message 46927.  

Specify *which* problem, please.

Version not downloading - server configuration (plan class specification, I suspect), fixed.

Current set of tasks failing on older cards - not fixed, under exploration.
ID: 46928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46929 - Posted: 15 Apr 2017, 21:35:38 UTC - in response to Message 46927.  


1) What caused the problem?

The executables that we deploy time expire after a year or so due to licensing issues.

2) What solved the problem?

I've reconfigured the scheduler to send the 915 app (supporting kepler+) to all 64 bit hosts that report CUDA 8 support. This seems not to work on Windows XP, despite the last 368 seemingly reporting cuda 8 support.

Matt
ID: 46929 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : all WUs downloaded recently produce "computation error" right away

©2025 Universitat Pompeu Fabra