Advanced search

Message boards : Number crunching : all WUs downloaded recently produce "computation error" right away

Author Message
Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46856 - Posted: 14 Apr 2017 | 14:51:32 UTC

this happens since about 20 minutes ago:
after earlier jobs were finished and uploaded, every new one which is downloaded fails right at the beginning with "computation error" (runtime:0; CPU time: 0). Regardless of what type of WU it is.

The stderr's of these WUs all look this way:

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -44 (0xffffffd4)
</message>
]]>


Thus, my daily quota was finished within short time :-(

any idea what problem this is?

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 332
Credit: 3,760,508,309
RAC: 374,795
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46857 - Posted: 14 Apr 2017 | 15:02:43 UTC - in response to Message 46856.
Last modified: 14 Apr 2017 | 15:10:01 UTC

Same here, all units are failing with this error message:


Stderr output

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -44 (0xffffffd4)
</message>
]]>


This is happening on my windows 10 computer. I disconnected my other computer from the network, no errors on that one, right now.


This same error is happening to units downloaded earlier that were previously running well, once the host contacts the server.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 241
Credit: 973,621,906
RAC: 3,418,769
Level
Glu
Scientific publications
watwat
Message 46858 - Posted: 14 Apr 2017 | 15:04:44 UTC
Last modified: 14 Apr 2017 | 15:06:43 UTC

Same, http://www.gpugrid.net/workunit.php?wuid=12499116 5 WU Failed on one computer, probably hit daily quota and is disabled now.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46859 - Posted: 14 Apr 2017 | 15:17:56 UTC

thanks, folks, for the quick replies.
So there seems to be something wrong in general with the WUs that are being downloaded currently.

This happens at a very bad time, as I suspect that due to the Easter holidays there will be no one available at GPUGRID to rectify the problem :-(

Profile [PUGLIA] kidkidkid3
Send message
Joined: 23 Feb 11
Posts: 48
Credit: 331,401,967
RAC: 180,472
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 46860 - Posted: 14 Apr 2017 | 15:22:32 UTC - in response to Message 46858.

Hi,
same here .... (unknown error) - exit code -44 (0xffffffd4)... continuously !
With no wu now ... because computer has finished a daily quota of 3 tasks !
After the damage ... now the trik !
Please suspend/cancel this daily quote, thanks in advance.
K.

____________
Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King)

Finrond
Send message
Joined: 26 Jun 12
Posts: 5
Credit: 311,739,014
RAC: 36
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 46861 - Posted: 14 Apr 2017 | 15:33:24 UTC - in response to Message 46860.

Same thing here, quota maxed.

Hans Sveen
Send message
Joined: 29 Oct 08
Posts: 2
Credit: 267,083,504
RAC: 190,501
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46862 - Posted: 14 Apr 2017 | 15:40:02 UTC

Hi!
Also happened on my two pc's;

one win 7 Ultimate 64, Nvidia 680 Gtx Driver 381.65,boinc 7.6.33 and
one win 10 Core 64, Nvidia 970 Gtx Driver 381.65, boinc 7.7.2 .

Tried to reset the latter one, still the same error !

With regards,

Hans Sveen
Oslo, Norway

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46863 - Posted: 14 Apr 2017 | 16:11:21 UTC
Last modified: 14 Apr 2017 | 16:11:46 UTC

I too am getting "Computation Errors" on all my GPUGrid tasks, currently.

Long runs (8-12 hours on fastest card) v8.48 (cuda65)

e3s12_e2s3p1f468-ADRIA_FOLDGREED50_crystal_ss_contacts_100_ubiquitin_6-1-2-RND2729_7
http://www.gpugrid.net/result.php?resultid=16219421

e4s6_e3s8p1f477-ADRIA_FOLDGREED50_crystal_ss_contacts_100_ubiquitin_8-0-2-RND8520_2
http://www.gpugrid.net/result.php?resultid=16219440

e16s215_e5s183p2f300-PABLO_contact_goal_KIX_CMYB-1-4-RND5301_2
http://www.gpugrid.net/result.php?resultid=16219443

e5s12_e4s5p0f45-ADRIA_FOLDGREED90_crystal_ss_contacts_20_ubiquitin_2-0-1-RND7117_3
http://www.gpugrid.net/result.php?resultid=16219446

Server state Over
Outcome Computation error
Client state Compute error
Exit status -44 (0xffffffffffffffd4) Unknown error number


Stderr output

<core_client_version>7.7.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -44 (0xffffffd4)
</message>
]]>

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46864 - Posted: 14 Apr 2017 | 16:19:06 UTC - in response to Message 46863.

I too am getting "Computation Errors" on all my GPUGrid tasks, currently.

yes, this is the remarkable thing - the error occurs on ANY type of task.
No idea, why :-(

We can only hope that someone from GPUGRID notices this problem ASAP. Whether it can be solved quickly - that's another question.

Loohi
Send message
Joined: 27 Aug 16
Posts: 16
Credit: 43,745,875
RAC: 24
Level
Val
Scientific publications
wat
Message 46867 - Posted: 14 Apr 2017 | 16:52:58 UTC - in response to Message 46864.

Same issues as listed above, regardless of long/short WUs. Hope we can hear back from the team soon!

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46868 - Posted: 14 Apr 2017 | 17:03:10 UTC - in response to Message 46863.

Matt's FAQ: What do the application error codes signify? lists "-44 The computer's date is wrong." Mine still looks OK, but I'm getting the same errors as everybody else.

One machine still has a task running, so I'll grab a copy of the files and the workunit/result specifications from client_state.xml: then see what's different about the replacement when I need one.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,970,837,494
RAC: 6,545,179
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46869 - Posted: 14 Apr 2017 | 17:16:54 UTC
Last modified: 14 Apr 2017 | 17:17:33 UTC

I experience the same behavior.
I would like to add that from my experience the workunits in progress will also fail with this error if you restart your PC, right after the restart.
See task 16216692, 16216249, 16216581
I've sent a notification e-mail about this to the staff.

Helix Von Smelix
Send message
Joined: 13 Aug 08
Posts: 4
Credit: 74,618,325
RAC: 10,725
Level
Thr
Scientific publications
wat
Message 46872 - Posted: 14 Apr 2017 | 18:46:43 UTC

Same here :-)

[CSF] Thomas H.V. Dupont
Send message
Joined: 20 Jul 14
Posts: 525
Credit: 55,667,800
RAC: 71,872
Level
Thr
Scientific publications
watwatwat
Message 46873 - Posted: 14 Apr 2017 | 19:04:21 UTC

Same here
____________
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES
www.crunchersansfrontieres.org

Tom Miller
Send message
Joined: 21 Nov 14
Posts: 5
Credit: 860,949,841
RAC: 464
Level
Glu
Scientific publications
watwat
Message 46875 - Posted: 14 Apr 2017 | 19:09:07 UTC

50 failed tasks since 14:46 UTC on the 14th of April. They continue to come and fail.

If this again turns out to be a staff failure, I'm done.

Is this science or science fiction.

Once again, the weekend comes and the junk rolls out.

Why would you ever trust the science of a group who is so careless with their work.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46876 - Posted: 14 Apr 2017 | 19:42:01 UTC

Interestingly enough: on the Project Status page, the number of unsent tasks is going up continuously.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,970,837,494
RAC: 6,545,179
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46877 - Posted: 14 Apr 2017 | 20:00:57 UTC - in response to Message 46876.

Interestingly enough: on the Project Status page, the number of unsent tasks is going up continuously.
It's because all active hosts have used up their daily quota.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46878 - Posted: 14 Apr 2017 | 20:04:02 UTC - in response to Message 46877.

Interestingly enough: on the Project Status page, the number of unsent tasks is going up continuously.
It's because all active hosts have used up their daily quota.

so I guess these are the tasks that are being generated automatically, right?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,970,837,494
RAC: 6,545,179
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46879 - Posted: 14 Apr 2017 | 20:08:36 UTC - in response to Message 46878.
Last modified: 14 Apr 2017 | 20:14:53 UTC

Interestingly enough: on the Project Status page, the number of unsent tasks is going up continuously.
It's because all active hosts have used up their daily quota.
so I guess these are the tasks that are being generated automatically, right?
In this case no. These are simply the failed workunits waiting to be resent to another host, but there's none to send to, because all have used up their dailiy quota.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,970,837,494
RAC: 6,545,179
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46880 - Posted: 14 Apr 2017 | 20:14:38 UTC - in response to Message 46875.
Last modified: 14 Apr 2017 | 20:16:38 UTC

50 failed tasks since 14:46 UTC on the 14th of April. They continue to come and fail.

If this again turns out to be a staff failure, I'm done.

Is this science or science fiction.

Once again, the weekend comes and the junk rolls out.

Why would you ever trust the science of a group who is so careless with their work.

Keep calm, and set up a backup project (that is a project which has 0 resource share set on the project's webpage).
I suggest Einstein@home or SETI@home.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46881 - Posted: 14 Apr 2017 | 20:17:53 UTC - in response to Message 46879.

These are simply the failed workunits waiting to be resend to another host, but there's none to send to, because all have used up their dailiy quota.

Which means that all the WUs that were faulty to begin with, will be "recycled", so to speak; and at some point, there will be several thousand faulty WUs in the queue :-(
So I am curious how this pile of junk will be successfully cleaned up :-)

Tom Miller
Send message
Joined: 21 Nov 14
Posts: 5
Credit: 860,949,841
RAC: 464
Level
Glu
Scientific publications
watwat
Message 46882 - Posted: 14 Apr 2017 | 22:01:48 UTC

My first failure was at

14:21:01 UTC on the 14th.

50+ and counting.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46883 - Posted: 14 Apr 2017 | 22:06:43 UTC

Here's an interesting one: WU 12499196.

Three consecutive failures with exit status -44, as we're all seeing. All of those were with the v8.48, cuda65 application.

But the fourth has gone to my (one and only) GTX 1050 Ti running the v9.15, cuda80 application. And it's running just fine - even better than fine, blisteringly fast.

There was an announcement this week that v9.15 was now available to all supported GPU generations: my older ones haven't picked it up, probably because I haven't updated my drivers recently. But just maybe, the current tasks require v9.15? That's one to test in the morning.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 574
Credit: 1,912,707,875
RAC: 1,723,332
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46884 - Posted: 14 Apr 2017 | 22:43:35 UTC - in response to Message 46883.

I have updated drivers Richard on my 980ti but it won't pick up new app. I have reset project and still 8.48 cuda 6.5

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46885 - Posted: 14 Apr 2017 | 22:58:34 UTC - in response to Message 46869.
Last modified: 14 Apr 2017 | 23:02:23 UTC

I experience the same behavior.
I would like to add that from my experience the workunits in progress will also fail with this error if you restart your PC, right after the restart.


I'd like to further clarify that.

If you suspend in-progress tasks, then resume them, they will fail. I just lost tons of work that way :) I smile, because it's all I can do. It happens. Just wanted to add that suspending and restarting the task itself, is also a problem.

Backup projects (attached with 0 resource share) are starting to kick in for me.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,970,837,494
RAC: 6,545,179
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46886 - Posted: 14 Apr 2017 | 23:12:53 UTC - in response to Message 46885.
Last modified: 14 Apr 2017 | 23:22:19 UTC

I experience the same behavior.
I would like to add that from my experience the workunits in progress will also fail with this error if you restart your PC, right after the restart.


I'd like to further clarify that.

If you suspend in-progress tasks, then resume them, they will fail. I just lost tons of work that way :) I smile, because it's all I can do. It happens. Just wanted to add that suspending and restarting the task itself, is also a problem.
I'm aware of that problem, but that gives a different error message in stderr.txt
EDIT: maybe I don't remember it right, and the error code / message is the same, but my tasks did not error out after a restart earlier.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46887 - Posted: 14 Apr 2017 | 23:15:22 UTC - in response to Message 46884.
Last modified: 14 Apr 2017 | 23:17:58 UTC

I have updated drivers Richard on my 980ti but it won't pick up new app. I have reset project and still 8.48 cuda 6.5

Sampling through a few of the highest-RAC users on my way to bed, it looks as if all their 970/980 cards are erroring tasks, but all their 1070/1080 cards are working normally. There's a debug clue in there somewhere.

Edit - including Retvari's single active 1080, host 23631

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,970,837,494
RAC: 6,545,179
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46888 - Posted: 14 Apr 2017 | 23:18:10 UTC - in response to Message 46883.

Here's an interesting one: WU 12499196.

Three consecutive failures with exit status -44, as we're all seeing. All of those were with the v8.48, cuda65 application.

But the fourth has gone to my (one and only) GTX 1050 Ti running the v9.15, cuda80 application. And it's running just fine - even better than fine, blisteringly fast.

My GTX 1080 is working fine with the 9.15 app under Windows 10.

There was an announcement this week that v9.15 was now available to all supported GPU generations: my older ones haven't picked it up, probably because I haven't updated my drivers recently. But just maybe, the current tasks require v9.15? That's one to test in the morning.

I don't think so.
It's more likely that some dll stopped working after a given date, that is 04.14.2017.
It could be a licensing limitation, or other time limit which is expired.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,970,837,494
RAC: 6,545,179
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46889 - Posted: 14 Apr 2017 | 23:43:29 UTC - in response to Message 46888.
Last modified: 14 Apr 2017 | 23:44:07 UTC

There was an announcement this week that v9.15 was now available to all supported GPU generations: my older ones haven't picked it up, probably because I haven't updated my drivers recently. But just maybe, the current tasks require v9.15? That's one to test in the morning.

I don't think so.
It's more likely that some dll stopped working after a given date, that is 04.14.2017.
It could be a licensing limitation, or other time limit which is expired.

I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching! Yes, the 8.48 app. So there's a date limit somewhere in the 8.48 app.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46890 - Posted: 15 Apr 2017 | 0:03:20 UTC

Great find, Retvari! That should help the devs to solve it as quickly as they can!

On a lighter note, I found another easy workaround too, here:
https://www.youtube.com/watch?v=dQw4w9WgXcQ

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46892 - Posted: 15 Apr 2017 | 6:39:10 UTC - in response to Message 46889.

I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching! Yes, the 8.48 app. So there's a date limit somewhere in the 8.48 app.

I've tried to do this, however, I got stuck with "the computer has finished the daily quota of 1 task" - HOW NICE :-(

Slowly but surely I am kind of fed up by GPUGRID. I'm getting more and more impression (like one of the posters above) that they don't take their work serious enough :-(

Profile [PUGLIA] kidkidkid3
Send message
Joined: 23 Feb 11
Posts: 48
Credit: 331,401,967
RAC: 180,472
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 46895 - Posted: 15 Apr 2017 | 7:38:56 UTC - in response to Message 46890.

Great find, Retvari! That should help the devs to solve it as quickly as they can!



Peace and love, thanks great Retvari, good Easter to all ... be patient !
K.

____________
Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King)

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46896 - Posted: 15 Apr 2017 | 8:03:17 UTC - in response to Message 46895.

... be patient !

I am afraid that my patience is overstreched by now - every month a major problem which makes GPUGRID crunching impossible for several days :-(((

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46897 - Posted: 15 Apr 2017 | 8:23:04 UTC - in response to Message 46889.

So there's a date limit somewhere in the 8.48 app.

My suspicion (unverified) is that the problem might lie with tcl84.dll

That's been replaced with tcl86.dll in v9.14/5, and https://www.activestate.com/activetcl seem to have a rather curious licencing regime:

Business and Enterprise Editions provide access to older Tcl versions:

Although non-production use is permitted for free using our latest Community Edition versions, use of legacy versions on non-production and/or production machines requires a Business Edition or Enterprise Edition license.

I'll play around with some options later.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,970,837,494
RAC: 6,545,179
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46898 - Posted: 15 Apr 2017 | 9:42:21 UTC - in response to Message 46897.

So there's a date limit somewhere in the 8.48 app.

My suspicion (unverified) is that the problem might lie with tcl84.dll

That's been replaced with tcl86.dll in v9.14/5...

I've tried to replace tcl84.dll with tcl86.dll by renaming the latter (and setting don't check file sizes in cc_config.xml), but then I got a different error:
There are no child processes to wait for. (0x80) - exit code 128 (0x80)

See this task.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46899 - Posted: 15 Apr 2017 | 9:53:54 UTC - in response to Message 46889.

Zoltan wrote:

I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching!

For me, this worked on the two Windows 10 PCs.

On my main crunching PC with two GTX980Ti and XP, I unfortunately had the "limit of daily tasks" problem (as mentioned earlier here), on the other one with the GTX750Ti and XP, after changing the date backwards (to 04.13.2017), none of the buttons on the left side of the BOINC manager did react any more. So I could not do what I had intended.
Only after changing the date back to real, the BOINC manager worked again. So no chance to apply this "date trick" on XP, at least not on mine :-(

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46900 - Posted: 15 Apr 2017 | 10:33:28 UTC - in response to Message 46898.

So there's a date limit somewhere in the 8.48 app.

My suspicion (unverified) is that the problem might lie with tcl84.dll

That's been replaced with tcl86.dll in v9.14/5...

I've tried to replace tcl84.dll with tcl86.dll by renaming the latter (and setting don't check file sizes in cc_config.xml), but then I got a different error:
There are no child processes to wait for. (0x80) - exit code 128 (0x80)

I had the same idea, but tried a different route: I wrapped up the existing files in an app_info.xml, and then changed the tcl file reference to supply a copy of tcl86.dll

No dice: instead, I got error

0xC000007B STATUS_INVALID_IMAGE_FORMAT

(task 16233669 - confirmed that this related to the tcl change with some offline tests)

This machine is Windows 7 with a GTX 970 and (currently) a maximum cuda 7.0 driver. Next steps will be to try a cuda 8.0 driver and see what the project sends me: if it's still v8.48, I'll try putting v9.15 into an app_info.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46902 - Posted: 15 Apr 2017 | 11:55:39 UTC - in response to Message 46900.

Sad to report that both approaches failed. A normal work fetch got me v8.48 even with a cuda 8.0 driver, and it failed with the clock error as before.

A full v9.15 file set (copied from my GTX 1050Ti machine, also running Windows 7/64) under app_info.xml gave repeated iterations of

15/04/2017 12:31:21 | GPUGRID | [cpu_sched] Starting task e14s3_e11s4p0f35-ADRIA_FOLDGREED10_crystal_ss_contacts_20_ubiquitin_4-0-1-RND0892_0 using acemdlong version 915 (cuda80) in slot 1
15/04/2017 12:31:24 | GPUGRID | Task e14s3_e11s4p0f35-ADRIA_FOLDGREED10_crystal_ss_contacts_20_ubiquitin_4-0-1-RND0892_0 exited with zero status but no 'finished' file
15/04/2017 12:31:24 | GPUGRID | If this happens repeatedly you may need to reset the project.

- the app quits silently with no error number, and doesn't even have time to start writing a stderr.txt file or to write anything to the _0_0 result file (aka 'progress.log'). The only evidence that the app has even tried to run is a 'canary' file in the slot directory. The only diagnostics output I can get is from a command prompt:

D:\BOINCdata\slots\1>acemd.915-80.exe
# ACEMD Molecular Dynamics Version [3212]
# CUDA Synchronisation mode: BLOCKING
# CUDA Synchronisation mode: BLOCKING
# SWAN: Created context 0 on GPU 0
SWAN : FATAL : Cuda driver error 35 in file 'swanlibnv2.cpp' in line 448.
# SWAN swan_assert 0

Card data is

15/04/2017 12:28:16 | | CUDA: NVIDIA GPU 0: GeForce GTX 970 (driver version 368.81, CUDA version 8.0, compute capability 5.2, 4096MB, 3066MB available, 4087 GFLOPS peak)

I think I'm stuck until the staff are back in the lab.

Stefan
Volunteer moderator
Project developer
Project scientist
Send message
Joined: 5 Mar 13
Posts: 258
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 46903 - Posted: 15 Apr 2017 | 12:08:18 UTC

I talked with Matt. He says that it's probably the license that time-expired. Updating the drivers will get the cuda 8 app which should fix it.

Stefan
Volunteer moderator
Project developer
Project scientist
Send message
Joined: 5 Mar 13
Posts: 258
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 46904 - Posted: 15 Apr 2017 | 12:24:29 UTC

For a more correct solution we will have to wait for Matt to update the old app next week. In the meanwhile as I said updating drivers should do it

klepel
Send message
Joined: 23 Dec 09
Posts: 126
Credit: 1,704,805,812
RAC: 1,863,487
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46905 - Posted: 15 Apr 2017 | 12:53:54 UTC

which driver version is necesary and which driver version is save?

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46908 - Posted: 15 Apr 2017 | 13:40:30 UTC - in response to Message 46904.

... updating drivers should do it

which might be impossible, or at least risky in case of Windows XP; Zoltan, what's your opinion on this?

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 177
Credit: 131,725,186
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 46909 - Posted: 15 Apr 2017 | 14:54:36 UTC
Last modified: 15 Apr 2017 | 14:55:05 UTC

My drivers are locked to the versions that came with the devices: changing drivers causes failures.

I will return as soon as the current system issues have been resolved as I believe GPUGrid performs valuable work.

Now I am off to Folding and WCG.....
____________
John

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46910 - Posted: 15 Apr 2017 | 16:25:10 UTC - in response to Message 46904.
Last modified: 15 Apr 2017 | 16:55:56 UTC

For a more correct solution we will have to wait for Matt to update the old app next week. In the meanwhile as I said updating drivers should do it


What the crap, Stefan? :) I'm already using the latest drivers! My failures are on Windows 10, using 381.65 and 381.78. Please provide more details on what drivers you think should work, and also why failures still happen on 381.65 and 381.78.

Edit: I'm not 100% sure that I've been able to attempt a task using 381.78 yet.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46911 - Posted: 15 Apr 2017 | 16:33:40 UTC - in response to Message 46910.
Last modified: 15 Apr 2017 | 16:35:11 UTC

... My failures are on Windows 10, using 381.65 and 381.78. Please provide more details on what drivers you think should work, and also why failures still happen on 381.65 and 381.78.

I was just going to ask here whether some-one has already tried the latest drivers - your posting answers my question, although in the negative sense.
So Matt's assumption that the latest drivers should solve the current problem unfortunately seems to be wrong :-(

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46912 - Posted: 15 Apr 2017 | 19:23:49 UTC - in response to Message 46856.

The problem should now be fixed for anyone with a CUDA 8-capable driver.


Matt

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46914 - Posted: 15 Apr 2017 | 19:56:30 UTC - in response to Message 46912.

I see you've deprecated v8.48 completely, but left v9.15 (superficially - as far as we can see) unchanged. I couldn't get it to work earlier, but I'll try again within the hour - test machine is busy with another project just at the moment.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46915 - Posted: 15 Apr 2017 | 19:59:34 UTC - in response to Message 46912.

The problem should now be fixed for anyone with a CUDA 8-capable driver.

Matt

which means that for Windows XP users, the problem is NOT solved yet, right?
When will this be the case?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46916 - Posted: 15 Apr 2017 | 20:18:40 UTC - in response to Message 46915.

I've changed the rules for issuing the 915 version. Any Windows machine that is 64 bit and reports CUDA 8.0 capability will get it now.

Matt

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46917 - Posted: 15 Apr 2017 | 20:20:32 UTC - in response to Message 46916.
Last modified: 15 Apr 2017 | 20:25:54 UTC

I've changed the rules for issuing the 915 version. Any Windows machine that is 64 bit and reports CUDA 8.0 capability will get it now.
Matt

So which steps will be taken next to enable older drivers for XP to work?

My XP with driver 368.81 did download version 915, the task did start, but was broken off after a few minutes with "too many exit(0)s"

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46919 - Posted: 15 Apr 2017 | 20:31:20 UTC - in response to Message 46917.


So which steps will be taken next to enable older drivers for XP to work?

My XP with driver 368.81 did download version 915, the task did start, but was broken off after a few minutes with "too many exit(0)s"


I was expecting it to work on 64 bit XP, actually. Given that it doesn't there's not a tremendous amount I can do to fix it immediately.

We haven't had an XP test platform for a long time: Microsoft's ended support for it 3 years ago! You really should upgrade...

Matt

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46920 - Posted: 15 Apr 2017 | 20:59:17 UTC

OK, let's put XP to bed - I think it's a red herring in this case.

I have two - well, three - identical Windows 7/64 machines, each with GTX 970 GPUs.

Two tests - first, with an older cuda 7.0 driver: no tasks available, no tasks sent. That's the right answer after deprecating v8.48

Second, the one which I upgraded with a cuda 8.0 driver earlier today (specifically, 368.81). Task was sent, and along with it the v9.15 application - again, as intended. So far so good.

BUT - as reported earlier in this thread (but I appreciate you wouldn't want to read through the entire thing on a holiday Saturday), v9.15 isn't running on my Maxwell cards with the current batch of tasks. (It runs fine on a Pascal card in another machine)

Symptoms are:

Under BOINC, repeated iterations of

Task e4s7_e2s3p0f357-ADRIA_FOLDGREED10_crystal_ss_contacts_100_ubiquitin_4-1-2-RND7142_0 exited with zero status but no 'finished' file

until BOINC kills the task with the 'Too many exits' after 100 tries - exactly the message Erich got under XP. No difference between the OS versions - this difference applies to the hardware (different generations of GPU). It seems to have changed with this new batch of tasks, since the initial test release a week ago.

Running standalone in a terminal window, I get

D:\BOINCdata\slots\0>acemd.915-80 # ACEMD Molecular Dynamics Version [3212] # CUDA Synchronisation mode: BLOCKING # CUDA Synchronisation mode: BLOCKING # SWAN: Created context 0 on GPU 0 SWAN : FATAL : Cuda driver error 35 in file 'swanlibnv2.cpp' in line 448. # SWAN swan_assert 0

- that's the only diagnostic I've been able to capture. Nothing is written to the output or stderr files.

Test task is 16240262 - I'll let it run through its 100 exits and report it as soon as I've posted this, so you can compare my Windows 7 output with Erich's XP.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,970,837,494
RAC: 6,545,179
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46921 - Posted: 15 Apr 2017 | 21:02:31 UTC - in response to Message 46919.


So which steps will be taken next to enable older drivers for XP to work?

My XP with driver 368.81 did download version 915, the task did start, but was broken off after a few minutes with "too many exit(0)s"


I was expecting it to work on 64 bit XP, actually. Given that it doesn't there's not a tremendous amount I can do to fix it immediately.

We haven't had an XP test platform for a long time: Microsoft's ended support for it 3 years ago! You really should upgrade...

Matt
Matt,
It's a bit off-topic, but let me explain:
These Windows XP x64 hosts are dedicated crunching boxes (therefore it does not matter if their OS is not supported anymore). A lot of effort have been put into them to make the GTX 980Ti work under Windows XP, selecting the right MB, "hacking" the NV driver to recognize the top-end cards, etc. The reason for *not* to upgrade them from Windows XP is to maximize their throughput (avoiding WDDM). The other path to achieve this is to use Linux, but you haven't put the SWAN_SYNC option into the latest Linux client (as far as my test proved it, but please correct me if I'm wrong), which hinders the performance of the top-end cards under Linux too. So you could motivate us to use Linux instead of the deprecated Windows XP if you would put that option in the Linux client, it could also increase the performance of the top end cards by 10~15% under Linux. But for now, if you could make a fresh CUDA 6.5 client, that would be great (and it would save us a lot of work).
Thank you in advance!

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46922 - Posted: 15 Apr 2017 | 21:12:36 UTC - in response to Message 46920.

Richard,

That error means "insufficient driver version".
According to the records that machine is running Windows 7 64b, not XP. Why are you running that driver version?

Matt

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46924 - Posted: 15 Apr 2017 | 21:21:24 UTC - in response to Message 46922.
Last modified: 15 Apr 2017 | 21:25:31 UTC

I was running the same cuda 7.0 driver version on all machines until this morning - I upgraded this morning for testing only.

My experience is that each successive driver release is slower for general purpose computing (generalisation - YMMV). Since I'm not a gamer, I don't want need or want the latest game patches. I just picked one that was the last in its particular sequence, so more likely to be stable and bug-free.

We have the benefit of Jacob Klein reporting into this thread as well (see message 46910) - he does test the latest drivers for the benefit of the wider BOINC community, and has persuaded NVidia to fix several bugs over the years. He reports the same as me.

Edit - Your Pascal app release post (message 44869) says simply "NVIDIA Driver 360+" - I thought I'd aimed high enough above that?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46925 - Posted: 15 Apr 2017 | 21:25:38 UTC - in response to Message 46924.

Jacob was testing before I'd changed the issuing rules for 915 - he never even go the app to test, let alone see any failures.

If you could try a later version I'd appreciate it.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46926 - Posted: 15 Apr 2017 | 21:27:53 UTC - in response to Message 46925.
Last modified: 15 Apr 2017 | 21:30:37 UTC

Sure, anything I can do to help. Supper has just beeped in the microwave, but I'll download while I eat, and install later.

Edit - 381.65 on its way.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46927 - Posted: 15 Apr 2017 | 21:29:48 UTC
Last modified: 15 Apr 2017 | 21:38:09 UTC

MJH:

Can you please explain:
1) What caused the problem?
2) What solved the problem?
3) Why were "updated drivers" previously recommended as a solution?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46928 - Posted: 15 Apr 2017 | 21:33:10 UTC - in response to Message 46927.

Specify *which* problem, please.

Version not downloading - server configuration (plan class specification, I suspect), fixed.

Current set of tasks failing on older cards - not fixed, under exploration.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46929 - Posted: 15 Apr 2017 | 21:35:38 UTC - in response to Message 46927.


1) What caused the problem?

The executables that we deploy time expire after a year or so due to licensing issues.

2) What solved the problem?

I've reconfigured the scheduler to send the 915 app (supporting kepler+) to all 64 bit hosts that report CUDA 8 support. This seems not to work on Windows XP, despite the last 368 seemingly reporting cuda 8 support.

Matt

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46930 - Posted: 15 Apr 2017 | 21:38:21 UTC
Last modified: 15 Apr 2017 | 21:45:16 UTC

MJH:

Here's what I'm seeing with 9.15:
- My PC "Speed", that has GTX 980 Ti x2 .... appears to work fine with 9.15
- My PC "RacerX", that has GTX 970 and GTX 660 Ti x2 ... appears to have problems running 9.15 apps on the 2 GTX 660 Ti GPUs. See below.

I thought CC 3.0 GPUs were still supported.
Any ideas?

<core_client_version>7.7.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -59 (0xffffffc5)
</message>
<stderr_txt>
# GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 1 :
# Name : GeForce GTX 660 Ti
# ECC : Disabled
# Global mem : 3072MB
# Capability : 3.0
# PCI ID : 0000:07:00.0
# Device clock : 1045MHz
# Memory clock : 3004MHz
# Memory width : 192bit
# Driver version : r381_64 : 38178
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 300

</stderr_txt>
]]>

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46931 - Posted: 15 Apr 2017 | 21:43:15 UTC - in response to Message 46930.

Interesting. The machines I'm having problems with also have dual GPUs of different vintages - mine have a secondary GTX 750Ti in both cases. But GPUGrid is excluded from the 750s, and only runs on the 970s.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46932 - Posted: 15 Apr 2017 | 21:43:26 UTC - in response to Message 46930.

For some reason the sm 3.0 support (and only that sm version) is broken.
That'll need a 9.16

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46933 - Posted: 15 Apr 2017 | 21:47:24 UTC - in response to Message 46931.

Richard the problem with your machines is (at least) the driver version.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46934 - Posted: 15 Apr 2017 | 21:57:43 UTC

Thanks MJH.

I'll do my best to test 9.16 when it's released, though I sure wish I could get email notifications for my subscribed threads ... that's been broken for a while :/ Maybe you could take a peek.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46935 - Posted: 15 Apr 2017 | 22:08:54 UTC - in response to Message 46934.

9.16 should be along in about 15 mins.

Tom Miller
Send message
Joined: 21 Nov 14
Posts: 5
Credit: 860,949,841
RAC: 464
Level
Glu
Scientific publications
watwat
Message 46936 - Posted: 15 Apr 2017 | 22:11:16 UTC - in response to Message 46933.

Failing on all machines.

Windows 10x64 on all.

4 X GTX670's
3 X GTX680's
4 X GTX770's
2 X GTX780Ti's

They all have late driver versions.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46937 - Posted: 15 Apr 2017 | 22:13:01 UTC - in response to Message 46933.
Last modified: 15 Apr 2017 | 22:24:26 UTC

Richard the problem with your machines is (at least) the driver version.

I was just coming to that conclusion myself. Clean install of 381.65 completed, machine rebooted, and task e67s40_e47s2p0f68-PABLO_P04637_0_IDP-0-1-RND0199_4 is running normally. But that's an old task from 13 April, with three previous v9.15 failures. Too late to investigate whether they might have lower drivers too, or some other problem.

I'd like to verify that tasks like yesterday's ADRIA_FOLDGREED10_crystal_ss_contacts_100_ubiquitin batch run OK before we completely sign this one off, but that can wait until tomorrow (or later next week).

Apologies for interrupting your weekend - hope you can have a good break in what remains of it.


Edit - OK, I peeked :-)

Task 16239400 has the ERR_TOO_MANY_EXITS problem, and it's described as

GeForce GTX 1060 6GB (4095MB) driver: 368.81

We're going to have to work out where the break-point occurs in the 360+ driver sequence, and put out an APB to upgrade - or a min_version in the plan_class. Next week. G'night.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 46938 - Posted: 15 Apr 2017 | 22:44:02 UTC - in response to Message 46937.

916 is out now: this should work with sm 300 GPUs

I've revised the scheduler to refuse work to anything with driver < 370.00 which was the previous minimum for CUDA 8. Seems the "supported version" reported by the driver is as unreliable as ever.

Matt

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46939 - Posted: 15 Apr 2017 | 22:54:22 UTC - in response to Message 46938.

OK, when I start upgrading my other two tomorrow morning I'll start with 372.54, and if that works, probably stick at 373.06 (first and last of the 372 series respectively). If that doesn't work, rinse and repeat with 375.63 / 376.33 and so on.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46940 - Posted: 15 Apr 2017 | 23:18:22 UTC
Last modified: 15 Apr 2017 | 23:19:06 UTC

MJH:
9.16 tasks are still not working for my GTX 660 Ti GPUs.



Server state Over
Outcome Computation error
Client state Compute error
Exit status -52 (0xffffffffffffffcc) Unknown error number
Computer ID 153764
Report deadline 20 Apr 2017 | 23:01:28 UTC
Run time 2.88
CPU time 0.00
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v9.16 (cuda80)



Stderr output

<core_client_version>7.7.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -52 (0xffffffcc)
</message>
<stderr_txt>
# GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 1 :
# Name : GeForce GTX 660 Ti
# ECC : Disabled
# Global mem : 3072MB
# Capability : 3.0
# PCI ID : 0000:07:00.0
# Device clock : 1045MHz
# Memory clock : 3004MHz
# Memory width : 192bit
# Driver version : r381_64 : 38178
SWAN : FATAL Unable to load module .nonbonded.cu. (300)

</stderr_txt>
]]>

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 332
Credit: 3,760,508,309
RAC: 374,795
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46941 - Posted: 16 Apr 2017 | 0:18:40 UTC

I updated my windows 10 computer to nvidia driver 381.65 from 359.06. Everything is running fine so far, but this driver is slightly slower.

Now, what is going to happen to windows xp? I would like to see it supported a little bit longer, and please don't tell me to upgrade.

If you are going to retire cuda 6.5, then please give us a warning before hand and don't do it, in this abrupt and amateurish manner.

And do keep track of your licenses' expiration dates and give us ample warning when we need to upgrade our software!

Thank you!!


PappaLitto
Send message
Joined: 21 Mar 16
Posts: 241
Credit: 973,621,906
RAC: 3,418,769
Level
Glu
Scientific publications
watwat
Message 46943 - Posted: 16 Apr 2017 | 2:21:23 UTC - in response to Message 46941.
Last modified: 16 Apr 2017 | 2:21:32 UTC

Now, what is going to happen to windows xp? I would like to see it supported a little bit longer, and please don't tell me to upgrade.

If you are going to retire cuda 6.5, then please give us a warning before hand and don't do it, in this abrupt and amateurish manner.

And do keep track of your licenses' expiration dates and give us ample warning when we need to upgrade our software!

Thank you!!

+1

Profile DrBob
Send message
Joined: 1 Sep 08
Posts: 3
Credit: 87,616,939
RAC: 491,716
Level
Thr
Scientific publications
watwatwatwatwatwatwat
Message 46944 - Posted: 16 Apr 2017 | 2:23:42 UTC

My GTX750Ti - driver 376.53 & GTX1050Ti - driver 378.92 are working fine now.

2 GTX460 running driver 378.92 are not getting any work even though they are above the minimum driver level and show CUDA version 8.0...

4/15/2017 9:06:31 PM | GPUGRID | Sending scheduler request: To fetch work.
4/15/2017 9:06:31 PM | GPUGRID | Requesting new tasks for Miner ASIC and NVIDIA GPU
4/15/2017 9:06:33 PM | GPUGRID | Scheduler request completed: got 0 new tasks
4/15/2017 9:06:33 PM | GPUGRID | No tasks sent
4/15/2017 9:06:33 PM | GPUGRID | No tasks are available for Short runs (2-3 hours on fastest card)
4/15/2017 9:06:33 PM | GPUGRID | No tasks are available for Long runs (8-12 hours on fastest card)

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46947 - Posted: 16 Apr 2017 | 3:00:16 UTC
Last modified: 16 Apr 2017 | 3:17:59 UTC

9.18 is also not working for my GTX 660 Ti GPUs.
SWAN : FATAL Unable to load module .nonbonded.cu. (300)

Also, the 9.x tasks aren't responding to suspending very well, sometimes it takes 10-20 seconds after issuing the suspend command, before they actually suspend!


Here you can see where it was fine on the GTX 970, but immediately failed on the GTX 660 Ti:
https://www.gpugrid.net/result.php?resultid=16242220

Note: I am running a pre-release version of Windows 10 - but I wouldn't think that would matter, would it?

Stderr output

<core_client_version>7.7.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -52 (0xffffffcc)
</message>
<stderr_txt>
# GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 970
# ECC : Disabled
# Global mem : 4096MB
# Capability : 5.2
# PCI ID : 0000:09:00.0
# Device clock : 1367MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# Driver version : r381_64 : 38178
# GPU 0 : 68C
# GPU 1 : 67C
# GPU 2 : 54C
# GPU 2 : 57C
# GPU 2 : 58C
# GPU 2 : 59C
# GPU 1 : 68C
# GPU 2 : 60C
# GPU 1 : 69C
# GPU 1 : 71C
# GPU 1 : 72C
# GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 0 :
# Name : GeForce GTX 970
# ECC : Disabled
# Global mem : 4096MB
# Capability : 5.2
# PCI ID : 0000:09:00.0
# Device clock : 1367MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# Driver version : r381_64 : 38178
# GPU 0 : 67C
# GPU 1 : 66C
# GPU 2 : 58C
# GPU 1 : 69C
# GPU 2 : 59C
# GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3212] VERSION [80]
# SWAN Device 1 :
# Name : GeForce GTX 660 Ti
# ECC : Disabled
# Global mem : 3072MB
# Capability : 3.0
# PCI ID : 0000:07:00.0
# Device clock : 1045MHz
# Memory clock : 3004MHz
# Memory width : 192bit
# Driver version : r381_64 : 38178
SWAN : FATAL Unable to load module .nonbonded.cu. (300)

</stderr_txt>
]]>

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46950 - Posted: 16 Apr 2017 | 4:43:35 UTC - in response to Message 46921.

I was expecting it to work on 64 bit XP, actually. Given that it doesn't there's not a tremendous amount I can do to fix it immediately.

We haven't had an XP test platform for a long time: Microsoft's ended support for it 3 years ago! You really should upgrade...
Matt

Matt,
It's a bit off-topic, but let me explain:
These Windows XP x64 hosts are dedicated crunching boxes (therefore it does not matter if their OS is not supported anymore). A lot of effort have been put into them to make the GTX 980Ti work under Windows XP, selecting the right MB, "hacking" the NV driver to recognize the top-end cards, etc. The reason for *not* to upgrade them from Windows XP is to maximize their throughput (avoiding WDDM).

... But for now, if you could make a fresh CUDA 6.5 client, that would be great (and it would save us a lot of work).
Thank you in advance!

I can only fully underline what Zoltan is saying, and hope that all the many crunchers using XP for good reason will be able to continue for a while.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46951 - Posted: 16 Apr 2017 | 6:15:40 UTC
Last modified: 16 Apr 2017 | 6:35:10 UTC

I now tried task
e4s8_e2s3p0f357-ADRIA_FOLDGREED10_crystal_ss_contacts_100_ubiquitin_4-1-2-RND4569_2

on my Windows 10 64-bit, driver 376.53, acemd 918.80.

It errored out after 1.30 seconds:
(unknown error) - exit code -1073741790 (0xc0000022).

Why so?

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46952 - Posted: 16 Apr 2017 | 7:05:06 UTC - in response to Message 46951.
Last modified: 16 Apr 2017 | 7:28:30 UTC

on my Windows 10 64-bit, driver 376.53, acemd 918.80.

I now updated the driver to 381.65 and downloaded

e5s6_e3s4p0f494-ADRIA_FOLDGREED50_crystal_ss_contacts_100_ubiquitin_3-0-2-RND2192_0

It's been running well for 10 minutes ... so let's keep our fingers crossed.
The card is a GTX970.

Still I hope that a solution can be found for XP.

[CSF] Thomas H.V. Dupont
Send message
Joined: 20 Jul 14
Posts: 525
Credit: 55,667,800
RAC: 71,872
Level
Thr
Scientific publications
watwatwat
Message 46956 - Posted: 16 Apr 2017 | 7:52:58 UTC

Windows 10/64-bit
GenuineIntel Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
NVIDIA GeForce GTX 960M (2048MB) driver: 381.65

Running for 17 minutes
No problem for now
Fingers crossed
____________
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES
www.crunchersansfrontieres.org

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 46957 - Posted: 16 Apr 2017 | 9:30:29 UTC - in response to Message 46952.
Last modified: 16 Apr 2017 | 9:39:39 UTC

It's been running well for 10 minutes ... so let's keep our fingers crossed.
The card is a GTX970.

And, unfortunately, even with the new (latest) driver I am experiencing the same problem that I am having since some time ago, and which I desribed in this thread:

http://www.gpugrid.net/forum_thread.php?id=4511#46686

After a while (today: after about 2 hours) the GPU clock automatically drops to the "default clock" value 1152 MHz (and power consumption dropping to about 58%). And this can only be changed back to a higher value (via NVIDIA Inspector) after a restart of the PC.
BTW, the same thing happens with the GTX750Ti in the other Windows10 PC.

This has never ever happened with my two Windows XP PCs.
So one more reason NOT to give up XP by switching to Windows10 !!!

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46960 - Posted: 16 Apr 2017 | 12:07:39 UTC - in response to Message 46939.

OK, when I start upgrading my other two tomorrow morning I'll start with 372.54, and if that works, probably stick at 373.06 (first and last of the 372 series respectively).

I can confirm that both these drivers allow v9.18 (cuda80) tasks to download and run on my GTX 970s under Windows 7. I'll settle on 373.06 - last bugfix for the series.

Technically speaking, these are major version 370 drivers, according to the release notes.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46961 - Posted: 16 Apr 2017 | 12:28:51 UTC
Last modified: 16 Apr 2017 | 12:30:07 UTC

But Richard,

Instead of playing with R370, you could be playing with R375, or R378, or even R381 --- the installer is 415 MB now, and they're only introducing about 2 new bugs for each one they fix! :)

I have every installer and release note, all the way back to 280.26. It's taking 37.3 GB of space.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46962 - Posted: 16 Apr 2017 | 12:31:07 UTC - in response to Message 46961.
Last modified: 16 Apr 2017 | 12:36:06 UTC

I'm happy just bumping along the bottom - I'll leave the stratosphere to you :)

It really does make it worth investing in a 50 Mbits internet connection and SSD system drives, doesn't it? I wonder who's got shares in who?

And I'm glad GPUGrid got the project internet connection sorted before we all had to upgrade to cuda 8.0 - I'm getting that 140 MB cufft64_80.dll in about 40 seconds.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46963 - Posted: 16 Apr 2017 | 12:36:16 UTC - in response to Message 46962.
Last modified: 16 Apr 2017 | 12:38:01 UTC

I'm happy just bumping along the bottom - I'll leave the stratosphere to you :)


Indeed. Just look at all these PRETTY numbers!
There not a non-alpha version to be found! Just the way I like it! :)

4/15/2017 3:21:41 PM | | Starting BOINC client version 7.7.2 for windows_x86_64
4/15/2017 3:21:41 PM | | This a development version of BOINC and may not function properly
4/15/2017 3:21:41 PM | | log flags: file_xfer, sched_ops, task, scrsave_debug, unparsed_xml
4/15/2017 3:21:41 PM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
4/15/2017 3:21:41 PM | | Data directory: E:\BOINC Data
4/15/2017 3:21:41 PM | | Running under account jacob
4/15/2017 3:21:42 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 980 Ti driver version (381.78, CUDA version 8.0, compute capability 5.2, 4096MB, 3962MB available, 7271 GFLOPS peak)
4/15/2017 3:21:42 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 980 Ti driver version (381.78, CUDA version 8.0, compute capability 5.2, 4096MB, 3962MB available, 6060 GFLOPS peak)
4/15/2017 3:21:42 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 980 Ti driver version (381.78, device version OpenCL 1.2 CUDA, 6144MB, 3962MB available, 7271 GFLOPS peak)
4/15/2017 3:21:42 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 980 Ti driver version (381.78, device version OpenCL 1.2 CUDA, 6144MB, 3962MB available, 6060 GFLOPS peak)
4/15/2017 3:21:42 PM | | Host name: Speed
4/15/2017 3:21:42 PM | | Processor: 16 GenuineIntel Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz [Family 6 Model 63 Stepping 2]
4/15/2017 3:21:42 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 dca pbe fsgsbase bmi1 smep bmi2
4/15/2017 3:21:42 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.16176.00)
4/15/2017 3:21:42 PM | | Memory: 63.90 GB physical, 73.40 GB virtual
4/15/2017 3:21:42 PM | | Disk: 300.00 GB total, 222.52 GB free
4/15/2017 3:21:42 PM | | Local time is UTC -4 hours
4/15/2017 3:21:42 PM | | VirtualBox version: 5.0.37

John
Send message
Joined: 15 Oct 11
Posts: 16
Credit: 73,362,928
RAC: 32,733
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46964 - Posted: 16 Apr 2017 | 13:13:57 UTC - in response to Message 46950.

I was expecting it to work on 64 bit XP, actually. Given that it doesn't there's not a tremendous amount I can do to fix it immediately.

We haven't had an XP test platform for a long time: Microsoft's ended support for it 3 years ago! You really should upgrade...
Matt

Matt,
It's a bit off-topic, but let me explain:
These Windows XP x64 hosts are dedicated crunching boxes (therefore it does not matter if their OS is not supported anymore). A lot of effort have been put into them to make the GTX 980Ti work under Windows XP, selecting the right MB, "hacking" the NV driver to recognize the top-end cards, etc. The reason for *not* to upgrade them from Windows XP is to maximize their throughput (avoiding WDDM).

... But for now, if you could make a fresh CUDA 6.5 client, that would be great (and it would save us a lot of work).
Thank you in advance!

I can only fully underline what Zoltan is saying, and hope that all the many crunchers using XP for good reason will be able to continue for a while.



+ 1

Finrond
Send message
Joined: 26 Jun 12
Posts: 5
Credit: 311,739,014
RAC: 36
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 46965 - Posted: 16 Apr 2017 | 13:42:22 UTC

Damn, guess this means I will have to move off the 359.06 drivers eh?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46966 - Posted: 16 Apr 2017 | 13:53:57 UTC

CC 3.0 ... still isn't working on 9.18 ...

Letunchik
Send message
Joined: 29 Dec 16
Posts: 2
Credit: 51,339,325
RAC: 250
Level
Thr
Scientific publications
wat
Message 46978 - Posted: 17 Apr 2017 | 18:46:26 UTC - in response to Message 46856.

I have the same situation. I was told, that the reason is in obsolete .dll file and bad WU at the server. The problem must be solved within a week or so.

klepel
Send message
Joined: 23 Dec 09
Posts: 126
Credit: 1,704,805,812
RAC: 1,863,487
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46979 - Posted: 17 Apr 2017 | 18:53:59 UTC

I can confirm, that the driver version: 373.06 solves the problem for my GTX 970 cards, but not for the GTX 670. It does not download any new WU on this computer.

When will we receive a new app for this type of cards?

Letunchik
Send message
Joined: 29 Dec 16
Posts: 2
Credit: 51,339,325
RAC: 250
Level
Thr
Scientific publications
wat
Message 46984 - Posted: 17 Apr 2017 | 21:37:22 UTC

After two days of useless requests my computer finally got a new job in GPUGrid. But new `long run` task lasts three times longer, than any previous `long run`! With the same amount of calculations (5 000 000 GFLOPs) on the same machine it is running for 27-29 hours instead of 9-10 before.
My old GTX680 perfectly fulfilled the calculations in the GPUGrid until April 14, 2017. Now it has become practically useless in this project.
Although it still shows high results in Einstein@Home.
If this changes are the result of "code optimization", then it is negative.
Please think about the owners of old Nvidia GPUs (with Compute Capability 3.0 and 2.1). Many people still use them.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46986 - Posted: 17 Apr 2017 | 22:11:02 UTC - in response to Message 46984.

Unfortunately, the "same amount of calculations (5 000 000 GFLOPs)" applies to all tasks assigned to the long queue, and isn't adjusted to reflect the complexity (duration) of the task - even though tasks are assessed before the run starts, so that a proportionate amount of credit can be issued.

It would be a significant improvement to the way this project runs alongside other projects under BOINC, if the task calculation estimate could be adjusted as well as the credit award.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1814
Credit: 9,970,837,494
RAC: 6,545,179
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46987 - Posted: 17 Apr 2017 | 22:14:19 UTC - in response to Message 46984.
Last modified: 17 Apr 2017 | 22:15:44 UTC

After two days of useless requests my computer finally got a new job in GPUGrid. But new `long run` task lasts three times longer, than any previous `long run`! With the same amount of calculations (5 000 000 GFLOPs) on the same machine it is running for 27-29 hours instead of 9-10 before.
The workunits won't take longer than before, only the estimation of the remaining time (what you see) is miscalculated. This time estimation will normalize after a couple of workunits done. As this is a new app, so the BOINC manager has to learn the duration correction ratio for this version. Alternatively you can make an app_config.xml to instruct the BOINC manager to calculate the remaining time based on the fraction done and the time elapsed.

Copy the following to the clipboard:
notepad c:\ProgramData\BOINC\projects\www.gpugrid.net\app_config.xml

Press Windows key + R, then paste and press [enter].
If you see an empty file then copy & paste the following text:
<app_config> <app> <name>acemdlong</name> <fraction_done_exact/> </app> <app> <name>acemdshort</name> <fraction_done_exact/> </app> <app> <name>acemdbeta</name> <fraction_done_exact/> </app> </app_config>

If you already have an app_config.xml, then you should only insert the line
<fraction_done_exact/>
after each line containing the name of the application.

Click file -> save and click [save].
Open the BOINC manager, click Options -> read config files.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 47000 - Posted: 18 Apr 2017 | 5:49:31 UTC

Zoltan, thanks for the "fraction_done_exact" app_config.xml, seems very useful.

On one of my PCs, there already is:

<app_config>
<app>
<name>acemdshort</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>

so, in this case, how do I include the "fraction_done_exact" part so that it works not only for acemdshort, but also for acemdlong ?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47002 - Posted: 18 Apr 2017 | 6:24:14 UTC
Last modified: 18 Apr 2017 | 6:32:10 UTC

Erich56:

1) See here:
https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration
... There's a pretty example of what an app_config.xml file is supposed to look like.

2) Learn to read XML :) Seriously, you can do it, I promise. Look carefully at his example - You'll see that the fraction_done_exact element lives within the app block. And, when declaring an element that isn't a block, you can put a slash at the end of it to not need a separate terminator for that element.... ie: <fraction_done_exact/>

So, if your XML file declares multiple app blocks (say x blocks), and you want all of them to use this setting, then you'll need to add the line x times, in the right places.

Looking at what you pasted, it seems you will need to create the acemdlong app block entirely.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 47004 - Posted: 18 Apr 2017 | 8:42:29 UTC - in response to Message 47002.

So, if your XML file declares multiple app blocks (say x blocks), and you want all of them to use this setting, then you'll need to add the line x times, in the right places.

Looking at what you pasted, it seems you will need to create the acemdlong app block entirely.

okay, all clear, thx, seems to work :-)

Loohi
Send message
Joined: 27 Aug 16
Posts: 16
Credit: 43,745,875
RAC: 24
Level
Val
Scientific publications
wat
Message 47008 - Posted: 18 Apr 2017 | 13:46:11 UTC
Last modified: 18 Apr 2017 | 13:47:51 UTC

Anyone having some weird GPU usage lately? it's stable at 90% for like 3 mins, then 0% (while still showing "Running" in BOINC) for a long time.

Aborted 2 units thinking they were faulty (contact goals), but i'm seeing same behaviour with ADRIA unit at the moment.

EDIT: running latest Nvidia drivers and 9.18 app

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 47017 - Posted: 18 Apr 2017 | 15:17:56 UTC - in response to Message 47008.

Anyone having some weird GPU usage lately?

Yes, this afternoon I happened to notice this behavour on one of my Windows 10 machines - latest software, latest driver.
Since I was out for a while, I cannot tell for how long time the CPU usage (as seen in the Windows Task Manager) was at zero.
The task (still running) is a Pablo_contact_goal_KIX. But if you can observe this with a ADRIA task as well, then it seems that the fault may rather be with the new software.

Anyway, what I noticed already is that with the new software, in Windows 10 crunching is a bit slower, and overclocking even less possible than before.

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 47037 - Posted: 19 Apr 2017 | 19:54:13 UTC

this evening, again I noticed this strange behaviour as described above.
The interesting thing is that in the BOINC Manager, the task is shown as "active", although the progress bar does not proceed, and the CPU usage in the Windows task manager is zero.
And after a while, the task resumes crunching.

Variable
Send message
Joined: 20 Nov 13
Posts: 20
Credit: 151,699,255
RAC: 474,179
Level
Ile
Scientific publications
watwatwatwatwat
Message 47059 - Posted: 21 Apr 2017 | 1:24:26 UTC

I'm noticing similar behavior on my machine. Running a GTX 1070. I look at GPU core load directly in HWinfo and it will just sit at 0% load for long periods, but from looking at the task history they seem to be completing eventually. Anyone know what's going on with this?

Loohi
Send message
Joined: 27 Aug 16
Posts: 16
Credit: 43,745,875
RAC: 24
Level
Val
Scientific publications
wat
Message 47120 - Posted: 27 Apr 2017 | 10:52:51 UTC

back at it again

Anyone else getting a bunch of faulty WU?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47121 - Posted: 27 Apr 2017 | 11:20:20 UTC - in response to Message 47120.

back at it again

Anyone else getting a bunch of faulty WU?

No - the WUs seem to be fine at the moment, and your failures since 26 April come from a range of different WU types.

The output of your most recent successful task shows

Driver version : r376_38 : 37653

but your computer now shows

NVIDIA GeForce GTX 970 (4095MB) driver: 381.89

Since you're running Windows 10, I suspect you've suffered from the common 'automatice driver update by Microsoft'. Try updating your driver again, this time direct from the NVidia site.

Loohi
Send message
Joined: 27 Aug 16
Posts: 16
Credit: 43,745,875
RAC: 24
Level
Val
Scientific publications
wat
Message 47123 - Posted: 27 Apr 2017 | 12:56:08 UTC - in response to Message 47121.

Thanks for the detailed answer. Updates have been triggered voluntarily by me, for both Win10 Creator and Nvidia. I'll try to reinstall these drivers now and see if it makes a difference tomorrow.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47125 - Posted: 27 Apr 2017 | 18:58:21 UTC
Last modified: 27 Apr 2017 | 18:58:55 UTC

Just to clarify ....
Work units are NOT FINE on CC3/SM3 GPUs like my GTX 660 Ti GPUs :(

Still waiting for MJH to give us more details on what went wrong, and who must fix it..

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47126 - Posted: 27 Apr 2017 | 19:28:15 UTC - in response to Message 47125.

Is it the workunits (some types? all types?) which fail on your GTX 660 Ti, or the new application?

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 574
Credit: 1,912,707,875
RAC: 1,723,332
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47127 - Posted: 27 Apr 2017 | 20:05:48 UTC - in response to Message 47125.

Just to clarify ....
Work units are NOT FINE on CC3/SM3 GPUs like my GTX 660 Ti GPUs :(

Still waiting for MJH to give us more details on what went wrong, and who must fix it..


I've got my 660ti still running on the 359.06 driver with cuda 6.5 app and works fine.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47128 - Posted: 28 Apr 2017 | 3:41:25 UTC - in response to Message 47127.

Is it the workunits (some types? all types?) which fail on your GTX 660 Ti, or the new application?



Just to clarify ....
Work units are NOT FINE on CC3/SM3 GPUs like my GTX 660 Ti GPUs :(

Still waiting for MJH to give us more details on what went wrong, and who must fix it..


I've got my 660ti still running on the 359.06 driver with cuda 6.5 app and works fine.



The 9.18 (cuda80) app crashes on my GTX 660 Ti GPUs that are in the same PC as my GTX 970. To my knowledge, this machine is intentionally and correctly given 9.18 (cuda80) tasks, but there's a problem with the app.

MJH said:

15 Apr 2017 | 21:43:26 UTC
http://www.gpugrid.net/forum_thread.php?id=4545&nowrap=true#46932
For some reason the sm 3.0 support (and only that sm version) is broken.


17 Apr 2017 | 19:49:15 UTC
http://www.gpugrid.net/forum_thread.php?id=4551&nowrap=true#46981
The peculiar exception for sm 3.0 devices is due to a compiler problem with CUDA 80 that affects only that hardware version. When that's fixed, hosts with a non-XP Windows will get 918.


.....
But I don't know what that means!

Is it a problem that GPUGrid must fix, or is it a problem that NVIDIA must fix?
I feel like nobody is trying to fix it.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47130 - Posted: 28 Apr 2017 | 10:12:13 UTC - in response to Message 47128.

17 Apr 2017 | 19:49:15 UTC
http://www.gpugrid.net/forum_thread.php?id=4551&nowrap=true#46981
The peculiar exception for sm 3.0 devices is due to a compiler problem with CUDA 80 that affects only that hardware version. When that's fixed, hosts with a non-XP Windows will get 918.

.....
But I don't know what that means!

Is it a problem that GPUGrid must fix, or is it a problem that NVIDIA must fix?
I feel like nobody is trying to fix it.

A Compiler is an integral part of the development software used by computer programmers to create useful applications.

In this case, the CUDA 8.0 compiler is maintained and distributed by NVidia to facilitate sales of their hardware products (GPUs). It would be difficult-to-impossible to do anything with a GPU without NVidia's compiler.

The CUDA compiler comprises two parts: the first part, which resides in the 'CUDA toolkit' on Matt's machine, produces intermediate code. The second part, which resides in the drivers on all our machines, converts the universal intermediate code into machine code instructions tailored to the specific hardware found in the target computer.

Matt hasn't identified (in public, at least) which of the two components he believes to be at fault. Since it's hardware-specific, my personal opinion is that it's likely to be the driver-level component - but I've been wrong before.

Either way, both components are the responsibility of NVidia. Any change would have to be implemented and distributed by them.

But you've encountered an age-old problem, previously described in terms of putting new wine into old bottles, or teaching old dogs new tricks. When a complex system relies on two symbiotic components (hardware and software, in this case), to what extent is it realistic to expect that every new pairing will work together ad infinitum?

Personally, I feel it's advantageous to keep computer systems 'balanced' - with hardware and software of a comparable vintage. My trusty and long-serving 9800 GTs have joined my Windows 3 computers in the museum - I haven't tried to convert them to run Cuda 8 or Windows 10. I suggest that, if you feel GTX 660 Ti cards are still energy-efficient enough to be useful, you put them into a chassis with a similar vintage of operating system and a Cuda 6.5 driver.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47131 - Posted: 28 Apr 2017 | 11:28:46 UTC - in response to Message 47130.
Last modified: 28 Apr 2017 | 11:35:42 UTC

Thanks Richard, but ...

My GTX 660 Ti GPUs are supported by the driver version that I use, and the OS that I use, the Cuda version the application was build for, and the application that I'm trying to run.

I expect this to work. It sounds like GPUGrid also expects this to work. It does not work.

My very simple question, remains unanswered:
Is it a problem that GPUGrid must fix, or is it a problem that NVIDIA must fix?

I feel like nobody is trying to fix it.
If it is something NVIDIA must fix, and if GPUGrid gave me enough info to identify the problem, then I could urge my NVIDIA contacts to look at it.

But MJH hasn't released details.

MJH?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 446
Credit: 1,104,179,502
RAC: 1,902,860
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47132 - Posted: 28 Apr 2017 | 12:05:12 UTC

The fix I would prefer is for MJH to limit the relevant GPU application to the newer cards; i.e., Maxwell and later. Whatever "fix" he might come up with may limit the performance of the newer cards, or at least require a lot of his time and effort that might be spent in better ways on new apps.

There will be the usual moaning and groaning, and people will leave. But there are plenty of volunteers anyway, and even more problems. So reduce both.

JoergF
Avatar
Send message
Joined: 20 Apr 15
Posts: 189
Credit: 224,482,586
RAC: 311,412
Level
Leu
Scientific publications
watwat
Message 47133 - Posted: 28 Apr 2017 | 13:58:36 UTC - in response to Message 47132.
Last modified: 28 Apr 2017 | 14:00:48 UTC

The fix I would prefer is for MJH to limit the relevant GPU application to the newer cards; i.e., Maxwell and later.


My two cents... I would agree for the long runs, as it doesnt make sense to run them on an old gtx660 anyway. But not as a general measure for long and short runs. Do we have any statistic about how many Kepler cards are still in use at GPUGRID? I reckon that there are a great many... and therefore we shouldnt jump the gun excluding them.

There will be the usual moaning and groaning, and people will leave. But there are plenty of volunteers anyway, and even more problems. So reduce both.


Well, if there are as many as I suspect (650ti, 660, 660ti, 670, 680), it would be very difficult to compensate that loss of crunching power. I have my doubts.
____________
Die Liebe allein versteht das Geheimnis, andere zu beschenken und dabei selbst reich zu werden. [Clemens von Brentano]
Only love understands the secret of giving and getting richer at the same time [Clemens of Brentano]

Jim1348
Send message
Joined: 28 Jul 12
Posts: 446
Credit: 1,104,179,502
RAC: 1,902,860
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 47134 - Posted: 28 Apr 2017 | 16:55:21 UTC - in response to Message 47133.

OK, that makes sense. I forgot about the short runs, but the Keplers would be quite nice for that.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 241
Credit: 973,621,906
RAC: 3,418,769
Level
Glu
Scientific publications
watwat
Message 47135 - Posted: 28 Apr 2017 | 18:07:48 UTC

Kepler is not nearly old enough to drop support, nor is it inefficient enough, as it's still on 28nm like maxwell. I'm glad they dropped Fermi because of the higher lithography and inefficient architecture.

Loohi
Send message
Joined: 27 Aug 16
Posts: 16
Credit: 43,745,875
RAC: 24
Level
Val
Scientific publications
wat
Message 47138 - Posted: 29 Apr 2017 | 4:23:18 UTC

Despite re-installing nvidia drivers, i'm still facing immediate computation errors since win10 creator's update - since this is not going to change, do you have any recommendations for me to try to start crunching again?

Erich56
Send message
Joined: 1 Jan 15
Posts: 346
Credit: 1,460,919,427
RAC: 2,683,758
Level
Met
Scientific publications
watwatwat
Message 47139 - Posted: 29 Apr 2017 | 5:34:16 UTC - in response to Message 47134.

OK, that makes sense. I forgot about the short runs, but the Keplers would be quite nice for that.

Except that the availablity of short runs has dropped quite a bit lately :-(

Myself, I have already considered to switch to short runs with my two GTX750Ti, since after implementing the latest crunching software (acemd_918.80), the crunching times have inreased considerably, up to almost 60 hours (as noticed also by other members).

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47141 - Posted: 29 Apr 2017 | 8:57:35 UTC - in response to Message 47138.

Despite re-installing nvidia drivers, i'm still facing immediate computation errors since win10 creator's update - since this is not going to change, do you have any recommendations for me to try to start crunching again?

It's beginning to look as if there might be a problem with that 381.89 driver, isn't it? It was only released on 25 April, and I haven't heard about anybody else trying to use it yet.

Maybe other users could post their observations, either way - and while we're waiting, you could try reverting to an older driver to see if that helps. Go to http://www.nvidia.com/Download/Find.aspx, fill in your card and operating system details, and choose from the search result list - anything between 372.54 and 381.65 should be fine. When you run the installer, choose 'custom' installation and check the 'clean install' box just to be on the safe side.

Loohi
Send message
Joined: 27 Aug 16
Posts: 16
Credit: 43,745,875
RAC: 24
Level
Val
Scientific publications
wat
Message 47142 - Posted: 29 Apr 2017 | 12:05:45 UTC - in response to Message 47141.
Last modified: 29 Apr 2017 | 12:06:01 UTC

yeah maybe ill try that, but it's hard since after 2 faulty WUs, i have to wait another 24hr to get the next ones.

Loohi
Send message
Joined: 27 Aug 16
Posts: 16
Credit: 43,745,875
RAC: 24
Level
Val
Scientific publications
wat
Message 47143 - Posted: 30 Apr 2017 | 7:39:51 UTC - in response to Message 47142.

I went back and saw that successful WU were performed with the latest Nvidia drivers (also my current one now), so it's fair to assume that win 10 creators update is the culprit... Since nothing else changed. Does that basically mean that I'm not gonna be able to do any work until gpugrid makes win 10 creators update compatible? I fear this might take a long time...

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47153 - Posted: 1 May 2017 | 18:34:36 UTC - in response to Message 47131.
Last modified: 1 May 2017 | 18:41:13 UTC

Thanks Richard, but ...

My GTX 660 Ti GPUs are supported by the driver version that I use, and the OS that I use, the Cuda version the application was build for, and the application that I'm trying to run.

I expect this to work. It sounds like GPUGrid also expects this to work. It does not work.

My very simple question, remains unanswered:
Is it a problem that GPUGrid must fix, or is it a problem that NVIDIA must fix?

I feel like nobody is trying to fix it.
If it is something NVIDIA must fix, and if GPUGrid gave me enough info to identify the problem, then I could urge my NVIDIA contacts to look at it.

But MJH hasn't released details.

MJH?




Request for users affected by "9.18 (cuda80)" app instantly failing:

My NVIDIA contact has a request:

Please fill out the Driver Feedback survey below, if you are affected by the GPUGrid "9.18 (cuda80)" app immediately failing with "Computation Error" on your GPU. This helps them assign priority when fixing issues. Be thorough when filling it out, please.

http://surveys.nvidia.com/index.jsp?pi=6e7ea6bb4a02641fa8f07694a40f8ac6

Thanks,
Jacob

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 574
Credit: 1,912,707,875
RAC: 1,723,332
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47154 - Posted: 1 May 2017 | 20:27:23 UTC - in response to Message 47153.

I have read somwere else that scientists have a major problem comminicating with ordinary people (i mean thick) and all the problems with this project seem to bare this out.

That's why science and the majority will never meet and more darkly science will be rejected by the majority.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47164 - Posted: 3 May 2017 | 20:03:57 UTC

Guess who's going to download the 1.2 GB Cuda 8.0 toolkit, and install the 8 GB Visual Studio 2015 Community Edition IDE, in attempt to repro the SM3/CC3 compiler issues using the Cuda Toolkit samples?

Yeah. Me. I'm hardcore sometimes.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47166 - Posted: 4 May 2017 | 5:28:35 UTC - in response to Message 47164.
Last modified: 4 May 2017 | 5:37:30 UTC

MJH (et. al):

I have concluded my exhaustive Cuda 8.0 SDK testing. On my Win10 x64 Build 16184 PC (with GTX970, GTX660Ti, GTX660Ti), I installed VS2015 Community, installed the Cuda 8.0 Toolkit and samples, installed the DirectX SDK, then built all of the Cuda solutions.

There are 155 Cuda samples that I was able to compile and test with. And I went through them, 2 times:
1) 381.89 - GTX970, GTX660Ti, GTX660Ti
2) 381.89 - GTX660Ti, GTX660Ti (I pulled the GTX970 out of the system)

Out of the 155 samples, they all passed on both runs.. except VFlockingD3D10 did not look correct on my GTX660Ti but looked fine on my GTX970. All other calculations and samples worked fine, even on a GTX660Ti.

This leads me to believe that the GPUGrid problem with the "9.18 (cuda80)" app, where it errors out immediately on a system that has a CC3.0/SM3 GPU .... might not be an NVIDIA problem. It might be a problem with your app.

Is it possible you are calling some method or function, that isn't supported by CC3.0/SM3?

I'm desperately wanting you to provide more info. I'm spending considerable effort to help you solve this, yet my questions to you go on unanswered. I hope you're making progress with a fix - please consider chiming in with your findings.

Jacob Klein

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 775
Credit: 1,317,539,795
RAC: 1,364,639
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47176 - Posted: 6 May 2017 | 8:42:04 UTC - in response to Message 47141.

It's beginning to look as if there might be a problem with that 381.89 driver, isn't it?

A user on the BOINC message boards says that BSOD problems with driver 381.89 stopped after updating to 382.05

(he also says he's upgraded from BOINC v7.6.33 to v7.7.2, but I'd caution against that - v7.7.2 was a highly experimental test build. v7.6.33 has been around for a long time, and is very unlikely to be implicated in recent changes to GPU behaviour)

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1049
Credit: 1,064,233,664
RAC: 957,892
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47177 - Posted: 6 May 2017 | 12:55:05 UTC - in response to Message 47176.
Last modified: 6 May 2017 | 12:55:41 UTC

That same guy posts a ton in the NVIDIA Driver Feedback threads on their forums, about GPU apps crashing whenever he closes BOINC. I've tried to help him a few times before, but it seems he doesn't know how to isolate problems and troubleshoot them very well. I don't think he tries very hard to reliably reproduce the problems that he has.

So ... I'd be cautious about his words being more "noise", instead of confirmations of problems or solutions.

Post to thread

Message boards : Number crunching : all WUs downloaded recently produce "computation error" right away