GTX 770 won't get work

Message boards : Number crunching : GTX 770 won't get work
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Michael H.W. Weber

Send message
Joined: 9 Feb 16
Posts: 78
Credit: 656,229,684
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47321 - Posted: 25 May 2017, 20:05:34 UTC - in response to Message 47318.  

Do you have the "Use NVidia GPU" and the "Use Graphics Processing Unit (GPU) if available" selected in GPUGrid preferences?

Yes.

Do you have at least 8 GB disk space in the partition the BOINC data directory resides?

Will check, but would quite certainly say yes.

How many other GPU project this host is attached to?

Just Primegrid and GPUGRID but even if I suspend Primegrid, the machine won't fetch work for GPUGRID.

You could try to increase the work buffer (it is set to 1 day now) for testing.

I did, but it won't change the situation even if I set the work buffer to 10/10 days.

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 47321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 261
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47322 - Posted: 25 May 2017, 22:17:34 UTC - in response to Message 47321.  

Getting new work is a two-part collaboration between your computer and the project server.

The first necessary condition is that your computer requests new work. The work fetch log has confirmed that your machine is requesting work for your NVidia GPU - job done. You can turn that logging off now, and save some disk space and processing cycles.

The second necessary condition is that the server responds by allocating new work - which it isn't. The question is - why not?

One more to check - are you allowing 'ACEMD long runs' (project preferences)? Short run jobs are as rare as hen's teeth these days.

After that, it's a question of verifying that your GPU's 'compute capability' and graphics driver together match the current minimum project requirements.
ID: 47322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47323 - Posted: 25 May 2017, 23:13:48 UTC - in response to Message 47322.  
Last modified: 25 May 2017, 23:41:34 UTC

After that, it's a question of verifying that your GPU's 'compute capability' and graphics driver together match the current minimum project requirements.
It's been done. Moreover this host has already successfully completed a single CUDA8.0 task, but no more sent by the project.

I've experienced the same behavior once when I was trying my GTX 1080 under Linux. I thought then that I'd messed up something while trying to make the SWAN_SYNC work under Linux (well, I couldn't). This host stopped receiving work, even if GPUGrid was the only project on this host all the time. Then I've installed Windows 10 on the same hardware, and it has received work, and it haven't stopped receiving work after I've set SWAN_SYNC on.
ID: 47323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 9 Feb 16
Posts: 78
Credit: 656,229,684
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47324 - Posted: 26 May 2017, 3:38:05 UTC - in response to Message 47322.  

One more to check - are you allowing 'ACEMD long runs' (project preferences)?

As said above: Yes, ALL GPUGRID subprojects are allowed for this machine.

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 47324 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47325 - Posted: 26 May 2017, 7:56:13 UTC - in response to Message 47324.  

I would detach from everything and reattach only to GPUGrid as Retvari suggests. If that doesn't work, you have found the one hardware/software configuration that just doesn't obey the rules as we know them. It happens as you know.
ID: 47325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 9 Feb 16
Posts: 78
Credit: 656,229,684
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47326 - Posted: 26 May 2017, 12:58:31 UTC - in response to Message 47325.  
Last modified: 26 May 2017, 13:00:36 UTC

I would detach from everything and reattach only to GPUGrid as Retvari suggests.

I detached from GPUGRID, rebooted the system and re-attached to GPUGRID. No improvement.

If that doesn't work, you have found the one hardware/software configuration that just doesn't obey the rules as we know them. It happens as you know.

No, I do not know or accept that. This is science, not homeopathy (although homeopathy at least offers the placebo effect (for those of us who are believers) which can't principally be excluded to be something scientifically accessable, too - although we still have not found a clue how that might be possible).

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 47326 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47327 - Posted: 26 May 2017, 13:51:02 UTC - in response to Message 47326.  
Last modified: 26 May 2017, 13:58:49 UTC

I am not sure you followed the instructions. Homeopathy is your idea, not mine.

However, if your enthusiastic to spend more time, I would try earlier drivers (still CUDA 8). Nvidia may have introduced problems in the later ones. I have seen it on other projects on occasion.
ID: 47327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47328 - Posted: 27 May 2017, 1:55:26 UTC
Last modified: 27 May 2017, 1:55:47 UTC

It's frustrating that the server doesn't give more details in its reply.

I think your problem can only be investigated further by the project admins, who really should throw us more bones in the server replies in the Event Log, to further explain WHY tasks were not given.
ID: 47328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47329 - Posted: 27 May 2017, 7:11:49 UTC - in response to Message 47326.  

I would detach from everything and reattach only to GPUGrid as Retvari suggests.

I detached from GPUGRID, rebooted the system and re-attached to GPUGRID. No improvement.

If that doesn't work, you have found the one hardware/software configuration that just doesn't obey the rules as we know them. It happens as you know.

No, I do not know or accept that. This is science, not homeopathy (although homeopathy at least offers the placebo effect (for those of us who are believers) which can't principally be excluded to be something scientifically accessable, too - although we still have not found a clue how that might be possible).

Michael.


Are you sure its not your work buffer or some other config in BOINC
ID: 47329 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 9 Feb 16
Posts: 78
Credit: 656,229,684
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47330 - Posted: 27 May 2017, 13:32:39 UTC - in response to Message 47329.  

Are you sure its not your work buffer or some other config in BOINC

Yes, I am sure about that.
This machine just did not receive any work anymore from one day to the other without me having altered any of the BOINC or project settings.

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 47330 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mindcrime

Send message
Joined: 27 Feb 14
Posts: 4
Credit: 121,376,887
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwat
Message 47332 - Posted: 27 May 2017, 21:20:22 UTC

I have two 750ti's on different machines at different physical locations using the same BOINC and GPUGrid settings/prefs. I noticed one was getting GPUGrid work and the other wasn't. After a couple of days of not getting work I began to investigate. I found this thread did some reading and noticed the driver difference between the two. I updated the driver to the newest (382.33) and got work.

TL;DR update your drivers.
ID: 47332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 9 Feb 16
Posts: 78
Credit: 656,229,684
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47333 - Posted: 28 May 2017, 5:21:58 UTC - in response to Message 47323.  

Moreover this host has already successfully completed a single CUDA8.0 task...

How did you actually find out about that?
I could't see any of the completed WUs in my client's history even before I started posting this thread.

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 47333 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47334 - Posted: 28 May 2017, 6:46:18 UTC

First post has a link to a host. Then on that page, you can click Application Details, to see application details for that host.
ID: 47334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 9 Feb 16
Posts: 78
Credit: 656,229,684
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47335 - Posted: 28 May 2017, 10:54:17 UTC - in response to Message 47334.  

First post has a link to a host. Then on that page, you can click Application Details, to see application details for that host.

Indeed. Never checked that link.
One question though: Why aren't all the tasks completed using CUDA 6.5 indicated as valid (although they all were valid)?

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 47335 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47336 - Posted: 28 May 2017, 11:14:07 UTC - in response to Message 47335.  
Last modified: 28 May 2017, 11:14:33 UTC

First post has a link to a host. Then on that page, you can click Application Details, to see application details for that host.

Indeed. Never checked that link.
One question though: Why aren't all the tasks completed using CUDA 6.5 indicated as valid (although they all were valid)?

Michael.


Your assumption, that they were all valid, seems invalid :)

From my experience, if a task is suspended+resumed, or stopped+resumed, then it has a chance of being invalid, even if you watched it complete without error. Something in the validator must not like the output, sometimes, when those scenarios happen.

Getting back on topic, I'm sure that GPUGrid changed their logic to decide when to give hosts work, and I'm fairly certain that "driver version detected" has a hand in that criteria. I wonder if they screwed something up for the app version criteria, for the 700-series-GPUs on linux?

Also, can you see if you can upgrade your driver (I looked briefly and there might be a minor update available to you).
ID: 47336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 9 Feb 16
Posts: 78
Credit: 656,229,684
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47338 - Posted: 29 May 2017, 12:56:23 UTC - in response to Message 47336.  
Last modified: 29 May 2017, 13:02:57 UTC

Your assumption, that they were all valid, seems invalid :)

Not really. They generated at least 73 billions of credits, so a few should have been OK. :)

The point is that not a single valid task is listed (and none invalid, too).

Getting back on topic, I'm sure that GPUGrid changed their logic to decide when to give hosts work, and I'm fairly certain that "driver version detected" has a hand in that criteria. I wonder if they screwed something up for the app version criteria, for the 700-series-GPUs on linux?

Also, can you see if you can upgrade your driver (I looked briefly and there might be a minor update available to you).

Two things:

(1) the NVIDIA proprietary driver is updated from time to time using auto-update of Ubuntu. I actually do not like to change this manually as everything except for GPUGRID works perfectly.
(2) This GTX 770 machine uses the same driver as my GTX 970 machine. The latter does receive tasks on a daily basis, the former not. So, I don't really see a reason for why the current driver should be the problem. Especially since, as stated above, even the GTX 770 completed a CUDA 8 WU successfully.

But why should I care?
It is not my project and the GTX 770 now contributes to some other project until the GPGRID team decides to do something in order to keep or increase their number of contributers.
I find it really kind of strange that - if I got it correctly - so far this topic has exclusively been discussed by volunteers? Thank you guys, I think you did your best.

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 47338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 261
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47339 - Posted: 29 May 2017, 13:53:00 UTC - in response to Message 47338.  

The point is that not a single valid task is listed (and none invalid, too).

Don't worry about that. Task data is kept in a short-term 'transactional' database, and purged (to save space and processing time) when no longer needed - usually after 10 days or so.

The important scientific data is transferred to a long-term scientific database and kept indefinitely.

But from the same 'application details' link for your machine, we can see for cuda65 (long tasks):

Number of tasks completed	249
Max tasks per day	1
Number of tasks today	0
Consecutive valid tasks	0

'Max tasks per day' and 'consecutive valid tasks' together imply that your machine produced a considerable number of invalid tasks at some point: no shame in that, we all did the same thing when the cuda65 licence expired, but it shows the sort of inferences you can draw.

Two things:

(1) the NVIDIA proprietary driver is updated from time to time using auto-update of Ubuntu. I actually do not like to change this manually as everything except for GPUGRID works perfectly.
...

I'm not a Linux user, but I have read comments that Linux GPU drivers tend to be compiled against a specific Linux kernel. If your kernel also auto-updates, you may need to take precautions to ensure that your kernel and driver updates are kept in sync.
ID: 47339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 9 Feb 16
Posts: 78
Credit: 656,229,684
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47342 - Posted: 30 May 2017, 10:28:50 UTC - in response to Message 47339.  
Last modified: 30 May 2017, 10:36:55 UTC

But from the same 'application details' link for your machine, we can see for cuda65 (long tasks):

Number of tasks completed	249
Max tasks per day	1
Number of tasks today	0
Consecutive valid tasks	0

'Max tasks per day' and 'consecutive valid tasks' together imply that your machine produced a considerable number of invalid tasks at some point: no shame in that, we all did the same thing when the cuda65 licence expired, but it shows the sort of inferences you can draw.

Hm, I do not understand how that conclusion can be drawn. The number of tasks is anyway limited to two per day by the GPUGRID server. The GTX 770 mostly got long runs, so it rarely can complete more than one per day. Moreover, I virtually checked the machine and its output on a daily basis during the entire year 2016 and early 2017. Rarely have I seen an invalid task, and when it happened, I caused it by accidentally updating the system inlcuding NVIDIA drivers during full DC operation.
I must confess, though, that around the time when GPUGRID stopped sending tasks to my system, I had not checked regularly for probably a few weeks.

IF there had been many, many consecutive errors at the transition from CUDA 6.5 to 8.0, wouldn't it be possible that some information flag is stored somewhere locally on my machine (or on the GPUGRID server) that causes my system being marked as permanently unreliable? And that this flag somewhow has not yet been removed and now causes WUs not to be sent? Hm, probably also not the case as it completed a CUDA 8.0 task...


I'm not a Linux user, but I have read comments that Linux GPU drivers tend to be compiled against a specific Linux kernel. If your kernel also auto-updates, you may need to take precautions to ensure that your kernel and driver updates are kept in sync.

See, that is exactly why I am hesitant to manually install a more recent NVIDIA driver: When you use the console to update the whole system, everything is coordinately (!) brought to the most recent state. A new kernel plus the corresponding and tested GPU driver will be delivered.

For now, I will just wait and see whether GPUGRID will again retrieve WUs for my GTX 770 after a future system update with even more recent drivers than the ones I currently have in use. Until then, other DC projects will be supported.

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 47342 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47345 - Posted: 30 May 2017, 20:04:07 UTC - in response to Message 47342.  
Last modified: 30 May 2017, 20:05:15 UTC

IF there had been many, many consecutive errors at the transition from CUDA 6.5 to 8.0, wouldn't it be possible that some information flag is stored somewhere locally on my machine (or on the GPUGRID server) that causes my system being marked as permanently unreliable? And that this flag somewhow has not yet been removed and now causes WUs not to be sent? Hm, probably also not the case as it completed a CUDA 8.0 task...
This came to my mind too. Perhaps you should try to force the BOINC manager to request a new host ID for your host. You can do it by stopping the BOINC manager, editing the client_state.xml, searching for <hostid>342877</hostid>, and replace the number to the number of a previous host of yours (or a random number, if you don't have an older host), saving the client_state.xml, and restaring the BOINC manager.
ID: 47345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 9 Feb 16
Posts: 78
Credit: 656,229,684
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 47346 - Posted: 31 May 2017, 8:27:39 UTC

Thanks for this suggestion.

A second idea of mine was that the client indicate a GPU memory of 1998 MB instead of the expected 2048 MB.
What is the minimum V-RAM which GPUGRID requires to send tasks, is this value stored somewhere in the BOINC system files and can it be modified without causing trouble?

Michael.
President of Rechenkraft.net - Germany's first and largest distributed computing organization.
ID: 47346 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : GTX 770 won't get work

©2025 Universitat Pompeu Fabra