Advanced search

Message boards : Number crunching : RTX 2070 Super is not progressing

Author Message
GWGeorge007
Avatar
Send message
Joined: 4 Mar 23
Posts: 10
Credit: 2,767,158,000
RAC: 7,728,822
Level
Phe
Scientific publications
wat
Message 60030 - Posted: 7 Mar 2023 | 16:44:53 UTC

Hello,

I have two desktop computers, and I am on Linux Ubuntu 20.04.5 LTS [5.15.0-60-generic|libc 2.31] and I am also running BOINC with two RTX 2070 Supers with one of them setup to run GPUGRID. It has 8GB of VRAM, and it is running, but not making any progress.

I also have an RTX 3080-Ti that is running GPUGRID, and it is working just fine. It is running Linux Ubuntu 22.04.1 LTS [6.1.4-060104-generic|libc 2.35] and also running BOINC.

I thought that I had the two desktop systems setup correctly and the same way. Apparently, either I don't or there is something amiss with the RTX 2070 Super.

Any help would be greatly appreciated.

GWGeorge007

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,690,286,770
RAC: 13,249,233
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60032 - Posted: 7 Mar 2023 | 21:30:04 UTC - in response to Message 60030.

How many tasks per card are you running? 1X or 2X?

Is the issue that only one task is progressing?

I see you have returned many tasks today on that host and after your post.

Have you looked at the properties of the task in the Manager?

Do have any checkpoints listed for the "stalled" task?

Have you looked in the "stalled" task slot yet for the progess indicator files?

GWGeorge007
Avatar
Send message
Joined: 4 Mar 23
Posts: 10
Credit: 2,767,158,000
RAC: 7,728,822
Level
Phe
Scientific publications
wat
Message 60040 - Posted: 8 Mar 2023 | 20:03:44 UTC - in response to Message 60032.

"How many tasks per card are you running? 1X or 2X?"
I am only running 1X tasks.

"Is the issue that only one task is progressing?"
Yes, only a single task is progressing.

"I see you have returned many tasks today on that host and after your post."
https://www.gpugrid.net/result.php?resultid=33342546 - Only one done on 3950X.

"Have you looked at the properties of the task in the Manager?"
Yes, I do have permissions.

"Do have any checkpoints listed for the "stalled" task?"
Not that I can tell, but I don't see any in my completed task either.

"Have you looked in the "stalled" task slot yet for the progress indicator files?"
?????? "stalled" task slot ??????

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,690,286,770
RAC: 13,249,233
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60044 - Posted: 9 Mar 2023 | 2:27:36 UTC - in response to Message 60040.
Last modified: 9 Mar 2023 | 2:36:56 UTC

Click on the stalled task in the Manager and in the left panel, select Properties.

Identify which slot the task is running and navigate to that slot.

Look for the wrapper_checkpoint.txt file and see if it has any value in it other than 0.00000 seconds.

Then do the same for the progress.chk file.

Open a Terminal. Run nvidia-smi. Look for the card with the stalled task in the list and note the utilization, amount of memory being used and power being used.

Are they all above zero?

How many tasks do you have on the host? 1 or 2? I see only one task listed in progress on the website. You have returned one task already so the website is allowed to send you 4 tasks with two tasks per card limit.

Do you have pandora_config set for two tasks or just one?

Look in the processes section of nvidia-smi. Do you see bin/python process listed for the card with the stalled task?

If you had one task each running on each card you would see two /bin/python processes listed. One for each card.

Post to thread

Message boards : Number crunching : RTX 2070 Super is not progressing

//