Project restarted

Message boards : News : Project restarted
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56415 - Posted: 11 Feb 2021, 12:28:40 UTC - in response to Message 56413.  

you missed that he was trying to run the executable directly (outside of BOINC), which is likely why he received that error message. All of my machines have devices on dev 1+, and even one in the same situation with an unusable card at dev0 (which has been excluded) and only runs on the card on dev1

You're right I really missed that, but then this is the reason for that strange licensing error.
It is not a good idea to run the GPUGrid app directly, as it needs a wrapper. Perhaps the wrapper contains the appropiate license, or it tells the app where to look for it. We don't know how, so we can't use this method to debug this error.
Perhaps he installed the BOINC manager as a service?
ID: 56415 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56416 - Posted: 11 Feb 2021, 12:39:05 UTC - in response to Message 56415.  
Last modified: 11 Feb 2021, 12:46:06 UTC

Perhaps he installed the BOINC manager as a service?

The Manager always runs in user space, but the client can run as a service.

My Linux machines do run as a service, without GPU problems. Windows machines can't run GPU apps on a service install, because of Microsoft driver security protocols.

Edit: programagor's Debian install on host 576641 looks OK from the outside. I'd suspect a driver problem - something like using a nouveau driver without the extra CUDA (computation) libraries provided through a manufacturer driver install.
ID: 56416 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56417 - Posted: 11 Feb 2021, 13:50:31 UTC - in response to Message 56416.  
Last modified: 11 Feb 2021, 13:53:20 UTC

Edit: programagor's Debian install on host 576641 looks OK from the outside. I'd suspect a driver problem - something like using a nouveau driver without the extra CUDA (computation) libraries provided through a manufacturer driver install.
I've installed a fresh Ubuntu 18.04 two days ago, and it has downloaded the 460.32 driver on its own, which works with FAH and GPUGrid also. I've upgraded it to 20.04 today.
EDIT: the 460.39 driver on his host should be from ppa:graphics-drivers/ppa. (It works on my other host)
ID: 56417 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56419 - Posted: 11 Feb 2021, 14:38:46 UTC - in response to Message 56417.  

The new Linux Mint (v20.1) offers me a driver manager:



It was defaulted to the open-source driver, but for computation, I think the proprietary driver is better.
ID: 56419 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
programagor

Send message
Joined: 3 Feb 21
Posts: 5
Credit: 1,046,250
RAC: 0
Level
Ala
Scientific publications
wat
Message 56432 - Posted: 11 Feb 2021, 20:23:20 UTC
Last modified: 11 Feb 2021, 20:31:34 UTC

When I ran the binary directly, I tried supplying the `--device 1` parameter, but to no avail; due to the basic license the binary always uses device id 0. Also, I don't see the `license.dat[.*]` anywhere on my system. And my drivers are straight from nvidia, no nouveau. I can compile and run CUDA programs/kernels without any issue. For completeness sake, I reinstalled my drivers, but the issue persists.

EDIT: I also looked inside the wrapper, and there is no string `license.dat`, which leads me to believe that the license file is missing, preventing me from running on GPU id 1

EDIT 2: I just noticed that the wrapper is launching the acemd with device id 0:
wrapper: running acemd3 (--boinc input --device 0)
ID: 56432 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
clych

Send message
Joined: 13 Nov 19
Posts: 5
Credit: 8,496,529
RAC: 0
Level
Ser
Scientific publications
wat
Message 56433 - Posted: 11 Feb 2021, 20:28:24 UTC - in response to Message 56403.  

7.333% after 24 hours.
It is a pitty, GPUGRID is the only projects that is not cause my computer to lag.
ID: 56433 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,909,595
RAC: 4,232,576
Level
Trp
Scientific publications
wat
Message 56435 - Posted: 11 Feb 2021, 21:17:23 UTC - in response to Message 56432.  

When I ran the binary directly, I tried supplying the `--device 1` parameter, but to no avail; due to the basic license the binary always uses device id 0. Also, I don't see the `license.dat[.*]` anywhere on my system. And my drivers are straight from nvidia, no nouveau. I can compile and run CUDA programs/kernels without any issue. For completeness sake, I reinstalled my drivers, but the issue persists.

EDIT: I also looked inside the wrapper, and there is no string `license.dat`, which leads me to believe that the license file is missing, preventing me from running on GPU id 1

EDIT 2: I just noticed that the wrapper is launching the acemd with device id 0:
wrapper: running acemd3 (--boinc input --device 0)


look in your BOINC event log at startup. Is your nvidia GPU device id 0? when you see "device [id]" in boinc, it's always the BOINC order, not the system order, which can vary due to the way BOINC decides what is the best device.

ID: 56435 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56438 - Posted: 11 Feb 2021, 21:48:02 UTC - in response to Message 56433.  

7.333% after 24 hours.
It is a pitty, GPUGRID is the only projects that is not cause my computer to lag.


These WUs are too large for many GPUs to complete in the 5 day (120 hour) window before they expire.

It would be best to abort them on hosts which cannot meet the deadline as running them would be time and electricity wasted. Same goes for having a spare WU waiting in your cue if your GPU takes 60 or more hours to complete one. The spare will expire before completion, yielding no credit.

I recommend a longer period of time before these "extra-long runs" expire. I think it will get them back quicker in the long run.
ID: 56438 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
programagor

Send message
Joined: 3 Feb 21
Posts: 5
Credit: 1,046,250
RAC: 0
Level
Ala
Scientific publications
wat
Message 56439 - Posted: 11 Feb 2021, 22:00:30 UTC - in response to Message 56435.  

look in your BOINC event log at startup. Is your nvidia GPU device id 0? when you see "device [id]" in boinc, it's always the BOINC order, not the system order, which can vary due to the way BOINC decides what is the best device.


Right, my apologies, boinc has my GPU at id 0
CUDA: NVIDIA GPU 0: GeForce GTX 1060 (driver version 460.39, CUDA version 11.2, compute capability 6.1, 4096MB, 3974MB available, 4276 GFLOPS peak)
OpenCL: NVIDIA GPU 0: GeForce GTX 1060 (driver version 460.39, device version OpenCL 1.2 CUDA, 6078MB, 3974MB available, 4276 GFLOPS peak)

So licensing is likely not the culprit in my case.
ID: 56439 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 178,897
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56443 - Posted: 11 Feb 2021, 23:24:22 UTC

The app has not changed.
https://www.gpugrid.net/apps.php

2x GPUs are running on one of my PCs w/o issue.
ID: 56443 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
clych

Send message
Joined: 13 Nov 19
Posts: 5
Credit: 8,496,529
RAC: 0
Level
Ser
Scientific publications
wat
Message 56461 - Posted: 12 Feb 2021, 18:25:42 UTC - in response to Message 56438.  

I have aborted this WU, new one much more faster.
ID: 56461 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,909,595
RAC: 4,232,576
Level
Trp
Scientific publications
wat
Message 56462 - Posted: 12 Feb 2021, 18:36:27 UTC - in response to Message 56461.  

I have aborted this WU, new one much more faster.


wait until it runs for a few hours. you will see that the initial percentage increase is not accurate, it is only an estimation from BOINC until it hits a real checkpoint. you'll see the % increase fast until it hits the checkpoint, then it will reset to 0.333 or 0.666% and will go very slow from that point.
ID: 56462 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 56465 - Posted: 12 Feb 2021, 21:08:01 UTC - in response to Message 56462.  

I have aborted this WU, new one much more faster.


wait until it runs for a few hours. you will see that the initial percentage increase is not accurate, it is only an estimation from BOINC until it hits a real checkpoint. you'll see the % increase fast until it hits the checkpoint, then it will reset to 0.333 or 0.666% and will go very slow from that point.


And... (sorry to butt in) Once you get to a checkpoint, highlight the task in the task window and click on the properties button. Check the progress rate near the bottom of the list. If it is less than 0.9% per hour the GPU is too slow to make the 120 hour window, even crunching 24/7. Best to send it on to someone else before it expires.


ID: 56465 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Clive

Send message
Joined: 2 Jul 19
Posts: 21
Credit: 90,744,164
RAC: 0
Level
Thr
Scientific publications
wat
Message 56466 - Posted: 12 Feb 2021, 21:17:14 UTC

Toni:

I have removed my laptop from crunching for GPUGRID. The laptop has a GTX 660M GPU which is inadequate for these large files. In my desktop there is a GTX 1060 which seems to have enough muscle to crunch these large files.

I hope all this crunching will benefit humanity in some way.

Clive
ID: 56466 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
goldfinch

Send message
Joined: 5 May 19
Posts: 36
Credit: 711,308,218
RAC: 41,661
Level
Lys
Scientific publications
wat
Message 56467 - Posted: 12 Feb 2021, 21:37:40 UTC - in response to Message 56466.  

Toni:

I have removed my laptop from crunching for GPUGRID. The laptop has a GTX 660M GPU which is inadequate for these large files. In my desktop there is a GTX 1060 which seems to have enough muscle to crunch these large files.

I hope all this crunching will benefit humanity in some way.

Clive

I have a discrete GTX 1060 on my laptop, but after 2 days of crunching a single task BOINC is still showing estimates of 8 more days... I'm crunching 24x7, and GPU's temperature is >90C, so the card has to be throttled. Are all new tasks that big? If that's the case, not only will i not be able to finish tasks in 24 hours to get some bonus, but also i won't be able to complete them in the allocated timeframe.
ID: 56467 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56468 - Posted: 12 Feb 2021, 23:40:12 UTC - in response to Message 56467.  

These are the largest (longest) tasks in the history of the project I believe.

Previous longest was around 12 hours back in the acemd2 (long-runs) application days.

If these are to become the nominal type of task in the future, they really need to increase the deadlines.

Or restrict them to adequate hardware like discrete GTX 1060 or better.

The estimated task GFLOPS seems to be roughly correct at 5,000,000 value.
ID: 56468 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
goldfinch

Send message
Joined: 5 May 19
Posts: 36
Credit: 711,308,218
RAC: 41,661
Level
Lys
Scientific publications
wat
Message 56470 - Posted: 13 Feb 2021, 4:09:19 UTC - in response to Message 56468.  

I agree that so long-running tasks should have their deadlines increased. Otherwise, we gradually go back to super-computers that no one can afford. And the purpose of crowd-computing is that many can participate.
As for limiting tasks to certain GPUs, that's not quite adequate. As I said, my GTX1060 isn't capable of handling those tasks, so it's not only the card that is important, but where it's installed, and what type of cooling is used. Unfortunately, my laptop isn't great at cooling, so both CPU and GPU heat up to 90-93 C. Putting the laptop in a fridge is not an option... And taking all parameters into account, such as cooling, power supply, throttling, even manufacture! - isn't feasible.
ID: 56470 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,909,595
RAC: 4,232,576
Level
Trp
Scientific publications
wat
Message 56471 - Posted: 13 Feb 2021, 4:29:58 UTC - in response to Message 56470.  

a normal full 1060 should be capable to process these tasks in 5 days. must be because it's a laptop version.
ID: 56471 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RockLr

Send message
Joined: 14 Mar 20
Posts: 7
Credit: 11,283,596
RAC: 0
Level
Pro
Scientific publications
wat
Message 56472 - Posted: 13 Feb 2021, 6:00:02 UTC - in response to Message 56470.  

My 1050ti in my laptop can finish a task in 66 hours. Must be something wrong.
ID: 56472 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56473 - Posted: 13 Feb 2021, 13:13:27 UTC

My GTX 1060 finished one work unit in 38 hours.
http://www.gpugrid.net/results.php?hostid=512821

The GTX 1070 took 26 hours.
http://www.gpugrid.net/results.php?hostid=524425

But another GTX 1070 failed twice.
http://www.gpugrid.net/results.php?hostid=528983
The first time was due to a reboot, and then the next one failed immediately thereafter.

They are all on Ubuntu 18.04/20.04.
I think they are better used on Folding.
ID: 56473 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : News : Project restarted

©2025 Universitat Pompeu Fabra