Whatever

Message boards : GPU Users Group message board : Whatever
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Zalster
Avatar

Send message
Joined: 26 Feb 14
Posts: 211
Credit: 4,496,324,562
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 54077 - Posted: 26 Mar 2020, 2:56:15 UTC

Target in sight....

Locking on to his tail pipes!!

Firing bananas!!

Reargunner to pilot.. fast moving object on our 6

Pilot to reargunner... Let loose with the oil slick and smoke!!
ID: 54077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1421
Credit: 9,147,196,190
RAC: 2,053,242
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54101 - Posted: 26 Mar 2020, 23:39:11 UTC - in response to Message 54077.  

Go get Bob.
ID: 54101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr. Kevvy
Avatar

Send message
Joined: 1 Jun 16
Posts: 15
Credit: 4,527,023,774
RAC: 0
Level
Arg
Scientific publications
wat
Message 54190 - Posted: 1 Apr 2020, 12:53:51 UTC

Am I correct that this project runs with no cache at all? Even after increasing the resource share to 100 because SETI@Home is gone, I get:

Wed 01 Apr 2020 08:35:55 AM EDT | GPUGRID | [sched_op] NVIDIA GPU work request: 340500.27 seconds; 0.00 devices
Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | Scheduler request completed: got 0 new tasks
Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | [sched_op] Server version 613
Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | No tasks sent
Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | This computer has reached a limit on tasks in progress


This is with only two active tasks, a queue of one and one uploading, set for two days cache.

Another issue that is going to become prevalent with the influx of new power hosts is the size of the uploaded result files (3MB for one of them) choking the upload server.
ID: 54190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 2,634
Level
Trp
Scientific publications
wat
Message 54191 - Posted: 1 Apr 2020, 13:11:42 UTC - in response to Message 54190.  

this project has a limit of 2 WU per GPU, and a max of 16 total in progress.

you might be able to get around that with Pandora's box, but I know Toni has said before not to try to get around the limits so maybe we can be nice here.
ID: 54191 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr. Kevvy
Avatar

Send message
Joined: 1 Jun 16
Posts: 15
Credit: 4,527,023,774
RAC: 0
Level
Arg
Scientific publications
wat
Message 54192 - Posted: 1 Apr 2020, 13:23:06 UTC - in response to Message 54191.  
Last modified: 1 Apr 2020, 13:23:28 UTC

Perfect...thanks Ian.

I wonder if using the previous spoof client to give all the computers 8 GPUs for the maximum of 16 cached each would be frowned upon... lol. Spoofing within policy. :^)
ID: 54192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 2,634
Level
Trp
Scientific publications
wat
Message 54193 - Posted: 1 Apr 2020, 13:30:03 UTC

it does work. When I first attached to the project I was still running the old spoofed client with 64 GPUs and it gave me the max 16.

I don't think anyone would be the wiser if you put the GPU count to 8. I have a legitimate 10-GPU system running. (but only 8 are assigned to GPUGRID) and the stderr txt file doesnt report how many GPUs the system has like SETI does.
ID: 54193 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zalster
Avatar

Send message
Joined: 26 Feb 14
Posts: 211
Credit: 4,496,324,562
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 54197 - Posted: 2 Apr 2020, 1:00:18 UTC

Not used to seeing any others here, lol. Well, guess it's something to get used to. Just read the entire thread. Welcome everyone...
ID: 54197 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1421
Credit: 9,147,196,190
RAC: 2,053,242
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54213 - Posted: 3 Apr 2020, 16:37:21 UTC - in response to Message 54197.  

I just set the Pandora config to 2X the number of gpus in the host. Same as what the project default is.

I had issues sometimes avoiding EDF on tasks when I was spoofed on gpus and carried 16 tasks in the cache.

Basically the project runs on "turn one in - get one" mechanism. I do want to try and stay at the 6 or 8 cache level because I have had many times where the stuck uploads prevented replenishing the cache and I still want to crunch tasks while waiting for the server disk congestion to clear the uploads.
ID: 54213 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1421
Credit: 9,147,196,190
RAC: 2,053,242
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54228 - Posted: 4 Apr 2020, 16:43:42 UTC

Except it stopped working overnight and my cache fell and wasn't being replenished because it never asked for work.
?????
ID: 54228 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Freewill

Send message
Joined: 18 Mar 10
Posts: 28
Credit: 41,935,087,419
RAC: 9,038,262
Level
Trp
Scientific publications
watwatwatwatwat
Message 54558 - Posted: 3 May 2020, 16:21:42 UTC

Hi Guys,
I've just started one PC on here and processed a few tasks. It looks like the GTX 1070Ti card gets more points/time than the RTX 2070 Super running "New version of ACEMD v2.10." Neither card is starved on the PCIe interface throughput. Has anyone else noticed that? If so, it seems like I should put my slower GPUs on this project and faster ones on E@H.
ID: 54558 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 2,634
Level
Trp
Scientific publications
wat
Message 54560 - Posted: 3 May 2020, 17:38:14 UTC - in response to Message 54558.  

you've only submitted a handful of tasks, and the tasks being distributed now can be a bit variable for runtime and credit received. I would give it more time and then check the averages after both cards have submitted a couple hundred tasks. what motherboard are you running with that system? which slots are the cards in?

the two cards also have similar CUDA core counts, the 2070S only has 128 more cores than the 1070ti though the RTX cards in general have 2x the SM count since they run 64 cores per SM vs the 128 cores per SM that Pascal had. It can come down to the application code also. its possible the acemd3 app scales more linearly with straight core count than with SM count which might be why they perform similarly. whereas petri's SETI code seemed to scale more with SM count, which is why his code ran better on the 2070 (36SMs, 2304 cores) than on a 1080ti (28SMs, 3584 cores).

just some things to keep in mind.
ID: 54560 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Freewill

Send message
Joined: 18 Mar 10
Posts: 28
Credit: 41,935,087,419
RAC: 9,038,262
Level
Trp
Scientific publications
watwatwatwatwat
Message 54567 - Posted: 3 May 2020, 22:57:44 UTC - in response to Message 54560.  

Thanks for the points, Ian. I am seeing that each new task has a different run time on the same card. And, the 2070 Super was finishing much faster, but the first cases on each were 4.6 pts/sec for the 2070 and 8.0 pts/sec for the 1070Ti. I'll keep watching to get more signal/noise. :)
ID: 54567 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 2,634
Level
Trp
Scientific publications
wat
Message 54588 - Posted: 6 May 2020, 0:19:27 UTC

I'm dabbling a bit with some further power reductions and efficiency boost.

took my 7x2070 system, which was running all cards already power limited to 165W (stock 175W).

Old settings:
7x RTX 2070
power limited all cards to 165W
+75 core OC
+300 mem OC

New settings:
power limited all cards to 150W (9.1% reduction)
+100 core OC
+400 mem OC

and comparing the averaged data from valid results (discarding statistical outliers in both cases), it looks like overall production dropped by only 2%, while I reduced power draw by about 9% and temps of the cards also dropped by about 5-6C across the board, which will be welcomed as we move into the summer months. testing +125 core right now with the same 150W PL, and tomorrow I'll try to squeeze +600mem on top of that to try to claw back that 2%, if i can.

I'll probably dabble around with this on the 10x2070 system also after I receive the 2 special risers I need (in the mail from China, ETA unknown). I'm running that system also power limited a tiny bit (higher end cards 185W TDP stock, PL'd to 175W atm).

More efficiency can be had by power limiting deeper, but I don't really want to give up too much raw performance.

ID: 54588 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1421
Credit: 9,147,196,190
RAC: 2,053,242
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54594 - Posted: 6 May 2020, 17:22:34 UTC

I'm still waiting on another flowmeter I ordered from China over a month ago. Other than China post saying it is in the system, no further progress.

Sure hope that this new one lasts longer than all the other ones I've tried and had fail very fast.
ID: 54594 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 2,634
Level
Trp
Scientific publications
wat
Message 54602 - Posted: 6 May 2020, 19:28:48 UTC - in response to Message 54588.  
Last modified: 6 May 2020, 19:30:22 UTC

testing +125 core right now with the same 150W PL, and tomorrow I'll try to squeeze +600mem on top of that to try to claw back that 2%, if i can.


+125 core/+400 mem got me back that 2%. so now it's performing the same at 150W as it did at 165W (x7). with cooler temps and cuts about 100W off the system power draw. win-win if it can stay stable. it's run for 2 days now at 150W so at least the +100/+400 and the +125/+400 settings seem stable.

I run fan speeds static at 75% for all cards, temps range from about 50C on the coolest card, to 60C on the hottest card.

trying +125core/+600mem now to see if it speeds up or not. memory speeds aren't really throttled by power limit, but the increased power required on the mem OC might cause the core clocks to drop and might drop performance. I'll evaluate the results tomorrow.
ID: 54602 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 2,634
Level
Trp
Scientific publications
wat
Message 54610 - Posted: 7 May 2020, 17:08:58 UTC

+125/+600 showed slight decrease in production. (very slight) probably due to the power situation I mentioned in my previous post. I did see a very slight bump in average core clock speeds (visually) when I reduced the mem OC from 600 to 400. It doesn't seem that GPUGRID benefits much from memory OC.

so i think PL 150W, +125core/+400mem is a nice setting for these 2070 cards.
ID: 54610 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 2,634
Level
Trp
Scientific publications
wat
Message 55091 - Posted: 3 Jul 2020, 19:37:44 UTC

Low credits yesterday?

Looking at most hosts they showed about 20-30% reduction in credits yesterday as compared to the past few weeks. Was the project down part of the day yesterday or something? Or did they have a string of low paying WUs or something? I don’t check my systems as diligently since they have been so stable, but I don’t see that any of them had any issues yesterday.
ID: 55091 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zalster
Avatar

Send message
Joined: 26 Feb 14
Posts: 211
Credit: 4,496,324,562
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 55100 - Posted: 9 Jul 2020, 17:07:33 UTC - in response to Message 55091.  

Don't know. I have all computers down until I find a new job. Hope all is well. TTYL
ID: 55100 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zalster
Avatar

Send message
Joined: 26 Feb 14
Posts: 211
Credit: 4,496,324,562
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 56191 - Posted: 29 Dec 2020, 19:19:54 UTC

So restart a computer back here. Only a 2 GPU machine. It tried to run Python and failed miserably. So now it's running ACEMD. Temps on the top GPU is 52 C. Will need to keep an eye on that. I have Einstein set as back up. Putting out a small amount of heat. Hope it will help move the cold air out of the main room. Will see

Z
ID: 56191 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1421
Credit: 9,147,196,190
RAC: 2,053,242
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56192 - Posted: 29 Dec 2020, 19:58:19 UTC

I would avoid the experimental python tasks for now.

Maybe in a few months, the admins and scientists will figure out a workable configuration set for all hosts.

You can use the Pandora client configuration file to up your cache to the maximum 16 allowed.

This is my pandora_config file snippet for this project. Courtesy of Ian.

project: https://www.gpugrid.net/
gpu_serverside_limit: 2
gpu_spoof_tasks: 16
gpu_limit: 16
request_min_cooldown: 180
ID: 56192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : GPU Users Group message board : Whatever

©2025 Universitat Pompeu Fabra