Message boards :
GPU Users Group message board
: Whatever
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Target in sight.... Locking on to his tail pipes!! Firing bananas!! Reargunner to pilot.. fast moving object on our 6 Pilot to reargunner... Let loose with the oil slick and smoke!!
|
|
Send message Joined: 13 Dec 17 Posts: 1421 Credit: 9,147,196,190 RAC: 2,053,242 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Go get Bob. |
|
Send message Joined: 1 Jun 16 Posts: 15 Credit: 4,527,023,774 RAC: 0 Level ![]() Scientific publications
|
Am I correct that this project runs with no cache at all? Even after increasing the resource share to 100 because SETI@Home is gone, I get: Wed 01 Apr 2020 08:35:55 AM EDT | GPUGRID | [sched_op] NVIDIA GPU work request: 340500.27 seconds; 0.00 devices Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | Scheduler request completed: got 0 new tasks Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | [sched_op] Server version 613 Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | No tasks sent Wed 01 Apr 2020 08:35:56 AM EDT | GPUGRID | This computer has reached a limit on tasks in progress This is with only two active tasks, a queue of one and one uploading, set for two days cache. Another issue that is going to become prevalent with the influx of new power hosts is the size of the uploaded result files (3MB for one of them) choking the upload server. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 2,634 Level ![]() Scientific publications
|
this project has a limit of 2 WU per GPU, and a max of 16 total in progress. you might be able to get around that with Pandora's box, but I know Toni has said before not to try to get around the limits so maybe we can be nice here.
|
|
Send message Joined: 1 Jun 16 Posts: 15 Credit: 4,527,023,774 RAC: 0 Level ![]() Scientific publications
|
Perfect...thanks Ian. I wonder if using the previous spoof client to give all the computers 8 GPUs for the maximum of 16 cached each would be frowned upon... lol. Spoofing within policy. :^) |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 2,634 Level ![]() Scientific publications
|
it does work. When I first attached to the project I was still running the old spoofed client with 64 GPUs and it gave me the max 16. I don't think anyone would be the wiser if you put the GPU count to 8. I have a legitimate 10-GPU system running. (but only 8 are assigned to GPUGRID) and the stderr txt file doesnt report how many GPUs the system has like SETI does.
|
|
Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Not used to seeing any others here, lol. Well, guess it's something to get used to. Just read the entire thread. Welcome everyone...
|
|
Send message Joined: 13 Dec 17 Posts: 1421 Credit: 9,147,196,190 RAC: 2,053,242 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I just set the Pandora config to 2X the number of gpus in the host. Same as what the project default is. I had issues sometimes avoiding EDF on tasks when I was spoofed on gpus and carried 16 tasks in the cache. Basically the project runs on "turn one in - get one" mechanism. I do want to try and stay at the 6 or 8 cache level because I have had many times where the stuck uploads prevented replenishing the cache and I still want to crunch tasks while waiting for the server disk congestion to clear the uploads. |
|
Send message Joined: 13 Dec 17 Posts: 1421 Credit: 9,147,196,190 RAC: 2,053,242 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Except it stopped working overnight and my cache fell and wasn't being replenished because it never asked for work. ????? |
|
Send message Joined: 18 Mar 10 Posts: 28 Credit: 41,935,087,419 RAC: 9,038,262 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Hi Guys, I've just started one PC on here and processed a few tasks. It looks like the GTX 1070Ti card gets more points/time than the RTX 2070 Super running "New version of ACEMD v2.10." Neither card is starved on the PCIe interface throughput. Has anyone else noticed that? If so, it seems like I should put my slower GPUs on this project and faster ones on E@H. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 2,634 Level ![]() Scientific publications
|
you've only submitted a handful of tasks, and the tasks being distributed now can be a bit variable for runtime and credit received. I would give it more time and then check the averages after both cards have submitted a couple hundred tasks. what motherboard are you running with that system? which slots are the cards in? the two cards also have similar CUDA core counts, the 2070S only has 128 more cores than the 1070ti though the RTX cards in general have 2x the SM count since they run 64 cores per SM vs the 128 cores per SM that Pascal had. It can come down to the application code also. its possible the acemd3 app scales more linearly with straight core count than with SM count which might be why they perform similarly. whereas petri's SETI code seemed to scale more with SM count, which is why his code ran better on the 2070 (36SMs, 2304 cores) than on a 1080ti (28SMs, 3584 cores). just some things to keep in mind.
|
|
Send message Joined: 18 Mar 10 Posts: 28 Credit: 41,935,087,419 RAC: 9,038,262 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Thanks for the points, Ian. I am seeing that each new task has a different run time on the same card. And, the 2070 Super was finishing much faster, but the first cases on each were 4.6 pts/sec for the 2070 and 8.0 pts/sec for the 1070Ti. I'll keep watching to get more signal/noise. :) |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 2,634 Level ![]() Scientific publications
|
I'm dabbling a bit with some further power reductions and efficiency boost. took my 7x2070 system, which was running all cards already power limited to 165W (stock 175W). Old settings: 7x RTX 2070 power limited all cards to 165W +75 core OC +300 mem OC New settings: power limited all cards to 150W (9.1% reduction) +100 core OC +400 mem OC and comparing the averaged data from valid results (discarding statistical outliers in both cases), it looks like overall production dropped by only 2%, while I reduced power draw by about 9% and temps of the cards also dropped by about 5-6C across the board, which will be welcomed as we move into the summer months. testing +125 core right now with the same 150W PL, and tomorrow I'll try to squeeze +600mem on top of that to try to claw back that 2%, if i can. I'll probably dabble around with this on the 10x2070 system also after I receive the 2 special risers I need (in the mail from China, ETA unknown). I'm running that system also power limited a tiny bit (higher end cards 185W TDP stock, PL'd to 175W atm). More efficiency can be had by power limiting deeper, but I don't really want to give up too much raw performance.
|
|
Send message Joined: 13 Dec 17 Posts: 1421 Credit: 9,147,196,190 RAC: 2,053,242 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I'm still waiting on another flowmeter I ordered from China over a month ago. Other than China post saying it is in the system, no further progress. Sure hope that this new one lasts longer than all the other ones I've tried and had fail very fast. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 2,634 Level ![]() Scientific publications
|
testing +125 core right now with the same 150W PL, and tomorrow I'll try to squeeze +600mem on top of that to try to claw back that 2%, if i can. +125 core/+400 mem got me back that 2%. so now it's performing the same at 150W as it did at 165W (x7). with cooler temps and cuts about 100W off the system power draw. win-win if it can stay stable. it's run for 2 days now at 150W so at least the +100/+400 and the +125/+400 settings seem stable. I run fan speeds static at 75% for all cards, temps range from about 50C on the coolest card, to 60C on the hottest card. trying +125core/+600mem now to see if it speeds up or not. memory speeds aren't really throttled by power limit, but the increased power required on the mem OC might cause the core clocks to drop and might drop performance. I'll evaluate the results tomorrow.
|
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 2,634 Level ![]() Scientific publications
|
+125/+600 showed slight decrease in production. (very slight) probably due to the power situation I mentioned in my previous post. I did see a very slight bump in average core clock speeds (visually) when I reduced the mem OC from 600 to 400. It doesn't seem that GPUGRID benefits much from memory OC. so i think PL 150W, +125core/+400mem is a nice setting for these 2070 cards.
|
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 2,634 Level ![]() Scientific publications
|
Low credits yesterday? Looking at most hosts they showed about 20-30% reduction in credits yesterday as compared to the past few weeks. Was the project down part of the day yesterday or something? Or did they have a string of low paying WUs or something? I don’t check my systems as diligently since they have been so stable, but I don’t see that any of them had any issues yesterday.
|
|
Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Don't know. I have all computers down until I find a new job. Hope all is well. TTYL
|
|
Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So restart a computer back here. Only a 2 GPU machine. It tried to run Python and failed miserably. So now it's running ACEMD. Temps on the top GPU is 52 C. Will need to keep an eye on that. I have Einstein set as back up. Putting out a small amount of heat. Hope it will help move the cold air out of the main room. Will see Z
|
|
Send message Joined: 13 Dec 17 Posts: 1421 Credit: 9,147,196,190 RAC: 2,053,242 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I would avoid the experimental python tasks for now. Maybe in a few months, the admins and scientists will figure out a workable configuration set for all hosts. You can use the Pandora client configuration file to up your cache to the maximum 16 allowed. This is my pandora_config file snippet for this project. Courtesy of Ian. project: https://www.gpugrid.net/ gpu_serverside_limit: 2 gpu_spoof_tasks: 16 gpu_limit: 16 request_min_cooldown: 180 |
©2025 Universitat Pompeu Fabra