Message boards :
Number crunching :
Estimated time over 100 days is something configured wdrong?
Message board moderation
| Author | Message |
|---|---|
JStatesonSend message Joined: 31 Oct 08 Posts: 186 Credit: 3,578,903,157 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I just started crunching here again after a long delay (summer too hot to crunch) and I have really long completion time for the equivalent of a gtx1080TI I looked at the app page and it shows the following requirement Linux running on an AMD x86_64 or Intel EM64T CPU 2.18 (cuda1121) 14 Sep 2021 | 11:44:42 UTC I then checked clinfo and I seem to be covered Platform Vendor NVIDIA Corporation Platform Version OpenCL 1.2 CUDA 11.2.162 ... Platform Name NVIDIA CUDA Number of devices 3 Device Name P102-100 Device Vendor NVIDIA Corporation Device Vendor ID 0x10de Device Version OpenCL 1.2 CUDA Driver Version 460.91.03 The following seem unusually long and I am also concerned about the 82c temp as that is probably the cutoff for the board. If the board was not crunching I would expect the temps t be in the 35c range, not 82 so it looks like the boards are doing something useful. Thanks for looking! [edit] Seems I cannot change my typo in the subject. For sure, that is wrong! running GLIBC 2.227 and boinc 7.16.11 Under ubuntu 18.04.5 |
|
Send message Joined: 10 Nov 13 Posts: 101 Credit: 15,773,211,122 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
These are not being cooled properly and you are cooking your GPU's. I'm pretty sure they are thermal throttling to protect themselves. They shouldn't be running over 75C. Nominally 60-65C would be better. I don't know if these have blower or fan type coolers but you definitely need to turn up the fan speed with a more aggressive fan curve and get them cooled down. It looks like you actually have three of them so depending on what case they are in that could be problematic if there isn't enough airflow. If they happen to be server grade cards with a bare heatsink and no fans, these must have significant air flow to be cooled properly. They are designed to be used in servers that are made for them. All my systems are Windows based so I'm not an expert on setting up fan curves and monitoring properly on Linux. If you need help with that, one of the Linux gurus will need to pipe up and give you some recommendations. |
JStatesonSend message Joined: 31 Oct 08 Posts: 186 Credit: 3,578,903,157 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I put a large fan on the system. It is an open rack, and the temps dropped to a reasonable value as shown. It looks like the Time Left is dropping 4 days for every hour of Elapsed Time. I am guessing a completion time of 30 hours. Hopefully the WUs will all be valid. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
The estimated completion time won’t be correct until you complete a bunch of tasks, I think the limit is 10 or 11 tasks to get a baseline. This is because the app is new and your system doesn’t know it’s performance yet. 30hrs for a P102-100 sounds about right
|
|
Send message Joined: 10 Nov 13 Posts: 101 Credit: 15,773,211,122 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Those temps are definitely better. It looks like you have the fan blowing on your D1 card but not as much on the others. If you could even it out more and get the other two cards under 70C that would be ideal. Open rack systems systems create a whole different challenge for cooling. Do the best you can to keep the cool air going to the GPU's and CPU. Make sure the hot air isn't recirculating directly back into the system. My GTX 1080 Ti's can complete the tasks in about 30hrs. If yours are cooling enough to run at the max boost clock they should be close to that. As Ian&Steve mentioned, if there is enough work to keep your cards busy, the timeline should look more reasonable soon. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My GTX 1080 Ti's can complete the tasks in about 30hrs.This is too much. My GTX 1080 Ti completed one under 21h 15m. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
My GTX 1080 Ti's can complete the tasks in about 30hrs.This is too much. My GTX 1080 Ti completed one under 21h 15m. The system shows an i7-6700 CPU. That tells us it’s on either a Z170 or Z270 motherboard. Since he has 3x GPUs, it’s possible that one of more of them is connected via a USB riser. And given that GPUGRID’s New ACEMD3 app’s rather heavy reliance on PCIe bandwidth, the slowdown could make sense. The P102-100 only has PCIe 3.0 x4 available anyway (it’s a mining card), which is JUST enough to keep it from slowdown on GPUGRID, slowdowns can be expected if it’s run at PCIe 2.0 and/or on fewer lanes (USB risers only carry a single lane)
|
JStatesonSend message Joined: 31 Oct 08 Posts: 186 Credit: 3,578,903,157 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks, I was not aware of the bandwidth requirements. I may put one of the cards in the x16 slot and leave the other two in the 1x risers. That would be a good benchmarking test. Around noon, I had actually suspended two of the cards (d2 and d0) as the temps were back in the 82's and resumed them at 10 pm (cooling starts about 10) I made sure to resume the "D0 app" first so I got the d0 card then I resumed the remaining one to get d2. The cards have identical specs but I did not want to take a chance so I resumed them in the order they were suspended to get the same app. [edit] I may sell these cards. I unloaded my 106-90 & 106-100 and a pair of 1070ti on ebay even a 102-100 "parts only". Most sold within hours of posting. I still have one item unsold. Probable will not get back what I paid for them. I never got them to work on boinc. https://www.ebay.com/itm/174959923252 |
©2025 Universitat Pompeu Fabra