Message boards :
Graphics cards (GPUs) :
New Gianni tasks take loooong time... a warning (8-12-16)
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
| Author | Message |
|---|---|
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Even put 1 on my fridge after a surge took out a $600 control board but that's another story. Surges are a problem too. After a lightning strike a few years ago, I put Zero Surge filters on all my equipment, even the ones with a UPS. The surge filter plugs into the wall first, then the UPS into that. I am loaded for bear. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
A quality UPS would solve that issue if you could do it. We have momentary glitches and surges where I live also. I bit the bullet and put UPSs on all 8 of my DC machines 1 at a time. Even put 1 on my fridge after a surge took out a $600 control board but that's another story. Thanks guys for the suggestions. I used to have a UPS on every machine but it was expensive to buy them and after a year or two got to be ridiculous trying to keep the batteries replaced (currently 12 machines). Now they have quality surge protectors, much less headache but also not protection against outages. Think that I mentioned this before and it's just my personal experience, but I used to have an even mix of AMD and Intel boxes. All had APC sine wave UPS at the time. After a lightning strike on the house (lightning rod BTW), every Intel system either failed immediately or within the next month. All the AMD systems were still running years later. Go figure. Why, I don't know. Maybe better MB components, maybe something in the basic design. Maybe just dumb luck. Since then I've used mainly AMD and have never had a CPU or MB failure. Maybe other peoples experiences are different... |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The Gianni finally finished on my 750Ti: https://www.gpugrid.net/result.php?resultid=15236288 It took just under 62 hours, not a good fit. Interesting, it previously failed on someone's 980Ti that's usually ok for the Gerards: https://www.gpugrid.net/workunit.php?wuid=11694702 For some reason GPUGrid thinks that my two 650Ti cards are the best candidates for the Gianni WUs. Just had to abort another one. They run the Gerard_FX WUs in an average of about 34 hours so always made the 2 day deadline but they'd be ridiculous on the Giannis. |
caffeineyellow5Send message Joined: 30 Jul 14 Posts: 225 Credit: 2,658,976,345 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The one thing, I noticed is your CPU time is lot lower the the run time: I figured out the problem when I tried this in the past. I was not limiting my tasks for GPU and CPU. I tried this again and it froze as soon as BOINC started each time I rebooted. So I opened in safe mode and changed the cc_config and app_config files to limit WCG on 2 systems and also GPUGRID on one system, based on the number of cores and task total. So the one system with 4 AMD cores is running 3 GPUGRID tasks and has bettered previous similar tasks by hours with a whole core (25%) CPU usage for each task. The other system with 12 Intel cores and 6 GPUGRID tasks possible I reduced the WCG tasks to 5 and GPUGRID tasks are at 6 still. That leaves on core free for tasks and OS and fills the rest with BOINC tasks. the 2 tasks that have completed are almost equal on GPU and CPU time, but only saved about 50 minutes for similar tasks with significantly more CPU time. I am happy to let this run as such, though it still is slower on other tasks running even though the CPU usage is not 100% now. Would it help a bit to change the swan_sync setting to an incremental like .8 or .7 instead of 1? Or would just reducing the WCG tasks to 4 be my option? It is odd that the swan_sync setting has not had anyone run into this same thing, but there should be another tutorial added somewhere for this setting. I searched online and found bits and pieces, but nothing complete or that answered this for me. Experimenting, time, and some logic were what got me here. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There have been several WUs by Gianni in the recent past. They are really huge and result in a nice credit, but no GPU below a 980Ti can crunch them within 24 hours and get the 20% extra credit. One of my hosts is a GTX970 - which it took some 36 hours at ~1360MHz. Hence, a change in this 24hrs rule would be desireable :-) |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There have been several WUs by Gianni in the recent past. They are really huge and result in a nice credit, but no GPU below a 980Ti can crunch them within 24 hours and get the 20% extra credit. My Linux system is crunching a Gianni now and it's at 50% after 15h 45min. My GPU clock (acording to NV X Server is 1278MHz) and I'm using the 361.42 NV driver. X Server also says I'm using 4% PCIE bandwidth on a PCIE2.0 x16 slot. GPU utilization is around 67% but varies from 62% to 70%. My CPU is an AMD A6-3500 APU (2.1/2.4GHz). So it looks like a GTX970 (at stock on a weak system) will take 31 to 32h to crunch these on Linux-x64. That suggests the WDDM overhead for these is at least 12.5% but probably closer to 16%. A GTX980 is ~17% faster (stock) than a GTX970 so would still take over 24h to complete on Linux (over 26h). If it was overclocked by ~10% then it might be able to just about complete inside 24h if the system was tuned to do so (SWAN_SYNC used, high CPU clock and fast RAM...). Note that the bonus is +25% for finishing (and reporting) inside 48h or +50% for finishing and reporting inside 24h. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So it looks like a GTX970 (at stock on a weak system) will take 31 to 32h to crunch these on Linux-x64. That suggests the WDDM overhead for these is at least The CPU on that host is an "old" Intel 2 Core Duo E8400 - which my account for at least part of the longer crunching time. And, of course, WDDM OH as well (Win10 64-bit)) Note that the bonus is +25% for finishing (and reporting) inside 48h or +50% for finishing and reporting inside 24h. Oh sorry, I missed that :-( |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So it looks like a GTX970 (at stock on a weak system) will take 31 to 32h to crunch these on Linux-x64. That suggests the WDDM overhead for these is at least 12.5% but probably closer to 16%. GTX 980 @ 1388MHz, GDDR5 @ 3505 MHz, i3-4160, WinXP, SWAN_SYNC on, no other tasks: 19h 24m 26s It's almost (~8m) missed the 24h bonus, as it spent 5h 28m in the queue. |
caffeineyellow5Send message Joined: 30 Jul 14 Posts: 225 Credit: 2,658,976,345 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I got one on my laptop. Windows 8.1 64bit, i7-4900MQ, 32GB RAM, NVIDIA Quadro K2100M @802Mhz mem @ 2504, swan_sync off. At 13.5% now and started as soon as it downloaded, it looks like 15:45 has passed and it might make the 5 day deadline by just squeezing through! We shall see, but it looks good at this point. I've never had a WU fail on this laptop except for downloading errors or crashes related to other programs or my own dumb experimentation with things like swan_sync (lol) |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So it looks like a GTX970 (at stock on a weak system) will take 31 to 32h to crunch these on Linux-x64. That suggests the WDDM overhead for these is at least 12.5% but probably closer to 16%. GIANNI_D3C36bCHL from Performance 1 Retvari Zoltan 15236101 14.49 NVIDIA GeForce GTX 980 Ti (4095MB) driver: 368.22 14h 30min isn't much over the app description: Long runs (8-12 hours on fastest card) and there is a good chance the GTX1080 (when the CUDA 8 dev kit goes on public release) will manage it within that 12h time frame (on Linux). FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There is also a good chance that after this summer shake-down cruise, the real work units in the fall won't be so long. I am hoping, and expecting, that a GTX 970 under Linux can handle them, though maybe not a 960. Otherwise, there will be some discontented people around here. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Well, what we also hope - I guess - is that there will be enough WUs available anytime around the clock. For the past several months, the situation was far away from that. |
|
Send message Joined: 13 Jan 14 Posts: 21 Credit: 15,415,926,517 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
These are some observations with systems I have working the GIANNI work units - from GTX 770, 780 and 970
Task name Work unit Computer ComputerName Specs RunTime=h:m:ss CPUTime=h:m:ss ElapsedTime Credit/Sec BatchName
e4s27_e1s26p0f453-GIANNI_D3C36bCHL1-0-1-RND7578_0 11694844 319927 sr71-w10 W10, SWAN=1, GTX 970 30:07:55 28:38:40 30:37:40 4.049311534 GIANNI_D3C36bCHL1
e4s2_e1s26p0f434-GIANNI_D3C36bCHL1-0-1-RND0457_1 11694819 319927 sr71-w10 W10, SWAN=1, GTX 970 30:50:54 29:15:05 33:33:38 3.955295125 GIANNI_D3C36bCHL1
e5s26_e2s33p0f456-GIANNI_D3C36bCHL1-0-1-RND5827_0 11695686 319927 sr71-w10 W10, SWAN=1, GTX 970 30:14:00 28:41:17 33:43:58 4.035734975 GIANNI_D3C36bCHL1
e8s171_e2s15p0f614-GIANNI_D3C36bCHL1-0-1-RND4754_1 11697165 289414 GridBench-w10 W10, SWAN=1, GTX 980 25:25:38 25:11:28 25:47:06 4.798538404 GIANNI_D3C36bCHL1
e8s37_e3s57p0f691-GIANNI_D3C36bCHL1-0-1-RND4952_0 11697031 289414 GridBench-w10 W10, SWAN=1, GTX 980 25:19:42 25:05:20 34:52:23 4.817271594 GIANNI_D3C36bCHL1
e9s11_e3s104p0f660-GIANNI_D3C36bCHL1-0-1-RND4267_0 11697896 176528 stealth-mint Linux, SWAN=0, GTX 770 30:39:17 3:52:32 30:51:13 3.980252871 GIANNI_D3C36bCHL1
e8s142_e3s69p0f433-GIANNI_D3C36bCHL1-0-1-RND5710_0 11697136 187252 rahl588-v81 W10, SWAN=0, GTX 770 35:09:10 5:54:14 52:49:35 2.776770928 GIANNI_D3C36bCHL1
|
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 51 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Well, what we also hope - I guess - is that there will be enough WUs available anytime around the clock. That would be very, very nice! May it happen soon. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
very annoying: after some 20 hrs on my GTX750Ti client, a Gianni WU broke off, indicating a "computation error". I was kind of suspicious anyway when I saw that this host had caught a Gianni WU. Since at that time it had run for several hours already, I decided not to stop it. However, next time I will definitely do so. I guess that the Gianni tasks are no good for GPUs below a GTX970 (or maybe 960). Somehow, these WUs should be programmed for NOT being downloaded on a GTX750Ti or below. |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 51 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I just finished one of these units on my windows 10 computer: Sunday I had several of these WUs download on my computers. On my xp computer, I ran these two WUs simultaneously (1 CPU + .5 GPU): e17s52_e1s50p0f278-GIANNI_D3C36bCHL1-0-1-RND0542_0 11700247 22 Aug 2016 | 2:50:27 UTC 23 Aug 2016 | 17:31:11 UTC Completed and validated 137,031.21 129,248.40 439,250.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) http://www.gpugrid.net/result.php?resultid=15245061 e17s51_e4s53p0f693-GIANNI_D3C36bCHL1-0-1-RND8402_0 11700246 22 Aug 2016 | 2:50:27 UTC 23 Aug 2016 | 16:51:07 UTC Completed and validated 134,710.64 128,933.70 439,250.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) http://www.gpugrid.net/result.php?resultid=15245060 The average run time per WU is: (137,031.21 + 134,710.64)/2 =135870.92/3600=37.74 hours/2 WUs = 18.87 hours Compare that (1 CPU + 1 GPU) 104,399.86/3600=29.00 hours. (See above) Which translates into 1-(18.87/29.00) =.35 or approximately a 35% improvement in productivity. The GPU usage is 98% max and power usage is 81% for running 1 CPU + .5 GPU mode, while 1 CPU+ 1 GPU mode yields GPU usage of 70% max and power usage of 67%. For my windows 10 computer, when I ran (1 CPU + .5 GPU for a few hours) the progress rate (from the boinc manager, task tab, properties button) was 3.6% per hour, which is 100/3.6 = 27.78 hours / 2 WU = 13.89 hours per WU computing time. When running (1 CPU + 1 GPU) the computing time per WU is 63,577.01/3600= 17.66 hours. (See above) Which translates into 1-(13.89/17.66) = .21 or approximately a 21% improvement in productivity. I guess that one way to beat WDDM lag! The GPU usage is 92% max and power usage is 80% for running 1 CPU + .5 GPU mode, while 1 CPU+ 1 GPU mode yields GPU usage of 80% max and power usage of 72%. Those are my results. I hope you understand my logic. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It's ironic that running the longest tasks simultaneously would be the most beneficial in terms of throughput (for some). FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 51 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It's ironic that running the longest tasks simultaneously would be the most beneficial in terms of throughput (for some). Being long was a coincidence. These tasks have a relatively high CPU dependence, which yields a relatively low GPU usage, with WDDM lag on the windows 10 computer and relatively old and slow CPU on the xp computer, there is the bottleneck. By running 1 CPU feeding .5 GPU, you are doubling up the CPU capacity, and so productivity increases. It’s all simple mathematics. I remember a few years ago, we were doing beta testing on multi core CPU tasks. So, if the trend continues, with high CPU dependent tasks, then having 2 or more CPUs feed 1 GPU, would be the logical step to mitigate this bottleneck. I think this was mentioned in 1 of the threads before, and someone said it might be impossible. I don’t think it’s impossible, maybe difficult, but not impossible. |
caffeineyellow5Send message Joined: 30 Jul 14 Posts: 225 Credit: 2,658,976,345 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I got one on my laptop. Windows 8.1 64bit, i7-4900MQ, 32GB RAM, NVIDIA Quadro K2100M @802Mhz mem @ 2504, swan_sync off. At 13.5% now and started as soon as it downloaded, it looks like 15:45 has passed and it might make the 5 day deadline by just squeezing through! We shall see, but it looks good at this point. I've never had a WU fail on this laptop except for downloading errors or crashes related to other programs or my own dumb experimentation with things like swan_sync (lol) OK, so it finished with a few hours to spare on the 5 day deadline! Up until it said it had 1 day left, it had already run 3 days and 8 hours, but the time was moving faster than realtime. Here is the result: http://www.gpugrid.net/result.php?resultid=15244431 4 Days 14.5 Hours was the total time. I cut off all WCG, my antivirus, most regular activity, and turned on swan-sync to push the finish, because I thought it would come a lot closer to missing the deadline. Now to reboot to turn everything back on, but I am glad I could prove myself wrong on this fear of GIANNI. I do however see a trend that will overcome the weaker, older GPUS that are still very abundant throughout the community of crunchers. I don't like the trend. If we could get people to set their systems to not accept short tasks on powerhouse GPUs and then get more short run units, we could have a balance of long runs on strong GPUs and short runs on the others like this laptop and weaker. 1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!" Ephesians 6:18-20, please ;-) http://tbc-pa.org |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 51 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I would agree that these task are more fragile than most of the other tasks. So far I had 2 fail on my computers: e24s139_e8s176p0f481-GIANNI_D3C36bCHL1-0-1-RND3252_0 11708469 26 Aug 2016 | 21:50:04 UTC 27 Aug 2016 | 10:56:02 UTC Error while computing 45,236.39 45,090.61 --- Long runs (8-12 hours on fastest card) v8.48 (cuda65) http://www.gpugrid.net/result.php?resultid=15256435 e8s162_e3s57p0f378-GIANNI_D3C36bCHL1-0-1-RND5354_0 11697156 17 Aug 2016 | 9:52:12 UTC 17 Aug 2016 | 11:40:21 UTC Error while computing 2,516.73 2,505.53 --- Long runs (8-12 hours on fastest card) v8.48 (cuda65) http://www.gpugrid.net/result.php?resultid=15239786 In both cases it was this error: ERROR: file force.cpp line 513: TCL evaluation of [calcforces] 07:42:49 (5856): called boinc_finish I was running both failed tasks at 1 CPU and 1 GPU mode, and the same speeds as the other tasks. Nothing was different. Though, I do have, so far, 15 completed and valid, and 2 more still crunching. |
©2025 Universitat Pompeu Fabra