Message boards :
News :
WU: OPM simulations
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next
Author | Message |
---|---|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I crunched a WU lasting 18.1 hours for 203.850 credits on my GTX970. Similar experience here with 4gbrR2-SDOERR_opm996-0-1-RND1953 3oaxR8-SDOERR_opm996-0-1-RND1378 |
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Seems like I might have underestimated the real runtime. We use a script that caclulates the projected runtime on my 780s. So it seems it's a bit too optimistic on it's estimates :/ Longer WUs (projected time over 18 hours) though should give 4x credits. I am sorry if it's not comparable to previous WUs. It's the best we can do given the tools. But I think it's not a huge issue since the WU group should be finishing in a day or two. I will consider the underestimation for next time I send out WUs though. Thanks for pointing it out. I hope it's not too much of a bother. Edit: Gerard might be right. He mentioned that since they are equilibrations they also use CPU. So the difference between estimated time and real time could be due to that. Only Noelia and Nate I think had some experience with equilibrations here on GPUGrid but they are not around anymore. I will keep it in mind when I send more jobs. |
Send message Joined: 30 Oct 08 Posts: 47 Credit: 669,991,028 RAC: 157,210 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
OK, Thanks Stefan |
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
OPM996 is classed as a Short Run but using 980ti seems it will take around 7 hours I received several "short" tasks on a machine set run only short tasks. They took 14 hours on a 750Ti. Short tasks in the past only took 4-5 hours on the 750. May I suggest that these be reconsidered as long tasks and not sent to short task only machines. |
Send message Joined: 23 Sep 15 Posts: 1 Credit: 13,348,148 RAC: 0 Level ![]() Scientific publications ![]() |
I'm computing the WU 2kogR2-SDOERR_opm996-0-1-RND7448 but after 24hrs it is still at 15% (750ti)... Please don't say me I have to abort it! |
Send message Joined: 19 Feb 16 Posts: 19 Credit: 140,656,383 RAC: 0 Level ![]() Scientific publications ![]() |
Some of them are pretty big. Had one task task take 20 hours. Long tasks usually finish around 6.50-7 hours for reference |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
[...]I have 2 WU's (2b5fR8 & 2b5fR3) running, one on each of my GTX970's.[...] With no other CPU WU's running I've recorded ~10% CPU usage for each GTX970 (2) WU while a GTX750 WU at ~6% of my Quad core 3.2GHz Haswell system. (3WU = ~26% total CPU usage) My GPU(s) current (opm996 long WU) estimated total Runtime. Completion rate (based on) 12~24hours of real-time crunching. 2r83R1 (GTX970) = 27hr 45min (3.600% per 1hr @ 70% GPU usage / 1501MHz) 1bakR0 (GTX970) = 23hr 30min (4.320% per 1hr @ 65% / 1501MHz) 1u27R2 (GTX750) = 40hr (2.520% per 1hr @ 80% / 1401MHz) 2I35R5 (GT650m) = 70hr (1.433% per 1hr @ 75% / 790MHz) Newer (Beta) BOINC clients introduced an (accurate) per min or hour progress rate feature - available in advanced view (task proprieties) commands bar. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Seems like I might have underestimated the real runtime. We use a script that caclulates the projected runtime on my 780s. So it seems it's a bit too optimistic on it's estimates :/ Longer WUs (projected time over 18 hours) though should give 4x credits. It would be hugely appreciated if you could find a way of hooking up the projections of that script to the <rsc_fpops_est> field of the associated workunits. With the BOINC server version in use here, a single mis-estimated task (I have one which has been running for 29 hours already) can mess up the BOINC client's scheduling - for other projects, as well as this one - for the next couple of weeks. |
Send message Joined: 29 Jun 14 Posts: 5 Credit: 29,718,557 RAC: 0 Level ![]() Scientific publications ![]() |
I have noticed that the last batch of WUs are worth a lot less credit per time spent crunching, but I also wanted to report that we might have some bad WUs going out. I spent quite a lot of time on this workunit, and noticed that the only returned tasks for this WU is "Error while computing." The workunit in question: https://gpugrid.net/workunit.php?wuid=11593942 It is possible that my GPU is unstable, but considering that I was crunching WUs to completion on GPUgrid before, as well as asteroids/milkyway without error, I believe the chance that the WU that was sent out was corrupt or bad. I will adjust my overclock if need be, but I have been crunching for weeks at these clocks with no problems until now. |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have noticed that the last batch of WUs are worth a lot less credit per time spent crunching, but I also wanted to report that we might have some bad WUs going out. I spent quite a lot of time on this workunit, and noticed that the only returned tasks for this WU is "Error while computing." The workunit in question: https://gpugrid.net/workunit.php?wuid=11593942 You've been crashing tasks for a couple weeks, including the older 'BestUmbrella_chalcone' ones. See below. You need to lower your GPU overclock, if you want to be stable with GPUGrid tasks! Downclock it until you never see "The simulation has become unstable." https://gpugrid.net/results.php?hostid=319330 https://gpugrid.net/result.php?resultid=15096591 2b6oR6-SDOERR_opm996-0-1-RND0942_1 10 May 2016 | 7:15:45 UTC https://gpugrid.net/result.php?resultid=15086372 28 Apr 2016 | 11:34:16 UTC e45s20_e17s22p1f138-GERARD_CXCL12_BestUmbrella_chalcone3441-0-1-RND8729_0 https://gpugrid.net/result.php?resultid=15084884 27 Apr 2016 | 23:52:34 UTC e44s17_e43s20p1f173-GERARD_CXCL12_BestUmbrella_chalcone2212-0-1-RND5557_0 https://gpugrid.net/result.php?resultid=15084715 26 Apr 2016 | 23:30:01 UTC e43s16_e31s21p1f321-GERARD_CXCL12_BestUmbrella_chalcone4131-0-1-RND6256_0 https://gpugrid.net/result.php?resultid=15084712 26 Apr 2016 | 23:22:44 UTC e43s13_e20s7p1f45-GERARD_CXCL12_BestUmbrella_chalcone4131-0-1-RND8139_0 https://gpugrid.net/result.php?resultid=15082560 25 Apr 2016 | 18:49:58 UTC e42s11_e31s18p1f391-GERARD_CXCL12_BestUmbrella_chalcone2731-0-1-RND8654_0 |
Send message Joined: 29 Jun 14 Posts: 5 Credit: 29,718,557 RAC: 0 Level ![]() Scientific publications ![]() |
I have noticed that the last batch of WUs are worth a lot less credit per time spent crunching, but I also wanted to report that we might have some bad WUs going out. I spent quite a lot of time on this workunit, and noticed that the only returned tasks for this WU is "Error while computing." The workunit in question: https://gpugrid.net/workunit.php?wuid=11593942 Yeah I was crashing for weeks until I found what I thought were stable clocks. I ran a few WUs without crashing and then GPUgrid was out of work temporarily. From that point I was crunching asteroids/milkyway without any error. I'll lower my GPU clocks or up the voltage and report back. Thanks! |
Send message Joined: 19 Feb 16 Posts: 19 Credit: 140,656,383 RAC: 0 Level ![]() Scientific publications ![]() |
I will agree with the credit being much lower but pretty sure they can't change what projects are worth after the fact, so I'm okay with it. |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 52,725 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
For future reference, I have a couple of suggestions: For WUs running 18 hours +, there should be separate category : "super long runs". I believe this was mentioned in past posts. The future WU application version should be made less CPU dependent. The WUs are getting longer, GPU s are getting faster, but CPU speed is stagnant. Something got to give, and with the Pascal cards coming out soon, you have put out a new version anyway. So, why not do both? My GPU usage on these latest SDOERR WUs is between 70% to 80%, compared to 85% to 95% for the GERARD BestUmbrella units. I don't think this is reinventing the wheel, just updating it. This does have to be done sooner or later. The Volta cards are coming out in only a few short years. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
For future reference, I have a couple of suggestions: Generally that's the case but ultimately this is a different type of research and it simply requires that some work be performed on the CPU (you can't do it on the GPU). The WUs are getting longer, GPU s are getting faster, but CPU speed is stagnant. Something got to give, and with the Pascal cards coming out soon, you have put out a new version anyway. Over the years WU's have remained about the same length overall. Occasionally there are extra-long tasks but such batches are rare. GPU's are getting faster, more adapt, the number of shaders (cuda cores here) is increasing and CUDA development continues. The problem isn't just CPU frequency (though an AMD Athlon 64 X2 5000+ is going to struggle a bit), WDDM is an 11%+ bottleneck (increasing with GPU performance) and when the GPUGrid app needs the CPU to perform a calculation it's down to the PCIE bridge and perhaps to some extent not being multi-threaded on the CPU. Note that it's the same ACEMD app (not updated since Jan) just a different batch of tasks (batches vary depending on the research type). My GPU usage on these latest SDOERR WUs is between 70% to 80%, compared to 85% to 95% for the GERARD BestUmbrella units. I don't think this is reinventing the wheel, just updating it. This does have to be done sooner or later. The Volta cards are coming out in only a few short years. I'm seeing ~75% GPU usage on my 970's too. On Linux and XP I would expect it to be around 90%. I've increased my Memory to 3505MHz to reduce the MCU (~41%) @1345MHz [power 85%], ~37% @1253MHz. I've watched the GPU load and it varies continuously with these WU's, as does the Power. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Current (Opm996 long) batch are 10 million step simulations with a large "nAtom" variation (30k <<>> 90k). Per step compute time "nAtom" variation create runtime differences for the same GPU's. Expect variable results due to OPM996 changeable atom amount. The WU's atom amount shown upon task completion in <stderr> file. Generally the number of atoms determine GPU usage: The more atoms a simulation has - the higher GPU usage. So far prior OPM996 WU on my 970's shown a 10% (60~70%) GPU usage (63k atoms .vs. 88k) for a ~5hr completion time variation. My current (2) opm996 tasks difference will be near ~7hr (19 & 26hr) total runtime (69% & 62%) core usage on 970's. The GPU with lower usage (fewer atoms) finishes a WU faster. Completed OPM996 (long WU) show >90k atoms (>24hr runtime) being rewarded with (over 630,000 credits) while a 88478 atoms task is within the 336.000 range. (<24hr runtime) 63016 atom task around 294.000 credits. 52602 atoms = 236,700. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
4n6hR2-SDOERR_opm996-0-1-RND7021_0 100.440 Natoms 16h 36m 46s (59.806s) 903.900 credits (GTX980Ti) I see 85%~89%~90% GPU usage on my WinXPx64 hosts. BTW there's no distinction of these workunits on the performance page, that makes me pretty sad. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If you have a small GPU, it's likely you will not finish some of these tasks inside 5 days which 'I think' is still the cut-off for credit [correct me if I'm wrong]! Suggest people with small cards look at the progression % and time taken in Boinc Manager and from that work out if the tasks will finish inside 5 days or not. If it's going to take over 120h consider aborting. Also worth taking some extra measures to ensure these extra long tasks complete. For me that means configuring MSI afterburner/NVIDIA Inspector to Prioritise Temperature and set it to something sensible such as 69C. I also freed up another CPU thread to help it along and set the memory to 3505MHz. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 14 Apr 14 Posts: 8 Credit: 57,034,536 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
This one looks like it's going to take 30hrs :) 44% after 13 hours. https://www.gpugrid.net/result.php?resultid=15094957 |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My last 2 valid tasks (970's) took around 31h also. Presently running 2 tasks that should take 25.5h and 29h on 970's. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
though should give 4x credits. where is my 4x credit? I see only at least twice as much running time with less credit.. Regards, Josef ![]() |
©2025 Universitat Pompeu Fabra