WU: OPM simulations

Message boards : News : WU: OPM simulations
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

AuthorMessage
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43348 - Posted: 10 May 2016, 10:10:24 UTC - in response to Message 43347.  

I crunched a WU lasting 18.1 hours for 203.850 credits on my GTX970.

This credit ratio is lower then previous "Long run" WUs

Previously I completed "Long run WUs" in 12 hours for 255.000 credits

Similar experience here with

4gbrR2-SDOERR_opm996-0-1-RND1953
3oaxR8-SDOERR_opm996-0-1-RND1378
ID: 43348 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 43349 - Posted: 10 May 2016, 10:33:14 UTC
Last modified: 10 May 2016, 10:36:38 UTC

Seems like I might have underestimated the real runtime. We use a script that caclulates the projected runtime on my 780s. So it seems it's a bit too optimistic on it's estimates :/ Longer WUs (projected time over 18 hours) though should give 4x credits.

I am sorry if it's not comparable to previous WUs. It's the best we can do given the tools. But I think it's not a huge issue since the WU group should be finishing in a day or two. I will consider the underestimation for next time I send out WUs though. Thanks for pointing it out. I hope it's not too much of a bother.

Edit: Gerard might be right. He mentioned that since they are equilibrations they also use CPU. So the difference between estimated time and real time could be due to that. Only Noelia and Nate I think had some experience with equilibrations here on GPUGrid but they are not around anymore. I will keep it in mind when I send more jobs.
ID: 43349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zioriga

Send message
Joined: 30 Oct 08
Posts: 47
Credit: 669,991,028
RAC: 157,210
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43351 - Posted: 10 May 2016, 12:10:50 UTC

OK, Thanks Stefan
ID: 43351 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nanoprobe

Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 43352 - Posted: 10 May 2016, 12:44:54 UTC - in response to Message 43329.  
Last modified: 10 May 2016, 12:47:28 UTC

OPM996 is classed as a Short Run but using 980ti seems it will take around 7 hours

Yet I have it on 2 other machines as a Long Run

Only completed 6% in 30 minutes at 61% GPU utilization

I received several "short" tasks on a machine set run only short tasks. They took 14 hours on a 750Ti. Short tasks in the past only took 4-5 hours on the 750. May I suggest that these be reconsidered as long tasks and not sent to short task only machines.
ID: 43352 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
manalog

Send message
Joined: 23 Sep 15
Posts: 1
Credit: 13,348,148
RAC: 0
Level
Pro
Scientific publications
wat
Message 43353 - Posted: 10 May 2016, 13:18:31 UTC - in response to Message 43352.  

I'm computing the WU 2kogR2-SDOERR_opm996-0-1-RND7448 but after 24hrs it is still at 15% (750ti)... Please don't say me I have to abort it!
ID: 43353 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Skyler Baker

Send message
Joined: 19 Feb 16
Posts: 19
Credit: 140,656,383
RAC: 0
Level
Cys
Scientific publications
wat
Message 43354 - Posted: 10 May 2016, 17:31:08 UTC

Some of them are pretty big. Had one task task take 20 hours. Long tasks usually finish around 6.50-7 hours for reference
ID: 43354 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 43355 - Posted: 10 May 2016, 17:57:34 UTC

[...]I have 2 WU's (2b5fR8 & 2b5fR3) running, one on each of my GTX970's.[...]
Will drop the CPU usage to ~60% and give it a spring clean before starting up to see if that improves things.
- update - that allowed the cards to run at slightly higher clocks (and normally) but it's now only looking like 31.5h. That might yet drop a few hours but these are still extra long tasks.
- update - still looking like about 31.5h (55% after 17h and 51% after ~16h).

With no other CPU WU's running I've recorded ~10% CPU usage for each GTX970 (2) WU while a GTX750 WU at ~6% of my Quad core 3.2GHz Haswell system. (3WU = ~26% total CPU usage)

My GPU(s) current (opm996 long WU) estimated total Runtime. Completion rate (based on) 12~24hours of real-time crunching.

2r83R1 (GTX970) = 27hr 45min (3.600% per 1hr @ 70% GPU usage / 1501MHz)
1bakR0 (GTX970) = 23hr 30min (4.320% per 1hr @ 65% / 1501MHz)
1u27R2 (GTX750) = 40hr (2.520% per 1hr @ 80% / 1401MHz)
2I35R5 (GT650m) = 70hr (1.433% per 1hr @ 75% / 790MHz)

Newer (Beta) BOINC clients introduced an (accurate) per min or hour progress rate feature - available in advanced view (task proprieties) commands bar.

ID: 43355 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43356 - Posted: 10 May 2016, 18:05:46 UTC - in response to Message 43349.  
Last modified: 10 May 2016, 18:27:45 UTC

Seems like I might have underestimated the real runtime. We use a script that caclulates the projected runtime on my 780s. So it seems it's a bit too optimistic on it's estimates :/ Longer WUs (projected time over 18 hours) though should give 4x credits.

It would be hugely appreciated if you could find a way of hooking up the projections of that script to the <rsc_fpops_est> field of the associated workunits. With the BOINC server version in use here, a single mis-estimated task (I have one which has been running for 29 hours already) can mess up the BOINC client's scheduling - for other projects, as well as this one - for the next couple of weeks.
ID: 43356 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TyphooN [Gridcoin]

Send message
Joined: 29 Jun 14
Posts: 5
Credit: 29,718,557
RAC: 0
Level
Val
Scientific publications
wat
Message 43357 - Posted: 10 May 2016, 19:16:55 UTC

I have noticed that the last batch of WUs are worth a lot less credit per time spent crunching, but I also wanted to report that we might have some bad WUs going out. I spent quite a lot of time on this workunit, and noticed that the only returned tasks for this WU is "Error while computing." The workunit in question: https://gpugrid.net/workunit.php?wuid=11593942

It is possible that my GPU is unstable, but considering that I was crunching WUs to completion on GPUgrid before, as well as asteroids/milkyway without error, I believe the chance that the WU that was sent out was corrupt or bad. I will adjust my overclock if need be, but I have been crunching for weeks at these clocks with no problems until now.
ID: 43357 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43358 - Posted: 10 May 2016, 19:24:57 UTC - in response to Message 43357.  
Last modified: 10 May 2016, 19:28:30 UTC

I have noticed that the last batch of WUs are worth a lot less credit per time spent crunching, but I also wanted to report that we might have some bad WUs going out. I spent quite a lot of time on this workunit, and noticed that the only returned tasks for this WU is "Error while computing." The workunit in question: https://gpugrid.net/workunit.php?wuid=11593942

It is possible that my GPU is unstable, but considering that I was crunching WUs to completion on GPUgrid before, as well as asteroids/milkyway without error, I believe the chance that the WU that was sent out was corrupt or bad. I will adjust my overclock if need be, but I have been crunching for weeks at these clocks with no problems until now.


You've been crashing tasks for a couple weeks, including the older 'BestUmbrella_chalcone' ones. See below.
You need to lower your GPU overclock, if you want to be stable with GPUGrid tasks!
Downclock it until you never see "The simulation has become unstable."

https://gpugrid.net/results.php?hostid=319330

https://gpugrid.net/result.php?resultid=15096591
2b6oR6-SDOERR_opm996-0-1-RND0942_1
10 May 2016 | 7:15:45 UTC

https://gpugrid.net/result.php?resultid=15086372
28 Apr 2016 | 11:34:16 UTC
e45s20_e17s22p1f138-GERARD_CXCL12_BestUmbrella_chalcone3441-0-1-RND8729_0

https://gpugrid.net/result.php?resultid=15084884
27 Apr 2016 | 23:52:34 UTC
e44s17_e43s20p1f173-GERARD_CXCL12_BestUmbrella_chalcone2212-0-1-RND5557_0

https://gpugrid.net/result.php?resultid=15084715
26 Apr 2016 | 23:30:01 UTC
e43s16_e31s21p1f321-GERARD_CXCL12_BestUmbrella_chalcone4131-0-1-RND6256_0

https://gpugrid.net/result.php?resultid=15084712
26 Apr 2016 | 23:22:44 UTC
e43s13_e20s7p1f45-GERARD_CXCL12_BestUmbrella_chalcone4131-0-1-RND8139_0

https://gpugrid.net/result.php?resultid=15082560
25 Apr 2016 | 18:49:58 UTC
e42s11_e31s18p1f391-GERARD_CXCL12_BestUmbrella_chalcone2731-0-1-RND8654_0
ID: 43358 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TyphooN [Gridcoin]

Send message
Joined: 29 Jun 14
Posts: 5
Credit: 29,718,557
RAC: 0
Level
Val
Scientific publications
wat
Message 43359 - Posted: 10 May 2016, 19:44:24 UTC - in response to Message 43358.  

I have noticed that the last batch of WUs are worth a lot less credit per time spent crunching, but I also wanted to report that we might have some bad WUs going out. I spent quite a lot of time on this workunit, and noticed that the only returned tasks for this WU is "Error while computing." The workunit in question: https://gpugrid.net/workunit.php?wuid=11593942

It is possible that my GPU is unstable, but considering that I was crunching WUs to completion on GPUgrid before, as well as asteroids/milkyway without error, I believe the chance that the WU that was sent out was corrupt or bad. I will adjust my overclock if need be, but I have been crunching for weeks at these clocks with no problems until now.


You've been crashing tasks for a couple weeks, including the older 'BestUmbrella_chalcone' ones. See below.
You need to lower your GPU overclock, if you want to be stable with GPUGrid tasks!
Downclock it until you never see "The simulation has become unstable."

https://gpugrid.net/results.php?hostid=319330

https://gpugrid.net/result.php?resultid=15096591
2b6oR6-SDOERR_opm996-0-1-RND0942_1
10 May 2016 | 7:15:45 UTC

https://gpugrid.net/result.php?resultid=15086372
28 Apr 2016 | 11:34:16 UTC
e45s20_e17s22p1f138-GERARD_CXCL12_BestUmbrella_chalcone3441-0-1-RND8729_0

https://gpugrid.net/result.php?resultid=15084884
27 Apr 2016 | 23:52:34 UTC
e44s17_e43s20p1f173-GERARD_CXCL12_BestUmbrella_chalcone2212-0-1-RND5557_0

https://gpugrid.net/result.php?resultid=15084715
26 Apr 2016 | 23:30:01 UTC
e43s16_e31s21p1f321-GERARD_CXCL12_BestUmbrella_chalcone4131-0-1-RND6256_0

https://gpugrid.net/result.php?resultid=15084712
26 Apr 2016 | 23:22:44 UTC
e43s13_e20s7p1f45-GERARD_CXCL12_BestUmbrella_chalcone4131-0-1-RND8139_0

https://gpugrid.net/result.php?resultid=15082560
25 Apr 2016 | 18:49:58 UTC
e42s11_e31s18p1f391-GERARD_CXCL12_BestUmbrella_chalcone2731-0-1-RND8654_0


Yeah I was crashing for weeks until I found what I thought were stable clocks. I ran a few WUs without crashing and then GPUgrid was out of work temporarily. From that point I was crunching asteroids/milkyway without any error. I'll lower my GPU clocks or up the voltage and report back. Thanks!
ID: 43359 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Skyler Baker

Send message
Joined: 19 Feb 16
Posts: 19
Credit: 140,656,383
RAC: 0
Level
Cys
Scientific publications
wat
Message 43360 - Posted: 10 May 2016, 22:46:48 UTC


I will agree with the credit being much lower but pretty sure they can't change what projects are worth after the fact, so I'm okay with it.
ID: 43360 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 52,725
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43361 - Posted: 10 May 2016, 23:47:45 UTC - in response to Message 43349.  
Last modified: 10 May 2016, 23:50:00 UTC

For future reference, I have a couple of suggestions:

For WUs running 18 hours +, there should be separate category : "super long runs". I believe this was mentioned in past posts.


The future WU application version should be made less CPU dependent. The WUs are getting longer, GPU s are getting faster, but CPU speed is stagnant. Something got to give, and with the Pascal cards coming out soon, you have put out a new version anyway. So, why not do both? My GPU usage on these latest SDOERR WUs is between 70% to 80%, compared to 85% to 95% for the GERARD BestUmbrella units. I don't think this is reinventing the wheel, just updating it. This does have to be done sooner or later. The Volta cards are coming out in only a few short years.
ID: 43361 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43362 - Posted: 11 May 2016, 9:58:59 UTC - in response to Message 43361.  
Last modified: 11 May 2016, 10:01:55 UTC

For future reference, I have a couple of suggestions:

For WUs running 18 hours +, there should be separate category : "super long runs". I believe this was mentioned in past posts.[quote]

This was previously suggested, however this issue was just an underestimate of the runtimes by the researchers and they would normally increase the credit ratio awarded for extra-long tasks. The real issue would be another queue on the server and facilitating and maintaining that; the short queue is often empty never mind having an extra long queue.

[quote]The future WU application version should be made less CPU dependent.

Generally that's the case but ultimately this is a different type of research and it simply requires that some work be performed on the CPU (you can't do it on the GPU).

The WUs are getting longer, GPU s are getting faster, but CPU speed is stagnant. Something got to give, and with the Pascal cards coming out soon, you have put out a new version anyway.

Over the years WU's have remained about the same length overall. Occasionally there are extra-long tasks but such batches are rare.
GPU's are getting faster, more adapt, the number of shaders (cuda cores here) is increasing and CUDA development continues. The problem isn't just CPU frequency (though an AMD Athlon 64 X2 5000+ is going to struggle a bit), WDDM is an 11%+ bottleneck (increasing with GPU performance) and when the GPUGrid app needs the CPU to perform a calculation it's down to the PCIE bridge and perhaps to some extent not being multi-threaded on the CPU.
Note that it's the same ACEMD app (not updated since Jan) just a different batch of tasks (batches vary depending on the research type).

My GPU usage on these latest SDOERR WUs is between 70% to 80%, compared to 85% to 95% for the GERARD BestUmbrella units. I don't think this is reinventing the wheel, just updating it. This does have to be done sooner or later. The Volta cards are coming out in only a few short years.

I'm seeing ~75% GPU usage on my 970's too. On Linux and XP I would expect it to be around 90%. I've increased my Memory to 3505MHz to reduce the MCU (~41%) @1345MHz [power 85%], ~37% @1253MHz. I've watched the GPU load and it varies continuously with these WU's, as does the Power.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43362 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 43364 - Posted: 11 May 2016, 15:29:22 UTC

Current (Opm996 long) batch are 10 million step simulations with a large "nAtom" variation (30k <<>> 90k).
Per step compute time "nAtom" variation create runtime differences for the same GPU's.
Expect variable results due to OPM996 changeable atom amount.
The WU's atom amount shown upon task completion in <stderr> file.

Generally the number of atoms determine GPU usage: The more atoms a simulation has - the higher GPU usage.
So far prior OPM996 WU on my 970's shown a 10% (60~70%) GPU usage (63k atoms .vs. 88k) for a ~5hr completion time variation.
My current (2) opm996 tasks difference will be near ~7hr (19 & 26hr) total runtime (69% & 62%) core usage on 970's. The GPU with lower usage (fewer atoms) finishes a WU faster.

Completed OPM996 (long WU) show >90k atoms (>24hr runtime) being rewarded with (over 630,000 credits) while a 88478 atoms task is within the 336.000 range. (<24hr runtime) 63016 atom task around 294.000 credits. 52602 atoms = 236,700.



ID: 43364 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43365 - Posted: 11 May 2016, 16:17:46 UTC - in response to Message 43364.  
Last modified: 11 May 2016, 16:20:53 UTC

4n6hR2-SDOERR_opm996-0-1-RND7021_0 100.440 Natoms 16h 36m 46s (59.806s) 903.900 credits (GTX980Ti)
I see 85%~89%~90% GPU usage on my WinXPx64 hosts.
BTW there's no distinction of these workunits on the performance page, that makes me pretty sad.
ID: 43365 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43366 - Posted: 11 May 2016, 17:22:08 UTC - in response to Message 43365.  
Last modified: 11 May 2016, 17:27:44 UTC

If you have a small GPU, it's likely you will not finish some of these tasks inside 5 days which 'I think' is still the cut-off for credit [correct me if I'm wrong]! Suggest people with small cards look at the progression % and time taken in Boinc Manager and from that work out if the tasks will finish inside 5 days or not. If it's going to take over 120h consider aborting.

Also worth taking some extra measures to ensure these extra long tasks complete. For me that means configuring MSI afterburner/NVIDIA Inspector to Prioritise Temperature and set it to something sensible such as 69C. I also freed up another CPU thread to help it along and set the memory to 3505MHz.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43366 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frederikhk

Send message
Joined: 14 Apr 14
Posts: 8
Credit: 57,034,536
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 43367 - Posted: 11 May 2016, 17:49:05 UTC

This one looks like it's going to take 30hrs :) 44% after 13 hours.

https://www.gpugrid.net/result.php?resultid=15094957
ID: 43367 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43368 - Posted: 11 May 2016, 21:05:14 UTC - in response to Message 43367.  

My last 2 valid tasks (970's) took around 31h also.
Presently running 2 tasks that should take 25.5h and 29h on 970's.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43368 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MrJo

Send message
Joined: 18 Apr 14
Posts: 43
Credit: 1,192,135,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 43369 - Posted: 11 May 2016, 21:10:50 UTC - in response to Message 43349.  

though should give 4x credits.

where is my 4x credit? I see only at least twice as much running time with less credit..

Regards, Josef

ID: 43369 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

Message boards : News : WU: OPM simulations

©2025 Universitat Pompeu Fabra