WU: OPM simulations

Message boards : News : WU: OPM simulations
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

AuthorMessage
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43429 - Posted: 15 May 2016, 12:40:31 UTC - in response to Message 43428.  

Top Average Performers is a very misleading and ill-conceived chart because it is based on average performance of a user and ALL his hosts rather than a host in particular.
Retvari has a lot of hosts with a mixture of cards and arguably hosts with the fastest return and throughput on GPUGrid. This mixture of hosts/cards puts him well in front on WUs completed but because times are averaged, behind on performance in hours.
Bedrich has only 2 hosts with at least 2 980ti's and possibly 3, so, because he doesn't have any slower cards when his return time in hours is averaged over all his hosts/cards end up at the top of the chart despite producing less than half of completed WUs as Retvari.

Got one of the looser-files again: https://www.gpugrid.net/result.php?resultid=15103135. Only 171,150 Credits for a runtime of 64,394.64

Now I've got this one: https://www.gpugrid.net/result.php?resultid=15104507 Haw can I see if it's a good ore a bad one?


There are no good or bad ones, there are just some you get more or less credit for.
ID: 43429 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 43430 - Posted: 15 May 2016, 12:43:17 UTC - in response to Message 43424.  

@Retvari

Congrats to No. 1 of the TOP Crunchers ;-)
By the way: Does anybody knows what happened to Stoneageman? Sunk silently in the ground?

Got one of the looser-files again: https://www.gpugrid.net/result.php?resultid=15103135. Only 171,150 Credits for a runtime of 64,394.64

Now I've got this one: https://www.gpugrid.net/result.php?resultid=15104507 Haw can I see if it's a good ore a bad one?

The Natom amount only known after a WU validates cleanly in (<stderr> file). One way to gauge Natom size is the GPU memory usage. The really big models (credit wise and Natom seem to confirm some OPM are the largest ACEMD ever crunched) near 1.5GB while smaller models are <1.2GB or less. Long OPM WU Natom (29k to 120K) varies to point where the cruncher doesn't really know what to expect for credit. (I like this new feature since no credit amount is fixed.)

Waiting on (2) formerly Timed out OPM to finish up on my 970's (23hr & 25hr estimated runtime). OPM WU was sent after hot day then cool evening ocean breeze -97 error GERARD_FXCXCL (50C GTX970) bent the knee for May sun.
Your GPUs are too hot. Your GT 630 reaches 80°C (176°F), while in your laptop your GT650M reaches 93°C (199°F) which is crazy.
Your host with 4 GPUs has two GTX970s, a GTX 750 and a GT630.
There's no point in risking the stability of the simulations running on your fast GPUs by putting low-end GPUs in the same host.
Packing 4 GPU to a single PC for 24/7 crunching requires water cooling, (or PCIe riser cards to make breathing space between the cards).
Crunching on laptops is not recommended. But if you do, you should place your laptop on its side while not in use, to make the air outlet facing up and the bottom of the laptop vertical (so the fan could take more air in). You should also regularly clean the fan & the fins with compressed air.


Heed the good advice!

Note that 93C is the GPU's temperature cut-off point. The GPU self-throttles to protect itself because it's dangerously hot. It doesn't have a cut-off point to protect the rest of the system and GPU's are Not designed to run at high temps continuously. Use temperature and fan controlling apps such as NVIDIA Inspector and MSI Afterburner to protect your hardware.

Tending the GPU advice - I will reconfigure. I've found an WinXP home edition UlCPC key plus it's sata1 hard drive - will a ULCPC key copied onto a USB work with a desktop system? I also have a USB drive Linux debian (tails 2.3) OS as well Parrot 3.0 I could set-up for a WIn8.1 dual-boot. Though the grapevine birds chirp mentioned graphic card performance is non-existent compared to mainline 4.* linux.

I'd really like to lose the WDDM choke point so my future Pascal cards are efficient as possible.
ID: 43430 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
sis651

Send message
Joined: 25 Nov 13
Posts: 66
Credit: 282,724,028
RAC: 69
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwat
Message 43432 - Posted: 15 May 2016, 19:06:11 UTC

I get one of these long runs to my notebook with GT740M. As it was slow, I stopped all other CPU projects; it was still slow. After about 60 - 70 hours it was still about %50. Anyway, it ended up with errors and I'm not getting any long runs anymore. It seems they won't finish on time...
Waiting for short runs now.
ID: 43432 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 43433 - Posted: 15 May 2016, 19:24:26 UTC

both of the below WUs crunched with a GTX980Ti:

e4s15_e2s1p0f633-GERARD_FXCXCL12R_2189739_1-0-1-RND7197_0
22,635.45 / 22,538.23 / 249,600.00

1bakR6-SDOERR_opm996-0-1-RND6740_2
41,181.13 / 40,910.27 / 236,250.00

the second one almost double crunching time, but less points earned.
what explains for this big difference?
ID: 43433 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43434 - Posted: 15 May 2016, 20:26:08 UTC - in response to Message 43433.  
Last modified: 15 May 2016, 20:31:30 UTC

GERARD_FXCXCL12R is a typical work unit in terms of credits awarded.
The SDOERR_opm tasks unpredictably vary in size/runtime. The credits awarded were guestimates. However, these are probably one-off primer work units that will hopefully feed future runs (where potentially interesting results have been observed). Another way to look at it is that you are doing cutting edge theoretical/proof of concept science, never done before - it's bumpy.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43434 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43435 - Posted: 15 May 2016, 21:57:54 UTC - in response to Message 43428.  
Last modified: 15 May 2016, 21:58:26 UTC

By the way: Does anybody knows what happened to Stoneageman?
He is crunching Einstein@home for some time now. He is ranked #8 regarding the total credits earned and #4 regarding RAC at the moment.
ID: 43435 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43436 - Posted: 15 May 2016, 22:09:33 UTC - in response to Message 43430.  
Last modified: 15 May 2016, 22:10:09 UTC

The Natom amount only known after a WU validates cleanly in (<stderr> file).
The number of atoms of a running task can be found in the project's folder, in a file named as the task plus a _0 attached to the end.
Though it has no .txt extension this is a clear text file, so if you open it with notepad you will find a line (5th) which contains this number:
# Topology reports 32227 atoms
ID: 43436 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MrJo

Send message
Joined: 18 Apr 14
Posts: 43
Credit: 1,192,135,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 43438 - Posted: 16 May 2016, 7:30:34 UTC - in response to Message 43435.  
Last modified: 16 May 2016, 7:32:39 UTC

He is crunching Einstein@home for some time now.


The Natom amount only known after a WU validates cleanly in (<stderr> file)The number of atoms of a running task can be found in the project's folder, in a file named as the task plus a _0 attached to the end.
Though it has no .txt extension this is a clear text file, so if you open it with notepad you will find a line (5th) which contains this number:
# Topology reports 32227 atoms


Thankyou for the explanation
Regards, Josef

ID: 43438 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MrJo

Send message
Joined: 18 Apr 14
Posts: 43
Credit: 1,192,135,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 43439 - Posted: 16 May 2016, 7:34:29 UTC - in response to Message 43434.  

Another way to look at it is that you are doing cutting edge theoretical/proof of concept science, never done before - it's bumpy.

I'll look at it from this angle. ;-)

Regards, Josef

ID: 43439 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43440 - Posted: 16 May 2016, 12:14:15 UTC
Last modified: 16 May 2016, 12:21:42 UTC

This one took over 5 days to get to me https://www.gpugrid.net/workunit.php?wuid=11595181

Completed in just under 24hrs for 1,095,000

Come on admins do something about the "5 Day Timeout" and continual error machines. Next WU took over 6 days to get to me. https://www.gpugrid.net/workunit.php?wuid=11595161 and also stop people caching WUs for more than an hour.
ID: 43440 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 43441 - Posted: 16 May 2016, 12:29:12 UTC

Ah you guys actually reminded me of the obvious fact that the credit calculations might be off in respect to the runtime if the system does not fit into GPU memory. Afaik if the system does not fully fit in the GPU (which might happen with quite a few of the OPM systems) it will simulate quite a bit slower.
I think this is not accounted for in the credit calculation.

On the other hand, the exact same credit calculation was used for my WUs as for Gerard's. The difference is that Gerard's are just one system and not 350 different ones like mine, so it's easy to be consistent in credits when the number of atoms doesn't change ;)

In any case I would like to thank you all for pushing through with this. It's nearly finished now so I can get to looking at the results.

Many thanks for the great work :)
ID: 43441 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43442 - Posted: 16 May 2016, 22:10:20 UTC - in response to Message 43441.  
Last modified: 16 May 2016, 22:17:31 UTC

I expect the problem was predominantly the varying number of atoms - the more atoms the longer the runtime. You would have needed to factor the atom count variable into the credit model for it to work perfectly. As any subsequent runs will likely have fixed atom counts (but varying per batch) I expect they can be calibrated as normal. If further primer runs are needed it would be good to factor the atom count into the credits.

The largest amount of GDDR I've seen being used is 1.5GB but based on reported atom counts some tasks might have been a little higher. Not all of the tasks use as much, many used <1GB so this was only a problem for some tasks that tried to run on GPU's with small amounts of GDDR (1GB mostly, but possibly a few [rare] 1.5GB cards [GT640's, 660 OEM's, 670M/670MX, the 192-bit GTX760 or some of the even rarer 400/500 series cards], or people trying to run 2 tasks on a 2GB card simultaneously).
Most cards have 2GB or more GDDR and most of the 1GB cards failed immediately when the tasks required more than 1GB GDDR. The 1GB cards that did complete tasks probably finished tasks that didn't require >1GB GDDR, otherwise they would have been heavily restricted as you suggest and experienced even greater PCIE bus usage which was already higher with this batch.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43442 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 52,725
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43443 - Posted: 16 May 2016, 23:00:23 UTC - in response to Message 43442.  

I expect the problem was predominantly the varying number of atoms - the more atoms the longer the runtime. You would have needed to factor the atom count variable into the credit model for it to work perfectly. As any subsequent runs will likely have fixed atom counts (but varying per batch) I expect they can be calibrated as normal. If further primer runs are needed it would be good to factor the atom count into the credits.

The largest amount of GDDR I've seen being used is 1.5GB but based on reported atom counts some tasks might have been a little higher. Not all of the tasks use as much, many used <1GB so this was only a problem for some tasks that tried to run on GPU's with small amounts of GDDR (1GB mostly, but possibly a few [rare] 1.5GB cards [GT640's, 660 OEM's, 670M/670MX, the 192-bit GTX760 or some of the even rarer 400/500 series cards], or people trying to run 2 tasks on a 2GB card simultaneously).
Most cards have 2GB or more GDDR and most of the 1GB cards failed immediately when the tasks required more than 1GB GDDR. The 1GB cards that did complete tasks probably finished tasks that didn't require >1GB GDDR, otherwise they would have been heavily restricted as you suggest and experienced even greater PCIE bus usage which was already higher with this batch.



More atoms also mean a higher GPU usage. I am currently crunching a WU with 107,436 atoms. My GPU usage is 83%, compared to the low atom WUs in this batch of 71%. Which is on a windows 10 computer with WDDM lag. My current GPU memory usage is 1692 MB.


The GERARD_FXCXCL WU, by comparison, that I am running concurrently on this machine on the other card, are 80% GPU usage and 514 MB GPU memory usage with 31,718 atoms.

The power usage is the same 75% for each WU, each running on 980Ti card.


ID: 43443 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43449 - Posted: 19 May 2016, 16:17:44 UTC - in response to Message 43396.  

717000 credits (with 25% bonus) was the highest I received. Would have been 860700 if returned inside 24h, but would require a bigger card.
On Linux or Win XP I'm sure a GTX970 could return some of these inside 24h.

The OPMs were hopeless on all but the fastest cards. Even the Gerards lately seem to be sized to cut out the large base of super-clocked 750 Ti cards at least on the dominant WDDM based machines (the 750 Tis are still some of the most efficient GPUs that NV has ever produced). In the meantime file sizes have increased and much time is used just in the upload process. I wonder just how important it is to keep the bonus deadlines so tight considering the larger file sizes and and the fact that the admins don't even seem to be able to follow up on the WUs we're crunching by keeping new ones in the queues. It wasn't long ago that the WU times doubled, not sure why.

Seems a few are gaining a bit of speed by running XP. Is that safe, considering the lack of support from MS? I've also been wanting to try running a Linux image (perhaps even from USB), but the image here hasn't been updated in years. Even sent one of the users a new GPU so he could work on a new Linux image for GPUGrid but nothing ever came of it. Any of the Linux experts up to this job?
ID: 43449 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43450 - Posted: 19 May 2016, 17:07:18 UTC - in response to Message 43449.  

Most of the issues are due to lack of personnel at GPUGrid. The research is mostly performed by the research students and several have just finished.

If you only use XP to crunch then you are limiting the risk. Anti virus packages and firewalls still work on XP.

Ubuntu 16.04 has been released recently. I'm looking to try it soon and see if there is a simple way to get it up and running for here; repository drivers + Boinc from the repository. If I can I will write it up. Alas, with every version so many commands change and new problems pop up that it's always a learning process.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43450 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43451 - Posted: 19 May 2016, 18:06:02 UTC - in response to Message 43450.  

Most of the issues are due to lack of personnel at GPUGrid. The research is mostly performed by the research students and several have just finished.

If you only use XP to crunch then you are limiting the risk. Anti virus packages and firewalls still work on XP.

Ubuntu 16.04 has been released recently. I'm looking to try it soon and see if there is a simple way to get it up and running for here; repository drivers + Boinc from the repository. If I can I will write it up. Alas, with every version so many commands change and new problems pop up that it's always a learning process.

Thanks SK. Hope that you can get the Linux info updated. It would be much appreciated. I'm leery about XP at this point. Please keep us posted.

I've been doing a little research into the 1 and 2 day bonus deadlines mostly by looking at a lot of different hosts. It's interesting. By moving WUs just past the 1 day deadline for a large number of GPUs, the work return may actually be getting slower. The users with the very fast GPUs generally cache as many as allowed and return times end up being close to 1 day anyway. On the other hand for instance most of my GPUs are the factory OCed 750 Ti (very popular on this project). When they were making the 1 day deadline, I set them as the only NV project and at 0 project priority. The new WU would be fetched when the old WU was returned. Zero lag. Now since I can't quite make the 1 day cutoff anyway, I set the queue for 1/2 day. Thus the turn around time is much slower (but still well inside the 2 day limit) and I actually get significantly more credit (especially when WUs are scarce). This too tight turnaround strategy by the project can actually be harmful to their overall throughput.
ID: 43451 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Skyler Baker

Send message
Joined: 19 Feb 16
Posts: 19
Credit: 140,656,383
RAC: 0
Level
Cys
Scientific publications
wat
Message 43452 - Posted: 20 May 2016, 1:59:06 UTC

Some of the new Geralds are definitely a bit long as well, they seem to run about 12.240% per hour, which wouldn't be very much except that's with a overclocked 980ti, nearly the best possible scenario until pascal later this month. Like others have said, it doesn't effect me, but it would be a long time with a slower card.
ID: 43452 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43454 - Posted: 20 May 2016, 11:26:24 UTC
Last modified: 20 May 2016, 11:48:31 UTC

This one took 10 Days 8 hours to get to me https://www.gpugrid.net/workunit.php?wuid=11595052

This work and all other work could be done much more quickly and efficiently if the project addressed this problem.

I imagine it would also increase the amount of work GPUGrid could accomplish and scientists might have higher confidence in the results.

TO ADD

One of Gerards took 3 and a 1/2 days to get to my slowest machine https://www.gpugrid.net/workunit.php?wuid=11600399
ID: 43454 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43456 - Posted: 20 May 2016, 15:16:17 UTC - in response to Message 43454.  

This one took 10 Days 8 hours to get to me https://www.gpugrid.net/workunit.php?wuid=11595052

This work and all other work could be done much more quickly and efficiently if the project addressed this problem.

I imagine it would also increase the amount of work GPUGrid could accomplish and scientists might have higher confidence in the results.

TO ADD

One of Gerards took 3 and a 1/2 days to get to my slowest machine https://www.gpugrid.net/workunit.php?wuid=11600399

Interesting that most of the failures were from fast GPUs, even 3x 980Ti and a Titan among others. Are people OCing to much? In the "research" I mentioned above I've noticed MANY 980Ti, Titan and Titan X cards throwing constant failures. Surprised me to say the least.
ID: 43456 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43457 - Posted: 20 May 2016, 15:21:49 UTC - in response to Message 43454.  

ID: 43457 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

Message boards : News : WU: OPM simulations

©2025 Universitat Pompeu Fabra