Message boards :
News :
WU: OPM simulations
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next
Author | Message |
---|---|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Top Average Performers is a very misleading and ill-conceived chart because it is based on average performance of a user and ALL his hosts rather than a host in particular. Retvari has a lot of hosts with a mixture of cards and arguably hosts with the fastest return and throughput on GPUGrid. This mixture of hosts/cards puts him well in front on WUs completed but because times are averaged, behind on performance in hours. Bedrich has only 2 hosts with at least 2 980ti's and possibly 3, so, because he doesn't have any slower cards when his return time in hours is averaged over all his hosts/cards end up at the top of the chart despite producing less than half of completed WUs as Retvari. Got one of the looser-files again: https://www.gpugrid.net/result.php?resultid=15103135. Only 171,150 Credits for a runtime of 64,394.64 There are no good or bad ones, there are just some you get more or less credit for. |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@Retvari The Natom amount only known after a WU validates cleanly in (<stderr> file). One way to gauge Natom size is the GPU memory usage. The really big models (credit wise and Natom seem to confirm some OPM are the largest ACEMD ever crunched) near 1.5GB while smaller models are <1.2GB or less. Long OPM WU Natom (29k to 120K) varies to point where the cruncher doesn't really know what to expect for credit. (I like this new feature since no credit amount is fixed.) Waiting on (2) formerly Timed out OPM to finish up on my 970's (23hr & 25hr estimated runtime). OPM WU was sent after hot day then cool evening ocean breeze -97 error GERARD_FXCXCL (50C GTX970) bent the knee for May sun.Your GPUs are too hot. Your GT 630 reaches 80°C (176°F), while in your laptop your GT650M reaches 93°C (199°F) which is crazy. Tending the GPU advice - I will reconfigure. I've found an WinXP home edition UlCPC key plus it's sata1 hard drive - will a ULCPC key copied onto a USB work with a desktop system? I also have a USB drive Linux debian (tails 2.3) OS as well Parrot 3.0 I could set-up for a WIn8.1 dual-boot. Though the grapevine birds chirp mentioned graphic card performance is non-existent compared to mainline 4.* linux. I'd really like to lose the WDDM choke point so my future Pascal cards are efficient as possible. |
Send message Joined: 25 Nov 13 Posts: 66 Credit: 282,724,028 RAC: 62 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I get one of these long runs to my notebook with GT740M. As it was slow, I stopped all other CPU projects; it was still slow. After about 60 - 70 hours it was still about %50. Anyway, it ended up with errors and I'm not getting any long runs anymore. It seems they won't finish on time... Waiting for short runs now. |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 869 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
both of the below WUs crunched with a GTX980Ti: e4s15_e2s1p0f633-GERARD_FXCXCL12R_2189739_1-0-1-RND7197_0 22,635.45 / 22,538.23 / 249,600.00 1bakR6-SDOERR_opm996-0-1-RND6740_2 41,181.13 / 40,910.27 / 236,250.00 the second one almost double crunching time, but less points earned. what explains for this big difference? |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
GERARD_FXCXCL12R is a typical work unit in terms of credits awarded. The SDOERR_opm tasks unpredictably vary in size/runtime. The credits awarded were guestimates. However, these are probably one-off primer work units that will hopefully feed future runs (where potentially interesting results have been observed). Another way to look at it is that you are doing cutting edge theoretical/proof of concept science, never done before - it's bumpy. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
By the way: Does anybody knows what happened to Stoneageman?He is crunching Einstein@home for some time now. He is ranked #8 regarding the total credits earned and #4 regarding RAC at the moment. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The Natom amount only known after a WU validates cleanly in (<stderr> file).The number of atoms of a running task can be found in the project's folder, in a file named as the task plus a _0 attached to the end. Though it has no .txt extension this is a clear text file, so if you open it with notepad you will find a line (5th) which contains this number: # Topology reports 32227 atoms |
Send message Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
He is crunching Einstein@home for some time now. The Natom amount only known after a WU validates cleanly in (<stderr> file)The number of atoms of a running task can be found in the project's folder, in a file named as the task plus a _0 attached to the end. Thankyou for the explanation Regards, Josef ![]() |
Send message Joined: 18 Apr 14 Posts: 43 Credit: 1,192,135,172 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Another way to look at it is that you are doing cutting edge theoretical/proof of concept science, never done before - it's bumpy. I'll look at it from this angle. ;-) Regards, Josef ![]() |
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This one took over 5 days to get to me https://www.gpugrid.net/workunit.php?wuid=11595181 Completed in just under 24hrs for 1,095,000 Come on admins do something about the "5 Day Timeout" and continual error machines. Next WU took over 6 days to get to me. https://www.gpugrid.net/workunit.php?wuid=11595161 and also stop people caching WUs for more than an hour. |
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Ah you guys actually reminded me of the obvious fact that the credit calculations might be off in respect to the runtime if the system does not fit into GPU memory. Afaik if the system does not fully fit in the GPU (which might happen with quite a few of the OPM systems) it will simulate quite a bit slower. I think this is not accounted for in the credit calculation. On the other hand, the exact same credit calculation was used for my WUs as for Gerard's. The difference is that Gerard's are just one system and not 350 different ones like mine, so it's easy to be consistent in credits when the number of atoms doesn't change ;) In any case I would like to thank you all for pushing through with this. It's nearly finished now so I can get to looking at the results. Many thanks for the great work :) |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I expect the problem was predominantly the varying number of atoms - the more atoms the longer the runtime. You would have needed to factor the atom count variable into the credit model for it to work perfectly. As any subsequent runs will likely have fixed atom counts (but varying per batch) I expect they can be calibrated as normal. If further primer runs are needed it would be good to factor the atom count into the credits. The largest amount of GDDR I've seen being used is 1.5GB but based on reported atom counts some tasks might have been a little higher. Not all of the tasks use as much, many used <1GB so this was only a problem for some tasks that tried to run on GPU's with small amounts of GDDR (1GB mostly, but possibly a few [rare] 1.5GB cards [GT640's, 660 OEM's, 670M/670MX, the 192-bit GTX760 or some of the even rarer 400/500 series cards], or people trying to run 2 tasks on a 2GB card simultaneously). Most cards have 2GB or more GDDR and most of the 1GB cards failed immediately when the tasks required more than 1GB GDDR. The 1GB cards that did complete tasks probably finished tasks that didn't require >1GB GDDR, otherwise they would have been heavily restricted as you suggest and experienced even greater PCIE bus usage which was already higher with this batch. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 47,738 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I expect the problem was predominantly the varying number of atoms - the more atoms the longer the runtime. You would have needed to factor the atom count variable into the credit model for it to work perfectly. As any subsequent runs will likely have fixed atom counts (but varying per batch) I expect they can be calibrated as normal. If further primer runs are needed it would be good to factor the atom count into the credits. More atoms also mean a higher GPU usage. I am currently crunching a WU with 107,436 atoms. My GPU usage is 83%, compared to the low atom WUs in this batch of 71%. Which is on a windows 10 computer with WDDM lag. My current GPU memory usage is 1692 MB. The GERARD_FXCXCL WU, by comparison, that I am running concurrently on this machine on the other card, are 80% GPU usage and 514 MB GPU memory usage with 31,718 atoms. The power usage is the same 75% for each WU, each running on 980Ti card. |
![]() Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
717000 credits (with 25% bonus) was the highest I received. Would have been 860700 if returned inside 24h, but would require a bigger card. The OPMs were hopeless on all but the fastest cards. Even the Gerards lately seem to be sized to cut out the large base of super-clocked 750 Ti cards at least on the dominant WDDM based machines (the 750 Tis are still some of the most efficient GPUs that NV has ever produced). In the meantime file sizes have increased and much time is used just in the upload process. I wonder just how important it is to keep the bonus deadlines so tight considering the larger file sizes and and the fact that the admins don't even seem to be able to follow up on the WUs we're crunching by keeping new ones in the queues. It wasn't long ago that the WU times doubled, not sure why. Seems a few are gaining a bit of speed by running XP. Is that safe, considering the lack of support from MS? I've also been wanting to try running a Linux image (perhaps even from USB), but the image here hasn't been updated in years. Even sent one of the users a new GPU so he could work on a new Linux image for GPUGrid but nothing ever came of it. Any of the Linux experts up to this job? |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Most of the issues are due to lack of personnel at GPUGrid. The research is mostly performed by the research students and several have just finished. If you only use XP to crunch then you are limiting the risk. Anti virus packages and firewalls still work on XP. Ubuntu 16.04 has been released recently. I'm looking to try it soon and see if there is a simple way to get it up and running for here; repository drivers + Boinc from the repository. If I can I will write it up. Alas, with every version so many commands change and new problems pop up that it's always a learning process. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
![]() Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Most of the issues are due to lack of personnel at GPUGrid. The research is mostly performed by the research students and several have just finished. Thanks SK. Hope that you can get the Linux info updated. It would be much appreciated. I'm leery about XP at this point. Please keep us posted. I've been doing a little research into the 1 and 2 day bonus deadlines mostly by looking at a lot of different hosts. It's interesting. By moving WUs just past the 1 day deadline for a large number of GPUs, the work return may actually be getting slower. The users with the very fast GPUs generally cache as many as allowed and return times end up being close to 1 day anyway. On the other hand for instance most of my GPUs are the factory OCed 750 Ti (very popular on this project). When they were making the 1 day deadline, I set them as the only NV project and at 0 project priority. The new WU would be fetched when the old WU was returned. Zero lag. Now since I can't quite make the 1 day cutoff anyway, I set the queue for 1/2 day. Thus the turn around time is much slower (but still well inside the 2 day limit) and I actually get significantly more credit (especially when WUs are scarce). This too tight turnaround strategy by the project can actually be harmful to their overall throughput. |
Send message Joined: 19 Feb 16 Posts: 19 Credit: 140,656,383 RAC: 0 Level ![]() Scientific publications ![]() |
Some of the new Geralds are definitely a bit long as well, they seem to run about 12.240% per hour, which wouldn't be very much except that's with a overclocked 980ti, nearly the best possible scenario until pascal later this month. Like others have said, it doesn't effect me, but it would be a long time with a slower card. |
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This one took 10 Days 8 hours to get to me https://www.gpugrid.net/workunit.php?wuid=11595052 This work and all other work could be done much more quickly and efficiently if the project addressed this problem. I imagine it would also increase the amount of work GPUGrid could accomplish and scientists might have higher confidence in the results. TO ADD One of Gerards took 3 and a 1/2 days to get to my slowest machine https://www.gpugrid.net/workunit.php?wuid=11600399 |
![]() Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This one took 10 Days 8 hours to get to me https://www.gpugrid.net/workunit.php?wuid=11595052 Interesting that most of the failures were from fast GPUs, even 3x 980Ti and a Titan among others. Are people OCing to much? In the "research" I mentioned above I've noticed MANY 980Ti, Titan and Titan X cards throwing constant failures. Surprised me to say the least. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have some similar experiences: e5s22_e1s14p0f264-GERARD_FXCXCL12R_2189739_2-0-1-RND1099 5 days: 1. Jonny's desktop with i7-3930K and two GTX 780s it has 4 successive timeouts e5s7_e3s79p0f564-GERARD_FXCXCL12R_1406742_1-0-1-RND7782 5 days: 1. Jozef J's desktop with i7-5960X and GTX 980 Ti it has a lot of errors 2. i-kami's desktop with i7-3770K and GTX 650 it has 1 timeout and the other GERARD WU took 2 days 2kytR9-SDOERR_opm996-0-1-RND3899 10 days and 6 hours: 1. Remix's laptop with a GeForce 610M it has only 1 task which has timed out (probably the user realized that this GPU is insufficient) 2. John C MacAlister's desktop with AMD FX-8350 and GTX 660 Ti it has errors & user aborts 3. Alexander Knerlein's laptop with GTX780M it has only 1 task which has timed out (probably the user realized that this GPU is insufficient) 1hh4R8-SDOERR_opm996-0-1-RND5553 10 days and 2 hours: 1. mcilfone's brand new i7-6700K with a very hot GTX 980 Ti it has errors, timeouts and some successful tasks 2. MintberryCrunch's desktop with Core2 Quad 8300 and GTX 560 Ti (1024MB) it has a timeout and a successful task |
©2025 Universitat Pompeu Fabra