Old Noelia WUs

Author	Message
GoodFodder Send message Joined: 4 Oct 12 Posts: 53 Credit: 333,467,496 RAC: 0 Level Scientific publications	Message 31584 - Posted: 17 Jul 2013, 12:35:24 UTC - in response to Message 31583. If you have a look at the acellera website which I understand was the commercial spin off of GpuGrid they state otherwise: "ACEMD uses a mixed-precision scheme, in which different parts of the MD computations are performed with different levels of precision depending on their numerical requirements. Double precision is used where required for numerical stability. Because many GPUs have much higher performance for single-precision floating-point arithmetic than double precision, this mixed-precision scheme allows ACEMD to fully exploit the GPU without sacrificing the quality of simulation. The validation tests for ACEMD demonstrate that its simulations converge to results comparable to that of an exclusively double-precision code such as NAMD. Other codes use a similar scheme. In their work describing the design of special purpose MD hardware, DE Shaw et al detail how numerical precision can be safely varied throughout the MD computations [link]." ID: 31584 · Rating: 0 · rate: /

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 31585 - Posted: 17 Jul 2013, 12:51:46 UTC - in response to Message 31584. Last modified: 17 Jul 2013, 12:54:03 UTC If you have a look at the acellera website which I understand was the commercial spin off of GpuGrid they state otherwise: "ACEMD uses a mixed-precision scheme, in which different parts of the MD computations are performed with different levels of precision depending on their numerical requirements. Double precision is used where required for numerical stability. That makes sense. I have noticed that of the three Noelia failures on my GTX 660s that have been successfully completed by others, the ones that completed successfully were all on higher-level cards (GTX 670, GTX 680 and a GTX 690). Those very likely have higher-performance floating point units than the GTX 660. I think the light bulb is beginning to go on. ID: 31585 · Rating: 0 · rate: /

dskagcommunity Send message Joined: 28 Apr 11 Posts: 463 Credit: 958,266,958 RAC: 31 Level Scientific publications	Message 31586 - Posted: 17 Jul 2013, 13:14:37 UTC Huh ok thx that must be very new info its not long ago that there was no use of DP here O.o DSKAG Austria Research Team: http://www.research.dskag.at ID: 31586 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 31588 - Posted: 17 Jul 2013, 14:20:46 UTC Just taking a wild guess, and saying the DP, if any, could be sent to the CPU. ID: 31588 · Rating: 0 · rate: /

GoodFodder Send message Joined: 4 Oct 12 Posts: 53 Credit: 333,467,496 RAC: 0 Level Scientific publications	Message 31592 - Posted: 17 Jul 2013, 16:23:57 UTC If DP is a bottleneck for Noelias then I would have expected the Fermi base cards to equal or outperform their Kepler equivalents. I have noticed a 650ti is typically equal in performance to a 560ti with non-Noelias, but not sure about Noelias. I guess it is possible ACEMD executes DP on the CPU 'core' though I would have thought the GPU would still be faster even if it has been castrated in Kepler. Still I am not convinced DP is at fault here as we know the Noelias are working on very large molecules - logically increasing the likelihood of execution stalls and hence why I suspect cache size maybe so important. ID: 31592 · Rating: 0 · rate: /

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 31594 - Posted: 17 Jul 2013, 16:48:43 UTC - in response to Message 31592. Still I am not convinced DP is at fault here as we know the Noelias are working on very large molecules - logically increasing the likelihood of execution stalls and hence why I suspect cache size maybe so important. We users need a "super-size" category to know what to put on the project. Otherwise, the stalls will cause bonus point (and more importantly the work itself) to go out the window. ID: 31594 · Rating: 0 · rate: /

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 31595 - Posted: 17 Jul 2013, 17:38:17 UTC - in response to Message 31594. As far as I know, GPUGrid has not and does not use fp64 (double precision). What the ACEMD application can do is a different question. For research groups that need cuda and double precision they could use ACEMD along with GK110 Titans or 780's (at least in theory), as they have superior double precision over the GK10x cards. Was running a NOELIA_1MB for >14h on a GTX660Ti, but it was only at 22%. Saw that the GPU temperature was too low. Looked in GPUZ and found that the memory controller load was once again at 1%. Suspended the WU and forced it to run on the other GPU. After ~1min I got a driver restart and after that there was no progress. Suspended and resumed again and while the WU is progressing, GPUZ still says 1% MCU. The time remaining keeps rising, and going by the progress increase (about 1/5th of a normal WU), it was likely to take another 2.3 days to complete. I aborted the WU because this is abnormal behavior. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 31595 · Rating: 0 · rate: /

Carlesa25 Send message Joined: 13 Nov 10 Posts: 328 Credit: 72,619,453 RAC: 0 Level Scientific publications	Message 31596 - Posted: 17 Jul 2013, 20:27:52 UTC - in response to Message 31595. Hello over an hour ago I started a new job in Ubuntu 13.04 NATHAN long and is working perfectly in the GTX770 and 89% of CPU. As repeatedly said it is clear that NOELIAs (beyond the Nvidia driver etc ...) have a problem that requires adequate analysis and if interested in maintaining a stable job, lost time and interest in the project. ID: 31596 · Rating: 0 · rate: /

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 31597 - Posted: 17 Jul 2013, 21:12:54 UTC - in response to Message 31596. As repeatedly said it is clear that NOELIAs (beyond the Nvidia driver etc ...) have a problem that requires adequate analysis and if interested in maintaining a stable job, lost time and interest in the project. Quite so. It is not clear whether this is a random problem with this batch of work, or whether a lesson has been learned that will prevent it from happening again. In fact, it is not even clear whether GPUGrid considers it to be a problem at all, or merely an acceptable cost of doing business. But for those of us who do not want to baby-sit our rigs, it would be of interest to know the answers. ID: 31597 · Rating: 0 · rate: /

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 31598 - Posted: 17 Jul 2013, 21:15:25 UTC - in response to Message 31595. SK wrote: As far as I know, GPUGrid has not and does not use fp64 (double precision). What the ACEMD application can do is a different question. I agree. And as long as GT240 and simiular cards can still run the code (although too slow nowadays) we can be sure that not a single DP instruction is needed from the hardware (those chips don't have any such units). Jim1348 wrote: Those (GTX 670, GTX 680 and a GTX 690) very likely have higher-performance floating point units than the GTX 660. No, they don't. They're using the exact same SMX's as building blocks, down to the smallest Kepler. What differs iis just the amount and clock speed of these units. The exception is GK110 (Titan and GTX780), which did indeed get more DP units (but again of the same type). MrS Scanning for our furry friends since Jan 2002 ID: 31598 · Rating: 0 · rate: /

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 31599 - Posted: 17 Jul 2013, 21:53:31 UTC - in response to Message 31598. Last modified: 17 Jul 2013, 21:54:10 UTC And as long as GT240 and simiular cards can still run the code (although too slow nowadays) we can be sure that not a single DP instruction is needed from the hardware (those chips don't have any such units). We haven't checked the latest batch; the Keplers can't run them, so there is no guarantee about the GT 240. But the cause of the difference is not so important as the fact that it is there, in beta-test form, or maybe even alpha. ID: 31599 · Rating: 0 · rate: /

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 31600 - Posted: 17 Jul 2013, 22:24:40 UTC - in response to Message 31599. My guess is that Noelia went back to a previous app, for scientific/testing/reassessment reasons. If at first you don't succeed... GL FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 31600 · Rating: 0 · rate: /

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 31601 - Posted: 17 Jul 2013, 22:28:41 UTC Nice to see the interaction between project managers and the contributors that make that project possible. Oh, sorry, there hasn't been any! Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline ID: 31601 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 31607 - Posted: 17 Jul 2013, 23:36:21 UTC - in response to Message 31599. And as long as GT240 and simiular cards can still run the code (although too slow nowadays) we can be sure that not a single DP instruction is needed from the hardware (those chips don't have any such units). We haven't checked the latest batch; the Keplers can't run them, so there is no guarantee about the GT 240. But the cause of the difference is not so important as the fact that it is there, in beta-test form, or maybe even alpha. My 680s ran all the Noelia's, save several that failed within 10-30s. However, I've checked the ones that failed I've successfully completed, just different tasks, but same WU type. As there were multiple Noelia WUs out and about. ID: 31607 · Rating: 0 · rate: /

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 31609 - Posted: 18 Jul 2013, 0:04:50 UTC - in response to Message 31607. My 680s ran all the Noelia's, save several that failed within 10-30s. However, I've checked the ones that failed I've successfully completed, just different tasks, but same WU type. As there were multiple Noelia WUs out and about. I just completed a 2-NOELIA_2HRUN (a new type for me) in 25 hours 31 minutes. There is nothing necessarily wrong with that; you don't get so many bonus points, but so what. I just think they need to alert the users about the different requirements, so that you can base your GPU purchasing decisions accordingly. It seems to be a new era; maybe when the dust settles, they will offer some guidance. Otherwise, it is just hit-or-miss as to what cards will work on what work units. ID: 31609 · Rating: 0 · rate: /

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 31610 - Posted: 18 Jul 2013, 0:07:47 UTC - in response to Message 31609. Seems to be a matter of pure luck :-) I just got an Santi SR error after 5000 seconds on a GTX660. So its not only Noelia. Greetings from TJ ID: 31610 · Rating: 0 · rate: /

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 31614 - Posted: 18 Jul 2013, 8:42:46 UTC - in response to Message 31601. Last modified: 18 Jul 2013, 8:44:37 UTC Nice to see the interaction between project managers and the contributors that make that project possible. Oh, sorry, there hasn't been any! Communications are limited to say the least, but at least 2 of the researchers are on leave and that means the others have to keep the project running, which is a challenge in itself. I just think they need to alert the users about the different requirements, so that you can base your GPU purchasing decisions accordingly. It seems to be a new era; maybe when the dust settles, they will offer some guidance. Otherwise, it is just hit-or-miss as to what cards will work on what work units. It's always been the case that the more expensive cards are faster and usually more reliable, however last time I looked they were not the best bang for buck. I don't know how the 670's and above are performing relative to the more mid-range cards such as the 650Ti and 660, but I'm seeing similar issues on my 660 and my 660Ti (mostly on Windows). On my Linux rig (650TiBoost) I've had no issues (using 304.88 drivers) but I have not run enough NOELIA WU's to say for sure... The one thing we do know is that 1GB GPU's struggle with WU's that require over 1GB GDDR. That's been noted and is stipulated in the Recommended GPU list, which does get updated when new GPU's arrive, and when we learn the hard way about task requirements. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 31614 · Rating: 0 · rate: /

GoodFodder Send message Joined: 4 Oct 12 Posts: 53 Credit: 333,467,496 RAC: 0 Level Scientific publications	Message 31616 - Posted: 18 Jul 2013, 9:26:37 UTC re: NOELIA_2HRUN some positive news: GTX 650ti (stable clock 1084, 1525) - using 714MB GPU mem (of 1024MB): http://www.gpugrid.net/result.php?resultid=7053079 108,240.23 secs - which is about 25% longer than a 660 (appears to be shader bound). For comparison Jim1348 mentioned his GTX 660 was using 1406 MB GPU mem (of 2048MB). http://www.gpugrid.net/result.php?resultid=7053163 86,491.77 secs Thus GPU memory allocation appears to be working and the performance is in line (unlike the *MG). - Be interesting to see what a 560ti runs them in. Skgiven: I think we all appreciate the great work you moderators are doing however I can't help think the sentiments on this forum would be alot more positive if the researchers were a little more proactive in their communication. All it would take is a small announcement in a dedicated thread when a new type of WU comes on line - a very brief summary in layman's terms of what the WU is related to would be nice so that volunteers feel apart of the project rather than just being 'used'; together with expected running time for a particular benchmark card or two - we are not all running this project for credits! ID: 31616 · Rating: 0 · rate: /

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 31623 - Posted: 18 Jul 2013, 11:03:08 UTC - in response to Message 31614. It's always been the case that the more expensive cards are faster and usually more reliable, however last time I looked they were not the best bang for buck. I don't know how the 670's and above are performing relative to the more mid-range cards such as the 650Ti and 660, but I'm seeing similar issues on my 660 and my 660Ti (mostly on Windows). On my Linux rig (650TiBoost) I've had no issues (using 304.88 drivers) but I have not run enough NOELIA WU's to say for sure... The one thing we do know is that 1GB GPU's struggle with WU's that require over 1GB GDDR. That's been noted and is stipulated in the Recommended GPU list, which does get updated when new GPU's arrive, and when we learn the hard way about task requirements. My conclusion is that it now takes at least a GTX 670 on the Longs in order to avoid the great slowdown we see. Even the 660s with 2 GB memory are not enough; I have learned that the hard way, and am going to put mine on Shorts. There is no point spending 20 hours or more grinding away when they could be doing more productive work. Even the higher-level cards may still have problems, but I think those will be worked out over time. It is all in the way of scientific progress, which is fine with me, but they could have mentioned it to us. ID: 31623 · Rating: 0 · rate: /

John C MacAlister Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level Scientific publications	Message 31625 - Posted: 18 Jul 2013, 11:55:07 UTC - in response to Message 31623. Last modified: 18 Jul 2013, 11:55:56 UTC Hi, Jim: I agree with your conclusion. With my 650 Tis I will only process shorts from now on. I want to share these with Alzheimer's processing at Folding and this combination works for me. Regards, John It's always been the case that the more expensive cards are faster and usually more reliable, however last time I looked they were not the best bang for buck. I don't know how the 670's and above are performing relative to the more mid-range cards such as the 650Ti and 660, but I'm seeing similar issues on my 660 and my 660Ti (mostly on Windows). On my Linux rig (650TiBoost) I've had no issues (using 304.88 drivers) but I have not run enough NOELIA WU's to say for sure... The one thing we do know is that 1GB GPU's struggle with WU's that require over 1GB GDDR. That's been noted and is stipulated in the Recommended GPU list, which does get updated when new GPU's arrive, and when we learn the hard way about task requirements. My conclusion is that it now takes at least a GTX 670 on the Longs in order to avoid the great slowdown we see. Even the 660s with 2 GB memory are not enough; I have learned that the hard way, and am going to put mine on Shorts. There is no point spending 20 hours or more grinding away when they could be doing more productive work. Even the higher-level cards may still have problems, but I think those will be worked out over time. It is all in the way of scientific progress, which is fine with me, but they could have mentioned it to us. ID: 31625 · Rating: 0 · rate: /