Old Noelia WUs

Message boards : News : Old Noelia WUs
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · Next

AuthorMessage
GoodFodder

Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31584 - Posted: 17 Jul 2013, 12:35:24 UTC - in response to Message 31583.  

If you have a look at the acellera website which I understand was the commercial spin off of GpuGrid they state otherwise:

"ACEMD uses a mixed-precision scheme, in which different parts of the MD computations are performed with different levels of precision depending on their numerical requirements. Double precision is used where required for numerical stability.
Because many GPUs have much higher performance for single-precision floating-point arithmetic than double precision, this mixed-precision scheme allows ACEMD to fully exploit the GPU without sacrificing the quality of simulation.

The validation tests for ACEMD demonstrate that its simulations converge to results comparable to that of an exclusively double-precision code such as NAMD. Other codes use a similar scheme. In their work describing the design of special purpose MD hardware, DE Shaw et al detail how numerical precision can be safely varied throughout the MD computations [link]."
ID: 31584 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31585 - Posted: 17 Jul 2013, 12:51:46 UTC - in response to Message 31584.  
Last modified: 17 Jul 2013, 12:54:03 UTC

If you have a look at the acellera website which I understand was the commercial spin off of GpuGrid they state otherwise:

"ACEMD uses a mixed-precision scheme, in which different parts of the MD computations are performed with different levels of precision depending on their numerical requirements. Double precision is used where required for numerical stability.

That makes sense. I have noticed that of the three Noelia failures on my GTX 660s that have been successfully completed by others, the ones that completed successfully were all on higher-level cards (GTX 670, GTX 680 and a GTX 690). Those very likely have higher-performance floating point units than the GTX 660.

I think the light bulb is beginning to go on.
ID: 31585 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dskagcommunity
Avatar

Send message
Joined: 28 Apr 11
Posts: 463
Credit: 958,266,958
RAC: 31
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31586 - Posted: 17 Jul 2013, 13:14:37 UTC

Huh ok thx that must be very new info its not long ago that there was no use of DP here O.o
DSKAG Austria Research Team: http://www.research.dskag.at



ID: 31586 · Rating: 0 · rate: Rate + / Rate - Report as offensive
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31588 - Posted: 17 Jul 2013, 14:20:46 UTC

Just taking a wild guess, and saying the DP, if any, could be sent to the CPU.
ID: 31588 · Rating: 0 · rate: Rate + / Rate - Report as offensive
GoodFodder

Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31592 - Posted: 17 Jul 2013, 16:23:57 UTC

If DP is a bottleneck for Noelias then I would have expected the Fermi base cards to equal or outperform their Kepler equivalents. I have noticed a 650ti is typically equal in performance to a 560ti with non-Noelias, but not sure about Noelias.
I guess it is possible ACEMD executes DP on the CPU 'core' though I would have thought the GPU would still be faster even if it has been castrated in Kepler.
Still I am not convinced DP is at fault here as we know the Noelias are working on very large molecules - logically increasing the likelihood of execution stalls and hence why I suspect cache size maybe so important.
ID: 31592 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31594 - Posted: 17 Jul 2013, 16:48:43 UTC - in response to Message 31592.  

Still I am not convinced DP is at fault here as we know the Noelias are working on very large molecules - logically increasing the likelihood of execution stalls and hence why I suspect cache size maybe so important.

We users need a "super-size" category to know what to put on the project. Otherwise, the stalls will cause bonus point (and more importantly the work itself) to go out the window.
ID: 31594 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31595 - Posted: 17 Jul 2013, 17:38:17 UTC - in response to Message 31594.  

As far as I know, GPUGrid has not and does not use fp64 (double precision). What the ACEMD application can do is a different question. For research groups that need cuda and double precision they could use ACEMD along with GK110 Titans or 780's (at least in theory), as they have superior double precision over the GK10x cards.

Was running a NOELIA_1MB for >14h on a GTX660Ti, but it was only at 22%. Saw that the GPU temperature was too low. Looked in GPUZ and found that the memory controller load was once again at 1%.
Suspended the WU and forced it to run on the other GPU. After ~1min I got a driver restart and after that there was no progress.
Suspended and resumed again and while the WU is progressing, GPUZ still says 1% MCU. The time remaining keeps rising, and going by the progress increase (about 1/5th of a normal WU), it was likely to take another 2.3 days to complete.
I aborted the WU because this is abnormal behavior.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 31595 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Carlesa25
Avatar

Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31596 - Posted: 17 Jul 2013, 20:27:52 UTC - in response to Message 31595.  

Hello over an hour ago I started a new job in Ubuntu 13.04 NATHAN long and is working perfectly in the GTX770 and 89% of CPU.

As repeatedly said it is clear that NOELIAs (beyond the Nvidia driver etc ...) have a problem that requires adequate analysis and if interested in maintaining a stable job, lost time and interest in the project.
ID: 31596 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31597 - Posted: 17 Jul 2013, 21:12:54 UTC - in response to Message 31596.  

As repeatedly said it is clear that NOELIAs (beyond the Nvidia driver etc ...) have a problem that requires adequate analysis and if interested in maintaining a stable job, lost time and interest in the project.

Quite so. It is not clear whether this is a random problem with this batch of work, or whether a lesson has been learned that will prevent it from happening again. In fact, it is not even clear whether GPUGrid considers it to be a problem at all, or merely an acceptable cost of doing business. But for those of us who do not want to baby-sit our rigs, it would be of interest to know the answers.
ID: 31597 · Rating: 0 · rate: Rate + / Rate - Report as offensive
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31598 - Posted: 17 Jul 2013, 21:15:25 UTC - in response to Message 31595.  

SK wrote:
As far as I know, GPUGrid has not and does not use fp64 (double precision). What the ACEMD application can do is a different question.

I agree. And as long as GT240 and simiular cards can still run the code (although too slow nowadays) we can be sure that not a single DP instruction is needed from the hardware (those chips don't have any such units).

Jim1348 wrote:
Those (GTX 670, GTX 680 and a GTX 690) very likely have higher-performance floating point units than the GTX 660.

No, they don't. They're using the exact same SMX's as building blocks, down to the smallest Kepler. What differs iis just the amount and clock speed of these units. The exception is GK110 (Titan and GTX780), which did indeed get more DP units (but again of the same type).

MrS
Scanning for our furry friends since Jan 2002
ID: 31598 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31599 - Posted: 17 Jul 2013, 21:53:31 UTC - in response to Message 31598.  
Last modified: 17 Jul 2013, 21:54:10 UTC

And as long as GT240 and simiular cards can still run the code (although too slow nowadays) we can be sure that not a single DP instruction is needed from the hardware (those chips don't have any such units).

We haven't checked the latest batch; the Keplers can't run them, so there is no guarantee about the GT 240. But the cause of the difference is not so important as the fact that it is there, in beta-test form, or maybe even alpha.
ID: 31599 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31600 - Posted: 17 Jul 2013, 22:24:40 UTC - in response to Message 31599.  

My guess is that Noelia went back to a previous app, for scientific/testing/reassessment reasons.

If at first you don't succeed...

GL
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 31600 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31601 - Posted: 17 Jul 2013, 22:28:41 UTC

Nice to see the interaction between project managers and the contributors that make that project possible.

Oh, sorry, there hasn't been any!
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 31601 · Rating: 0 · rate: Rate + / Rate - Report as offensive
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31607 - Posted: 17 Jul 2013, 23:36:21 UTC - in response to Message 31599.  

And as long as GT240 and simiular cards can still run the code (although too slow nowadays) we can be sure that not a single DP instruction is needed from the hardware (those chips don't have any such units).

We haven't checked the latest batch; the Keplers can't run them, so there is no guarantee about the GT 240. But the cause of the difference is not so important as the fact that it is there, in beta-test form, or maybe even alpha.


My 680s ran all the Noelia's, save several that failed within 10-30s. However, I've checked the ones that failed I've successfully completed, just different tasks, but same WU type. As there were multiple Noelia WUs out and about.
ID: 31607 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31609 - Posted: 18 Jul 2013, 0:04:50 UTC - in response to Message 31607.  

My 680s ran all the Noelia's, save several that failed within 10-30s. However, I've checked the ones that failed I've successfully completed, just different tasks, but same WU type. As there were multiple Noelia WUs out and about.

I just completed a 2-NOELIA_2HRUN (a new type for me) in 25 hours 31 minutes. There is nothing necessarily wrong with that; you don't get so many bonus points, but so what. I just think they need to alert the users about the different requirements, so that you can base your GPU purchasing decisions accordingly. It seems to be a new era; maybe when the dust settles, they will offer some guidance. Otherwise, it is just hit-or-miss as to what cards will work on what work units.
ID: 31609 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31610 - Posted: 18 Jul 2013, 0:07:47 UTC - in response to Message 31609.  

Seems to be a matter of pure luck :-) I just got an Santi SR error after 5000 seconds on a GTX660. So its not only Noelia.
Greetings from TJ
ID: 31610 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31614 - Posted: 18 Jul 2013, 8:42:46 UTC - in response to Message 31601.  
Last modified: 18 Jul 2013, 8:44:37 UTC

Nice to see the interaction between project managers and the contributors that make that project possible.

Oh, sorry, there hasn't been any!

Communications are limited to say the least, but at least 2 of the researchers are on leave and that means the others have to keep the project running, which is a challenge in itself.

I just think they need to alert the users about the different requirements, so that you can base your GPU purchasing decisions accordingly. It seems to be a new era; maybe when the dust settles, they will offer some guidance. Otherwise, it is just hit-or-miss as to what cards will work on what work units.

It's always been the case that the more expensive cards are faster and usually more reliable, however last time I looked they were not the best bang for buck. I don't know how the 670's and above are performing relative to the more mid-range cards such as the 650Ti and 660, but I'm seeing similar issues on my 660 and my 660Ti (mostly on Windows). On my Linux rig (650TiBoost) I've had no issues (using 304.88 drivers) but I have not run enough NOELIA WU's to say for sure... The one thing we do know is that 1GB GPU's struggle with WU's that require over 1GB GDDR. That's been noted and is stipulated in the Recommended GPU list, which does get updated when new GPU's arrive, and when we learn the hard way about task requirements.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 31614 · Rating: 0 · rate: Rate + / Rate - Report as offensive
GoodFodder

Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31616 - Posted: 18 Jul 2013, 9:26:37 UTC

re: NOELIA_2HRUN some positive news:

GTX 650ti (stable clock 1084, 1525) - using 714MB GPU mem (of 1024MB):
http://www.gpugrid.net/result.php?resultid=7053079

108,240.23 secs - which is about 25% longer than a 660 (appears to be shader bound).

For comparison Jim1348 mentioned his GTX 660 was using 1406 MB GPU mem (of 2048MB).
http://www.gpugrid.net/result.php?resultid=7053163
86,491.77 secs

Thus GPU memory allocation appears to be working and the performance is in line (unlike the *MG).
- Be interesting to see what a 560ti runs them in.

Skgiven: I think we all appreciate the great work you moderators are doing however I can't help think the sentiments on this forum would be alot more positive if the researchers were a little more proactive in their communication.
All it would take is a small announcement in a dedicated thread when a new type of WU comes on line - a very brief summary in layman's terms of what the WU is related to would be nice so that volunteers feel apart of the project rather than just being 'used'; together with expected running time for a particular benchmark card or two - we are not all running this project for credits!
ID: 31616 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31623 - Posted: 18 Jul 2013, 11:03:08 UTC - in response to Message 31614.  

It's always been the case that the more expensive cards are faster and usually more reliable, however last time I looked they were not the best bang for buck. I don't know how the 670's and above are performing relative to the more mid-range cards such as the 650Ti and 660, but I'm seeing similar issues on my 660 and my 660Ti (mostly on Windows). On my Linux rig (650TiBoost) I've had no issues (using 304.88 drivers) but I have not run enough NOELIA WU's to say for sure... The one thing we do know is that 1GB GPU's struggle with WU's that require over 1GB GDDR. That's been noted and is stipulated in the Recommended GPU list, which does get updated when new GPU's arrive, and when we learn the hard way about task requirements.

My conclusion is that it now takes at least a GTX 670 on the Longs in order to avoid the great slowdown we see. Even the 660s with 2 GB memory are not enough; I have learned that the hard way, and am going to put mine on Shorts. There is no point spending 20 hours or more grinding away when they could be doing more productive work. Even the higher-level cards may still have problems, but I think those will be worked out over time.

It is all in the way of scientific progress, which is fine with me, but they could have mentioned it to us.
ID: 31623 · Rating: 0 · rate: Rate + / Rate - Report as offensive
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31625 - Posted: 18 Jul 2013, 11:55:07 UTC - in response to Message 31623.  
Last modified: 18 Jul 2013, 11:55:56 UTC

Hi, Jim:

I agree with your conclusion. With my 650 Tis I will only process shorts from now on. I want to share these with Alzheimer's processing at Folding and this combination works for me.

Regards,

John


It's always been the case that the more expensive cards are faster and usually more reliable, however last time I looked they were not the best bang for buck. I don't know how the 670's and above are performing relative to the more mid-range cards such as the 650Ti and 660, but I'm seeing similar issues on my 660 and my 660Ti (mostly on Windows). On my Linux rig (650TiBoost) I've had no issues (using 304.88 drivers) but I have not run enough NOELIA WU's to say for sure... The one thing we do know is that 1GB GPU's struggle with WU's that require over 1GB GDDR. That's been noted and is stipulated in the Recommended GPU list, which does get updated when new GPU's arrive, and when we learn the hard way about task requirements.

My conclusion is that it now takes at least a GTX 670 on the Longs in order to avoid the great slowdown we see. Even the 660s with 2 GB memory are not enough; I have learned that the hard way, and am going to put mine on Shorts. There is no point spending 20 hours or more grinding away when they could be doing more productive work. Even the higher-level cards may still have problems, but I think those will be worked out over time.

It is all in the way of scientific progress, which is fine with me, but they could have mentioned it to us.
ID: 31625 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · 17 · Next

Message boards : News : Old Noelia WUs

©2025 Universitat Pompeu Fabra