GPUGRID and Fermi

Author	Message
Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 18771 - Posted: 27 Sep 2010, 16:55:21 UTC - in response to Message 18722. The developer version of CUDA 3.2 is presently in Beta and the latest Beta driver supports CUDA 3.2. Hopefully by the end of the month these will bring further improvements, particularly to GTX460’s. Actually it's more than a beta, it's a Release Candidate. ...The new Asus ENGTX470 is now in my i7-920 system, along with an older ENGTX470. ... (crunching 6 CPU tasks and 2 Fermi tasks). You should take in account that an i7-920 acually has 4 cores and Hyper Threading doubles them. But there is only one FP unit in each core, what is used by most projects. Therefore HT won't double the number of simultaneous FP tasks. That means if you run 4 tasks (using FP) on this i7-920, you'd get the same performace (tasks will finish in half the time, or will do twice as much calculations during the same time - it depends on the project). (correct me if you have different experience). Considering how GPUGRID works, the best performance you could achieve would be running 4 CPU tasks and 2 GPUGRID tasks. But I haven't had experiments with i7 and GPUGRID. ID: 18771 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 18772 - Posted: 27 Sep 2010, 19:54:21 UTC - in response to Message 18771. That means if you run 4 tasks (using FP) on this i7-920, you'd get the same performace (tasks will finish in half the time, or will do twice as much calculations during the same time - it depends on the project). (correct me if you have different experience). To achieve this you'd need either - 100% regular code: no mispredicted branches, no data which must be fetched from caches or memory - an infinitely large L1 cache as main memory with an access latency of 1 cycle Sadly neither applies to the real world, the CPU often has to wait for something. That's why higher memory speeds, larger and faster caches etc. all help performance. HT can be used to keep a core busy if one thread runs into such a situation, even if both threads would be purely using FP. Personally I have seen very good BOINC performance of i5 / i7 CPUs with HT on. MrS Scanning for our furry friends since Jan 2002 ID: 18772 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 18773 - Posted: 27 Sep 2010, 22:05:15 UTC - in response to Message 18772. Last modified: 27 Sep 2010, 22:25:23 UTC Sadly neither applies to the real world, the CPU often has to wait for something. That's why higher memory speeds, larger and faster caches etc. all help performance. HT can be used to keep a core busy if one thread runs into such a situation, even if both threads would be purely using FP. Personally I have seen very good BOINC performance of i5 / i7 CPUs with HT on. MrS You right about that, that's why I wrote "it depends on the project" - the more FP utilization the less gain with HT turned on. Crunching tasks are using far more FP than "real wolrd" applications, more likely as a benchmark does. Also, an FP operation takes much longer than an integer or some data moving op, using different part of the execution unit, so it can run ahead with code execution and prefetch, while the FP operation lasts. When i7 was released, I was experimentig with rosetta@home (the other project I crunch for), and I made three observations: 1. Core i7's FP is not any faster than the Core2duo's or Core2quad's at the same clock speed (while integer performance and overall system performance is significantly better, and i7 is available at higher stock speeds than C2D or C2Q) 2. HT doesn't double Rosetta's performance, nor makes significant improvement (I don't remember the exact numbers, but it was about under 10%) 3. Rosetta's RAC (and BOINC's CPU benchmark results) increase almost in direct ratio with C2Q's and i7's clock speed regardless of FSB and RAM speed or cache size or even the MB. (I had a Core2Quad 6600 for a long time, starting in a G965 chipset MB, later in a P45 chipset motherboard, and even later I got a C2Q 9550 in another P45 chipset MB. So I came to that the tasks are limited only by FP speed) I'm really curious how much performance gain you can have with HT turned on and in which project. ID: 18773 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 18780 - Posted: 29 Sep 2010, 8:20:58 UTC - in response to Message 18773. I haven't turned HT off yet (as there was no reason), but I encourage you to try it with Einstein CPU WUs. Also back in the SETI classic days the original Northwood with HT gained 50% throughput. Granted, the current app will probably more optimized and thus see less gain. but it shows the potential of HT for BOINC. And somewhat surpisingly: it didn't matter if I threw some ABC (pure integer) into the mix, the Einstein times did not really change. MrS Scanning for our furry friends since Jan 2002 ID: 18780 · Rating: 0 · rate: / Reply Quote

Fred J. Verster Send message Joined: 1 Apr 09 Posts: 58 Credit: 35,833,978 RAC: 0 Level Scientific publications	Message 18782 - Posted: 29 Sep 2010, 11:02:32 UTC - in response to Message 18780. Last modified: 29 Sep 2010, 11:03:23 UTC When looking at the last Validated Results, I noticed that running the ACEMD2: GPU molecular dynamics v6.11 (cuda31), on the 200 series NVidia cards, (too) often errors out, while the FERMI's, in this case, a GTX480 & GTX470(on an ASUS P5E Mobo and QX9650 CPU), all running 'stock', do the job whithout errors. 200 vs 400 series Different CUDA version. Also . I do have 3 200 series cards and older: 8500GT, 8900GTX+ and GTS250, can I put them to use on GPUGrid and what app. should I use? Knight Who Says Ni N! ID: 18782 · Rating: 0 · rate: / Reply Quote