Message boards :
Graphics cards (GPUs) :
GPUGRID and Fermi
Message board moderation
Previous · 1 . . . 10 · 11 · 12 · 13
| Author | Message |
|---|---|
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The developer version of CUDA 3.2 is presently in Beta and the latest Beta driver supports CUDA 3.2. Actually it's more than a beta, it's a Release Candidate. ...The new Asus ENGTX470 is now in my i7-920 system, along with an older ENGTX470. ... (crunching 6 CPU tasks and 2 Fermi tasks). You should take in account that an i7-920 acually has 4 cores and Hyper Threading doubles them. But there is only one FP unit in each core, what is used by most projects. Therefore HT won't double the number of simultaneous FP tasks. That means if you run 4 tasks (using FP) on this i7-920, you'd get the same performace (tasks will finish in half the time, or will do twice as much calculations during the same time - it depends on the project). (correct me if you have different experience). Considering how GPUGRID works, the best performance you could achieve would be running 4 CPU tasks and 2 GPUGRID tasks. But I haven't had experiments with i7 and GPUGRID. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
That means if you run 4 tasks (using FP) on this i7-920, you'd get the same performace (tasks will finish in half the time, or will do twice as much calculations during the same time - it depends on the project). (correct me if you have different experience). To achieve this you'd need either - 100% regular code: no mispredicted branches, no data which must be fetched from caches or memory - an infinitely large L1 cache as main memory with an access latency of 1 cycle Sadly neither applies to the real world, the CPU often has to wait for something. That's why higher memory speeds, larger and faster caches etc. all help performance. HT can be used to keep a core busy if one thread runs into such a situation, even if both threads would be purely using FP. Personally I have seen very good BOINC performance of i5 / i7 CPUs with HT on. MrS Scanning for our furry friends since Jan 2002 |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Sadly neither applies to the real world, the CPU often has to wait for something. That's why higher memory speeds, larger and faster caches etc. all help performance. HT can be used to keep a core busy if one thread runs into such a situation, even if both threads would be purely using FP. You right about that, that's why I wrote "it depends on the project" - the more FP utilization the less gain with HT turned on. Crunching tasks are using far more FP than "real wolrd" applications, more likely as a benchmark does. Also, an FP operation takes much longer than an integer or some data moving op, using different part of the execution unit, so it can run ahead with code execution and prefetch, while the FP operation lasts. When i7 was released, I was experimentig with rosetta@home (the other project I crunch for), and I made three observations: 1. Core i7's FP is not any faster than the Core2duo's or Core2quad's at the same clock speed (while integer performance and overall system performance is significantly better, and i7 is available at higher stock speeds than C2D or C2Q) 2. HT doesn't double Rosetta's performance, nor makes significant improvement (I don't remember the exact numbers, but it was about under 10%) 3. Rosetta's RAC (and BOINC's CPU benchmark results) increase almost in direct ratio with C2Q's and i7's clock speed regardless of FSB and RAM speed or cache size or even the MB. (I had a Core2Quad 6600 for a long time, starting in a G965 chipset MB, later in a P45 chipset motherboard, and even later I got a C2Q 9550 in another P45 chipset MB. So I came to that the tasks are limited only by FP speed) I'm really curious how much performance gain you can have with HT turned on and in which project. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I haven't turned HT off yet (as there was no reason), but I encourage you to try it with Einstein CPU WUs. Also back in the SETI classic days the original Northwood with HT gained 50% throughput. Granted, the current app will probably more optimized and thus see less gain. but it shows the potential of HT for BOINC. And somewhat surpisingly: it didn't matter if I threw some ABC (pure integer) into the mix, the Einstein times did not really change. MrS Scanning for our furry friends since Jan 2002 |
Fred J. VersterSend message Joined: 1 Apr 09 Posts: 58 Credit: 35,833,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
When looking at the last Validated Results, I noticed that running the ACEMD2: GPU molecular dynamics v6.11 (cuda31), on the 200 series NVidia cards, (too) often errors out, while the FERMI's, in this case, a GTX480 & GTX470(on an ASUS P5E Mobo and QX9650 CPU), all running 'stock', do the job whithout errors. 200 vs 400 series Different CUDA version. Also . I do have 3 200 series cards and older: 8500GT, 8900GTX+ and GTS250, can I put them to use on GPUGrid and what app. should I use? Knight Who Says Ni N! |
©2026 Universitat Pompeu Fabra