GPUGRID and Fermi

Message boards : Graphics cards (GPUs) : GPUGRID and Fermi
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13

AuthorMessage
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18771 - Posted: 27 Sep 2010, 16:55:21 UTC - in response to Message 18722.  

The developer version of CUDA 3.2 is presently in Beta and the latest Beta driver supports CUDA 3.2.
Hopefully by the end of the month these will bring further improvements, particularly to GTX460’s.

Actually it's more than a beta, it's a Release Candidate.

...The new Asus ENGTX470 is now in my i7-920 system, along with an older ENGTX470. ... (crunching 6 CPU tasks and 2 Fermi tasks).

You should take in account that an i7-920 acually has 4 cores and Hyper Threading doubles them. But there is only one FP unit in each core, what is used by most projects. Therefore HT won't double the number of simultaneous FP tasks. That means if you run 4 tasks (using FP) on this i7-920, you'd get the same performace (tasks will finish in half the time, or will do twice as much calculations during the same time - it depends on the project). (correct me if you have different experience). Considering how GPUGRID works, the best performance you could achieve would be running 4 CPU tasks and 2 GPUGRID tasks. But I haven't had experiments with i7 and GPUGRID.
ID: 18771 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18772 - Posted: 27 Sep 2010, 19:54:21 UTC - in response to Message 18771.  

That means if you run 4 tasks (using FP) on this i7-920, you'd get the same performace (tasks will finish in half the time, or will do twice as much calculations during the same time - it depends on the project). (correct me if you have different experience).


To achieve this you'd need either
- 100% regular code: no mispredicted branches, no data which must be fetched from caches or memory
- an infinitely large L1 cache as main memory with an access latency of 1 cycle

Sadly neither applies to the real world, the CPU often has to wait for something. That's why higher memory speeds, larger and faster caches etc. all help performance. HT can be used to keep a core busy if one thread runs into such a situation, even if both threads would be purely using FP.

Personally I have seen very good BOINC performance of i5 / i7 CPUs with HT on.

MrS
Scanning for our furry friends since Jan 2002
ID: 18772 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18773 - Posted: 27 Sep 2010, 22:05:15 UTC - in response to Message 18772.  
Last modified: 27 Sep 2010, 22:25:23 UTC

Sadly neither applies to the real world, the CPU often has to wait for something. That's why higher memory speeds, larger and faster caches etc. all help performance. HT can be used to keep a core busy if one thread runs into such a situation, even if both threads would be purely using FP.

Personally I have seen very good BOINC performance of i5 / i7 CPUs with HT on.

MrS

You right about that, that's why I wrote "it depends on the project" - the more FP utilization the less gain with HT turned on. Crunching tasks are using far more FP than "real wolrd" applications, more likely as a benchmark does. Also, an FP operation takes much longer than an integer or some data moving op, using different part of the execution unit, so it can run ahead with code execution and prefetch, while the FP operation lasts.
When i7 was released, I was experimentig with rosetta@home (the other project I crunch for), and I made three observations:
1. Core i7's FP is not any faster than the Core2duo's or Core2quad's at the same clock speed (while integer performance and overall system performance is significantly better, and i7 is available at higher stock speeds than C2D or C2Q)
2. HT doesn't double Rosetta's performance, nor makes significant improvement (I don't remember the exact numbers, but it was about under 10%)
3. Rosetta's RAC (and BOINC's CPU benchmark results) increase almost in direct ratio with C2Q's and i7's clock speed regardless of FSB and RAM speed or cache size or even the MB. (I had a Core2Quad 6600 for a long time, starting in a G965 chipset MB, later in a P45 chipset motherboard, and even later I got a C2Q 9550 in another P45 chipset MB. So I came to that the tasks are limited only by FP speed)
I'm really curious how much performance gain you can have with HT turned on and in which project.
ID: 18773 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18780 - Posted: 29 Sep 2010, 8:20:58 UTC - in response to Message 18773.  

I haven't turned HT off yet (as there was no reason), but I encourage you to try it with Einstein CPU WUs. Also back in the SETI classic days the original Northwood with HT gained 50% throughput. Granted, the current app will probably more optimized and thus see less gain. but it shows the potential of HT for BOINC.
And somewhat surpisingly: it didn't matter if I threw some ABC (pure integer) into the mix, the Einstein times did not really change.

MrS
Scanning for our furry friends since Jan 2002
ID: 18780 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Fred J. Verster

Send message
Joined: 1 Apr 09
Posts: 58
Credit: 35,833,978
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18782 - Posted: 29 Sep 2010, 11:02:32 UTC - in response to Message 18780.  
Last modified: 29 Sep 2010, 11:03:23 UTC

When looking at the last Validated Results, I noticed that running the
ACEMD2: GPU molecular dynamics v6.11 (cuda31), on the 200
series NVidia cards, (too) often errors out, while the FERMI's, in this case,
a GTX480 & GTX470(on an ASUS P5E Mobo and QX9650 CPU), all running 'stock', do the job whithout errors.

200 vs 400 series

Different CUDA version.
Also .

I do have 3 200 series cards and older: 8500GT, 8900GTX+ and GTS250, can I put them to use on GPUGrid and what app. should I use?

Knight Who Says Ni N!
ID: 18782 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 10 · 11 · 12 · 13

Message boards : Graphics cards (GPUs) : GPUGRID and Fermi

©2026 Universitat Pompeu Fabra