Message boards :
Graphics cards (GPUs) :
Quadro 1700 Discrepancy
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 15 Jan 09 Posts: 7 Credit: 1,766,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi all, Ive got a perculiar issue here. Im running 4 x quadro 1700 card on 4 x almost identical systems. yet only 2 of my boxes regularily report gg wus on time, while the other two always fail to finish by the deadline. all 4 cards are identical in manufacture/make. cpus are all core2duos (some faster then others) and same amount of ram. im also using the same nvidia drivers (latest) and boinc 6.6.15. Any reason why this might be the case? Cheers |
|
Send message Joined: 21 Oct 08 Posts: 144 Credit: 2,973,555 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Everything looks identical except in the OS. All the machines with these cards are XP 32-bit, but there is one difference with the cards that aren't working. They list the OS as: Microsoft Windows XP Professional x86 Edition, Service Pack 3, v.3311, (05.01.2600.00) The v.3311 portion is not in the OS listing for machines with cards that are working. Indeed, these cards are doing the same work about 20 hours faster or more than the cards in machines with the v.3311 addition. This is the only thing that I can see as different and that might be causing some slowdown in the cards. What is the difference in the XP installs? |
Bender10Send message Joined: 3 Dec 07 Posts: 167 Credit: 8,368,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Microsoft Windows XP This looks like a new build or Release Candidate for XP SP3...maybe.. Did you just patch your XP boxes..? Consciousness: That annoying time between naps...... Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it. |
|
Send message Joined: 15 Jan 09 Posts: 7 Credit: 1,766,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
good pickup fellas, totally forgot to check the service pack numbers. never thought they would make a 20hr difference per wu crunch. I will update the ones that have the RC version of SP3 to the new one and see what happens. Cheers |
Michael GoetzSend message Joined: 2 Mar 09 Posts: 124 Credit: 124,873,744 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Another thing to check is what those computers are doing ('real' work, not BOINC). While the configurations might be the same, if some boxes are heavily loaded and others aren't, that will drastically affect the run time since BOINC tasks run at the lowest priority. Since the CPU needs to funnel data to the GPU, if the GPU isn't running GG because something else is running instead, the GPU will get starved for work, resulting in longer run times. Mike |
|
Send message Joined: 15 Jan 09 Posts: 7 Credit: 1,766,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
well all 4 are running the same boinc projects, and nothing else. ie. dedicated crunchers. another peculiar thing is that the 2 comps that are finishing wus on time are using 0.03% cpu + 1 cuda, whereas the 2 slow comps are using 0.02% cpu + 1 cuda. Any way to force the two slow ones to use the extra 0.01 cpu?? |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
another peculiar thing is that the 2 comps that are finishing wus on time are using 0.03% cpu + 1 cuda, whereas the 2 slow comps are using 0.02% cpu + 1 cuda. Not sure why the numbers would be different, but they should not make any difference, i.e. they don't affect how the BOINC client runs the apps. Interesting: on my machine with 6.5.0 it shows 0.14 CPU. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Can you check with cpuz what sort of cpu they have and especially look at the optimisation codes like mmx,sse,sse2,sse3 and ssse types the more the faster. I get the feeling that even though huge amounts are calculated on the gpu some parts still are done by the cpu before being fed to the gpu. So those codes can make a huge difference thats my 2 cents :D |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I get the feeling that even though huge amounts are calculated on the gpu some parts still are done by the cpu before being fed to the gpu. These instruction set extensions are no magic bullets. You'd need to carefully hand-optimize and vectorize your code, something compilers are not very good at and which is not always possible. Or use libraries where other people already did that job for you. Anyway, the cpu part of GPU-Grid has so far been very insensitive to CPU speed. And his cards are not very fast (-> need less CPU), plus his CPUs are fast: they're all C2Ds with at least SSE3. Thinking outside the box is nice, but I'd be very surprised if you found the right direction here ;) It looks more like a software problem, maybe some debug options were enabled in that pre-release SP3. MrS Scanning for our furry friends since Jan 2002 |
ZydorSend message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Hi all, The reason for the difference is you dont have identical cpus, they are markedly different. There are two Core2 Duo E Series and 2 Core2 6400's Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz [x86 Family 6 Model 15 Stepping 11] 5640.97 million ops/sec Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz [x86 Family 6 Model 23 Stepping 6] 6804.65 million ops/sec Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [x86 Family 6 Model 15 Stepping 6] 4690.81 million ops/sec Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [x86 Family 6 Model 15 Stepping 2] 4623.15 million ops/sec The integer processing speed is the key. If you look at the slower results, they are from the slower interger rated boxes, which is "normal" in that the interger processing is the key part of Crunching. They also dont have the same RAM - two have 3Gb and two have 2Gb Regards Zy |
ZydorSend message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Looked a bit closer at the results. The slower boxes as expected do crunch slower, they do however complete in time. Looking across all four boxes, there are a large amount of User Aborts after a few hours / after 24 hours, particularly the faster machines, paradoxically. What issues have you faced that caused the User Aborts well before the 4 day completion time? Another thought - have you been overclocking any of them? Particularly the two faster boxes. There are a significant number of compute errors well before the 4 day completion time, are they used for other tasks that have - maybe - crashed, taking with them the GPU WU? You can get compute errors when the BOINC Manager is closed down before stopping the local host link (eg closing down or rebooting the PC without properly closing BOINC/Models) - what version of BOINC Manager do you use? Regards Zy |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The reason for the difference is you dont have identical cpus, they are markedly different. How do you know that? Sorry, but this sounds just plain wrong. Even on GTX 280-class cards the CPU-speed does not matter (within sane limits) and these guys need the CPU every ~25 ms. His cards have times per step of 400 - 500 ms, that's one CPU-interaction every half second! They also dont have the same RAM - two have 3Gb and two have 2Gb The amount of memory is only important if you run out of it. On my machine the acemd-task uses 24 MB of main mem. How likely is it that these 24 MB are just too much for a machine with 3 GBs and windows can not swap anything else to disk? Just let the op sort out this software question first, i.e. install the proper SP3. Edit: as Zydor just mentioned clock speeds.. I've seen 400 MHz shader clock in one result instead of the usual 900 MHz. That would cause a significant slow-down. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 21 Oct 08 Posts: 144 Credit: 2,973,555 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
|
©2025 Universitat Pompeu Fabra