On new fatty WUs

Message boards : News : On new fatty WUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 18477 - Posted: 30 Aug 2010, 21:17:46 UTC - in response to Message 18428.  
Last modified: 30 Aug 2010, 21:20:50 UTC

I've completed two new 'fatty' WUs on two different computers.
The first one finished in 9 hours 11 minutes (13.23ms/setp). (GTX 480 and C2Q 9550)
The second one finished in 9 hours 51 minutes (14.2ms/setp). (GTX 480 and C2Q 6600)
I realized how much these WUs CPU dependent. (I use the SWAN_SYNC=0 setting on both computers)
Both tasks were 2.5 millon steps.
But I don't understand how could these get the same credit as Snow Crash's 1.82 million step wu, which took only 6 hours 54 minutes (9.1ms/step). (GTX 480 and i7 920)
An i7 920 is faster than a C2Q 9550, but I think not that much. (especially the FP units almost the same fast)
I also don't understand the different GPU clock rates of Snow Crash's task. (1.59GHz, 1.40GHz, 1.74GHz, 1.40GHz, and 1.40GHz) (this computer have only one GPU, maybe the GPU clock changed while crunching this task, and restarted 4 times?)


This is not 1.8M steps. It's still 2.5M steps but the last restart till it finished was of 1.8M steps. (maybe the message could be clearer).

gdf
ID: 18477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MarkJ
Volunteer moderator
Volunteer tester

Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18478 - Posted: 30 Aug 2010, 21:35:01 UTC - in response to Message 18455.  

OK, thank you Mark !

what is with open-cl support to bring up the ATI-power to the project ?

here is a post from timo strunk about open_cl ( developer poem@home ):


Hi everybody,

So, first of all: We will not use CUDA in our app. OpenCL can do everything CUDA can do and there's really no need to use it anymore (however that's my opinion). Apart from that we are working hard to get everything to work in Single Precision. So far our forcefields are exact in single precision with the exception of our Solvent Accessible Surface Area term, which will be deployed on the CPU for the moment. The new SASA forcefield is about 7 times faster than the old one on the CPU already though and during the next time somebody will be working to deploy it also on GPUs.

So the GPU part will be single precision and OpenCL; there's also no question whether we release first ATI or NVidia - these two releases won't be more than 1-2 days apart for sure, because the code is not different apart from optimization parameters. Personally I use the ATI Stream SDK on CPU for debugging bugs usually.

As to release dates: Well we were a bit off, when we estimated POEM++ to be done for the end of CASP, therefore at the moment we are hesitant to specify an immediate release date. The CPU part however is done and we are thinking to release it first, because it also gives already quite a speedup.

Best,
Timo


The main problem with OpenCL is it doesn't have the library support and developer tools that CUDA has. CUDA has been around longer. Eventually that will change. There is an FFT libray supplied with CUDA that a number of the projects make use of. From what I gather there isn't an equivilent one with OpenCL.

From the project point of view OpenCL would be the way to go as you only need one app (however you have to compile with each manufacturers compiler to work on that brand). The coding is different so the apps that have been developed need to be rewritten to work under OpenCL (which all takes time and effort).

It will get there eventually, it just takes a long time for OpenCL to catch up to CUDA.
BOINC blog
ID: 18478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18479 - Posted: 30 Aug 2010, 21:36:03 UTC - in response to Message 18477.  

This is not 1.8M steps. It's still 2.5M steps but the last restart till it finished was of 1.8M steps. (maybe the message could be clearer).

gdf


Oh, now it's perfectly clear.
Thank you.
I overclocked my CPU by 20% since then, and the GPU usage had risen 10%.
So I guess it would be better to upgrade my CPU (+MB and RAM of course) to achieve higher GPU usage.
ID: 18479 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy

Send message
Joined: 19 Aug 07
Posts: 46
Credit: 45,339,082
RAC: 38
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 18488 - Posted: 1 Sep 2010, 5:07:53 UTC

Have all the Fatty wu's been sent out & how big are the files that get uploaded?
We are talking of 10 nanoseconds of simulation per WU

I don't understand. 10 nanoseconds of simulation per WU isn't very long at all. how come the Fatty units are taking twice as long?
ID: 18488 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18489 - Posted: 1 Sep 2010, 7:04:09 UTC - in response to Message 18488.  
Last modified: 1 Sep 2010, 7:57:17 UTC

I thought most of the 500 had been sent and returned but there are a few still available/running.

The tasks are about 2 to 4 times as long as normal tasks, though there is quite a variety of task lengths at the present time.
ID: 18489 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 18490 - Posted: 1 Sep 2010, 9:14:47 UTC - in response to Message 18489.  

Each of those 500 will be reissued 10 times to achieve 100 ns of total simulation time each.

gdf
ID: 18490 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18496 - Posted: 2 Sep 2010, 16:41:02 UTC - in response to Message 18476.  
Last modified: 2 Sep 2010, 16:41:27 UTC

Ignasi,

It is now a few days ago and still no answer on this question.

Can you please give an exploination for this error?

Thanks!
Ton (ftpd) Netherlands
ID: 18496 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 18498 - Posted: 2 Sep 2010, 17:39:10 UTC - in response to Message 18496.  

Ignasi was abroad, but what error are you referring to?

gdf
ID: 18498 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18501 - Posted: 3 Sep 2010, 8:41:45 UTC - in response to Message 18498.  

gdf,

Thanks for the reply. See message 18476 from skgiven!


Ton (ftpd) Netherlands
ID: 18501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jjwhalen

Send message
Joined: 23 Nov 09
Posts: 29
Credit: 17,591,899
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18523 - Posted: 3 Sep 2010, 23:25:31 UTC

I'm curious. I'm currently crunching TRYP WU 1859786 which I got at the end of the server outage. Its info page says its's not a resend but a first-time d/l. Is this part of the original batch of 500 or a new batch?

Oh...belated congratulations on getting the servers back up again: you were missed, but I'm sure Collatz appreciated the overflow ;D
ID: 18523 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18526 - Posted: 4 Sep 2010, 10:23:15 UTC - in response to Message 18523.  

To quote GDF, "Each of those 500 will be reissued 10 times to achieve 100 ns of total simulation time each".

Basically a batch of 500 was created, sent out, returned, and used to build another 500. When these are returned they will create another 500 and so on for 10 times.

This highlights the importance of fast turnaround.
ID: 18526 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18546 - Posted: 6 Sep 2010, 10:53:12 UTC - in response to Message 18476.  

Skgiven & Ignasi,

Again two WU aborted after a lot of hours processing at the same machine/card.

Can you please take a look and reply?

Thanks


Ton (ftpd) Netherlands
ID: 18546 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ignasi

Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 18547 - Posted: 6 Sep 2010, 13:03:46 UTC - in response to Message 18546.  

@ftpd

The errors you are referring to do not come exclusively from WUs from the 'fatty' batch. This and this are HIVPR and only this one is a fatty WU.
Similar errors are reported in 9800 GT here.

App developers are already aware of it.

thanks
ID: 18547 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18550 - Posted: 6 Sep 2010, 18:30:08 UTC - in response to Message 18547.  

Motivated to make a systemic suggestion here. Thanks,
ID: 18550 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
M J Harvey

Send message
Joined: 12 Feb 07
Posts: 9
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 18553 - Posted: 6 Sep 2010, 21:00:01 UTC - in response to Message 18501.  

That's a funny error - it looks like a driver bug affecting GTX9800 (which is what a GTS250 is, too). What driver version do you have?

Matt
ID: 18553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18554 - Posted: 7 Sep 2010, 7:50:12 UTC - in response to Message 18553.  
Last modified: 7 Sep 2010, 7:58:58 UTC

Well, if it’s a driver bug then the same bug in driver 19713 is still there in driver 25896, and effects both the 9800 GT (511MB) and the 1GB version of the GTS250, suggesting this one is not related to memory size.
It would be interesting to see if the bug was present for 6.11 tasks; With the CUDA 3.1 based app rather than 2.3 files.
3.1 might be slower for the older cards but if there were fewer errors, overall it could be faster.

ftpd, at the time of the errors did you have other (non-GPUGrid) GPU tasks running?
ID: 18554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18556 - Posted: 7 Sep 2010, 11:23:48 UTC - in response to Message 18554.  

@skgiven,

At that time only gpugrid was running, gts250 can only do one job.
RNA World was running using CPU.
Later that week two other WU also were aborted. HIVPR?

Enough information?

Good luck
Ton (ftpd) Netherlands
ID: 18556 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18570 - Posted: 8 Sep 2010, 11:16:57 UTC - in response to Message 18556.  
Last modified: 8 Sep 2010, 11:32:55 UTC

Hi Ton,
At that time only gpugrid was running, gts250 can only do one job.
Thanks, I was just trying to determine if Collatz or other GPU tasks were overwriting your GPUGrid WU memory, can occasionally happen if you run two GPU projects on the same computer. They don’t run at the same time, but Boinc can stop one running, and suspend it in memory, to let the other run. Sometimes when this happens it can corrupt the task, but I doubt that this is the case (you only ran one Collatz task on that GPU, probably when there was a task shortage).

RNA World was running using CPU.
Unless it was running Beta tasks it would not mess up the GPUGrid task. If you had lots of RNA failures on the same system, then it would be fair to say it could mess up the system and cause problems for GPU tasks (as they also have to use the CPU too).

Later that week two other WU also were aborted. HIVPR?
Enough information?
Well, I expect in your case you were not video editing or playing a computer game, or you would have said. So I think so.

I see you are having to User Aborting these tasks, and I can fully understand why, they all crashed on that card. A real pain for the cruncher to sit over a system that way. I'm not keen on this situation.

As Matt said, there is a driver "related" bug:
(SWAN : FATAL : Failure executing kernel sync [transpose_float2] [700]
Assertion failed: 0, file swanlib_nv.cpp, line 121)

I just wanted to make sure there was not something else. However the same problem has been raised in two other threads,
http://www.gpugrid.net/forum_thread.php?id=2274
http://www.gpugrid.net/forum_thread.php?id=2278

We can’t do any more than ignasi did - inform the developers, and throw in a few more suggestions, if only to treat the symptoms.
The researchers have been working on new applications for a while now, 5% faster last I heard. Hopefully they can find a work around for this, as well as make the app much faster for GTX460’s and so on.
ID: 18570 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18571 - Posted: 8 Sep 2010, 12:21:12 UTC - in response to Message 18570.  

@skgiven,

Hi Kev,

I never aborted wu myself. There are never played games on this machine.
Just Outlook and Internet Explorer.
RNA no Beta-jobs. There were Linux-beta-jobs. No faillures.

Success!
Ton (ftpd) Netherlands
ID: 18571 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19719 - Posted: 30 Nov 2010, 17:51:08 UTC

I've discovered a new type fatty WU on one of my computers called *_IBUCH_1_pYEEI_long_*. It's running for 4 hours 15 minutes and completed 55% (GTX 480 @ 800MHz, 63% GPU usage, SWAN_SYNC=0, C2Q 9550 @ 3.71GHz).
I'm a little surprised.
ID: 19719 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : On new fatty WUs

©2025 Universitat Pompeu Fabra