Message boards :
Number crunching :
Problem of misassignment of cuda4.2 vs cuda3.1 tasks
Message board moderation
| Author | Message |
|---|---|
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I have made some changes to the server to add some debugging code and some other smaller changes. Let me know if you have been given a cuda3.1 workunit and you should not have receive that. gdf |
|
Send message Joined: 26 Dec 10 Posts: 115 Credit: 416,576,946 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thank you! When should we expect the change to be fully effective? Should we wait a day to make sure any older 3.1 tasks have cleared the queue? This will make many crunchers very happy! Thx - Paul Note: Please don't use driver version 295 or 296! Recommended versions are 266 - 285. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
it's in effect now for all new requests. gdf |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
On a sample of one (http://www.gpugrid.net/results.php?hostid=93580), last week's 3.1 allocation has been replaced by 4.2 It will be interesting to see if this little 420M laptop can complete it within 24 hours. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Good for now. gdf |
|
Send message Joined: 3 Oct 11 Posts: 100 Credit: 5,879,292,399 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
The problem seems to be that your machine is marked as unreliable with the cuda4.2 application, so the server decides to give the cuda3.1 one which is reliable. I'll contact Berkeley about it. gdf |
|
Send message Joined: 3 Oct 11 Posts: 100 Credit: 5,879,292,399 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This host also gets 4.2 tasks. All of my GTX5xx hosts get mixture of tasks. My one GTX680 host gets only 4.2 tasks. |
StoneagemanSend message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Still getting a mix. ie http://www.gpugrid.net/results.php?hostid=124305 |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Is a project reset needed following this mornings update? FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
It should not be required, but you never know. gdf |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The problem seems to be that your machine is marked as unreliable with the cuda4.2 application, so the server decides to give the cuda3.1 one which is reliable. Could this be the result of the high error count with ERROR: file deven.cpp line 1106: # Energies have become nan which some people got with the cuda4.2 app? I had several myself with my GTX 470 (host 43404). That's not a good host to generalise from, because I run it under app_info.xml, but in case it helps, here are my observations. For over 3 months, I was running the cuda3.1 app with a count of 0.5, and tasks from other projects running alongside GPUGrid on the same GPU (see thread 2897). A few tasks failed, but no more than usual. Then I swapped to cuda4.2 in the same configuration. The failure rate soared - to over 50%, by eye - and all errors were of the type 'Energies have become nan'. Finally, I set count=1 in app_info (so that GPUGrid has sole use of the GPU while running, although it is swapped out periodically so other projects can run). Since making that change, I haven't had a single error. So, perhaps, other apps in GPU memory cause a problem? I see someone else was talking about memory being a possible suspect in the news threads. All of which leads me to suspect a buffer overflow, or use of uninitialised memory, in the cuda4.2 app. I recently helped a developer on another project pin down an error which was causing invalid data to be processed: his comments after he'd found the bug were: I recall I always got some junk at the end of arrays (array size can be any but processing is vectorized to float4) .... The test which let us track that one down was: "If the host is regularly producing errors, perform a complete cold restart (to zero GPU RAM), and then allow tasks to run while avoiding any application which might load large amounts of data into VRAM" - so no games, video playback, photo editing etc. If the errors go away when VRAM is kept 'clean', that might be a pointer. |
|
Send message Joined: 21 Dec 08 Posts: 51 Credit: 26,320,167 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I got this error a few times, i solved it by raising the voltage a bit. Or not overclocking as much would help I would think too. |
MartySend message Joined: 8 Nov 08 Posts: 3 Credit: 241,804,865 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This host is also getting an mix of cuda31 and cuda42 tasks. Hasn't had an error since i installed the GTX560 in it and started running GPUGRID again. |
StoneagemanSend message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Not had a 3.1 task since my last post, so looking promising. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
We have from now implemented a correcting suggested by David A. in the scheduler which according to him should fix the problem. Let me know. gdf |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Any comment? Is the problem solved? gdf |
|
Send message Joined: 3 Oct 11 Posts: 100 Credit: 5,879,292,399 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just checked. Looks good. No new mixed tasks for me. |
dskagcommunitySend message Joined: 28 Apr 11 Posts: 463 Credit: 958,266,958 RAC: 34 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
3 Jul 2012 | 16:41:51 UTC Thats the date of my last 31 sent. Its after your 10 oclock. But i must wait for more wus the current one is 42 but this means nothing ^^ 285gtx is slowing barely down on 42 apps so i need more time to wait :/ DSKAG Austria Research Team: http://www.research.dskag.at
|
|
Send message Joined: 6 Sep 10 Posts: 8 Credit: 3,478,997,495 RAC: 74 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This computer has not received any cuda 4.2 work units since updating the driver on 6/30/2012. The last one just downloaded a few minutes ago, it was cuda 3.1 also. Any suggestions. http://www.gpugrid.net/show_host_detail.php?hostid=79921 Thanks Jim |
©2025 Universitat Pompeu Fabra