Run times

Author	Message
Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12554 - Posted: 19 Sep 2009, 4:13:03 UTC Has the run time been creeping up while I was not looking? I recall run times of about 6:30 on most of my cards (with some plus minus slop) but now It seems I am seeing 7:30 to 8:00 run times. I can even see some that have projected run times of over 9 hours... ID: 12554 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 12562 - Posted: 19 Sep 2009, 17:40:35 UTC - in response to Message 12554. There is a new bug in BOINC (6.6.36) which assigns all WU to the same gpu if you have multiple gpus. As they time share it, they take much longer. I am not sure that this is your case. Can anyone suggest a good BOINC version which is new, but without major flaws? gdf ID: 12562 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12563 - Posted: 19 Sep 2009, 18:07:47 UTC - in response to Message 12562. There is a new bug in BOINC (6.6.36) which assigns all WU to the same gpu if you have multiple gpus. As they time share it, they take much longer. I am not sure that this is your case. Can anyone suggest a good BOINC version which is new, but without major flaws? gdf GDF, As far as I know that assign all tasks to GPU 0 is a Linux only bug ... of course that is what you run ... :) At the moment I am running mostly 6.6.3x versions but have been pretty happy with 6.10.3 which does not have the two major issues of 6.10.4 and .5 ... 6.10.6 fixed a couple issues but has still left uncorrected some problems with the order in which it processes GPU tasks for some people (introduced in 6.10.4). Also note that I think that I just uncovered a new bug / situation with task ordering on the GPU with multiple projects that, in essence, will cause Resource Share to be ignored. I do not know how far back in versions that this bug extends. For me it is new in that to this point there was no pressure to run multiple projects for the simple reason that effectively there were no projects to run ... Now that we are ramping up more and more projects with GPU capabilities ... well ... Anyway, my suggestions are still 6.5.0, 6.6.36, or 6.10.3; and as I said these are versions I had run extensively or am running now ... ID: 12563 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12565 - Posted: 19 Sep 2009, 19:35:35 UTC I just realized that you neatly sidestepped my original question... :) Are the tasks longer now, or is it my imagination? I am not talking about longer run times caused by bugs but just normal run times... ID: 12565 · Rating: 0 · rate: / Reply Quote

Temujin Send message Joined: 12 Jul 07 Posts: 100 Credit: 21,848,502 RAC: 0 Level Scientific publications	Message 12566 - Posted: 19 Sep 2009, 20:04:00 UTC - in response to Message 12562. There is a new bug in BOINC (6.6.36) which assigns all WU to the same gpu if you have multiple gpus. As they time share it, they take much longer. Aha, that'll be why my GTX295 has started working, albeit slowly ID: 12566 · Rating: 0 · rate: / Reply Quote

JackOfAll Send message Joined: 7 Jun 09 Posts: 40 Credit: 24,377,383 RAC: 0 Level Scientific publications	Message 12569 - Posted: 20 Sep 2009, 0:55:57 UTC - in response to Message 12562. Last modified: 20 Sep 2009, 1:00:09 UTC There is a new bug in BOINC (6.6.36) which assigns all WU to the same gpu if you have multiple gpus. As they time share it, they take much longer. I am not sure that this is your case. Can anyone suggest a good BOINC version which is new, but without major flaws? GPU scheduling seems to be fubar'd for Linux in one way or another with pretty much all releases. There is the everything gets assigned to '--device 0' bug in the 6.6.3x series (cause coproc_cmdline() is called post fork()) and the preempt problems with 6.10.x. I'm running 6_6a branch (which is equiv to an unreleased 6.6.39) plus the following patch (r18836 from trunk) which will resolve the '--device 0' issue. It seems pretty solid. --- boinc_core_release_6_6_39/client/app_start.cpp.orig 2009-09-15 11:18:45.000000000 +0100 +++ boinc_core_release_6_6_39/client/app_start.cpp 2009-09-15 11:52:34.000000000 +0100 @@ -104,8 +104,10 @@ } #endif -// for apps that use coprocessors, reserve the instances, -// and append "--device x" to the command line +// For apps that use coprocessors, reserve the instances, +// and append "--device x" to the command line. +// NOTE: on Linux, you must call this before the fork(), not after. +// Otherwise the reservation is a no-op. // static void coproc_cmdline( COPROC* coproc, ACTIVE_TASK* atp, int ninstances, char* cmdline @@ -793,6 +795,13 @@ getcwd(current_dir, sizeof(current_dir)); + sprintf(cmdline, "%s %s", + wup->command_line.c_str(), app_version->cmdline + ); + if (coproc_cuda && app_version->ncudas) { + coproc_cmdline(coproc_cuda, this, app_version->ncudas, cmdline); + } + // Set up core/app shared memory seg if needed // if (!app_client_shm.shm) { @@ -924,10 +933,6 @@ } } #endif - sprintf(cmdline, "%s %s", wup->command_line.c_str(), app_version->cmdline); - if (coproc_cuda && app_version->ncudas) { - coproc_cmdline(coproc_cuda, this, app_version->ncudas, cmdline); - } sprintf(buf, "../../%s", exec_path ); if (g_use_sandbox) { char switcher_path[100]; Send me a PM if you want a link to the RPM's and SRPM for a Fedora 11 build. ID: 12569 · Rating: 0 · rate: / Reply Quote

Jet Send message Joined: 14 Jun 09 Posts: 25 Credit: 5,835,455 RAC: 0 Level Scientific publications	Message 12570 - Posted: 20 Sep 2009, 7:05:39 UTC - in response to Message 12554. Should agree with you, Paul. Unfortunately, couldn't find the records earlier 31 of July ( when software was updated to CUDA 2.2 capability), to confirm your thoughts, but anyhow, sure that you are totally right: WU's becomes longer. New WU's are lasts longer (from 6-6:30 hours to complete till 7:30++ hours), according to my feelings. Previously my station was able to complete at least, 3,5 - 4 WU's\day per GPU, right now I'm happy with 3 WU's\day. OK, to keep station more stable, I was downclocked GPU's a bit ( from 1,63gHz till 1,57gHz), but that wasn't the main reason. Running damn stable GTX260 x3 under BOINC 6.10.0, 190.38 driver on Win Server 2008. B.T.W., did you run other projects on your farms on CPU's ? If "yes", it could happened, that neighbor project, running on CPU's, could consume a bit of the CPU power, with need GPU's to be served ( data feed & output, etc). That could be one of the reasons, as well, I think. Right now I'm checking this on my system. ID: 12570 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 12571 - Posted: 20 Sep 2009, 7:21:10 UTC - in response to Message 12570. No, WUs are not longer. They are designed to last 1/4 of a day on a fast card. gdf ID: 12571 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 12572 - Posted: 20 Sep 2009, 7:23:33 UTC - in response to Message 12571. I have updated the recommended client to 6.10.3. thanks, gdf ID: 12572 · Rating: 0 · rate: / Reply Quote

MarkJ Volunteer moderator Volunteer tester Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level Scientific publications	Message 12576 - Posted: 20 Sep 2009, 11:29:03 UTC - in response to Message 12571. No, WUs are not longer. They are designed to last 1/4 of a day on a fast card. gdf Looking through my last lot of results the shortest seem to be 7 hours 30 mins and the majority seem to be around 8 hours 30 mins. That was taken using the "approx elapsed time" shown in the wu results. These were run on GTX295 and GTX275 so by no means a slow card, although they do run at standard speeds. One machine (the GTX275) has BOINC 6.6.37 and the other is currently running 6.10.3 under windows. BOINC blog ID: 12576 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12577 - Posted: 20 Sep 2009, 15:43:14 UTC - in response to Message 12571. No, WUs are not longer. They are designed to last 1/4 of a day on a fast card. Well, your design is bent ... I used to get timings that were in the range of 6 hours and change on my GTX295 cards... sadly I cannot prove this as the task list is truncated at about the first of September and I am thinking back to much earlier. If your intent is to run for about 1/4 of a day, or 6 hours well, you are over-shooting that on GTX260s, GTX285, and GTX295 cards ... the more common time seems to be up in the 28,000 seconds than down at 21K seconds. This does seem task dependent. I am only pointing this out because it seems strange that before most tasks did come in under 7 hours and now more and more are running up to 9 hours. And you don't seem to be aware of the increase in run times ... A minor point then becomes you are shading the credit grant ... :) But most importantly to me is that you are not aware that you are overrunning your execution time targets ... Off to see the football ... ID: 12577 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 12578 - Posted: 20 Sep 2009, 16:22:36 UTC - in response to Message 12577. Last modified: 20 Sep 2009, 16:23:17 UTC Ok, let's say that it is between 1/4 and 1/3 of a day. The calculations is made approximately, it is not designed to be exact. gdf ID: 12578 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12581 - Posted: 21 Sep 2009, 6:08:00 UTC - in response to Message 12578. Ok, let's say that it is between 1/4 and 1/3 of a day. The calculations is made approximately, it is not designed to be exact. Ok ... But I am not sure that you are seeing to point of my question... Are you aware that the time is growing... The only reason I really noticed it was because for a couple months here lately I was not able to pay attention to GPU Grid (notice the lack of posting) and it was a little bit of a shock to see that my run times are almost always over 7 hours now and running as high as 9 where before my run times were real consistenly clustered about 6.5 hours ... Not to put too fine a point on it, but, if this is the case the low end recommendation for hardware needs revision ... ID: 12581 · Rating: 0 · rate: / Reply Quote

Tom Philippart Send message Joined: 12 Feb 09 Posts: 57 Credit: 23,376,686 RAC: 0 Level Scientific publications	Message 12583 - Posted: 21 Sep 2009, 8:41:35 UTC - in response to Message 12581. I had a similar increase in runtime just after I upgraded to the 190 drivers. Completely removing them and reinstalling them fixed it for me! ID: 12583 · Rating: 0 · rate: / Reply Quote

SuperViruS Send message Joined: 18 Aug 08 Posts: 8 Credit: 127,707,074 RAC: 0 Level Scientific publications	Message 12606 - Posted: 22 Sep 2009, 6:00:54 UTC The increased time may be due to a bug in the 190.xx drivers, which puts the GPU in 2D mode, more information in this post. ID: 12606 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 12607 - Posted: 22 Sep 2009, 6:57:49 UTC - in response to Message 12606. The increased time may be due to a bug in the 190.xx drivers, which puts the GPU in 2D mode, more information in this post. I could have sworn we were told we needed to update to the 190 series drivers. Did I misunderstand? I mean I think I have all my systems running the 190.62 drivers now ... no 2 are on 190.62 and one is on 190.38 ... The thing is that in that I don't turn my systems off and they run 24/7 I don't see how they get back into 3D mode if the issue is down-shifting to 2D mode... I would think that once it was down it could not, or at least would not, re-adjust up on the next task. That is why I have trouble thinking that this is that kind of problem. I have not done a survey though my quick look seemed to hint that it is more likely task type dependent ... that is, some of the tasks (by task name class) are now running longer than the norms ... ID: 12607 · Rating: 0 · rate: / Reply Quote

RalphEllis Send message Joined: 11 Dec 08 Posts: 43 Credit: 2,216,617 RAC: 0 Level Scientific publications	Message 12649 - Posted: 23 Sep 2009, 2:55:04 UTC - in response to Message 12607. With the new Linux cuda 2.2 application, my work units are running faster and producing more credit per day. I am using the 190.32 Nvidia drivers. When I was running with the 190 drivers in Windows, the work units were not running faster but they were more stable and used less CPU time. ID: 12649 · Rating: 0 · rate: / Reply Quote

Jet Send message Joined: 14 Jun 09 Posts: 25 Credit: 5,835,455 RAC: 0 Level Scientific publications	Message 12691 - Posted: 23 Sep 2009, 19:03:29 UTC - in response to Message 12606. I don't think, that sudden switch to 2D mode could be the reason. I'm almost run all the time GPU-Z, to control the core \ mem frequencies, as well, as core temps. All three GTX 260 runs at full load. Additionally, in <stderr_txt> file main details is shown, as well, as core frequency. Here is sample: # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1.59 GHz # Total amount of global memory: 939524096 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Driver version 2030 # Runtime version 2020 # Device 1: "GeForce GTX 260" # Clock rate: 1.59 GHz # Total amount of global memory: 939524096 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Driver version 2030 # Runtime version 2020 # Device 2: "GeForce GTX 260" # Clock rate: 1.59 GHz # Total amount of global memory: 939524096 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Driver version 2030 # Runtime version 2020 MDIO ERROR: cannot open file "restart.coor" # Time per step: 51.394 ms # Approximate elapsed time for entire WU: 32121.105 s called boinc_finish </stderr_txt> No any sign of fall to 2D mode. ID: 12691 · Rating: 0 · rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 12714 - Posted: 24 Sep 2009, 11:47:34 UTC - in response to Message 12691. Another consideration is that the amount of CPU time has risen sharply which slows down other projects. GPU Grid was my project of choice for the GPU however, it appears it consumes a hefty amount of CPU, more than I would expect and I am aware it needs to use some CPU time. Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline ID: 12714 · Rating: 0 · rate: / Reply Quote

JackOfAll Send message Joined: 7 Jun 09 Posts: 40 Credit: 24,377,383 RAC: 0 Level Scientific publications	Message 12715 - Posted: 24 Sep 2009, 12:41:47 UTC - in response to Message 12714. Another consideration is that the amount of CPU time has risen sharply which slows down other projects. GPU Grid was my project of choice for the GPU however, it appears it consumes a hefty amount of CPU, more than I would expect and I am aware it needs to use some CPU time. I have noticed that v670 of the Linux app uses approx 10% more CPU than it used to with v666. I wonder whether that is by design or an unwelcome side effect? ID: 12715 · Rating: 0 · rate: / Reply Quote