Message boards :
Graphics cards (GPUs) :
Run times
Message board moderation
| Author | Message |
|---|---|
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Has the run time been creeping up while I was not looking? I recall run times of about 6:30 on most of my cards (with some plus minus slop) but now It seems I am seeing 7:30 to 8:00 run times. I can even see some that have projected run times of over 9 hours... |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
There is a new bug in BOINC (6.6.36) which assigns all WU to the same gpu if you have multiple gpus. As they time share it, they take much longer. I am not sure that this is your case. Can anyone suggest a good BOINC version which is new, but without major flaws? gdf |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There is a new bug in BOINC (6.6.36) which assigns all WU to the same gpu if you have multiple gpus. As they time share it, they take much longer. GDF, As far as I know that assign all tasks to GPU 0 is a Linux only bug ... of course that is what you run ... :) At the moment I am running mostly 6.6.3x versions but have been pretty happy with 6.10.3 which does not have the two major issues of 6.10.4 and .5 ... 6.10.6 fixed a couple issues but has still left uncorrected some problems with the order in which it processes GPU tasks for some people (introduced in 6.10.4). Also note that I think that I just uncovered a new bug / situation with task ordering on the GPU with multiple projects that, in essence, will cause Resource Share to be ignored. I do not know how far back in versions that this bug extends. For me it is new in that to this point there was no pressure to run multiple projects for the simple reason that effectively there were no projects to run ... Now that we are ramping up more and more projects with GPU capabilities ... well ... Anyway, my suggestions are still 6.5.0, 6.6.36, or 6.10.3; and as I said these are versions I had run extensively or am running now ... |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I just realized that you neatly sidestepped my original question... :) Are the tasks longer now, or is it my imagination? I am not talking about longer run times caused by bugs but just normal run times... |
|
Send message Joined: 12 Jul 07 Posts: 100 Credit: 21,848,502 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There is a new bug in BOINC (6.6.36) which assigns all WU to the same gpu if you have multiple gpus. As they time share it, they take much longer. Aha, that'll be why my GTX295 has started working, albeit slowly |
|
Send message Joined: 7 Jun 09 Posts: 40 Credit: 24,377,383 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
There is a new bug in BOINC (6.6.36) which assigns all WU to the same gpu if you have multiple gpus. As they time share it, they take much longer. GPU scheduling seems to be fubar'd for Linux in one way or another with pretty much all releases. There is the everything gets assigned to '--device 0' bug in the 6.6.3x series (cause coproc_cmdline() is called post fork()) and the preempt problems with 6.10.x. I'm running 6_6a branch (which is equiv to an unreleased 6.6.39) plus the following patch (r18836 from trunk) which will resolve the '--device 0' issue. It seems pretty solid.
--- boinc_core_release_6_6_39/client/app_start.cpp.orig 2009-09-15 11:18:45.000000000 +0100
+++ boinc_core_release_6_6_39/client/app_start.cpp 2009-09-15 11:52:34.000000000 +0100
@@ -104,8 +104,10 @@
}
#endif
-// for apps that use coprocessors, reserve the instances,
-// and append "--device x" to the command line
+// For apps that use coprocessors, reserve the instances,
+// and append "--device x" to the command line.
+// NOTE: on Linux, you must call this before the fork(), not after.
+// Otherwise the reservation is a no-op.
//
static void coproc_cmdline(
COPROC* coproc, ACTIVE_TASK* atp, int ninstances, char* cmdline
@@ -793,6 +795,13 @@
getcwd(current_dir, sizeof(current_dir));
+ sprintf(cmdline, "%s %s",
+ wup->command_line.c_str(), app_version->cmdline
+ );
+ if (coproc_cuda && app_version->ncudas) {
+ coproc_cmdline(coproc_cuda, this, app_version->ncudas, cmdline);
+ }
+
// Set up core/app shared memory seg if needed
//
if (!app_client_shm.shm) {
@@ -924,10 +933,6 @@
}
}
#endif
- sprintf(cmdline, "%s %s", wup->command_line.c_str(), app_version->cmdline);
- if (coproc_cuda && app_version->ncudas) {
- coproc_cmdline(coproc_cuda, this, app_version->ncudas, cmdline);
- }
sprintf(buf, "../../%s", exec_path );
if (g_use_sandbox) {
char switcher_path[100];
Send me a PM if you want a link to the RPM's and SRPM for a Fedora 11 build. |
JetSend message Joined: 14 Jun 09 Posts: 25 Credit: 5,835,455 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Should agree with you, Paul. Unfortunately, couldn't find the records earlier 31 of July ( when software was updated to CUDA 2.2 capability), to confirm your thoughts, but anyhow, sure that you are totally right: WU's becomes longer. New WU's are lasts longer (from 6-6:30 hours to complete till 7:30++ hours), according to my feelings. Previously my station was able to complete at least, 3,5 - 4 WU's\day per GPU, right now I'm happy with 3 WU's\day. OK, to keep station more stable, I was downclocked GPU's a bit ( from 1,63gHz till 1,57gHz), but that wasn't the main reason. Running damn stable GTX260 x3 under BOINC 6.10.0, 190.38 driver on Win Server 2008. B.T.W., did you run other projects on your farms on CPU's ? If "yes", it could happened, that neighbor project, running on CPU's, could consume a bit of the CPU power, with need GPU's to be served ( data feed & output, etc). That could be one of the reasons, as well, I think. Right now I'm checking this on my system. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
No, WUs are not longer. They are designed to last 1/4 of a day on a fast card. gdf |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I have updated the recommended client to 6.10.3. thanks, gdf |
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No, WUs are not longer. They are designed to last 1/4 of a day on a fast card. Looking through my last lot of results the shortest seem to be 7 hours 30 mins and the majority seem to be around 8 hours 30 mins. That was taken using the "approx elapsed time" shown in the wu results. These were run on GTX295 and GTX275 so by no means a slow card, although they do run at standard speeds. One machine (the GTX275) has BOINC 6.6.37 and the other is currently running 6.10.3 under windows. BOINC blog |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
No, WUs are not longer. They are designed to last 1/4 of a day on a fast card. Well, your design is bent ... I used to get timings that were in the range of 6 hours and change on my GTX295 cards... sadly I cannot prove this as the task list is truncated at about the first of September and I am thinking back to much earlier. If your intent is to run for about 1/4 of a day, or 6 hours well, you are over-shooting that on GTX260s, GTX285, and GTX295 cards ... the more common time seems to be up in the 28,000 seconds than down at 21K seconds. This does seem task dependent. I am only pointing this out because it seems strange that before most tasks did come in under 7 hours and now more and more are running up to 9 hours. And you don't seem to be aware of the increase in run times ... A minor point then becomes you are shading the credit grant ... :) But most importantly to me is that you are not aware that you are overrunning your execution time targets ... Off to see the football ... |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Ok, let's say that it is between 1/4 and 1/3 of a day. The calculations is made approximately, it is not designed to be exact. gdf |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Ok, let's say that it is between 1/4 and 1/3 of a day. Ok ... But I am not sure that you are seeing to point of my question... Are you aware that the time is growing... The only reason I really noticed it was because for a couple months here lately I was not able to pay attention to GPU Grid (notice the lack of posting) and it was a little bit of a shock to see that my run times are almost always over 7 hours now and running as high as 9 where before my run times were real consistenly clustered about 6.5 hours ... Not to put too fine a point on it, but, if this is the case the low end recommendation for hardware needs revision ... |
|
Send message Joined: 12 Feb 09 Posts: 57 Credit: 23,376,686 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had a similar increase in runtime just after I upgraded to the 190 drivers. Completely removing them and reinstalling them fixed it for me! |
|
Send message Joined: 18 Aug 08 Posts: 8 Credit: 127,707,074 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The increased time may be due to a bug in the 190.xx drivers, which puts the GPU in 2D mode, more information in this post.
|
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The increased time may be due to a bug in the 190.xx drivers, which puts the GPU in 2D mode, more information in this post. I could have sworn we were told we needed to update to the 190 series drivers. Did I misunderstand? I mean I think I have all my systems running the 190.62 drivers now ... no 2 are on 190.62 and one is on 190.38 ... The thing is that in that I don't turn my systems off and they run 24/7 I don't see how they get back into 3D mode if the issue is down-shifting to 2D mode... I would think that once it was down it could not, or at least would not, re-adjust up on the next task. That is why I have trouble thinking that this is that kind of problem. I have not done a survey though my quick look seemed to hint that it is more likely task type dependent ... that is, some of the tasks (by task name class) are now running longer than the norms ... |
|
Send message Joined: 11 Dec 08 Posts: 43 Credit: 2,216,617 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
With the new Linux cuda 2.2 application, my work units are running faster and producing more credit per day. I am using the 190.32 Nvidia drivers. When I was running with the 190 drivers in Windows, the work units were not running faster but they were more stable and used less CPU time. |
JetSend message Joined: 14 Jun 09 Posts: 25 Credit: 5,835,455 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I don't think, that sudden switch to 2D mode could be the reason. I'm almost run all the time GPU-Z, to control the core \ mem frequencies, as well, as core temps. All three GTX 260 runs at full load. Additionally, in <stderr_txt> file main details is shown, as well, as core frequency. Here is sample: # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1.59 GHz # Total amount of global memory: 939524096 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Driver version 2030 # Runtime version 2020 # Device 1: "GeForce GTX 260" # Clock rate: 1.59 GHz # Total amount of global memory: 939524096 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Driver version 2030 # Runtime version 2020 # Device 2: "GeForce GTX 260" # Clock rate: 1.59 GHz # Total amount of global memory: 939524096 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Driver version 2030 # Runtime version 2020 MDIO ERROR: cannot open file "restart.coor" # Time per step: 51.394 ms # Approximate elapsed time for entire WU: 32121.105 s called boinc_finish </stderr_txt> No any sign of fall to 2D mode. |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Another consideration is that the amount of CPU time has risen sharply which slows down other projects. GPU Grid was my project of choice for the GPU however, it appears it consumes a hefty amount of CPU, more than I would expect and I am aware it needs to use some CPU time. Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline |
|
Send message Joined: 7 Jun 09 Posts: 40 Credit: 24,377,383 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Another consideration is that the amount of CPU time has risen sharply which slows down other projects. GPU Grid was my project of choice for the GPU however, it appears it consumes a hefty amount of CPU, more than I would expect and I am aware it needs to use some CPU time. I have noticed that v670 of the Linux app uses approx 10% more CPU than it used to with v666. I wonder whether that is by design or an unwelcome side effect? |
©2026 Universitat Pompeu Fabra