Message boards :
Graphics cards (GPUs) :
Mysterious effects with 6.3.14
Message board moderation
| Author | Message |
|---|---|
KokomikoSend message Joined: 18 Jul 08 Posts: 190 Credit: 24,093,690 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Had some mysterious effects under 6.3.14 with parallel working tasks, don't sure if this is a problem of the 6.3.14. LHC-WUs can't run with PS3Grid, PS3Grid stops if LHC is working. PS3Grid can't run with Magnetism, compute error for the PS3Grid-WU. Has anybody else made similar observations of tis side effects?
|
|
Send message Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here's mine: Need 6.3.14 Multi-GPU Help..... Short version is everything seems to be running fine on my single GPU rigs, but the multi-gpu rig has issues.... HTH |
Venturini Dario[VENETO]Send message Joined: 26 Jul 08 Posts: 44 Credit: 4,832,360 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I don't know if the things are connected but I had some problems with LHC after I started running PS3Grid with the 6.3.14 version. All of my LHC-WUs completed succesfully but ALL resulted invalid. I'll investigate further... |
DoctorNowSend message Joined: 18 Aug 07 Posts: 83 Credit: 135,208,752 RAC: 4 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
LHC-WUs can't run with PS3Grid, PS3Grid stops if LHC is working. I discovered that this is depending on how much LHC-WUs are running. On my X2 the PS3Grid-task only did stop when both cores did crunch on a LHC WU. If there was only one busy with LHC, PS3 runs further. (Encountered this with 6.3.10, maybe 6.3.14 handles this differently) All of my LHC-WUs completed succesfully but ALL resulted invalid. I'll investigate further.. The LHC ones I finished in combination with PS3 did all validated correctly. Maybe it's also client depending, are you using 6.3.14? Member of BOINC@Heidelberg and ATA!
|
Stefan LedwinaSend message Joined: 16 Jul 07 Posts: 464 Credit: 298,573,998 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I don't think the failing LHC WUs have something to do with PS3GID/GPUGRID, but with the 6.x.x clients.. From the LHC News - 05.09.2008 10:40 BST - pixelicious.at - my little photoblog |
Krunchin-Keith [USA]Send message Joined: 17 May 07 Posts: 512 Credit: 111,288,061 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
One found many small glitches in the scheduler. I've been reporting these to the developer. You need to turn on debug to see some of what is going on. It may not affect everyone running here. I'm testing some on the GPU alpha project where we use less than 1 CPU. It's not a big problem, but client does thing like start 2 CPU tasks + 1 CPU/CUDA on a 2 CPU host. Other times when a CUDA end it will not start another until maybe 10 minutes later, It does however reserve the GPU, just not start any work. I spent 5-6 hours doing some debugging yesterday to find these problems. We are close to a fully properly behaving clinet, but not quite there. So please everyone be patient, It will happen soon. We will get a better client and you will not need to use ncpus+1 to use all CPUs and CUDA processing. Another thing I found. I cannot run MalariaControl's optimer app at same time as a CUDA task. It stops the CUDA task dead if I had 2 Malaria running along with a CPU/CUDA. That app runs as a wrapper app running some JAVA code. This seems to be the problem. Once I stopped running that, my GPU ms/step improved. Example - test tasks that are suppose to run 8 minutes, took 8 CPU minutes, spread out over 1, 3 or 4 plus wall hours. The other time they were held up sitting idle. BOINC would show as running even though cpu time did not increment. If only one Malaria was running CUDA would run some but not quite full speed. I'm not sure if it affects tasks runnng here now. I have to run some more, now with Malaria turned off, to see if it does or does not. You can turn the Malaria optiomizer app off so you don't get work in your preferences at MalariaControl. IF anyone else is running along side CUDA. Take a sample of your last 5 to 10 CUDA tasks, most importantly the ms/step recorded in the task detail. Then stop optimizer, when none is left on your computer, let more CUDA run. You need to run at least 3 to get a sample. See if your ms/step improves. |
ayQueSend message Joined: 6 Sep 08 Posts: 18 Credit: 806,771 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
One found many small glitches in the scheduler. I've been reporting these to the developer. You need to turn on debug to see some of what is going on. It may not affect everyone running here. I'm testing some on the GPU alpha project where we use less than 1 CPU. It's not a big problem, but client does thing like start 2 CPU tasks + 1 CPU/CUDA on a 2 CPU host. Other times when a CUDA end it will not start another until maybe 10 minutes later, It does however reserve the GPU, just not start any work. I spent 5-6 hours doing some debugging yesterday to find these problems. ..that sounds great!! :) Thank you in advance for losing nerves or something like that while debugging for us... :) |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Concerning the Malaria - GPU-Grid interaction: Generally there are some problems with priorities and task scheduling in windows. This is not to say Linux is necessarily better, I just have more experience with Win. An example: if you run Matlab at "lowest priority" on all cores there are plenty of programs, which can not get CPU time any more, even if they run at "normal". It could be that the Malaria-app is quite more aggressive than it's task priority suggests. MrS Scanning for our furry friends since Jan 2002 |
Krunchin-Keith [USA]Send message Joined: 17 May 07 Posts: 512 Credit: 111,288,061 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have found out it will be possible for the client to run 1 more task than CPUs, even with no ncpus+1 (which should not be used anyway). This case will occur when there are two CPU tasks in deadline toruble, it will run those in addition to a CPU/CUDA task in order to keep the GPU in full use too. Everyone should read this new wiki note which expalins the new behavior of 6.3 clients The new behavior will be to make max use of CPUs and GPUs, even if it means sometimes using more CPU than you physically have or have allocated. So don't panic when your dual core appears to becomes a tri core. |
©2025 Universitat Pompeu Fabra