Really low Run Times, but still Completed and Successful?

Message boards : Graphics cards (GPUs) : Really low Run Times, but still Completed and Successful?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

AuthorMessage
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29419 - Posted: 11 Apr 2013, 23:46:21 UTC - in response to Message 29417.  

But in the meantime why do something that's causing corrupted results?
ID: 29419 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29420 - Posted: 11 Apr 2013, 23:57:39 UTC - in response to Message 29419.  
Last modified: 12 Apr 2013, 0:11:51 UTC

Not all results are corrupted, and the admins still have a way of finding the corrupted ones after they've been erroneously validated.

So, to answer your question "why keep doing it", the answer is "the science" (Overall I do more science with a higher throughput, running 2-at-a-time, even if I get some that immediately error out).

If the admins make a request for me to change, I will of course honor it. But, so far, I've been encouraged to test (find problems) and report results (and problems).

I just hope they fix the problems I find.

Regards,
Jacob
ID: 29420 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29423 - Posted: 12 Apr 2013, 12:22:25 UTC - in response to Message 29420.  

The apps at GPUGrid were not designed to run more than one task at a time on the same GPU. If running several tasks at once causes failures it might be massively detrimental to the project, so don't be surprised if the use of app_config gets banned. Getting lots of credits for producing failed tasks won't endear you to anyone and typically results in having your credit reduced or account suspended.

I don't think projects have to facilitate crunchers or Boinc add ons.

app_config was designed for all that can use it but is better used at other projects than here. At GPUGrid it seems the improvement is limited to but a few task types, on cards with more RAM and on Vista/W7/W8.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 29423 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 29424 - Posted: 12 Apr 2013, 12:53:18 UTC - in response to Message 29423.  
Last modified: 12 Apr 2013, 13:00:48 UTC

JK is the only one affected by the problem, as far as I can tell, and it's a nasty one. My guess is that the client does not keep the two WUs separate. Disable as soon as possible.

(Unless there is something more odd like read-only filesystems, file permissions, or the like).
ID: 29424 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29425 - Posted: 12 Apr 2013, 13:37:58 UTC - in response to Message 29424.  
Last modified: 12 Apr 2013, 14:06:56 UTC

JK is the only one affected by the problem, as far as I can tell, and it's a nasty one. My guess is that the client does not keep the two WUs separate. Disable as soon as possible.

(Unless there is something more odd like read-only filesystems, file permissions, or the like).


Toni:
Per your request (as a project administrator), I will stop running 2-at-a-time on the same GPU.

Toni / Nate / GDF:
Could you please do more research into the problem (hopefully not just guessing and giving up)... and fix the apps so that they can run 2-at-a-time consistently, and fix the validator so it does proper additional validation checks before granting credits? Running the tasks 2-at-a-time does increase throughput, when they work properly, and thus supporting it would be beneficial to your science.

I am available for any testing, and am eager to be allowed to run tasks 2-at-a-time, again.

Thanks,
Jacob Klein
ID: 29425 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 29428 - Posted: 12 Apr 2013, 19:47:17 UTC - in response to Message 29425.  
Last modified: 12 Apr 2013, 19:48:00 UTC

Hi JK, thanks for your fix. Your results now seem to work properly. My guess is that it is a bug in the client (not so much the application) which makes the two tasks not isolated from each other.
ID: 29428 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29430 - Posted: 12 Apr 2013, 19:52:08 UTC - in response to Message 29428.  

Toni,

There are times when GPUGrid-2-at-a-time works just fine for me. In fact, I'd say most of the time, it works correctly.

If it's a bug in the client, as you suggest, then let's get it fixed! I'm a beta tester, testing 7.0.62, and if you can isolate the reason you think it's a client bug, we can contact David Anderson and he can fix it. What makes you think it's a client bug?

I'm much more likely to believe that it's an application problem, though. I'm able to run x-at-a-time successfully on all of my other GPU projects.

If you determine it's an application problem, could you please fix it, as well as the validator?

I look forward to your reply,
Jacob
ID: 29430 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29431 - Posted: 12 Apr 2013, 20:16:22 UTC

JK's got a point in that running multiple concurrent WUs works for many projects. In fact GPu-Grid is the first one I am aware of where it doesn't work. Generally the others don't officially support and encourage it, but didn't have to adjust their code either.

Running multiple WUs per system, on multiple GPUs, works. Since GPU-Grid is rather complex and surely uses quite some CUDA libraries I could imaginethe following: maybe in some of these libraries there's some code / variables / initializations which is global per GPU, so that multi WUs share it (unintentionally, in this case). Now an error could appear if 2 WUs conflict in the use of the ressource, otherwise it will run just fine.

MrS
Scanning for our furry friends since Jan 2002
ID: 29431 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29460 - Posted: 15 Apr 2013, 13:35:53 UTC - in response to Message 29430.  

Toni,

There are times when GPUGrid-2-at-a-time works just fine for me. In fact, I'd say most of the time, it works correctly.

If it's a bug in the client, as you suggest, then let's get it fixed! I'm a beta tester, testing 7.0.62, and if you can isolate the reason you think it's a client bug, we can contact David Anderson and he can fix it. What makes you think it's a client bug?

I'm much more likely to believe that it's an application problem, though. I'm able to run x-at-a-time successfully on all of my other GPU projects.

If you determine it's an application problem, could you please fix it, as well as the validator?

I look forward to your reply,
Jacob


Toni / Nate / GDR:

I contacted David Anderson. He said that he'd be willing to help with any client problem.

I'd like to make progress on fixing the issues within this thread.
Have you guys begun an investigation yet into the cause?

It seems like the stderr output only had a single line of info, showing the BOINC version number. Maybe the code could be changed to indicate how far along in the execution it got, before crashing/completing?

I'd like to resume processing GPUGrid tasks x-at-a-time on the same GPU, like I can with all of my other GPU projects.

If there's anything I can do to help test a change or expedite the fix, I am at your command.

Thank you,
Jacob
ID: 29460 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29461 - Posted: 15 Apr 2013, 15:37:48 UTC - in response to Message 29460.  
Last modified: 15 Apr 2013, 16:07:40 UTC

Toni / Nate / GDR:

I disabled x-at-a-time for GPUGrid tasks, per Toni's request, on 4/12/2013.

Yet, I had 2 more results that were marked "Completed and validated", with Run time less than 4 seconds, granted full+bonus credit of 70,800, on 4/14/2013!

This leads me to believe that the problem is unrelated to running x-at-a-time.
Again, I ask you, have you begun your investigation??

Task Work unit Computer Sent Time reported or deadline Status Run time (sec) CPU time (sec) Credit Application
6755331 4362872 126725 14 Apr 2013 | 7:01:18 UTC 14 Apr 2013 | 10:38:35 UTC Completed and validated 2.40 0.84 70,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
6754567 4362305 126725 14 Apr 2013 | 2:29:48 UTC 14 Apr 2013 | 10:38:01 UTC Completed and validated 3.19 0.84 70,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)

Specs:
Windows 8 x64, BOINC v7.0.62 x64 Beta, nVidia GeForce 314.22 WHQL, eVGA GTX 660 TI 3GB FTW, eVGA GTX 460 1GB, GPUGrid app_config using the following settings for all 4 apps:
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.001</cpu_usage>
</gpu_versions>
ID: 29461 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29462 - Posted: 15 Apr 2013, 15:49:16 UTC - in response to Message 29461.  

<cpu_usage>.001</cpu_usage>

Jacob, could you try running without the app_config and see if you still get these glitches?
ID: 29462 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29463 - Posted: 15 Apr 2013, 15:55:30 UTC - in response to Message 29462.  
Last modified: 15 Apr 2013, 16:08:19 UTC

Maybe. Could you also try running WITH the app_config to see what results you get?
I'm quite frustrated -- it really feels like I'm the only one trying to solve this problem.
ID: 29463 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29464 - Posted: 15 Apr 2013, 16:20:31 UTC - in response to Message 29463.  

Maybe. Could you also try running WITH the app_config to see what results you get?
I'm quite frustrated -- it really feels like I'm the only one trying to solve this problem.

I don't see it as a problem. Many projects support running multiple instances (although not necessarily officially), this one does not. GPUGrid is extremely demanding compared to most projects. So far you've been seeing only the easiest of the long WUs compared to what we had not too long ago. I wouldn't want to try running multiple long NOELIA or GIANNI WUs. The NATHAN WUs previous to these would not even run properly 1x on my GTX 460/768 cards due to too large a memory footprint. I know you're trying to squeeze every last bit of performance out of your GTX 660 TI 3GB and that's commendable. Maybe the developers can find a solution for you, but I wouldn't be recommending either running 2x or limiting CPU reservation at this point.
ID: 29464 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29465 - Posted: 15 Apr 2013, 16:27:13 UTC - in response to Message 29464.  
Last modified: 15 Apr 2013, 16:29:19 UTC

I don't think you understand the cpu limitation portion.
I removed the GPUGrid.net project, and re-added it, and started working on some long-run Nathan's (without any app_config override). By default, they use "0.73 CPUs + 1 NVIDIA GPU" on my machine. Having an app_config that sets it to 0.001, does not make any difference, if only 1 task is running. That number is used when to determine how many CPU jobs to add.

For instance:

Without app-config, my system runs:
1 GPUGrid.net task (0.73 CPUs + 1 NVIDIA GPU)
2 WCG HCC GPU tasks (each: 1 CPUs + 0.5 NVIDIA GPUs)
6 CPU jobs

With app_config, my system runs:
1 GPUGrid.net task (0.001 CPUs + 1 NVIDIA GPU)
2 WCG HCC GPU tasks (each: 1 CPUs + 0.5 NVIDIA GPUs)
6 CPU jobs

See? I don't see how the app_config could possibly make a difference, when running only 1 GPUGrid task, but I'm testing it anyways, since I'm losing hope that the admins care.

You may not see it as a problem, but it affects their science results. Tasks are failing immediately, their validator is erroneously marking those invalid results as valid... and if I can prove (to myself) that running x-at-a-time is not the cause, then I will likely switch back to running x-at-a-time, for increased performance.

The admins need to begin an investigation, if they care.
ID: 29465 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29466 - Posted: 15 Apr 2013, 17:27:03 UTC - in response to Message 29465.  

See? I don't see how the app_config could possibly make a difference, when running only 1 GPUGrid task, but I'm testing it anyways

That's all I asked, since it seems like these failures are unique to you and you're probably the only one running the above configuration. The only way to know for sure is to test it.

You may not see it as a problem, but it affects their science results.

Exactly, and if the config is the cause the admins should disable the use of app_info and app_config until it's sorted out.
ID: 29466 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29467 - Posted: 15 Apr 2013, 17:41:23 UTC - in response to Message 29466.  

And how do you propose they disable those things? Seriously. I don't believe it's possible. <sigh>
Man, I am SICK of all of these "ifs" and "let's do this and that."

I'm the only one testing to find the root cause, so far as I know. It's frustrating, to say the least, it feels like I'm in a hole. But if I'm the only one having the issue, then sure, I can accept being the only one doing the testing. If anyone else truly cares, they should be trying to reproduce the problem, instead of guessing at causes/fixes.

These proposals, some of which are ludicrous (making validation based on CPU time??), some of which are impossible (ban app-config??), and some of which are detrimental (limit to 1 gpu task per person??)... they're all the wrong approaches, in my opinion.

I said it before, and I'll say it again:

What they should do is 2 things:
Priority 1) Fix the validator to stop marking these results valid, and thus stop issuing credits for invalid results
Priority 2) Fix the workunits so they do not error under whatever conditions they are erroring

If you want to help me test, then help me test.
Otherwise, please don't guess at causes/solutions.

Thanks,
Jacob
ID: 29467 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29468 - Posted: 15 Apr 2013, 19:50:00 UTC - in response to Message 29467.  

Jacob.. relax. It's only been 1 work day since Toni replied the last time. He could be on vacation or babysitting as far as we know.. or something else got in his way.

I agree the validator should be fixed, and this doesn't sound too hard. If they can fix "your" problem, on the other hand, is up in the air as long as we have no further insight into why this is happening.

One more possibility (I'm not calling it a guess ;) would be some wierd interaction with WCG@GPU, but then you probably wouldn't be the only one affected.

MrS
Scanning for our furry friends since Jan 2002
ID: 29468 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29469 - Posted: 15 Apr 2013, 20:05:56 UTC - in response to Message 29468.  

I'll try to relax.

If I knew that an investigation was proceeding, or that a fix was being worked on, then I could relax a lot easier.

It's been 2 full weeks since the problem was reported, and the admin responses thus far have been "oh wow that's odd and beyond me", "wait and see if it happens again", and "probably a client issue".
ID: 29469 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29470 - Posted: 15 Apr 2013, 20:50:57 UTC - in response to Message 29467.  

some of which are impossible (ban app-config??)

It was my understanding that there is a server setting to allow/disallow the anonymous platform (app_info.xml). Not positive that's true and also don't know if there's a similar way to disallow the app_config. Maybe someone can enlighten us on that point.
ID: 29470 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29509 - Posted: 19 Apr 2013, 13:22:55 UTC - in response to Message 29425.  
Last modified: 19 Apr 2013, 13:23:43 UTC

Toni / Nate / GDF:

I have been doing additional testing. Since removing/readding the project, I have not had the issue, even when using a custom app_config for 0.001 CPU and 1-at-a-time task processing.

In order to test whether the remove/readd fixed the problem, I was wondering...
Could I please turn on 2-at-a-time task processing, to continue my testing?

Awaiting your reply,
Jacob
ID: 29509 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

Message boards : Graphics cards (GPUs) : Really low Run Times, but still Completed and Successful?

©2025 Universitat Pompeu Fabra