*_pYEEI_* information and issues

Message boards : Graphics cards (GPUs) : *_pYEEI_* information and issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15913 - Posted: 22 Mar 2010, 14:19:33 UTC

Today 6 out of 6 cancelled after a few hours processing! Also the long WU 6.71

What can we do about it???
Ton (ftpd) Netherlands
ID: 15913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15914 - Posted: 22 Mar 2010, 14:36:41 UTC - in response to Message 15913.  

1. Are you connecting to this machine remotely?
2. Are you crunching anything else on this machine?
3. Can you suspend one of the WUs that are currently running to see if the other one will finish properly.
Thanks - Steve
ID: 15914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15915 - Posted: 22 Mar 2010, 14:43:40 UTC - in response to Message 15913.  
Last modified: 22 Mar 2010, 14:45:09 UTC

Your GTX295 is getting about 7K points per day on average. On that system it should be getting about 49K! It must be particularly annoying to have 4 tasks all fail after going more than 50% through a task; one task must have been about 20min from finishing!

RTM it, try it in a different system or edit your config file to run only 1 tasks at a time on your GTX295 (28500 would be a good bit better than 7000, if that worked), or try Snow Crash's suggestion - to suspend one task and let one finish before beginning the second (need to select no new tasks before starting the second task).

By the way, one of the tasks that failed on your GTX295 also failed for me on a card that very rarely fails, and also failed for someone else. So it is possible that that particular task was problematic.

At least your new GTS250 is running well!
ID: 15915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15921 - Posted: 22 Mar 2010, 17:41:13 UTC - in response to Message 15914.  

I am not connected remotely. It is my office-machine. It was crunching the weekend.

It is crunching also Milkyway, Collatz, Seti all cuda-gpu-jobs.

I also do 1 job - gpugrid and 1 job - seti or anything else.

I have also GTX 260 and GTS 250(in other machines) - no problems with that cards.


Ton (ftpd) Netherlands
ID: 15921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15974 - Posted: 25 Mar 2010, 12:54:52 UTC

25-3-2010 13:12:57 GPUGRID Computation for task p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0 finished
25-3-2010 13:12:57 GPUGRID Output file p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0_1 for task p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0 absent
25-3-2010 13:12:57 GPUGRID Output file p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0_2 for task p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0 absent
25-3-2010 13:12:57 GPUGRID Output file p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0_3 for task p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0 absent
25-3-2010 13:12:57 GPUGRID Starting p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0
25-3-2010 13:12:58 GPUGRID Starting task p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0 using acemd version 671
25-3-2010 13:13:34 GPUGRID Computation for task p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0 finished
25-3-2010 13:13:34 GPUGRID Output file p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0_1 for task p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0 absent
25-3-2010 13:13:34 GPUGRID Output file p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0_2 for task p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0 absent
25-3-2010 13:13:34 GPUGRID Output file p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0_3 for task p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0 absent

1 job cancelled after 3 hours 12 minutes and 1 job cancelled after 22 secs.

Any reasons?

Yesterday 4 jobs - all OK!!!
Ton (ftpd) Netherlands
ID: 15974 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15975 - Posted: 25 Mar 2010, 15:04:24 UTC - in response to Message 15974.  

I was very stable until I started running both versions of the apps. Then I started to get failures on the old 6.71 which made my system unstable and the new version 6.03 would start to crash. I would restart my computer and a couple of 6.03 would run and all was good until I ran a 6.71 and it errored and again made my system unstable.

Last night in BOINC Manger I told it "No New Tasks" for GPUGrid
Then I went to my GPUGrid preferences here on the webite and told it to only send me ACEMD 2. (this is the new app version and is much faster).
Back in BOINC Manager I "Reset" GPUGrid.
Then I told it to accept new work.

So far everything looks good with no errors. I have a vague suspicion that one of the dlls distributed with the apps is different but is not being replaced and is what causes problems on otherwise stable machines.

Thanks - Steve
ID: 15975 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ignasi

Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 16041 - Posted: 29 Mar 2010, 9:03:59 UTC - in response to Message 15975.  

Actually, we shouldn't be distributing the old app anymore.
There are though some WUs sent last week with the old app, but that was a mistake.
In principle all new WUs are going to come with the new app.

Let's see if that ends up with weird failures.

cheers,
i
ID: 16041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Graphics cards (GPUs) : *_pYEEI_* information and issues

©2026 Universitat Pompeu Fabra