Message boards :
Graphics cards (GPUs) :
*_pYEEI_* information and issues
Message board moderation
Previous · 1 · 2 · 3 · 4
| Author | Message |
|---|---|
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Today 6 out of 6 cancelled after a few hours processing! Also the long WU 6.71 What can we do about it??? Ton (ftpd) Netherlands |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
1. Are you connecting to this machine remotely? 2. Are you crunching anything else on this machine? 3. Can you suspend one of the WUs that are currently running to see if the other one will finish properly. Thanks - Steve |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Your GTX295 is getting about 7K points per day on average. On that system it should be getting about 49K! It must be particularly annoying to have 4 tasks all fail after going more than 50% through a task; one task must have been about 20min from finishing! RTM it, try it in a different system or edit your config file to run only 1 tasks at a time on your GTX295 (28500 would be a good bit better than 7000, if that worked), or try Snow Crash's suggestion - to suspend one task and let one finish before beginning the second (need to select no new tasks before starting the second task). By the way, one of the tasks that failed on your GTX295 also failed for me on a card that very rarely fails, and also failed for someone else. So it is possible that that particular task was problematic. At least your new GTS250 is running well! |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am not connected remotely. It is my office-machine. It was crunching the weekend. It is crunching also Milkyway, Collatz, Seti all cuda-gpu-jobs. I also do 1 job - gpugrid and 1 job - seti or anything else. I have also GTX 260 and GTS 250(in other machines) - no problems with that cards. Ton (ftpd) Netherlands |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
25-3-2010 13:12:57 GPUGRID Computation for task p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0 finished 25-3-2010 13:12:57 GPUGRID Output file p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0_1 for task p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0 absent 25-3-2010 13:12:57 GPUGRID Output file p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0_2 for task p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0 absent 25-3-2010 13:12:57 GPUGRID Output file p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0_3 for task p20-IBUCH_025a_pYEEI_100309-13-20-RND9969_0 absent 25-3-2010 13:12:57 GPUGRID Starting p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0 25-3-2010 13:12:58 GPUGRID Starting task p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0 using acemd version 671 25-3-2010 13:13:34 GPUGRID Computation for task p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0 finished 25-3-2010 13:13:34 GPUGRID Output file p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0_1 for task p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0 absent 25-3-2010 13:13:34 GPUGRID Output file p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0_2 for task p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0 absent 25-3-2010 13:13:34 GPUGRID Output file p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0_3 for task p16-IBUCH_2_PQpYEEIPI_long_100319-2-4-RND1703_0 absent 1 job cancelled after 3 hours 12 minutes and 1 job cancelled after 22 secs. Any reasons? Yesterday 4 jobs - all OK!!! Ton (ftpd) Netherlands |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I was very stable until I started running both versions of the apps. Then I started to get failures on the old 6.71 which made my system unstable and the new version 6.03 would start to crash. I would restart my computer and a couple of 6.03 would run and all was good until I ran a 6.71 and it errored and again made my system unstable. Last night in BOINC Manger I told it "No New Tasks" for GPUGrid Then I went to my GPUGrid preferences here on the webite and told it to only send me ACEMD 2. (this is the new app version and is much faster). Back in BOINC Manager I "Reset" GPUGrid. Then I told it to accept new work. So far everything looks good with no errors. I have a vague suspicion that one of the dlls distributed with the apps is different but is not being replaced and is what causes problems on otherwise stable machines. Thanks - Steve |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Actually, we shouldn't be distributing the old app anymore. There are though some WUs sent last week with the old app, but that was a mistake. In principle all new WUs are going to come with the new app. Let's see if that ends up with weird failures. cheers, i |
©2026 Universitat Pompeu Fabra