Message boards :
Graphics cards (GPUs) :
*_pYEEI_* information and issues
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
| Author | Message |
|---|---|
|
Send message Joined: 27 Jan 09 Posts: 4 Credit: 582,988,184 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
[quote]Thanks for the comments. I looked in my GPUGrid preferences and did not notice anything saying Beta I did see "Run test applications? This helps us develop applications, but may cause jobs to fail on your computer" Which was already set to no. [quote] I have found the answer in another thread. So unless someone has switched the "Run Test Applications" off for me in the last 2 days I have never accepted Beta Applications. I have re attached a 275. I will leave that running for a few days. The 295s will stay on F@H for now, they run F@H (and used to run S@H) fine it was only GPUGrid causing problems. Also FYI it was me who aborted the work units after seeing this thread and relating it to the problems I had been having. After seeing work units processing for hours then showing computation error I was not in the mood to waste any more time. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Other users cannot see if you have Betas enabled or not, just suggest you turn it off if you are having problems. There are many things that can cause errors. We can only guess as we do not have all the info. I cant tell if your system has automatic updates turned on, or if you have your local Boinc client set to Use GPU when computer is in use. All I can do is suggest you disable automatic updates as these force restarts and crash tasks, and turn off Use GPU when computer is in use, if you watch any video on your system. GL |
|
Send message Joined: 27 Jan 09 Posts: 4 Credit: 582,988,184 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks for the advice. The PCs are all part of a crunching farm I have. All headless and controlled by VNC. Only 4 of them have 9 series or higher Nvidia cards suitable for GPU Grid. Rest are simple Quads with built in graphics running Rosetta and WCG. Either way I will leave a single 275 running on GpuGrid for now. The rest can stay on F@H. Andy |
[AF>Libristes>Jip] Elgrande71Send message Joined: 16 Jul 08 Posts: 45 Credit: 78,618,001 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Compute error with a GTX295 GPU on this computer . |
X-Files 27Send message Joined: 11 Oct 08 Posts: 95 Credit: 68,023,693 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
i got weird wu(1949860), it error out but then a success?
|
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've seen a recent handful of errors on my GTX295 and I know a team mate of mine has seen a few also. TONI wu process fine (which I think are more computationally intensive) so I think our OC is OK. Are you seeing a higher failure rate on these WUs between last night and early this morning? Thanks - Steve |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Still happening? Could you post some of this failed results so I can double check they are right? thanks |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Still happening? No. Everything looks good now :-) Thanks - Steve |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
16-3-2010 10:40:54 GPUGRID Restarting task p34-IBUCH_chall_pYEEI_100301-15-40-RND6745_1 using acemd version 671 16-3-2010 10:40:55 GPUGRID Restarting task p31-IBUCH_21_pYEEI_100301-13-40-RND4121_0 using acemd2 version 603 16-3-2010 10:58:32 GPUGRID Computation for task p31-IBUCH_21_pYEEI_100301-13-40-RND4121_0 finished 16-3-2010 10:58:32 GPUGRID Output file p31-IBUCH_21_pYEEI_100301-13-40-RND4121_0_1 for task p31-IBUCH_21_pYEEI_100301-13-40-RND4121_0 absent 16-3-2010 10:58:32 GPUGRID Output file p31-IBUCH_21_pYEEI_100301-13-40-RND4121_0_2 for task p31-IBUCH_21_pYEEI_100301-13-40-RND4121_0 absent 16-3-2010 10:58:32 GPUGRID Output file p31-IBUCH_21_pYEEI_100301-13-40-RND4121_0_3 for task p31-IBUCH_21_pYEEI_100301-13-40-RND4121_0 absent 16-3-2010 10:58:32 GPUGRID Starting p9-IBUCH_201_pYEEI_100301-13-40-RND6673_0 16-3-2010 10:58:34 GPUGRID Starting task p9-IBUCH_201_pYEEI_100301-13-40-RND6673_0 using acemd2 version 603 16-3-2010 11:29:43 GPUGRID Computation for task p9-IBUCH_201_pYEEI_100301-13-40-RND6673_0 finished 16-3-2010 11:29:43 GPUGRID Output file p9-IBUCH_201_pYEEI_100301-13-40-RND6673_0_1 for task p9-IBUCH_201_pYEEI_100301-13-40-RND6673_0 absent 16-3-2010 11:29:43 GPUGRID Output file p9-IBUCH_201_pYEEI_100301-13-40-RND6673_0_2 for task p9-IBUCH_201_pYEEI_100301-13-40-RND6673_0 absent 16-3-2010 11:29:43 GPUGRID Output file p9-IBUCH_201_pYEEI_100301-13-40-RND6673_0_3 for task p9-IBUCH_201_pYEEI_100301-13-40-RND6673_0 absent 16-3-2010 11:29:43 GPUGRID Starting a33-TONI_HERG79a-3-100-RND6672_0 16-3-2010 11:29:44 GPUGRID Starting task a33-TONI_HERG79a-3-100-RND6672_0 using acemd2 version 603 I am also using GTX 295, both jobs cancelled after 45 min. device 1. Yesterday 3 jobs out of 4 cancelled after almost 5 hours processing. I can use some help!!!!!! See also "errors after 7 hours" Ton (ftpd) Netherlands |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Please HELP!!! Today again 4 out of 5 jobs cancelled after more than 4 hours processing!! GTX 295 - Windows XP Ton (ftpd) Netherlands |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Again today 6 out of 6 cancelled after 45 secs. Windows XP - gtx 295 - driver 197.13 Also working Windows XP - gts 250 - driver 197.13 - no problems in abou 10 hours. Any ideas???? Ton (ftpd) Netherlands |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
@ftpd I see all your errors, yes. Your case is one of the hardest to debug. All WU you took where already started by somebody else, therefore it is not an input file corruption. Neither, given by the fact that they fail after some execution time. We neither see no major failure due to the application solely at least. But what I observe in your case is that none of the other cards have such a high rate of failure with similar or even equal WU's and app version. Have you considered that the source might be the card itself? What brand is the card? Can you monitor temperature while running? Is that your video output card? Do you experience that sort of fails in other projects? |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Dear Ignasi, Since last day i have also problems with WU-Milky Way with the same card. The temp is OK = about 65C In case of processin 6.71 cuda no problems, only with acemd2? This computer is working 24/7 and is not using (except for monitor) the card. The card is 6 months old. Regards, Ton PS Now processing device 1 = collatz and device 0 = gpugrid Ton (ftpd) Netherlands |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You may want to consider returning it to the seller or manufacturer (RTM), if it is under warrantee. If you have tried it in more than one system with the same result, I think it is fair to say the issue is with the card. As you are now getting errors with other projects and the error rate is rising the card might actually fail soon. |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
looks like this WU is bad ... p25-IBUCH_101b_pYEEI_100304-11-80-RND0419_5 I will be starting it in a few hours so we'll see if the string of errors continues for this WU. Thanks - Steve |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Last night same machine GTX 295 3 out of 4 were OK!!!!!!!!!! Ton (ftpd) Netherlands |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
looks like this WU is bad ... Thanks Snow Crash, certainly some WUs seem to be condemned to die. We have been discussing that internally and it can be either by chance that a result is corrupted when saved/uploaded/etc. or that particular cards are are corrupting results from time to time. Anyways, please let us know if you detect any pattern of failure regarding 'condemned WUs'. cheers, i |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You guys do such a good job that I have not seen another "one off" wu error. I just finished my first *long* WU and it took precisely twice as long as previous pYEEI wus which I bet is exactly what you planned. Excellent work. Can you tell us anything about the numbr of atoms and how much time these wus model? Thanks - Steve |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Can you tell us anything about the numbr of atoms and how much time these wus model? Sure. These *long* Wu's are exactly twice as long as the previous one's with similar name. They are modeling exactly 5 nanoseconds (ns) of ~36000 atoms (*pYEEI* & *PQpYEEIPI*). In these systems we have a protein (good old friend SH2 domain) and ligand (phosphoTYR-GLU-GLU-ILE & PRO-GLN-phosphoTYR-GLU-GLU-ILE-PRO-ILE //aminoacids) for which we are computing 'free energies of binding'. Basically the strength of their interaction. We are willing to increase the size for one main reason. Our 'optimal' simulation time for analysis is no shorter than 5 ns at the moment. That means that our waiting time then is made of a normal WU (2.5ns) + queuing + normal WU (2.5ns), this times 50 which is the number of WUs for one of these experiments. As you may see, the time-to-answer will greatly vary. With twice as long WUs, we omit the queuing time. Now with a faster application shouldn't be much of a hassle. However, it is still a test. We want to have your feedback on them. thanks, ignasi |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Looking at 9 hours of processing on a current, state of the art, GPU to return 5 ns worth of realtime simulation puts into perspective just how important it is for all of us to pull together. I've read some of the papers you guys have published and not that I can follow any of it but I always knew you were working at the atomic level (seriously cool, you rock!). Also, knowing that with the normal size you ultimately need to put together 100 WUs back to back before you have anything that even makes sense for you to start looking at highlites why we need to turn these WUs as quickly as possible. Best case scenario you don't get a finished run for more than 3 months ... and that's best case. I imagine it is more common to have to wait at least 4 months. Running stable with a small cache will reduce the overall runtime of an experiment much more than any one GPU running fast. So, everyone ... get those card stable, turn your cache down low, and keep on crunching! Thanks - Steve |
©2026 Universitat Pompeu Fabra