Message boards :
Number crunching :
KASHIF_HIVPR Errors?
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 2 Jun 09 Posts: 10 Credit: 21,969,126 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|
|
Send message Joined: 23 Feb 09 Posts: 39 Credit: 144,654,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I reported it 4 days ago for G92 cards (compute capability 1.1) like 9800GT, 8800 GT (G92)... http://www.gpugrid.net/forum_thread.php?id=2274 |
|
Send message Joined: 24 Jan 09 Posts: 42 Credit: 16,676,387 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here are also one: http://www.gpugrid.net/result.php?resultid=2935402 My card are gtx 460 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 460" # Clock rate: 1.55 GHz # Total amount of global memory: 804847616 bytes # Number of multiprocessors: 7 # Number of cores: 56 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
What drivers are you using? |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
DigitalDingus is using two 9600 GSO (767MB) cards with driver: 19745 (Q9450, XP x86). The fail times look random: 2935235 1870438 8 Sep 2010 7:06:12 UTC 8 Sep 2010 7:32:16 UTC Error while computing 1,496.16 11.69 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2934119 1869838 8 Sep 2010 2:53:40 UTC 8 Sep 2010 7:02:58 UTC Error while computing 14,446.09 23.11 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2934086 1869814 8 Sep 2010 1:40:10 UTC 8 Sep 2010 2:53:40 UTC Error while computing 2,728.41 11.33 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2931920 1868719 7 Sep 2010 12:15:59 UTC 8 Sep 2010 0:21:49 UTC Error while computing 20,453.97 14.77 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2930618 1868078 7 Sep 2010 4:36:49 UTC 12 Sep 2010 4:36:49 UTC In progress --- --- --- --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2930026 1867745 7 Sep 2010 4:03:02 UTC 7 Sep 2010 4:36:49 UTC Error while computing 1,912.63 12.89 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2928799 1867124 6 Sep 2010 19:51:29 UTC 6 Sep 2010 22:04:25 UTC Error while computing 7,864.14 8.88 6,016.70 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2928286 1866896 6 Sep 2010 15:19:01 UTC 7 Sep 2010 18:40:38 UTC Completed and validated 72,823.73 1,372.66 4,535.61 5,669.51 ACEMD2: GPU molecular dynamics v6.05 (cuda) 2927745 1866582 6 Sep 2010 15:19:01 UTC 6 Sep 2010 16:51:46 UTC Error while computing 5,424.13 41.77 6,016.70 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2925177 1865300 5 Sep 2010 21:53:33 UTC 6 Sep 2010 15:19:01 UTC Error while computing 36,642.95 80.09 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) 2924932 1865162 5 Sep 2010 20:14:39 UTC 6 Sep 2010 15:19:01 UTC Error while computing 42,419.78 43.20 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda) I would suggest you try the latest drivers 25896. If you keep getting failures try to find out what else is running when these tasks crash (if anything). Tapio, your task failed after 4sec GPU time. Some tasks seem to fail within 20sec. These are not very significant and do not reduce your contribution by much. Your card seems to be running well. |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
@skgiven, I had the same problems with windows xp pro + gts250 + 258.96 driver after a lot of hours processing. See other thread. Success Ton (ftpd) Netherlands |
|
Send message Joined: 2 Jun 09 Posts: 10 Credit: 21,969,126 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Will try the newer nVidia drivers, if any exist. Just upgraded to the latest BOINC in case it made a difference, but it did not. Other than that, I'll be crunching Collatz for a while I think. |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Driver 258.96 exists for this card. Please try it! Good luck Ton (ftpd) Netherlands |
OlivierSend message Joined: 12 Jun 09 Posts: 1 Credit: 2,063,022 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Same problem here unfortunatly. Theres something wrong with those kashif units ... |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
@skgiven Hi Kev, Again after several hours (6) processing aborted. Windows XP-pro - gts250 258.96. Gives also windows-message and waiting for answer, so no further processing during the night. I do not like this kind of errors. Do not send them anymore to this type of gpu-cards, please? Good luck. Ton (ftpd) Netherlands |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The HIVPR_n1_bound tasks seem very troublesome on CC1.1 cards. I made suggestions to allow crunchers to opt out of crunching some task types. It would involve some work for the scientists on the project design and server layout. If GDF can get it implemented it would allow crunchers to deselect troublesome projects, which would make it useful for other problems too. Did an update try to automatically install on your system overnight? I think the issue primarily relates to crunching those tasks, and only occasionally appears for other tasks, so perhaps this can be worked around by the programmers; you managed to crunch two revlo_TRYP work units in the last couple of days, so the card is still a useful, working card. We just need you to crunch the good tasks for that type of card. |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The error from GPUgrid (HIVPR) causes a windows-error-message, which was waiting for a reply (send or no send to Microsoft). So all GPU-tasks were waiting during the night. Keep on crunching! Ton (ftpd) Netherlands |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I expect the Microsoft Error was along the lines of, acemd2_6.05_windows_intelx86__cuda *32 has stopped working. If you are sitting at the compter and see this error message pop-up, sometimes you can press the system restart button (on the computer case) and when it restarts the task is often able to pickup where from the last checkpoint; so you don't loose the task. That would not work after a minute never mind sometime overnight. I'm guessing you have already restarted the system. Do you know from the logs if a system update occured at that time of the error message (error logs), or some backup, defrag or other heavy CPU app ran - just in case something other than the task/driver is at fault here? |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi Kev, I use this machine only for crunching 24/7, so no back-up, no updates etc. Just Gpugrid and RNA or Ibercivis ore Freehal. I do no have to restart this system. Success! Ton (ftpd) Netherlands |
|
Send message Joined: 12 Feb 09 Posts: 57 Credit: 23,376,686 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have the same problems with this card: NVIDIA GPU 0: GeForce 9600 GT (driver version 25721, CUDA version 3010, compute capability 1.1, 496MB, 218 GFLOPS peak) here's an example: MDIO ERROR: cannot open file "restart.coor" SWAN : FATAL : Failure executing kernel sync [PmeRealSpace_compute_forces_kernel] [999] Assertion failed: 0, file swanlib_nv.cpp, line 121 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for reporting the error. The same error has been posted up several times now, and the developers are aware of it. A driver bug is catching out the applications when they run on CC1.1 cards. It does not always occur but is a concern. With long complex GPU calculations the odd error is always expected, but these tasks are more problematic than others. Several suggestions and potential work around’s have been made. |
|
Send message Joined: 23 Feb 09 Posts: 39 Credit: 144,654,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I expect the Microsoft Error was along the lines of, I did this trick several times over the last month (four 9800GT cards). System restart without clicking away the "error message pop-up" worked for me mostly - even hours after the error happend. With the current KASHIF_HIVPR_*_bound* (*_unbound*) errors it worked never. |
Fred J. VersterSend message Joined: 1 Apr 09 Posts: 58 Credit: 35,833,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Computer ID 78963 Report deadline 15 Sep 2010 15:54:10 UTC Run time 11402.593746 CPU time 736.2813 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 MDIO ERROR: cannot open file "restart.coor" # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 480" # Clock rate: 1.40 GHz # Total amount of global memory: 1610153984 bytes # Number of multiprocessors: 15 # Number of cores: 120 # Time per step (avg over 275000 steps): 11.463 ms # Approximate elapsed time for entire WU: 11462.898 s called boinc_finish </stderr_txt> ]]> Validate state Geldig Claimed credit 6322.41203703704 Granted credit 9483.61805555556 application version ACEMD2: GPU molecular dynamics v6.11 (cuda31) With an 9800GTX+, it didn't work either. Knight Who Says Ni N! |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Fred ... you posted results from a good run out of a 480 and it does not look like you are even running a 9800 anymore so I'm not sure wehere you were going with that. Thanks - Steve |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Fred use to have a GTX470, and is now using a GTX480. That task completed on his 480 but failed on a GTX460 (not a 9800GTX+). I did see a 9800 failure against one of his GTX470 successes. Fred, keep your good cards hooked up to GPUGrid, a GTX480 would be wasted anywhere else. |
©2025 Universitat Pompeu Fabra