KASHIF_HIVPR Errors?

Message boards : Number crunching : KASHIF_HIVPR Errors?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
DigitalDingus

Send message
Joined: 2 Jun 09
Posts: 10
Credit: 21,969,126
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 18563 - Posted: 8 Sep 2010, 2:58:53 UTC

I've had several of these give an Error While Computing. Anyone else? These WU's seem to estimate at almost twice the computing time as I normally have.
ID: 18563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Siegfried Niklas
Avatar

Send message
Joined: 23 Feb 09
Posts: 39
Credit: 144,654,294
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 18564 - Posted: 8 Sep 2010, 7:59:04 UTC

I reported it 4 days ago for G92 cards (compute capability 1.1) like 9800GT, 8800 GT (G92)...

http://www.gpugrid.net/forum_thread.php?id=2274
ID: 18564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Old man

Send message
Joined: 24 Jan 09
Posts: 42
Credit: 16,676,387
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18565 - Posted: 8 Sep 2010, 8:21:34 UTC

Here are also one:

http://www.gpugrid.net/result.php?resultid=2935402

My card are gtx 460

stderr out

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 460"
# Clock rate: 1.55 GHz
# Total amount of global memory: 804847616 bytes
# Number of multiprocessors: 7
# Number of cores: 56
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>
ID: 18565 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ignasi

Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 18566 - Posted: 8 Sep 2010, 8:49:56 UTC - in response to Message 18565.  

What drivers are you using?
ID: 18566 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18567 - Posted: 8 Sep 2010, 9:38:02 UTC - in response to Message 18566.  
Last modified: 8 Sep 2010, 9:48:28 UTC

DigitalDingus is using two 9600 GSO (767MB) cards with driver: 19745
(Q9450, XP x86).
The fail times look random:

2935235 1870438 8 Sep 2010 7:06:12 UTC 8 Sep 2010 7:32:16 UTC Error while computing 1,496.16 11.69 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2934119 1869838 8 Sep 2010 2:53:40 UTC 8 Sep 2010 7:02:58 UTC Error while computing 14,446.09 23.11 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2934086 1869814 8 Sep 2010 1:40:10 UTC 8 Sep 2010 2:53:40 UTC Error while computing 2,728.41 11.33 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2931920 1868719 7 Sep 2010 12:15:59 UTC 8 Sep 2010 0:21:49 UTC Error while computing 20,453.97 14.77 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2930618 1868078 7 Sep 2010 4:36:49 UTC 12 Sep 2010 4:36:49 UTC In progress --- --- --- --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2930026 1867745 7 Sep 2010 4:03:02 UTC 7 Sep 2010 4:36:49 UTC Error while computing 1,912.63 12.89 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2928799 1867124 6 Sep 2010 19:51:29 UTC 6 Sep 2010 22:04:25 UTC Error while computing 7,864.14 8.88 6,016.70 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2928286 1866896 6 Sep 2010 15:19:01 UTC 7 Sep 2010 18:40:38 UTC Completed and validated 72,823.73 1,372.66 4,535.61 5,669.51 ACEMD2: GPU molecular dynamics v6.05 (cuda)
2927745 1866582 6 Sep 2010 15:19:01 UTC 6 Sep 2010 16:51:46 UTC Error while computing 5,424.13 41.77 6,016.70 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2925177 1865300 5 Sep 2010 21:53:33 UTC 6 Sep 2010 15:19:01 UTC Error while computing 36,642.95 80.09 6,409.23 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)
2924932 1865162 5 Sep 2010 20:14:39 UTC 6 Sep 2010 15:19:01 UTC Error while computing 42,419.78 43.20 6,322.41 --- ACEMD2: GPU molecular dynamics v6.05 (cuda)

I would suggest you try the latest drivers 25896. If you keep getting failures try to find out what else is running when these tasks crash (if anything).

Tapio, your task failed after 4sec GPU time. Some tasks seem to fail within 20sec. These are not very significant and do not reduce your contribution by much. Your card seems to be running well.
ID: 18567 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18568 - Posted: 8 Sep 2010, 10:43:59 UTC - in response to Message 18567.  

@skgiven,

I had the same problems with windows xp pro + gts250 + 258.96 driver after a lot of hours processing. See other thread.

Success
Ton (ftpd) Netherlands
ID: 18568 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DigitalDingus

Send message
Joined: 2 Jun 09
Posts: 10
Credit: 21,969,126
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 18572 - Posted: 8 Sep 2010, 13:53:11 UTC - in response to Message 18568.  
Last modified: 8 Sep 2010, 13:54:16 UTC

Will try the newer nVidia drivers, if any exist. Just upgraded to the latest BOINC in case it made a difference, but it did not. Other than that, I'll be crunching Collatz for a while I think.
ID: 18572 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18573 - Posted: 8 Sep 2010, 16:22:43 UTC - in response to Message 18572.  

Driver 258.96 exists for this card.
Please try it!

Good luck
Ton (ftpd) Netherlands
ID: 18573 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Olivier
Avatar

Send message
Joined: 12 Jun 09
Posts: 1
Credit: 2,063,022
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwat
Message 18588 - Posted: 9 Sep 2010, 18:33:01 UTC - in response to Message 18563.  

Same problem here unfortunatly. Theres something wrong with those kashif units ...
ID: 18588 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18602 - Posted: 10 Sep 2010, 8:57:20 UTC

@skgiven

Hi Kev,

Again after several hours (6) processing aborted. Windows XP-pro - gts250 258.96.
Gives also windows-message and waiting for answer, so no further processing during the night. I do not like this kind of errors. Do not send them anymore to this type of gpu-cards, please?
Good luck.


Ton (ftpd) Netherlands
ID: 18602 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18605 - Posted: 10 Sep 2010, 10:29:07 UTC - in response to Message 18602.  

The HIVPR_n1_bound tasks seem very troublesome on CC1.1 cards. I made suggestions to allow crunchers to opt out of crunching some task types. It would involve some work for the scientists on the project design and server layout. If GDF can get it implemented it would allow crunchers to deselect troublesome projects, which would make it useful for other problems too.
Did an update try to automatically install on your system overnight?
I think the issue primarily relates to crunching those tasks, and only occasionally appears for other tasks, so perhaps this can be worked around by the programmers; you managed to crunch two revlo_TRYP work units in the last couple of days, so the card is still a useful, working card. We just need you to crunch the good tasks for that type of card.
ID: 18605 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18607 - Posted: 10 Sep 2010, 11:21:03 UTC - in response to Message 18605.  

The error from GPUgrid (HIVPR) causes a windows-error-message, which was waiting for a reply (send or no send to Microsoft). So all GPU-tasks were waiting during the night.
Keep on crunching!
Ton (ftpd) Netherlands
ID: 18607 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18608 - Posted: 10 Sep 2010, 14:39:07 UTC - in response to Message 18607.  

I expect the Microsoft Error was along the lines of,
acemd2_6.05_windows_intelx86__cuda *32 has stopped working.
If you are sitting at the compter and see this error message pop-up, sometimes you can press the system restart button (on the computer case) and when it restarts the task is often able to pickup where from the last checkpoint; so you don't loose the task. That would not work after a minute never mind sometime overnight.
I'm guessing you have already restarted the system.

Do you know from the logs if a system update occured at that time of the error message (error logs), or some backup, defrag or other heavy CPU app ran - just in case something other than the task/driver is at fault here?
ID: 18608 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18609 - Posted: 10 Sep 2010, 14:47:29 UTC - in response to Message 18608.  

Hi Kev,

I use this machine only for crunching 24/7, so no back-up, no updates etc.
Just Gpugrid and RNA or Ibercivis ore Freehal. I do no have to restart this system.

Success!
Ton (ftpd) Netherlands
ID: 18609 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tom Philippart

Send message
Joined: 12 Feb 09
Posts: 57
Credit: 23,376,686
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 18624 - Posted: 11 Sep 2010, 10:15:18 UTC

I have the same problems with this card:

NVIDIA GPU 0: GeForce 9600 GT (driver version 25721, CUDA version 3010, compute capability 1.1, 496MB, 218 GFLOPS peak)

here's an example:
MDIO ERROR: cannot open file "restart.coor"
SWAN : FATAL : Failure executing kernel sync [PmeRealSpace_compute_forces_kernel] [999]
Assertion failed: 0, file swanlib_nv.cpp, line 121

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

ID: 18624 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18628 - Posted: 11 Sep 2010, 11:01:43 UTC - in response to Message 18624.  

Thanks for reporting the error. The same error has been posted up several times now, and the developers are aware of it.
A driver bug is catching out the applications when they run on CC1.1 cards. It does not always occur but is a concern. With long complex GPU calculations the odd error is always expected, but these tasks are more problematic than others.
Several suggestions and potential work around’s have been made.
ID: 18628 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Siegfried Niklas
Avatar

Send message
Joined: 23 Feb 09
Posts: 39
Credit: 144,654,294
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 18630 - Posted: 11 Sep 2010, 13:51:24 UTC - in response to Message 18608.  

I expect the Microsoft Error was along the lines of,
acemd2_6.05_windows_intelx86__cuda *32 has stopped working.
If you are sitting at the compter and see this error message pop-up, sometimes you can press the system restart button (on the computer case) and when it restarts the task is often able to pickup where from the last checkpoint; so you don't loose the task. That would not work after a minute never mind sometime overnight.


I did this trick several times over the last month (four 9800GT cards).
System restart without clicking away the "error message pop-up" worked for me mostly - even hours after the error happend.

With the current KASHIF_HIVPR_*_bound* (*_unbound*) errors it worked never.
ID: 18630 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Fred J. Verster

Send message
Joined: 1 Apr 09
Posts: 58
Credit: 35,833,978
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18632 - Posted: 11 Sep 2010, 20:53:51 UTC - in response to Message 18630.  

Computer ID 78963
Report deadline 15 Sep 2010 15:54:10 UTC
Run time 11402.593746
CPU time 736.2813
stderr out

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 480"
# Clock rate: 1.40 GHz
# Total amount of global memory: 1610153984 bytes
# Number of multiprocessors: 15
# Number of cores: 120
MDIO ERROR: cannot open file "restart.coor"
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 480"
# Clock rate: 1.40 GHz
# Total amount of global memory: 1610153984 bytes
# Number of multiprocessors: 15
# Number of cores: 120
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 480"
# Clock rate: 1.40 GHz
# Total amount of global memory: 1610153984 bytes
# Number of multiprocessors: 15
# Number of cores: 120
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 480"
# Clock rate: 1.40 GHz
# Total amount of global memory: 1610153984 bytes
# Number of multiprocessors: 15
# Number of cores: 120
# Time per step (avg over 275000 steps): 11.463 ms
# Approximate elapsed time for entire WU: 11462.898 s
called boinc_finish

</stderr_txt>
]]>

Validate state Geldig
Claimed credit 6322.41203703704
Granted credit 9483.61805555556
application version ACEMD2: GPU molecular dynamics v6.11 (cuda31)


With an 9800GTX+, it didn't work either.


Knight Who Says Ni N!
ID: 18632 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18636 - Posted: 12 Sep 2010, 10:15:59 UTC - in response to Message 18632.  

Fred ... you posted results from a good run out of a 480 and it does not look like you are even running a 9800 anymore so I'm not sure wehere you were going with that.
Thanks - Steve
ID: 18636 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18637 - Posted: 12 Sep 2010, 11:49:15 UTC - in response to Message 18636.  

Fred use to have a GTX470, and is now using a GTX480. That task completed on his 480 but failed on a GTX460 (not a 9800GTX+). I did see a 9800 failure against one of his GTX470 successes.

Fred, keep your good cards hooked up to GPUGrid, a GTX480 would be wasted anywhere else.
ID: 18637 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : KASHIF_HIVPR Errors?

©2025 Universitat Pompeu Fabra