Message boards :
Graphics cards (GPUs) :
Computation Error
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 1 Jan 09 Posts: 1 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
I have tried to run Nvidia client several time. I have 1 x 8800 GTX and 1 x 8800 GT non SLi (obviously). running under Vista 64 bit with latest WQL drivers. I have tried several Boinc Windows clients. First time the WU ran for several hours then in final few % it errored. Now every WU i run errors. How do I get this to work ? Seti@Home runs great with Cuda so why not GPUGrid ? This is a example of what is posted against WU <core_client_version>6.6.12</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce 8800 GTX" # Clock rate: 1350000 kilohertz # Total amount of global memory: 805306368 bytes # Number of multiprocessors: 16 # Number of cores: 128 # Device 1: "GeForce 8800 GT" # Clock rate: 1500000 kilohertz # Total amount of global memory: 268435456 bytes # Number of multiprocessors: 14 # Number of cores: 112 Cuda error in file 'nonbonded.cu' in line 189 : invalid device symbol. </stderr_txt> ]]> |
Stefan LedwinaSend message Joined: 16 Jul 07 Posts: 464 Credit: 298,573,998 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
GPU FAQ: Overview of cards that run Cuda 2.0 compiled applications The 8800GTX is not supported by GPUGRID - the 8800GT is supported... It seems BOINC switched between the two cards during computation of the task and that's probably why it errored out... pixelicious.at - my little photoblog |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Well i have the same error and i only have 1 card so thats not it. Until now i never had an error on the cuda applications but today my first ever. <core_client_version>6.5.0</core_client_version> <![CDATA[ <message> Onjuiste functie. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce 9600 GT" # Clock rate: 1800000 kilohertz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 8 # Number of cores: 64 MDIO ERROR: cannot open file "restart.coor" # Using CUDA device 0 # Device 0: "GeForce 9600 GT" # Clock rate: 1800000 kilohertz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 8 # Number of cores: 64 </stderr_txt> ]]> |
Stefan LedwinaSend message Joined: 16 Jul 07 Posts: 464 Credit: 298,573,998 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Actually it's not the same error. ;) Your's is also incorrect function (0x1) exit code 1, but in Fatbob's stderr.out there's also - Cuda error in file 'nonbonded.cu' in line 189 : invalid device symbol. pixelicious.at - my little photoblog |
|
Send message Joined: 8 Oct 08 Posts: 15 Credit: 29,603,934 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same problem on my new GTX260-216. <core_client_version>6.4.7</core_client_version> <![CDATA[ <message> Unzul�ssige Funktion. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1350000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 27 # Number of cores: 216 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> Boinc-Client: 6.4.7 OS: MS Windows XP Pro/32Bit, SP3 (05.01.2600.00) Coprozessor: Gainward GeForce GTX260-216/895MB (620MHz Core, 1242MHz Shader Clock, 896MB 2200MHz GDDR3 Memory) Nvidia driver: 182.08 ALL WU's crashed on this machine. On the others with GTX280 there is no actual problem. Machine has no problem when playing games. |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Ok agreed but seems now every units is ending in error direct when starts someone any idea what or how ?! <core_client_version>6.5.0</core_client_version> <![CDATA[ <message> Onjuiste functie. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce 9600 GT" # Clock rate: 1800000 kilohertz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 8 # Number of cores: 64 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Reboot? And/or power the machine off and remove the power cord for >10 min. Your computers are hidden, so I can't check it myself. Do you post the entire error message or just part of it? The line "Onjuiste functie. (0x1) - exit code 1 (0x1)" is the general error category and doesn't tell us what's happening. For example in fatbobs case "Cuda error in file 'nonbonded.cu' in line 189 : invalid device symbol." was the actual error message. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
changed the setting back to show :) ill try to reboot and see what happens |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
OK, you did post all relevant information and there's nothing else in the task output. But it was worth taking a look anyway. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I guess Stefan is right i think that somehow it tried to switch to the nonbonded device, its probably like switching between different computers Meanwhile i took the advise and rebooted and closed down my machine for a few minutes to see if that solves the issue. If so we must reboot once in a few days, probably because of memory leaks in the applications. But which one is not clear to me it can be both or it could the combination of gpugrid versus seti Strangly i haven't had any problems on all previous units and never needed a reboot other then about 2 months ago my machine runs 24/7 normally. Thanks for trying to help anyway but i fear its out of our hands now PS sadly no solution the newly received unit crashed withing 30 seconds So its something else other then my machine or boinc since all other projects run without errors |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Normally we don't need to reboot for GPU-Grid, it runs just fine. But if *something* happened an WUs error out in rows, it could be that the PC went into some strange state (which the reboot / power off would solve). I guess Stefan is right i think that somehow it tried to switch to the nonbonded device, its probably like switching between different computers Sorry to tell you, but you're on the completely wrong track here. The error message involving file "nonbonded.cu" appeared because fatbobs BOINC tried to run GPU-Grid on a card which does not support the features it needs. I.e. it can not recognize some commands which the GPU-Grid team put into nonbonded.cu. This is not something you could trigger (even if you wanted to), or which just happens, it only happens if you use *incapable* hardware. That's why your errors are completely different from fatboys, except for the general error code 0x1. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Lol you misunderstood me i prolly stink at english but you say exactly what i meant with it On my personal problem with gpugrid i did something nasty :( I downclocked my gpu speeds drastically and see what happens with the new units now my perfect record of running non errors is over I think the admin from gpugrid could not stand me being error free on the runs ;) Anyway no clue why it keeps crashing if it crashes again ill leave for a while to good running projects untill the problems are solved I dont want to spend another 24 hours and then getting nothing because it errors out again. |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Not sure if anybody want to know but when i had seti and gpugrid active both as cuda the grugrid was crashed. And only 3 cores out of 4 active on boinc. I have now only gpugrid active on the cuda and seems to run without error. So lets see if it stays running |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Did they try to run at the same time on your single card or was it just that you had projects activated and BOINc would switch between them normally? In that case we can say for sure that seti is not leaving the machine in a "clean" state. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Did they try to run at the same time on your single card or was it just that you had projects activated and BOINc would switch between them normally? Well after testing and disabling seti from my projects (also because of i am not very happy with that project (probably fake results)) I must admit that gpugrid runs nicely again altough i have to admit i changed some hardware settings also First of all i slowed down my dram it was set to run at 4-4-4-12 CR1 but i fear is maybe too fast so i switched it back to CR2 but i am not sure if this was needed. Second i downclocked my VC back to beneat the default OC settings of this card and run the card at 100% stock speeds for the given 9600GT model. By default the card was overclocked by EVGA to 675 mhz I saw also that sometimes 2 or 3 seti cuda units ran simultanous with the gpugrid but for a while this gave NO errors. Hence i have 4 cores ;) After i resetted seti cuda and installed the KWSN optimized app it went to run with 1 cuda unit together with gpugrid also not giving any errors. But then suddenly all the errors came as result, i cleaned up all and set it back to low/standard settings and now the project runs "normal" (slow) again. Finishing units like it should but i am not sure if/or all these should be seen as needed because the weird thing of it all is that it worked for weeks without problems, and all of a sudden all started to fail. Its ofcourse kinda hard to find the real culprit in these situations, maybe the given units where nasty, i have no clue yet but i am glad it runs normal again. Or maybe the damn windows updates where ;), who knows may say it :D |
|
Send message Joined: 1 Sep 08 Posts: 37 Credit: 5,864,088 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I have a new Win XP with an updates, the newest Nvidia driver for my GTX 295. But I cant complete any WU with GPUGrid. Try to reset the project, use the GTX in SLI and in two core mode... nothing works... Its the same mistake evrytime: MDIO ERROR: cannot open file "restart.coor" <core_client_version>6.6.15</core_client_version> <![CDATA[ <message> Unzul�ssige Funktion. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 295" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939261952 bytes # Number of multiprocessors: 30 # Number of cores: 240 # Device 1: "GeForce GTX 295" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 30 # Number of cores: 240 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> The GTX is ok, in a Vista system I can complete the WUs... Can you help me with this problem, please? |
|
Send message Joined: 8 Sep 08 Posts: 63 Credit: 1,696,957,181 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This is actually not a problem. The message [MDIO ERROR: cannot open file "restart.coor"] always occurs at the start of a new WU for the simple reason that it has to start from scratch and can not fall back on a previously saved restart.coor, such as will be the case after a shutdown and restart of the PC in the middle of crunching a WU. Your machine also reports two devices as it should be for a GTX 295. So two GPUGRID WUs will be running together. Please check your result status and you should see that everything is fine. PS - and if want any more help, try unhiding your PCs so that other fellow crunchers can have a look at them. Kind regards. Alain |
|
Send message Joined: 1 Sep 08 Posts: 37 Credit: 5,864,088 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Hi Alain, this is the machine: http://www.gpugrid.net/results.php?hostid=29092 Its unhide now. The PC ist running the hole day, there is no restart. Sometimes the error comes after a few seconds starting the new WU, sometimes after a few hours work... The result quit with a client and computing error... Hope you can help me a little more... Kind regards Joe PS Here some fact of the machine: 13.03.2009 14:37:09 Starting BOINC client version 6.6.15 for windows_intelx86 13.03.2009 14:37:09 log flags: task, file_xfer, sched_ops 13.03.2009 14:37:09 Libraries: libcurl/7.19.4 OpenSSL/0.9.8j zlib/1.2.3 13.03.2009 14:37:09 Data directory: D:\BOINC\Data 13.03.2009 14:37:09 Running under account Jörg 13.03.2009 14:37:09 Milkyway@home Found app_info.xml; using anonymous platform 13.03.2009 14:37:09 SETI@home Found app_info.xml; using anonymous platform 13.03.2009 14:37:09 Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz [x86 Family 6 Model 15 Stepping 11] 13.03.2009 14:37:09 Processor features: fpu tsc sse sse2 mmx 13.03.2009 14:37:09 OS: Microsoft Windows XP: Professional x86 Editon, Service Pack 3, (05.01.2600.00) 13.03.2009 14:37:09 Memory: 2.00 GB physical, 3.85 GB virtual 13.03.2009 14:37:09 Disk: 107.42 GB total, 100.79 GB free 13.03.2009 14:37:09 Local time is UTC +1 hours 13.03.2009 14:37:09 CUDA device: GeForce GTX 295 (driver version 18208, CUDA version 1.3, 896MB, est. 106GFLOPS) 13.03.2009 14:37:09 Not using a proxy 13.03.2009 14:37:09 GPUGRID URL: http://www.gpugrid.net/; Computer ID: 29092; location: (none); project prefs: default 13.03.2009 14:37:09 Reading preferences override file 13.03.2009 14:37:09 Preferences limit memory usage when active to 1023.46MB 13.03.2009 14:37:09 Preferences limit memory usage when idle to 1842.23MB 13.03.2009 14:37:09 Preferences limit disk usage to 53.71GB ... 13.03.2009 14:45:13 GPUGRID Sending scheduler request: To fetch work. 13.03.2009 14:45:13 GPUGRID Requesting new tasks 13.03.2009 14:45:41 GPUGRID Computation for task sM24328-SH2_US_8-0-10-SH2_US_8620000_0 finished 13.03.2009 14:45:41 GPUGRID Output file sM24328-SH2_US_8-0-10-SH2_US_8620000_0_1 for task sM24328-SH2_US_8-0-10-SH2_US_8620000_0 absent 13.03.2009 14:45:41 GPUGRID Output file sM24328-SH2_US_8-0-10-SH2_US_8620000_0_2 for task sM24328-SH2_US_8-0-10-SH2_US_8620000_0 absent 13.03.2009 14:45:41 GPUGRID Output file sM24328-SH2_US_8-0-10-SH2_US_8620000_0_3 for task sM24328-SH2_US_8-0-10-SH2_US_8620000_0 absent ... |
|
Send message Joined: 8 Sep 08 Posts: 63 Credit: 1,696,957,181 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
OK, there is indeed a problem. All your results have the error code [Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)]. Unfortunately this tells little and gives no real clues. Worth trying in such cases. 1. Check the version of your video driver. Make sure you have the last one, currently 180.29 if I am not mistaken. 2. Verify your GPU temperature 3. Did you overclock your videocard? If so try easing back. 4. Also, did you try a simple restart? Hope one of these help, sorry I can not be more specific. Kind regards Alain |
|
Send message Joined: 25 Nov 08 Posts: 51 Credit: 980,186 RAC: 0 Level ![]() Scientific publications ![]()
|
I'd add another question: 5. Do you have GPU tasks ticked in your Seti options on your account @ Seti? The default is yes. In theory it should be possible to get GPU projects to share resources with outher GPU projects on the same machine but I've not been able to get that to work yet. Phoneman1 |
©2025 Universitat Pompeu Fabra