Computation Error

Message boards : Graphics cards (GPUs) : Computation Error
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Fatbob

Send message
Joined: 1 Jan 09
Posts: 1
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 7267 - Posted: 8 Mar 2009, 12:34:39 UTC
Last modified: 8 Mar 2009, 12:45:07 UTC

I have tried to run Nvidia client several time.

I have 1 x 8800 GTX and 1 x 8800 GT non SLi (obviously).
running under Vista 64 bit with latest WQL drivers.

I have tried several Boinc Windows clients.

First time the WU ran for several hours then in final few % it errored. Now every WU i run errors.

How do I get this to work ?
Seti@Home runs great with Cuda so why not GPUGrid ?

This is a example of what is posted against WU

<core_client_version>6.6.12</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 8800 GTX"
# Clock rate: 1350000 kilohertz
# Total amount of global memory: 805306368 bytes
# Number of multiprocessors: 16
# Number of cores: 128
# Device 1: "GeForce 8800 GT"
# Clock rate: 1500000 kilohertz
# Total amount of global memory: 268435456 bytes
# Number of multiprocessors: 14
# Number of cores: 112
Cuda error in file 'nonbonded.cu' in line 189 : invalid device symbol.

</stderr_txt>
]]>
ID: 7267 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stefan Ledwina
Avatar

Send message
Joined: 16 Jul 07
Posts: 464
Credit: 298,573,998
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwat
Message 7269 - Posted: 8 Mar 2009, 12:50:54 UTC - in response to Message 7267.  

GPU FAQ: Overview of cards that run Cuda 2.0 compiled applications
The 8800GTX is not supported by GPUGRID - the 8800GT is supported...
It seems BOINC switched between the two cards during computation of the task and that's probably why it errored out...

pixelicious.at - my little photoblog
ID: 7269 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7271 - Posted: 8 Mar 2009, 13:40:33 UTC

Well i have the same error and i only have 1 card so thats not it.
Until now i never had an error on the cuda applications but today my first ever.

<core_client_version>6.5.0</core_client_version>
<![CDATA[
<message>
Onjuiste functie. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9600 GT"
# Clock rate: 1800000 kilohertz
# Total amount of global memory: 536543232 bytes
# Number of multiprocessors: 8
# Number of cores: 64
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce 9600 GT"
# Clock rate: 1800000 kilohertz
# Total amount of global memory: 536543232 bytes
# Number of multiprocessors: 8
# Number of cores: 64

</stderr_txt>
]]>

ID: 7271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stefan Ledwina
Avatar

Send message
Joined: 16 Jul 07
Posts: 464
Credit: 298,573,998
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwat
Message 7274 - Posted: 8 Mar 2009, 14:05:19 UTC - in response to Message 7271.  

Actually it's not the same error. ;)

Your's is also incorrect function (0x1) exit code 1, but in Fatbob's stderr.out there's also -
Cuda error in file 'nonbonded.cu' in line 189 : invalid device symbol.



pixelicious.at - my little photoblog
ID: 7274 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[boinc.at] Fireman69

Send message
Joined: 8 Oct 08
Posts: 15
Credit: 29,603,934
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7275 - Posted: 8 Mar 2009, 14:52:32 UTC

Same problem on my new GTX260-216.

<core_client_version>6.4.7</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1350000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>

Boinc-Client: 6.4.7
OS: MS Windows XP Pro/32Bit, SP3 (05.01.2600.00)
Coprozessor: Gainward GeForce GTX260-216/895MB (620MHz Core, 1242MHz Shader Clock, 896MB 2200MHz GDDR3 Memory)
Nvidia driver: 182.08

ALL WU's crashed on this machine. On the others with GTX280 there is no actual problem. Machine has no problem when playing games.

ID: 7275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7277 - Posted: 8 Mar 2009, 16:14:43 UTC

Ok agreed but seems now every units is ending in error direct when starts someone any idea what or how ?!

<core_client_version>6.5.0</core_client_version>
<![CDATA[
<message>
Onjuiste functie. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9600 GT"
# Clock rate: 1800000 kilohertz
# Total amount of global memory: 536543232 bytes
# Number of multiprocessors: 8
# Number of cores: 64
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>

ID: 7277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7278 - Posted: 8 Mar 2009, 16:45:45 UTC - in response to Message 7277.  

Reboot? And/or power the machine off and remove the power cord for >10 min.

Your computers are hidden, so I can't check it myself. Do you post the entire error message or just part of it? The line "Onjuiste functie. (0x1) - exit code 1 (0x1)" is the general error category and doesn't tell us what's happening. For example in fatbobs case "Cuda error in file 'nonbonded.cu' in line 189 : invalid device symbol." was the actual error message.

MrS
Scanning for our furry friends since Jan 2002
ID: 7278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7281 - Posted: 8 Mar 2009, 19:03:21 UTC

changed the setting back to show :)
ill try to reboot and see what happens
ID: 7281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7283 - Posted: 8 Mar 2009, 19:46:43 UTC - in response to Message 7281.  

OK, you did post all relevant information and there's nothing else in the task output. But it was worth taking a look anyway.

MrS
Scanning for our furry friends since Jan 2002
ID: 7283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7285 - Posted: 8 Mar 2009, 21:54:58 UTC
Last modified: 8 Mar 2009, 22:00:56 UTC

I guess Stefan is right i think that somehow it tried to switch to the nonbonded device, its probably like switching between different computers
Meanwhile i took the advise and rebooted and closed down my machine for a few minutes to see if that solves the issue.
If so we must reboot once in a few days, probably because of memory leaks in the applications.
But which one is not clear to me it can be both or it could the combination of gpugrid versus seti

Strangly i haven't had any problems on all previous units and never needed a reboot other then about 2 months ago my machine runs 24/7 normally.
Thanks for trying to help anyway but i fear its out of our hands now


PS sadly no solution the newly received unit crashed withing 30 seconds
So its something else other then my machine or boinc since all other projects run without errors
ID: 7285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7287 - Posted: 8 Mar 2009, 22:05:39 UTC - in response to Message 7285.  

Normally we don't need to reboot for GPU-Grid, it runs just fine. But if *something* happened an WUs error out in rows, it could be that the PC went into some strange state (which the reboot / power off would solve).

I guess Stefan is right i think that somehow it tried to switch to the nonbonded device, its probably like switching between different computers


Sorry to tell you, but you're on the completely wrong track here. The error message involving file "nonbonded.cu" appeared because fatbobs BOINC tried to run GPU-Grid on a card which does not support the features it needs. I.e. it can not recognize some commands which the GPU-Grid team put into nonbonded.cu. This is not something you could trigger (even if you wanted to), or which just happens, it only happens if you use *incapable* hardware. That's why your errors are completely different from fatboys, except for the general error code 0x1.

MrS
Scanning for our furry friends since Jan 2002
ID: 7287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7288 - Posted: 8 Mar 2009, 22:24:34 UTC
Last modified: 8 Mar 2009, 22:32:27 UTC

Lol you misunderstood me i prolly stink at english but you say exactly what i meant with it

On my personal problem with gpugrid i did something nasty :(
I downclocked my gpu speeds drastically and see what happens with the new units now my perfect record of running non errors is over
I think the admin from gpugrid could not stand me being error free on the runs ;)
Anyway no clue why it keeps crashing if it crashes again ill leave for a while to good running projects untill the problems are solved
I dont want to spend another 24 hours and then getting nothing because it errors out again.
ID: 7288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7321 - Posted: 10 Mar 2009, 15:32:21 UTC

Not sure if anybody want to know but when i had seti and gpugrid active both as cuda the grugrid was crashed.
And only 3 cores out of 4 active on boinc.
I have now only gpugrid active on the cuda and seems to run without error.
So lets see if it stays running
ID: 7321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7327 - Posted: 10 Mar 2009, 19:22:44 UTC - in response to Message 7321.  

Did they try to run at the same time on your single card or was it just that you had projects activated and BOINc would switch between them normally?

In that case we can say for sure that seti is not leaving the machine in a "clean" state.

MrS
Scanning for our furry friends since Jan 2002
ID: 7327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
uBronan
Avatar

Send message
Joined: 1 Feb 09
Posts: 139
Credit: 575,023
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7378 - Posted: 12 Mar 2009, 15:46:40 UTC - in response to Message 7327.  

Did they try to run at the same time on your single card or was it just that you had projects activated and BOINc would switch between them normally?

In that case we can say for sure that seti is not leaving the machine in a "clean" state.

MrS


Well after testing and disabling seti from my projects (also because of i am not very happy with that project (probably fake results))

I must admit that gpugrid runs nicely again altough i have to admit i changed some hardware settings also
First of all i slowed down my dram it was set to run at 4-4-4-12 CR1 but i fear is maybe too fast so i switched it back to CR2 but i am not sure if this was needed.
Second i downclocked my VC back to beneat the default OC settings of this card and run the card at 100% stock speeds for the given 9600GT model.
By default the card was overclocked by EVGA to 675 mhz

I saw also that sometimes 2 or 3 seti cuda units ran simultanous with the gpugrid but for a while this gave NO errors.
Hence i have 4 cores ;)
After i resetted seti cuda and installed the KWSN optimized app it went to run with 1 cuda unit together with gpugrid also not giving any errors.

But then suddenly all the errors came as result, i cleaned up all and set it back to low/standard settings and now the project runs "normal" (slow) again.

Finishing units like it should but i am not sure if/or all these should be seen as needed because the weird thing of it all is that it worked for weeks without problems, and all of a sudden all started to fail.

Its ofcourse kinda hard to find the real culprit in these situations, maybe the given units where nasty, i have no clue yet but i am glad it runs normal again.
Or maybe the damn windows updates where ;), who knows may say it :D
ID: 7378 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Joe

Send message
Joined: 1 Sep 08
Posts: 37
Credit: 5,864,088
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 7415 - Posted: 13 Mar 2009, 11:19:49 UTC

I have a new Win XP with an updates, the newest Nvidia driver for my GTX 295. But I cant complete any WU with GPUGrid. Try to reset the project, use the GTX in SLI and in two core mode... nothing works... Its the same mistake evrytime:

MDIO ERROR: cannot open file "restart.coor"

<core_client_version>6.6.15</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 295"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939261952 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 30
# Number of cores: 240
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>

The GTX is ok, in a Vista system I can complete the WUs...

Can you help me with this problem, please?
ID: 7415 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alain Maes

Send message
Joined: 8 Sep 08
Posts: 63
Credit: 1,696,957,181
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7416 - Posted: 13 Mar 2009, 11:54:44 UTC - in response to Message 7415.  

This is actually not a problem. The message [MDIO ERROR: cannot open file "restart.coor"] always occurs at the start of a new WU for the simple reason that it has to start from scratch and can not fall back on a previously saved restart.coor, such as will be the case after a shutdown and restart of the PC in the middle of crunching a WU.

Your machine also reports two devices as it should be for a GTX 295. So two GPUGRID WUs will be running together.

Please check your result status and you should see that everything is fine.

PS - and if want any more help, try unhiding your PCs so that other fellow crunchers can have a look at them.

Kind regards.

Alain
ID: 7416 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Joe

Send message
Joined: 1 Sep 08
Posts: 37
Credit: 5,864,088
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 7418 - Posted: 13 Mar 2009, 14:02:15 UTC - in response to Message 7416.  

Hi Alain,

this is the machine: http://www.gpugrid.net/results.php?hostid=29092 Its unhide now.

The PC ist running the hole day, there is no restart. Sometimes the error comes after a few seconds starting the new WU, sometimes after a few hours work... The result quit with a client and computing error...

Hope you can help me a little more...

Kind regards

Joe

PS Here some fact of the machine:

13.03.2009 14:37:09 Starting BOINC client version 6.6.15 for windows_intelx86
13.03.2009 14:37:09 log flags: task, file_xfer, sched_ops
13.03.2009 14:37:09 Libraries: libcurl/7.19.4 OpenSSL/0.9.8j zlib/1.2.3
13.03.2009 14:37:09 Data directory: D:\BOINC\Data
13.03.2009 14:37:09 Running under account Jörg
13.03.2009 14:37:09 Milkyway@home Found app_info.xml; using anonymous platform
13.03.2009 14:37:09 SETI@home Found app_info.xml; using anonymous platform
13.03.2009 14:37:09 Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz [x86 Family 6 Model 15 Stepping 11]
13.03.2009 14:37:09 Processor features: fpu tsc sse sse2 mmx
13.03.2009 14:37:09 OS: Microsoft Windows XP: Professional x86 Editon, Service Pack 3, (05.01.2600.00)
13.03.2009 14:37:09 Memory: 2.00 GB physical, 3.85 GB virtual
13.03.2009 14:37:09 Disk: 107.42 GB total, 100.79 GB free
13.03.2009 14:37:09 Local time is UTC +1 hours
13.03.2009 14:37:09 CUDA device: GeForce GTX 295 (driver version 18208, CUDA version 1.3, 896MB, est. 106GFLOPS)
13.03.2009 14:37:09 Not using a proxy
13.03.2009 14:37:09 GPUGRID URL: http://www.gpugrid.net/; Computer ID: 29092; location: (none); project prefs: default
13.03.2009 14:37:09 Reading preferences override file
13.03.2009 14:37:09 Preferences limit memory usage when active to 1023.46MB
13.03.2009 14:37:09 Preferences limit memory usage when idle to 1842.23MB
13.03.2009 14:37:09 Preferences limit disk usage to 53.71GB
...

13.03.2009 14:45:13 GPUGRID Sending scheduler request: To fetch work.
13.03.2009 14:45:13 GPUGRID Requesting new tasks
13.03.2009 14:45:41 GPUGRID Computation for task sM24328-SH2_US_8-0-10-SH2_US_8620000_0 finished
13.03.2009 14:45:41 GPUGRID Output file sM24328-SH2_US_8-0-10-SH2_US_8620000_0_1 for task sM24328-SH2_US_8-0-10-SH2_US_8620000_0 absent
13.03.2009 14:45:41 GPUGRID Output file sM24328-SH2_US_8-0-10-SH2_US_8620000_0_2 for task sM24328-SH2_US_8-0-10-SH2_US_8620000_0 absent
13.03.2009 14:45:41 GPUGRID Output file sM24328-SH2_US_8-0-10-SH2_US_8620000_0_3 for task sM24328-SH2_US_8-0-10-SH2_US_8620000_0 absent

...

ID: 7418 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alain Maes

Send message
Joined: 8 Sep 08
Posts: 63
Credit: 1,696,957,181
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7419 - Posted: 13 Mar 2009, 14:34:22 UTC - in response to Message 7418.  

OK, there is indeed a problem. All your results have the error code [Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)]. Unfortunately this tells little and gives no real clues.

Worth trying in such cases.
1. Check the version of your video driver. Make sure you have the last one, currently 180.29 if I am not mistaken.
2. Verify your GPU temperature
3. Did you overclock your videocard? If so try easing back.
4. Also, did you try a simple restart?

Hope one of these help, sorry I can not be more specific.

Kind regards

Alain

ID: 7419 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phoneman1

Send message
Joined: 25 Nov 08
Posts: 51
Credit: 980,186
RAC: 0
Level
Gly
Scientific publications
watwat
Message 7420 - Posted: 13 Mar 2009, 14:41:37 UTC - in response to Message 7419.  

I'd add another question:

5. Do you have GPU tasks ticked in your Seti options on your account @ Seti? The default is yes.

In theory it should be possible to get GPU projects to share resources with outher GPU projects on the same machine but I've not been able to get that to work yet.

Phoneman1
ID: 7420 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Graphics cards (GPUs) : Computation Error

©2025 Universitat Pompeu Fabra