Workaround for the "CUDA3.1 client sent to CUDA4.2 capable hosts" problem

Message boards : Number crunching : Workaround for the "CUDA3.1 client sent to CUDA4.2 capable hosts" problem
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25973 - Posted: 28 Jun 2012, 16:12:55 UTC
Last modified: 28 Jun 2012, 16:14:55 UTC

Trying to round out the conversation ...

RH: app_info is easier but you won't automatically get new versions.
My take = If I'm smart enough to even apply the app_info someone else creates I *should* be smart enough to keep track of the projects I am involved in to know when there are upgrades.

RZ: With adding the no check file size (thx MH) then the "swap file" only has to happen 1 time (well until you re-attach of the project).
My take = Using a bat file to make a copy of a file and rename it seems overly complex but mapping out what needed to happen has been extremely valuable!
Please also make sure that if you installed BOINC to a custom location the script needs to be modified accordingly.

Overall ... I'm sticking with the 1 time manual copy/ rename and have made an extra copy in a separate folder along with notes to reproduce if I need to re-attach.

Perhaps a new thread would be in order to investigate which WUs still process faster with 3.1 on 2XX series and how do we switch app versions on the fly. (I think this might just be a variation on RZ's script?)
Thanks - Steve
ID: 25973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mark Henderson

Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 25975 - Posted: 28 Jun 2012, 16:34:09 UTC
Last modified: 28 Jun 2012, 16:36:38 UTC

It would be beneficial if there was some way to tell after the fact that a WU labeled 3.1 was processed using 4.2 for the user and the project in case troubleshooting was required for some problems with results or whatever. Hopefully no problems will arise.
ID: 25975 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25978 - Posted: 28 Jun 2012, 17:48:29 UTC - in response to Message 25975.  

Mark, thanks for pointing out the solution for the flaw of my workaround!

It would be beneficial if there was some way to tell after the fact that a WU labeled 3.1 was processed using 4.2 for the user and the project in case troubleshooting was required for some problems with results or whatever.

The CUDA4.2 and the CUDA3.1 client produce very different stderr output files, so if you see a CUDA4.2 like stderr output of a CUDA3.1 task, then it's a converted one. The runtimes are also very distinguishable.

Hopefully no problems will arise.

I am confident in that no problems will arise from converted workunits. The CUDA3.1 and the CUDA4.2 client has the same version (and sub-version) numbers (v6.16). My guess is that the only difference between them is the version of the CUDA compiler used to build them.

Using a bat file to make a copy of a file and rename it seems overly complex but mapping out what needed to happen has been extremely valuable!

Yes, it can be done with Total Commander in a single step. I did it this way with this utility, but I had to provide a workaround for everyone, and every host, also the V3.0 was operating with the slots too, but it seems unnecessary, since the BOINC manager runs the client from the project's directory. I don't know why the BOINC manager copies the executables to the slot directory, if it won't use them afterwards.
ID: 25978 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tnmjr99a

Send message
Joined: 7 Aug 09
Posts: 8
Credit: 138,167,128
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25982 - Posted: 28 Jun 2012, 22:09:17 UTC

Looks like you need to have already run a 4.2 task in order to use the batch program?
ID: 25982 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25983 - Posted: 28 Jun 2012, 22:43:01 UTC - in response to Message 25982.  

Looks like you need to have already run a 4.2 task in order to use the batch program?

Yes.
ID: 25983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile K1atOdessa

Send message
Joined: 25 Feb 08
Posts: 249
Credit: 444,646,963
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25984 - Posted: 29 Jun 2012, 2:46:07 UTC

Looks like after a system reboot, the old cuda3.1 executable is downloaded again. So, it appears the batch file needs to be run after every reboot.


6/28/2012 10:42:30 PM | GPUGRID | Started download of acemd.win.2352
ID: 25984 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25996 - Posted: 29 Jun 2012, 10:13:08 UTC - in response to Message 25971.  

---the default location of the cc_config.xml file is
---on Windows XP: c:\Documents and Settings\All Users\Application Data\BOINC\projects\www.gpugrid.net\
---on Windows 7: c:\ProgramData\BOINC\projects\www.gpugrid.net\

cc_config.xml is a BOINC global file, not a project specific file. It belongs in the root of the BOINC data folder structure.

The easiest way to verify whether your own installation is using the default location is to look at the BOINC Manager message/event log: the working BOINC data directory is listed at around the fourth line after every BOINC restart.
ID: 25996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26002 - Posted: 29 Jun 2012, 12:23:05 UTC - in response to Message 25996.  

You are right. Sorry, it's a copy-paste bug. Let's blame it on the heat.
ID: 26002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mark Henderson

Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 26011 - Posted: 29 Jun 2012, 16:54:53 UTC
Last modified: 29 Jun 2012, 16:55:32 UTC

The 3.1 DLLs are being pulled to the slot directory with the renamed 4.2 acmed file, is this correct ?
ID: 26011 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26025 - Posted: 30 Jun 2012, 7:20:52 UTC - in response to Message 26011.  
Last modified: 30 Jun 2012, 7:50:10 UTC

Yes. It's because the BOINC manager doesn't know that it's a CUDA 4.2 client.
If it's causing any trouble, you should simply copy the CUDA4.2 dll's into the Slot directory with my previous batch program from the GPUGrid project directory (make bacukups before doing so). But it won't cause problems, because the BOINC manager runs the client from the project's directory, not from the slot's (and doesn't copy it back from the slot directory).
ID: 26025 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nanoprobe

Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 26026 - Posted: 30 Jun 2012, 8:14:42 UTC - in response to Message 25947.  

It would be quicker and simpler to create an app_info.xml file from the information already available in client_state

I'm very new to GPU crunching so would you be willing to help a noob and post a copy of the app_info file you're using to make the 3.1 run like a 4.2?
Thanks.
ID: 26026 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nanoprobe

Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 26048 - Posted: 30 Jun 2012, 18:06:07 UTC

I had my first two 4.2 tasks complete and validate. Now everything is failing with the message below. What happened?

Stderr output

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 470"
# Clock rate: 1.40 GHz
# Total amount of global memory: 1341718528 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO: cannot open file "restart.coor"
ERROR: # Energies have become nan

called boinc_finish

</stderr_txt>
]]>

ID: 26048 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mark Henderson

Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 26056 - Posted: 1 Jul 2012, 1:11:10 UTC

From the posts I have kept up with that error is often caused by overclocking.
ID: 26056 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nanoprobe

Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 26059 - Posted: 1 Jul 2012, 2:08:20 UTC - in response to Message 26056.  

From the posts I have kept up with that error is often caused by overclocking.

The card is an MSI GTX465 GE unlocked to a 470. I bought it used and had been running it at the settings it had when I bought it. 1.025 vcore, 700 MGz core clock, 1400 shader clock, 1848 memory clock. I thought after the first 2 tasks completed I was good to go but everything else after that failed. This is my first real dive into Nvidia GPUs and I'm finding that what works on ATI doesn't work on Nvidia. I've been trying different things but it seems that all the card needed was a little bump in the vcore voltage. I'm 3 1/2 hours into a long run task so we'll see how that goes. If it and the 1 in cache make it to validation then I'll try the 3.1 to 4.2 work around or maybe someone can point me to an app_info that will do the trick. Thanks for the help.
ID: 26059 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 57
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26076 - Posted: 1 Jul 2012, 14:24:45 UTC

I managed to get this running on a windows xp computer, with one video card, with no problem, by following the instructions. On a windows 7 computer with 2 video cards, it doesn't work for me. I followed the instructions, made sure the project and slot directories were correct,(mine were in a different location than listed on the postings, so I adjusted commands to those appropriate locations) and ran as administrator. Nothing!
ID: 26076 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile SMTB1963
Avatar

Send message
Joined: 27 Jun 10
Posts: 38
Credit: 524,420,921
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 26081 - Posted: 1 Jul 2012, 20:48:59 UTC

Works for me on 2 XP machines and a W7 machine. Great post Retvari.

Thanks !
ID: 26081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26085 - Posted: 2 Jul 2012, 9:44:19 UTC - in response to Message 25971.  
Last modified: 2 Jul 2012, 9:45:58 UTC

It's working!
I'm re-posting it to sum up some anwers, and the corrections in a single post:

You have to have CUDA4.2 capable NVidia drivers (v295.73 or later, v301.42 is recommended at the moment)
You have to have the CUDA4.2 application (acemd.2562.cuda42), and the CUDA4.2 runtime dll's (cudart32_42_9.dll, cufft32_42_9.dll) on your machine for this to work. I.e. your host should have downloaded at least one CUDA4.2 workunit before you can apply this workaround.
If you are not sure about this, you can check if your host already has the 3 required files in the project directory.
The default location of these files is:
on Windows XP: c:\Documents and Settings\All Users\Application Data\BOINC\projects\www.gpugrid.net\
on Windows 7: c:\ProgramData\BOINC\projects\www.gpugrid.net\
The easiest way to verify whether your own installation is using the default location is to look at the BOINC Manager message/event log: the working BOINC data directory is listed at around the fourth line after every BOINC restart. (you can get the correct path from that line)

Workaround V4.1:

1a. Put <dont_check_file_sizes>1</dont_check_file_sizes> in the options section in your cc_config.xml file
1b. If you don't have a cc_config.xml file, create a simple text file named cc_config.xml with the following content:

<cc_config>
<options>
<report_results_immediately>1</report_results_immediately>
<dont_check_file_sizes>1</dont_check_file_sizes>
</options>
</cc_config>


---the default location of the cc_config.xml file is
---on Windows XP: c:\Documents and Settings\All Users\Application Data\BOINC\
---on Windows 7: c:\ProgramData\BOINC\

2. Re-read configuration in BOINC manager

3. Pause all CUDA3.1 tasks

4. If you've installed the BOINC manager to it's default location, you can run one of the following batch programs according to your type of Windows.
Or you can manually copy the executable file of CUDA4.2 client over the CUDA3.1 client: acemd.2562.cuda42 -> acemd.win.2352
(the batch program can work on a non-default location too, if the path in the first line is corrected)

4a. On Windows XP run this batch program:

SET GPUGRIDDIR=c:\Documents and Settings\All Users\Application Data\BOINC\projects\www.gpugrid.net\
COPY "%GPUGRIDDIR%acemd.2562.cuda42" "%GPUGRIDDIR%acemd.win.2352" /y


4b. On Windows 7 run this batch program (right click) as an administrator:

SET GPUGRIDDIR=c:\ProgramData\BOINC\projects\www.gpugrid.net\
COPY "%GPUGRIDDIR%acemd.2562.cuda42" "%GPUGRIDDIR%acemd.win.2352" /y


5. Restart all CUDA3.1 tasks.

After that, every CUDA3.1 task will run as CUDA4.2, while the BOINC manager will still show they are CUDA3.1. It's because the BOINC manager doesn't know that it's a CUDA 4.2 client, so it won't copy the CUDA4.2 dlls into the slot directory, but it doesn't matter, because the BOINC manager runs the client from the project's directory, not from the slot's (and doesn't copy it back from the slot directory). If it's causing any trouble, you should simply copy the CUDA4.2 dll's into the Slot directory with my previous batch program from the GPUGrid project directory (make backups before doing so).

Thanks to every contributor!
ID: 26085 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ebonydogx

Send message
Joined: 19 Jun 12
Posts: 11
Credit: 51,704,550
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 26288 - Posted: 11 Jul 2012, 15:37:53 UTC

Retvari, Thank you!
I received a long 31 wu two days ago. Est time to complete was over 15.5 hours. No way for me to complete within 24 hour bonus period so I aborted before it started. Yesterday I was assigned another long 31 wu with about 14.5 hours est to complete. It was going to sneak in under the 24 hour mark so I let it run. It started about 1:00 am today my time. When I checked this morning, the 31 wu had been running for over 8 hours and still had 11 hours est to complete. I was about to abort, thought I'd check the forums first, and found your workaround.
Wow! After implementing and restarting the task, the wu is humming along. Time to completion is reduced by 30 mins after just 14 mins of processing.

Of course, the wu still has to validate, but I am optimistic that it will do so and will qualify for 24 hr bonus.

I'm a noob at any kind of programming, and really far better with hardware than software, so I really appreciate the detail in your instructions.
I'll let you know if the task validates.
Thanks again!
ID: 26288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26293 - Posted: 11 Jul 2012, 22:28:44 UTC - in response to Message 26288.  

Of course, the wu still has to validate, but I am optimistic that it will do so and will qualify for 24 hr bonus.

All of my converted workunits have been validated. I'm sure that yours will validate too.

I'm a noob at any kind of programming, and really far better with hardware than software, so I really appreciate the detail in your instructions.

You're welcome! It's my pleasure if my workaround helps you.

I'll let you know if the task validates.

I'll keep an eye on it.
ID: 26293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ebonydogx

Send message
Joined: 19 Jun 12
Posts: 11
Credit: 51,704,550
RAC: 0
Level
Thr
Scientific publications
watwatwatwat
Message 26297 - Posted: 12 Jul 2012, 3:48:27 UTC - in response to Message 26293.  

The wu validated and I feel really good about my little programming venture, even though all I did was follow your instructions.

Thanks again!
ID: 26297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Workaround for the "CUDA3.1 client sent to CUDA4.2 capable hosts" problem

©2025 Universitat Pompeu Fabra