Workaround for the "CUDA3.1 client sent to CUDA4.2 capable hosts" problem

Message boards : Number crunching : Workaround for the "CUDA3.1 client sent to CUDA4.2 capable hosts" problem
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25924 - Posted: 27 Jun 2012, 11:09:56 UTC
Last modified: 27 Jun 2012, 11:19:01 UTC

Aborting CUDA3.1 tasks are easy (also detrimental for the project), but what can you do if you have a CUDA3.1 workunit running for hours, and you don't want to waste it?
I've figured it out, that I can convert CUDA3.1 tasks to CUDA4.2 tasks.
So if you have some file management skills, you can do it too in no time:
(My instructions are for Windows only) V2.0

1. Pause all running CUDA3.1 tasks in BOINC manager
2. Locate the slot directory of the CUDA3.1 task by searching through the slots for the CUDA3.1 executable (acemd.win.2352)
---(on Windows XP the slots are located in c:\Documents and Settings\All Users\Application Data\BOINC\slots\)
---(on Windows 7 the slots are located in: c:\ProgramData\BOINC\slots\)
3. Copy the CUDA4.2 executable files to the slot directory of the CUDA3.1 task located in step 2.
---(the source of the CUDA4.2 client files (acemd.2562.cuda42, cudart32_42_9.dll, cufft32_42_9.dll) is:)
---(on Windows XP: c:\Documents and Settings\All Users\Application Data\BOINC\projects\www.gpugrid.net\)
---(on Windows 7: c:\ProgramData\BOINC\projects\www.gpugrid.net\
4. Delete the CUDA3.1 client (acemd.win.2352) from the directory located in step 2.
5. Rename the CUDA4.2 client executable file to the name of the CUDA3.1 client in the slot directory
---(acemd.2562.cuda42 -> acemd.win.2352)
---(do not rename the dll's)
6. Rename the CUDA3.1 client executable file in the project directory. (acemd.win.2352 -> acemd.win.2352.bak)
---(on Windows XP: c:\Documents and Settings\All Users\Application Data\BOINC\projects\www.gpugrid.net\)
---(on Windows 7: c:\ProgramData\BOINC\projects\www.gpugrid.net\
7. Make a copy of the CUDA4.2 client in the project directory, and rename it to the CUDA3.1 client's name. (copy of acemd.2562.cuda42 -> acemd.win.2352)
8. Restart the CUDA3.1 tasks in BOINC manager (they will be displayed still as a CUDA3.1 task)

I hope it will work for you too.
Windows Explorer should be set to show hidden and system folders if you want to browse for these directories, or you can paste the paths directly from my text.
Also the "hide extensions of known files" option should be disabled.
I recommend a file manager tool like Total Commander (step 4 and 5 can be done in a single step with this utility)
I ask a Linux expert to try a similar method, and share it here if it's working on Linux too.
ID: 25924 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25925 - Posted: 27 Jun 2012, 11:16:28 UTC - in response to Message 25923.  
Last modified: 27 Jun 2012, 11:17:33 UTC

Hopefully this will not impact on the building of subsequent tasks?

I will see it soon, my first converted tasks will finish in two hours.

I will try your workaround if I get any more CUDA 4.2 tasks on Windows, but I fully expect it to work.

You don't have to wait, the CUDA4.2 files are there in the directory of the project.
ID: 25925 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile K1atOdessa

Send message
Joined: 25 Feb 08
Posts: 249
Credit: 444,646,963
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25926 - Posted: 27 Jun 2012, 11:23:14 UTC

Am I correct in that you need to do this every time you notice a cuda3.1 WU running?

Very cool that this appears to work, but quite a bit of monitoring of the system.
ID: 25926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25927 - Posted: 27 Jun 2012, 12:40:14 UTC - in response to Message 25926.  

Am I correct in that you need to do this every time you notice a cuda3.1 WU running?

Not if the next task runs in the same slot (and I think it does).

Might be a coincidence but I have not received any 3.1 tasks since resetting yesterday, just 4.2. acemd.win.2352 therefore doesn't exist, so you can't do this in advance (and after a project reset).
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 25927 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25928 - Posted: 27 Jun 2012, 13:07:18 UTC
Last modified: 27 Jun 2012, 13:20:35 UTC

Nice work Retvari!!!

I had been thinking along these lines or maybe using a appinfo.xml to pull a switcheroo but thank you very much for working through the details.

I think by making 4.2 version files named as 3.1 version in the project directory *might* fix this even if the next WU does not run in the same slot because isn't that were the appropriate slot files are copied from so they don't have to be downloaded everytime?

I'm also trying this now and will report back what happens. Seeing as we are getting a mix of 3.1 and 4.2 it may be quite a while.
Thanks - Steve
ID: 25928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25929 - Posted: 27 Jun 2012, 13:45:35 UTC - in response to Message 25926.  

Am I correct in that you need to do this every time you notice a cuda3.1 WU running?

The bad news is that the BOINC manager notices that I've overwritten the cuda3.1 client, and downloads the original one. (so you have to do it every time)
2012. 06. 27. 15:08:15 GPUGRID [error] File acemd.win.2352 has wrong size: expected 2349568, got 3454464
2012. 06. 27. 15:09:29 GPUGRID Started download of acemd.win.2352

The good news is I'm writing a little batch program to do the job. Stay tuned.
ID: 25929 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25930 - Posted: 27 Jun 2012, 13:48:57 UTC

Does anyone know how it notices ... is it smart enough to read the file's header or is it only looking at the timestamp. If it is only the time stamp I have code (at work) that uses the win32 api to set the stamp to whatever you tell it to be.
Thanks - Steve
ID: 25930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25933 - Posted: 27 Jun 2012, 14:20:41 UTC - in response to Message 25930.  

Does anyone know how it notices ... is it smart enough to read the file's header or is it only looking at the timestamp. If it is only the time stamp I have code (at work) that uses the win32 api to set the stamp to whatever you tell it to be.

According to the error message, it compares the size of the files.
ID: 25933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25934 - Posted: 27 Jun 2012, 14:59:21 UTC
Last modified: 27 Jun 2012, 15:05:17 UTC

Workaround V3.0 :)

For Windows XP:

1. Create a new folder for the files of the workaround.
2. Create a new text document in this folder called no.txt, and put a single n character in it.
3. Create a new text document called check.bat and put the following text in it:

SET GPUGRIDDIR=c:\Documents and Settings\All Users\Application Data\BOINC\projects\www.gpugrid.net\
SET SLOTDIR=c:\Documents and Settings\All Users\Application Data\BOINC\slots\
COMP "%GPUGRIDDIR%acemd.2562.cuda42" "%GPUGRIDDIR%acemd.win.2352" <no.txt
IF errorlevel 1 COPY "%GPUGRIDDIR%acemd.2562.cuda42" "%GPUGRIDDIR%acemd.win.2352" /y

FOR /L %%i IN (1,1,10) DO CALL slotcheck %%i


4. Create a new text document called slotcheck.bat and put the following text in it:

IF NOT EXIST "%SLOTDIR%%1\acemd.win.2352" GOTO slotnotcuda31
COMP "%GPUGRIDDIR%acemd.2562.cuda42" "%SLOTDIR%%1\acemd.win.2352" <no.txt
IF errorlevel 1 GOTO slotcopy
GOTO slotnotcuda31
:slotcopy
COPY "%GPUGRIDDIR%acemd.2562.cuda42" "%SLOTDIR%%1\acemd.win.2352" /y
COPY "%GPUGRIDDIR%cudart32_42_9.dll" "%SLOTDIR%%1\" /y
COPY "%GPUGRIDDIR%cufft32_42_9.dll" "%SLOTDIR%%1\" /y
:slotnotcuda31


5. Pause all CUDA3.1 tasks, and run check.bat. Restart all CUDA3.1 tasks.
6. Create a scheduled task to run the check.bat in every 5 minutes, or you can run it manually while no CUDA3.1 tasks running

For Windows 7 the first two lines of check.bat should be:
SET GPUGRIDDIR=c:\ProgramData\BOINC\projects\www.gpugrid.net\
SET SLOTDIR=c:\ProgramData\BOINC\slots\

In Windows 7 you need the highest access rights to run the batch file, so right click and choose "Run as administrator", or if you create a shortcut for it, check the "run as administrator" on the compatibility tab.
It checks slots 1-10, if you need more (or less) you can change it by changing the third number in the line beginnig with "FOR"
ID: 25934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile K1atOdessa

Send message
Joined: 25 Feb 08
Posts: 249
Credit: 444,646,963
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25942 - Posted: 27 Jun 2012, 20:11:40 UTC
Last modified: 27 Jun 2012, 20:12:09 UTC

I tried your batch files. Seemed to run fine on the running task (cuda3.1) and made it run like a cuda4.2.

After that task ended, the next task was a cuda4.2 and errored out immediately.

Task Error

However, once the error task uploaded, I got another cuda4.2 task crunching fine. Who knows if it was just some weird issue.
ID: 25942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25943 - Posted: 27 Jun 2012, 20:34:04 UTC - in response to Message 25942.  
Last modified: 27 Jun 2012, 20:39:45 UTC

Thank you for your report. This error could be by coincidence.
I have a different problem with the scheduling of this batch program:
When a new CUDA3.1 task is downloaded, it is receive an "error while downloading".
I couldn't catch it while it happened, I will keep an eye on it.
I think I can't cheat the BOINC manager, because it runs the client from the project's directory, not the slot's.
However, if a host do not receive a new CUDA3.1 task while a converted one is running, it could be fine.
So use my workaround with caution, and simply do not run it if it's causing problems. I will try to improve it. Until then I suggest you not to use it as a scheduled task.
Can anyone tell me, if there is a way to make the BOINC manager run the tasks from the slot's directory?
ID: 25943 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Paul Raney

Send message
Joined: 26 Dec 10
Posts: 115
Credit: 416,576,946
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 25945 - Posted: 27 Jun 2012, 22:30:39 UTC - in response to Message 25943.  

It sounds like all of you are receiving a combination of CUDA 3.1 and 4.2 tasks. I have the same issue. My real concern is that all of my cards are overclocked and CUDA 4.2 tasks may required a different hardware config that CUDA 3.1 tasks. It would be great to get only 4.2 work units going forward.
Thx - Paul

Note: Please don't use driver version 295 or 296! Recommended versions are 266 - 285.
ID: 25945 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25947 - Posted: 27 Jun 2012, 23:40:25 UTC

It would be quicker and simpler to create an app_info.xml file from the information already available in client_state, as I did for Running multiple tasks per GPU - count=0.5.

The downside is that you then have no chance of receiving new applications automatically - and it took until today for me to get round to finding, downloading, and implementing cuda42 on host 43404. Looks to have been well worth the effort.
ID: 25947 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile K1atOdessa

Send message
Joined: 25 Feb 08
Posts: 249
Credit: 444,646,963
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25949 - Posted: 28 Jun 2012, 1:19:44 UTC - in response to Message 25943.  

I have a different problem with the scheduling of this batch program:
When a new CUDA3.1 task is downloaded, it is receive an "error while downloading".
I couldn't catch it while it happened, I will keep an eye on it.


I think I've had a cuda3.1 download after a cuda4.2, without an "error while downloading". I'll let your batch run for 24 hours and see what other errors it generates. So far, having a single WU error out immediately while running a cuda3.1 WU in only 60% of the time (with another well on its way to completion) has been worth the effort.

Definitely interesting way to try to solve the issue, though probably not the silver bullet a separate queue selection would be.
ID: 25949 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile K1atOdessa

Send message
Joined: 25 Feb 08
Posts: 249
Credit: 444,646,963
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25951 - Posted: 28 Jun 2012, 2:16:22 UTC - in response to Message 25949.  

I think I've had a cuda3.1 download after a cuda4.2, without an "error while downloading".


OK. On my second system, I definitely just had this scenario. No issues and it is 20 minutes in.

One interesting thing is that the slot appears to have changed from 8 (this afternoon) to 9 (right now). Looking into the slot 9 folder, it definitely has the cuda4.2 file size renamed as the cuda3.1 filename. However, it has the cuda3.2 .dll files (no sign of the cuda4.2 dll's).

This doesn't seem to impact the WU running (its been plugging along for 20+ minutes now), but not sure why I don't see the cuda4.2 dll's. Speed seems on par for the cuda4.2 WU's.

I'm not an expert, so does it makes sense that the cuda4.2 executable can run with the cuda3.1 dll's?
ID: 25951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25954 - Posted: 28 Jun 2012, 8:36:19 UTC - in response to Message 25951.  

I'm not an expert, so does it makes sense that the cuda4.2 executable can run with the cuda3.1 dll's?

No. Use Dependency Walker to see which support files an application executable needs - the cuda42 app here needs, as you would expect, the cu...32_42_9 DLLs.

Use Process Explorer to see which DLLs a running application is using - and where it's loading them from. My guess is that the application is finding the right DLLs somewhere else in the path, and loading those in preference to the ones in the slot directory - I've been caught that way in the past.

If the project is doing a copy-rename on those DLLs, then it's wasting a lot of time and disk access on something which is going to be no use at all. Somebody should have a look at the <app_version> section of client_state.xml to see what's going on.
ID: 25954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mark Henderson

Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 25965 - Posted: 28 Jun 2012, 13:38:14 UTC

There is a Boinc flag to help with the earlier problem..

<dont_check_file_sizes>0|1</dont_check_file_sizes>
ID: 25965 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25967 - Posted: 28 Jun 2012, 13:50:42 UTC

Here is my first converted workunit, which ran from the start to completion as a converted wu.
The second one is completed, but still uploading.
ID: 25967 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25968 - Posted: 28 Jun 2012, 13:52:05 UTC - in response to Message 25965.  

There is a Boinc flag to help with the earlier problem..

<dont_check_file_sizes>0|1</dont_check_file_sizes>

Wow!
I put this in my cc_config.xml. If it works, then we have to apply only once my batch program.
ID: 25968 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25971 - Posted: 28 Jun 2012, 14:54:46 UTC - in response to Message 25968.  
Last modified: 28 Jun 2012, 15:00:08 UTC

It's working!
Workaround V4.0:

1a. Put <dont_check_file_sizes>1</dont_check_file_sizes> in the options section in your cc_config.xml file
1b. If you don't have a cc_config.xml file, create a simple text file named cc_config.xml with the following content:

<cc_config>
<options>
<report_results_immediately>1</report_results_immediately>
<dont_check_file_sizes>1</dont_check_file_sizes>
</options>
</cc_config>


---the default location of the cc_config.xml file is
---on Windows XP: c:\Documents and Settings\All Users\Application Data\BOINC\projects\www.gpugrid.net\
---on Windows 7: c:\ProgramData\BOINC\projects\www.gpugrid.net\

2. Re-read configuration in BOINC manager

3. Pause all CUDA3.1 tasks

4a. On Windows XP run this batch program:

SET GPUGRIDDIR=c:\Documents and Settings\All Users\Application Data\BOINC\projects\www.gpugrid.net\
COPY "%GPUGRIDDIR%acemd.2562.cuda42" "%GPUGRIDDIR%acemd.win.2352" /y


4b. On Windows 7 run this batch program (right click) as an administrator:

SET GPUGRIDDIR=c:\ProgramData\BOINC\projects\www.gpugrid.net\
COPY "%GPUGRIDDIR%acemd.2562.cuda42" "%GPUGRIDDIR%acemd.win.2352" /y


5. Restart all CUDA3.1 tasks.

After that, every CUDA3.1 task will be running as CUDA4.2.
ID: 25971 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Workaround for the "CUDA3.1 client sent to CUDA4.2 capable hosts" problem

©2025 Universitat Pompeu Fabra