Message boards :
Number crunching :
Workaround for the "CUDA3.1 client sent to CUDA4.2 capable hosts" problem
Message board moderation
Previous · 1 · 2 · 3 · Next
| Author | Message |
|---|---|
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Trying to round out the conversation ... RH: app_info is easier but you won't automatically get new versions. My take = If I'm smart enough to even apply the app_info someone else creates I *should* be smart enough to keep track of the projects I am involved in to know when there are upgrades. RZ: With adding the no check file size (thx MH) then the "swap file" only has to happen 1 time (well until you re-attach of the project). My take = Using a bat file to make a copy of a file and rename it seems overly complex but mapping out what needed to happen has been extremely valuable! Please also make sure that if you installed BOINC to a custom location the script needs to be modified accordingly. Overall ... I'm sticking with the 1 time manual copy/ rename and have made an extra copy in a separate folder along with notes to reproduce if I need to re-attach. Perhaps a new thread would be in order to investigate which WUs still process faster with 3.1 on 2XX series and how do we switch app versions on the fly. (I think this might just be a variation on RZ's script?) Thanks - Steve |
|
Send message Joined: 21 Dec 08 Posts: 51 Credit: 26,320,167 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It would be beneficial if there was some way to tell after the fact that a WU labeled 3.1 was processed using 4.2 for the user and the project in case troubleshooting was required for some problems with results or whatever. Hopefully no problems will arise. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Mark, thanks for pointing out the solution for the flaw of my workaround! It would be beneficial if there was some way to tell after the fact that a WU labeled 3.1 was processed using 4.2 for the user and the project in case troubleshooting was required for some problems with results or whatever. The CUDA4.2 and the CUDA3.1 client produce very different stderr output files, so if you see a CUDA4.2 like stderr output of a CUDA3.1 task, then it's a converted one. The runtimes are also very distinguishable. Hopefully no problems will arise. I am confident in that no problems will arise from converted workunits. The CUDA3.1 and the CUDA4.2 client has the same version (and sub-version) numbers (v6.16). My guess is that the only difference between them is the version of the CUDA compiler used to build them. Using a bat file to make a copy of a file and rename it seems overly complex but mapping out what needed to happen has been extremely valuable! Yes, it can be done with Total Commander in a single step. I did it this way with this utility, but I had to provide a workaround for everyone, and every host, also the V3.0 was operating with the slots too, but it seems unnecessary, since the BOINC manager runs the client from the project's directory. I don't know why the BOINC manager copies the executables to the slot directory, if it won't use them afterwards. |
|
Send message Joined: 7 Aug 09 Posts: 8 Credit: 138,167,128 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Looks like you need to have already run a 4.2 task in order to use the batch program? |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Looks like you need to have already run a 4.2 task in order to use the batch program? Yes. |
K1atOdessaSend message Joined: 25 Feb 08 Posts: 249 Credit: 444,646,963 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Looks like after a system reboot, the old cuda3.1 executable is downloaded again. So, it appears the batch file needs to be run after every reboot. 6/28/2012 10:42:30 PM | GPUGRID | Started download of acemd.win.2352 |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
---the default location of the cc_config.xml file is cc_config.xml is a BOINC global file, not a project specific file. It belongs in the root of the BOINC data folder structure. The easiest way to verify whether your own installation is using the default location is to look at the BOINC Manager message/event log: the working BOINC data directory is listed at around the fourth line after every BOINC restart. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You are right. Sorry, it's a copy-paste bug. Let's blame it on the heat. |
|
Send message Joined: 21 Dec 08 Posts: 51 Credit: 26,320,167 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The 3.1 DLLs are being pulled to the slot directory with the renamed 4.2 acmed file, is this correct ? |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes. It's because the BOINC manager doesn't know that it's a CUDA 4.2 client. If it's causing any trouble, you should simply copy the CUDA4.2 dll's into the Slot directory with my previous batch program from the GPUGrid project directory (make bacukups before doing so). But it won't cause problems, because the BOINC manager runs the client from the project's directory, not from the slot's (and doesn't copy it back from the slot directory). |
|
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It would be quicker and simpler to create an app_info.xml file from the information already available in client_state I'm very new to GPU crunching so would you be willing to help a noob and post a copy of the app_info file you're using to make the 3.1 run like a 4.2? Thanks. |
|
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had my first two 4.2 tasks complete and validate. Now everything is failing with the message below. What happened? Stderr output <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 470" # Clock rate: 1.40 GHz # Total amount of global memory: 1341718528 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO: cannot open file "restart.coor" ERROR: # Energies have become nan called boinc_finish </stderr_txt> ]]>
|
|
Send message Joined: 21 Dec 08 Posts: 51 Credit: 26,320,167 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
From the posts I have kept up with that error is often caused by overclocking. |
|
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
From the posts I have kept up with that error is often caused by overclocking. The card is an MSI GTX465 GE unlocked to a 470. I bought it used and had been running it at the settings it had when I bought it. 1.025 vcore, 700 MGz core clock, 1400 shader clock, 1848 memory clock. I thought after the first 2 tasks completed I was good to go but everything else after that failed. This is my first real dive into Nvidia GPUs and I'm finding that what works on ATI doesn't work on Nvidia. I've been trying different things but it seems that all the card needed was a little bump in the vcore voltage. I'm 3 1/2 hours into a long run task so we'll see how that goes. If it and the 1 in cache make it to validation then I'll try the 3.1 to 4.2 work around or maybe someone can point me to an app_info that will do the trick. Thanks for the help. |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 57 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I managed to get this running on a windows xp computer, with one video card, with no problem, by following the instructions. On a windows 7 computer with 2 video cards, it doesn't work for me. I followed the instructions, made sure the project and slot directories were correct,(mine were in a different location than listed on the postings, so I adjusted commands to those appropriate locations) and ran as administrator. Nothing! |
SMTB1963Send message Joined: 27 Jun 10 Posts: 38 Credit: 524,420,921 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Works for me on 2 XP machines and a W7 machine. Great post Retvari. Thanks ! |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It's working! I'm re-posting it to sum up some anwers, and the corrections in a single post: You have to have CUDA4.2 capable NVidia drivers (v295.73 or later, v301.42 is recommended at the moment) You have to have the CUDA4.2 application (acemd.2562.cuda42), and the CUDA4.2 runtime dll's (cudart32_42_9.dll, cufft32_42_9.dll) on your machine for this to work. I.e. your host should have downloaded at least one CUDA4.2 workunit before you can apply this workaround. If you are not sure about this, you can check if your host already has the 3 required files in the project directory. The default location of these files is: on Windows XP: c:\Documents and Settings\All Users\Application Data\BOINC\projects\www.gpugrid.net\ on Windows 7: c:\ProgramData\BOINC\projects\www.gpugrid.net\ The easiest way to verify whether your own installation is using the default location is to look at the BOINC Manager message/event log: the working BOINC data directory is listed at around the fourth line after every BOINC restart. (you can get the correct path from that line) Workaround V4.1: 1a. Put <dont_check_file_sizes>1</dont_check_file_sizes> in the options section in your cc_config.xml file 1b. If you don't have a cc_config.xml file, create a simple text file named cc_config.xml with the following content: <cc_config> <options> <report_results_immediately>1</report_results_immediately> <dont_check_file_sizes>1</dont_check_file_sizes> </options> </cc_config> ---the default location of the cc_config.xml file is ---on Windows XP: c:\Documents and Settings\All Users\Application Data\BOINC\ ---on Windows 7: c:\ProgramData\BOINC\ 2. Re-read configuration in BOINC manager 3. Pause all CUDA3.1 tasks 4. If you've installed the BOINC manager to it's default location, you can run one of the following batch programs according to your type of Windows. Or you can manually copy the executable file of CUDA4.2 client over the CUDA3.1 client: acemd.2562.cuda42 -> acemd.win.2352 (the batch program can work on a non-default location too, if the path in the first line is corrected) 4a. On Windows XP run this batch program: SET GPUGRIDDIR=c:\Documents and Settings\All Users\Application Data\BOINC\projects\www.gpugrid.net\ COPY "%GPUGRIDDIR%acemd.2562.cuda42" "%GPUGRIDDIR%acemd.win.2352" /y 4b. On Windows 7 run this batch program (right click) as an administrator: SET GPUGRIDDIR=c:\ProgramData\BOINC\projects\www.gpugrid.net\ COPY "%GPUGRIDDIR%acemd.2562.cuda42" "%GPUGRIDDIR%acemd.win.2352" /y 5. Restart all CUDA3.1 tasks. After that, every CUDA3.1 task will run as CUDA4.2, while the BOINC manager will still show they are CUDA3.1. It's because the BOINC manager doesn't know that it's a CUDA 4.2 client, so it won't copy the CUDA4.2 dlls into the slot directory, but it doesn't matter, because the BOINC manager runs the client from the project's directory, not from the slot's (and doesn't copy it back from the slot directory). If it's causing any trouble, you should simply copy the CUDA4.2 dll's into the Slot directory with my previous batch program from the GPUGrid project directory (make backups before doing so). Thanks to every contributor! |
|
Send message Joined: 19 Jun 12 Posts: 11 Credit: 51,704,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Retvari, Thank you! I received a long 31 wu two days ago. Est time to complete was over 15.5 hours. No way for me to complete within 24 hour bonus period so I aborted before it started. Yesterday I was assigned another long 31 wu with about 14.5 hours est to complete. It was going to sneak in under the 24 hour mark so I let it run. It started about 1:00 am today my time. When I checked this morning, the 31 wu had been running for over 8 hours and still had 11 hours est to complete. I was about to abort, thought I'd check the forums first, and found your workaround. Wow! After implementing and restarting the task, the wu is humming along. Time to completion is reduced by 30 mins after just 14 mins of processing. Of course, the wu still has to validate, but I am optimistic that it will do so and will qualify for 24 hr bonus. I'm a noob at any kind of programming, and really far better with hardware than software, so I really appreciate the detail in your instructions. I'll let you know if the task validates. Thanks again! |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Of course, the wu still has to validate, but I am optimistic that it will do so and will qualify for 24 hr bonus. All of my converted workunits have been validated. I'm sure that yours will validate too. I'm a noob at any kind of programming, and really far better with hardware than software, so I really appreciate the detail in your instructions. You're welcome! It's my pleasure if my workaround helps you. I'll let you know if the task validates. I'll keep an eye on it. |
|
Send message Joined: 19 Jun 12 Posts: 11 Credit: 51,704,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
The wu validated and I feel really good about my little programming venture, even though all I did was follow your instructions. Thanks again! |
©2025 Universitat Pompeu Fabra