ACEMD2 6.12 cuda and 6.13 cuda31 for windows and linux

Author	Message
Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 19429 - Posted: 11 Nov 2010, 18:25:03 UTC - in response to Message 19428. Running CUDA 3.1 on GT240s does not presently offer high resource utilization or credit. On a quad GT240 system I have been running tasks on the CUDA 3.1 app for several days. The run times vary between 70,000sec and 107,000sec. Only the KASHIF_HIVPR tasks have a reasonably short runtime and finish between 70000sec and 75,000sec (within 24h). However, even these tasks are about 38% slower than the 6.05 app; KASHIF_HIVPR took between 51,000sec and 54,000sec. For GT240’s on Windows, 6.12 is on average (over different tasks) about 8% slower than 6.05, and 6.11 is around 43% slower than 6.05. At least. Does this slowdown include using syan_sync and losing a CPU core with 6.11 & 6.12? ID: 19429 · Rating: 0 · rate: / Reply Quote

Bikermatt Send message Joined: 8 Apr 10 Posts: 37 Credit: 4,431,457,619 RAC: 36,378 Level Scientific publications	Message 19431 - Posted: 11 Nov 2010, 20:33:31 UTC In Linux app 6.12 driver 260.19.12 GT 240 at stock clocks: KASHIF_HIVPR tasks ~42ms per step IBUCH_*_pYEEI tasks: ~38ms per step My first 6.13 app just finished in Linux on a GTX 470 driver 260.19.12 also. It was an IBUCH and it ran about 800 sec longer than the 6.06 app, so not good for this one but we will see after a few different tasks have run. ID: 19431 · Rating: 0 · rate: / Reply Quote

[AF>Libristes>GNU-Linux] xipeh... Send message Joined: 30 Nov 08 Posts: 2 Credit: 3,479,719 RAC: 0 Level Scientific publications	Message 19432 - Posted: 11 Nov 2010, 20:47:27 UTC - in response to Message 19431. The 6.13 realease seems to have corrected the problem I had on my Linux box. Thanx. But the new WUs are taking 100+ hours on my 9800 GTX+, which doesn't give me any leaway with the 5 days deadline. Even if I leave my computer running 24/7 (which I usually do), I'm not even sure I'll meet that deadline every time, as I sometimes have to double boot on windows to let my children play a couple of games. Any chance of having shorter WUs for the low end graphic cards ? ID: 19432 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level Scientific publications	Message 19433 - Posted: 11 Nov 2010, 21:31:55 UTC - in response to Message 19417. The 260-drivers are not in a repository, and all I've read about the manual installation of nvidia drivers in different Linux boards is that you have to maintain and babysit your machine very very close, and the possibility to loose the screen completely after a kernel update seems to be a very likely possibility. I've found a way to install the new drivers via a packet manager and DKMS (Thank you Ralf Recker): This is an address for a PPA (Personal Package Archive), thus nothing official, but it worked: https://launchpad.net/~ubuntu-x-swat/+archive/x-updates I had to restart the computer afterwards, as the update included some stuff close to the kernel, but so what. Now I've got the 260.19.12 installed, but the WU isn't running any faster. Perhaps I have to get a new one, as this one had already started, I'll let you know. Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki ID: 19433 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 19435 - Posted: 11 Nov 2010, 21:45:32 UTC - in response to Message 19433. Tomorrow, I will run benchmarks on our g200 systems. We practically run only on Fermi now in the lab. Every time that there is a change in application, it costs a lot of effort to tune the systems for you guys and for us, but it is a necessary step to move forward the application and to keep the pace with new drivers and cards. gdf ID: 19435 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19438 - Posted: 12 Nov 2010, 0:29:30 UTC - in response to Message 19435. I would like to have an onsite option whereby the cruncher could select to run specific applications according to their set systems member profile. The researchers could still set the default app as is, by driver, but allow an override for the cruncher to select online. That way we could use whatever driver we deem useful to our individual needs (within the app requirements). It would for example allow some people to use a 260.99 driver and run the cuda (6.12 or 6.13) app. There are many normal computer usage reasons to have different (mostly newer) drivers (than 195 for example), and obvious benefits to the project; crunchers that know what they are doing naturally want to optimize for performance. It would certainly be handier for me to use an up to date driver and run the 6.12 app than to use a 4 port KVM and better for the project; a 43% increase in that systems productivity. ID: 19438 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 19439 - Posted: 12 Nov 2010, 1:45:59 UTC - in response to Message 19438. I would like to have an onsite option whereby the cruncher could select to run specific applications according to their set systems member profile. The researchers could still set the default app as is, by driver, but allow an override for the cruncher to select online. That way we could use whatever driver we deem useful to our individual needs (within the app requirements). It would for example allow some people to use a 260.99 driver and run the cuda (6.12 or 6.13) app. There are many normal computer usage reasons to have different (mostly newer) drivers (than 195 for example), and obvious benefits to the project; crunchers that know what they are doing naturally want to optimize for performance. It would certainly be handier for me to use an up to date driver and run the 6.12 app than to use a 4 port KVM and better for the project; a 43% increase in that systems productivity. Seconded ID: 19439 · Rating: 0 · rate: / Reply Quote

Bikermatt Send message Joined: 8 Apr 10 Posts: 37 Credit: 4,431,457,619 RAC: 36,378 Level Scientific publications	Message 19440 - Posted: 12 Nov 2010, 3:17:06 UTC GTX 470 driver 260.19.12 IBUCH__pYEEI tasks: 6.06 app ~ 11.2ms per step 6.13 app ~ 11.8ms per step The one input_-TONI task that I have ran was .7ms per step slower also. On a positive note, I do have a GTX 460 running in Linux now, I can't imagine it could perform worse than it did with the 6.11 app in Win7. ID: 19440 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level Scientific publications	Message 19442 - Posted: 12 Nov 2010, 8:33:38 UTC - in response to Message 19433. Now I've got the 260.19.12 installed, but the WU isn't running any faster. Perhaps I have to get a new one, as this one had already started, I'll let you know. It looks like it went considerably faster nevertheless. p249-IBUCH_1_pYEEI_101109-0-20-RND9042_0 What I've done besides installing a new driver was, after another hint by Ralf via PM to set the nice-value to 0, and the temperature went up from 40° to 55°. I'm at work atm, will say more once I'm at my puter again in the evening. Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki ID: 19442 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19443 - Posted: 12 Nov 2010, 12:59:51 UTC - in response to Message 19442. Bikermatt, a week or so ago nobody could even use a GTX460 with Linux. Let’s hope some of our slowdown is offset by new Linux crunchers. Your GTX460 takes 21ms per step for an IBUCH task under Linux and 32ms under Win7. Although there might be some WU difference most of your Win7 tasks take around 29 to 32ms per step. Anyone tried a GTS450 on Linux? ID: 19443 · Rating: 0 · rate: / Reply Quote

CElliott Send message Joined: 29 Oct 10 Posts: 4 Credit: 2,675,358 RAC: 0 Level Scientific publications	Message 19447 - Posted: 12 Nov 2010, 18:50:46 UTC I think I may know why I am seeing frequent errors with the acemd2_6.11_windows_intelx86__cuda31.exe client. Every now and then one of my GPUs becomes very slow -- the GTX 250 in a computer that has a 250 and a GTX 460. I think it may be overclocked too much, and I have slowed it down, and it did work well with Seti@Home. I can tell it is not doing anything because the fraction done only advances about 0.010% per minute, whereas normal progress is about 0.08% per minute, and the GPU temperature is only about 36 degrees C, with a normal of about 46. The solution is to reboot the computer and a reset may help also. However, when the system restarts the workunit always errors out. The last time I watched it in Windowns XP Task Manager and Boincmgr. It appears that what "may" have happened is that Boinc tried to start the app on device 0 (the 460) several times, and then tried device 1 (the 250), which already had a task running. Here is the <stderr> section: <stderr_out> <![CDATA[ <message> The system cannot find the path specified. (0x3) - exit code 3 (0x3) </message> <stderr_txt> # Using device 1 # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 460" # Clock rate: 2.05 GHz # Total amount of global memory: 1073283072 bytes # Number of multiprocessors: 7 # Number of cores: 56 # Device 1: "GeForce GTS 250" # Clock rate: 1.78 GHz # Total amount of global memory: 1073545216 bytes # Number of multiprocessors: 16 # Number of cores: 128 MDIO ERROR: cannot open file "restart.coor" # Using device 0 # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 460" # Clock rate: 2.05 GHz # Total amount of global memory: 1073283072 bytes # Number of multiprocessors: 7 # Number of cores: 56 # Device 1: "GeForce GTS 250" # Clock rate: 1.78 GHz # Total amount of global memory: 1073545216 bytes # Number of multiprocessors: 16 # Number of cores: 128 # Using device 0 # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 460" # Clock rate: 2.05 GHz # Total amount of global memory: 1073283072 bytes # Number of multiprocessors: 7 # Number of cores: 56 # Device 1: "GeForce GTS 250" # Clock rate: 1.78 GHz # Total amount of global memory: 1073545216 bytes # Number of multiprocessors: 16 # Number of cores: 128 SWAN : FATAL : Failure executing kernel sync [swan_fast_fill] [700] Assertion failed: 0, file swanlib_nv.c, line 124 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. </stderr_txt> I see this error message often: 'MDIO ERROR: cannot open file "restart.coor"', and as far as I can tell, restart.coor is always where it should be. Is it possible that the app times out trying to read restart.coor when the hard disk is busy? ID: 19447 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 19452 - Posted: 13 Nov 2010, 11:34:53 UTC - in response to Message 19447. The restart is not an actual error, just ignore it. 2.05 GHz is quite high for a GTX460, though. Are you running at elevated temperatures and extreme cooling? In case of problems with that setup I suggest lowering the clock as a first step. MrS Scanning for our furry friends since Jan 2002 ID: 19452 · Rating: 0 · rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level Scientific publications	Message 19453 - Posted: 13 Nov 2010, 11:42:59 UTC - in response to Message 19442. What I've done besides installing a new driver was, after another hint by Ralf via PM to set the nice-value to 0, and the temperature went up from 40° to 55°. I noticed the linux ver 6.13 app has a default nice value of 10. At nice=10, my gtx460 GPU temp is 44C and %cpu for the app is close to 0. If I manually set nice to 5, I see the same low temp and %cpu. If I set the nice value to 4, GPU temp jumps to 58C and %cpu=12. Would it be possible to distribute the app at a default nice value of 4? 64 bit Ubuntu 10.04 boinc 6.10.58 GTX460 nvidia driver version 260.19.14 Thanks much for developing a linux app that works with fermi cards!! ID: 19453 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 19454 - Posted: 13 Nov 2010, 14:53:49 UTC - in response to Message 19453. What I've done besides installing a new driver was, after another hint by Ralf via PM to set the nice-value to 0, and the temperature went up from 40° to 55°. I noticed the linux ver 6.13 app has a default nice value of 10. At nice=10, my gtx460 GPU temp is 44C and %cpu for the app is close to 0. If I manually set nice to 5, I see the same low temp and %cpu. If I set the nice value to 4, GPU temp jumps to 58C and %cpu=12. Would it be possible to distribute the app at a default nice value of 4? On my Windows machines I also boost the priority (via eFMer Priority64), generally to "high". GPUGRID runs much faster and still only uses a small portion of 1 CPU, no SWAN_SYNC needed. ID: 19454 · Rating: 0 · rate: / Reply Quote

Bikermatt Send message Joined: 8 Apr 10 Posts: 37 Credit: 4,431,457,619 RAC: 36,378 Level Scientific publications	Message 19455 - Posted: 13 Nov 2010, 15:43:35 UTC The 6.13 app is running slower on my GTX 460 in Win7 compared to the 6.11 app. The GTX 460 is running good in Linux on the 6.13 app, so far I am seeing around 21ms per step for the IBUCH tasks. p2-IBUCH_15_PQpYEEIPI_101019-14-40-RND7762_2 # Time per step (avg over 1250000 steps): 33.258 ms # Approximate elapsed time for entire WU: 41573.039 s application version ACEMD2: GPU molecular dynamics v6.13 (cuda31) p25-IBUCH_3_PQpYEEIPI_101019-14-40-RND3646_1 # Time per step (avg over 275000 steps): 31.505 ms # Approximate elapsed time for entire WU: 39381.489 s application version ACEMD2: GPU molecular dynamics v6.11 (cuda31) ID: 19455 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19456 - Posted: 13 Nov 2010, 18:02:35 UTC - in response to Message 19454. Tested my GTX260 (at factory settings) on Kubuntu. While running a nice fast GIANNI task without setting swan_sync to zero it was very slow; after 12h it had not reached 50% complete (47.5%). So it would not have finished within 24h. I then freed up a CPU core, configured swan_sync=0, restarted and the task sped up considerably: It finished in about 15½h, suggesting the task would have finished in around 7h if I had used swan_sync from the start. Just under 12ms per step. I’m now running one of the slower IBUCH tasks, but it should still finish in around 9h 40min. The faster tasks use to take my GTX260 around 6½h on XP, when the card was overclocked and a CPU core freed up. Bikermatt, have you left a CPU core free for your GTX460 on Win7? You might also want to try Beyond's method of incresing priority using eFMer Priority64. ID: 19456 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 19457 - Posted: 13 Nov 2010, 18:59:19 UTC - in response to Message 19456. Tested my GTX260 (at factory settings) on Kubuntu. While running a nice fast GIANNI task without setting swan_sync to zero it was very slow; after 12h it had not reached 50% complete (47.5%). So it would not have finished within 24h. I then freed up a CPU core, configured swan_sync=0, restarted and the task sped up considerably: It finished in about 15½h, suggesting the task would have finished in around 7h if I had used swan_sync from the start. Just under 12ms per step. Have you tried setting the nice level as suggested above by several people. It would be interesting to compare this to using SWAN_SYNC. I know of no other DC projects that act this way. Suspect that it's a programming issue that needs to be addressed. ID: 19457 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 19458 - Posted: 13 Nov 2010, 19:04:32 UTC - in response to Message 19457. The GIANNI tasks are using the new algorithm for faster speed. It is a test. Hopefully, soon all the simulations will use that. It should be quite a bit faster on every cards. gdf Tested my GTX260 (at factory settings) on Kubuntu. While running a nice fast GIANNI task without setting swan_sync to zero it was very slow; after 12h it had not reached 50% complete (47.5%). So it would not have finished within 24h. I then freed up a CPU core, configured swan_sync=0, restarted and the task sped up considerably: It finished in about 15½h, suggesting the task would have finished in around 7h if I had used swan_sync from the start. Just under 12ms per step. Have you tried setting the nice level as suggested above by several people. It would be interesting to compare this to using SWAN_SYNC. I know of no other DC projects that act this way. Suspect that it's a programming issue that needs to be addressed. ID: 19458 · Rating: 0 · rate: / Reply Quote

BorgHunter Send message Joined: 2 Mar 10 Posts: 1 Credit: 140,175,416 RAC: 0 Level Scientific publications	Message 19459 - Posted: 14 Nov 2010, 1:27:57 UTC I've been having a problem starting with 6.12 where my GPU utilization is only around 10%. I run Rosetta and WCG on my four CPU cores, and previously I never had a problem with GPUGRID; it'd have healthy GPU utilization and I'd speed along nicely with tasks. Starting with 6.12, and including 6.13 (cuda31), I need to tell BOINC to only allocate 75% of my CPU cores (i.e. 3 of my 4 cores), then my GPU utilization jumps to around 80%. Here's my uname: borghunter@apollo ~ $ uname -a Linux apollo 2.6.35-ARCH #1 SMP PREEMPT Sat Oct 30 21:22:26 CEST 2010 x86_64 Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz GenuineIntel GNU/Linux Here's what 75% CPU allocation in BOINC looks like in nvidia-smi: borghunter@apollo ~ $ nvidia-smi -a ==============NVSMI LOG============== Timestamp : Sat Nov 13 19:18:28 2010 Driver Version : 260.19.21 GPU 0: Product Name : GeForce GTX 275 PCI Device/Vendor ID : 5e610de PCI Location ID : 0:1:0 Board Serial : 212899432077126 Display : Connected Temperature : 70 C Fan Speed : 40% Utilization GPU : 78% Memory : 16% And this is with 100% CPU allocation: borghunter@apollo ~ $ nvidia-smi -a ==============NVSMI LOG============== Timestamp : Sat Nov 13 19:26:30 2010 Driver Version : 260.19.21 GPU 0: Product Name : GeForce GTX 275 PCI Device/Vendor ID : 5e610de PCI Location ID : 0:1:0 Board Serial : 212899432077126 Display : Connected Temperature : 63 C Fan Speed : 40% Utilization GPU : 9% Memory : 2% ID: 19459 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 19460 - Posted: 14 Nov 2010, 9:24:41 UTC - in response to Message 19459. HI, this is because from this version the default is not to use a full CPU to drive the GPU. If you want to get back as before, just add to your .bashrc, export SWAN_SYNC=0 gdf I've been having a problem starting with 6.12 where my GPU utilization is only around 10%. I run Rosetta and WCG on my four CPU cores, and previously I never had a problem with GPUGRID; it'd have healthy GPU utilization and I'd speed along nicely with tasks. Starting with 6.12, and including 6.13 (cuda31), I need to tell BOINC to only allocate 75% of my CPU cores (i.e. 3 of my 4 cores), then my GPU utilization jumps to around 80%. Here's my uname: borghunter@apollo ~ $ uname -a Linux apollo 2.6.35-ARCH #1 SMP PREEMPT Sat Oct 30 21:22:26 CEST 2010 x86_64 Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz GenuineIntel GNU/Linux Here's what 75% CPU allocation in BOINC looks like in nvidia-smi: borghunter@apollo ~ $ nvidia-smi -a ==============NVSMI LOG============== Timestamp : Sat Nov 13 19:18:28 2010 Driver Version : 260.19.21 GPU 0: Product Name : GeForce GTX 275 PCI Device/Vendor ID : 5e610de PCI Location ID : 0:1:0 Board Serial : 212899432077126 Display : Connected Temperature : 70 C Fan Speed : 40% Utilization GPU : 78% Memory : 16% And this is with 100% CPU allocation: borghunter@apollo ~ $ nvidia-smi -a ==============NVSMI LOG============== Timestamp : Sat Nov 13 19:26:30 2010 Driver Version : 260.19.21 GPU 0: Product Name : GeForce GTX 275 PCI Device/Vendor ID : 5e610de PCI Location ID : 0:1:0 Board Serial : 212899432077126 Display : Connected Temperature : 63 C Fan Speed : 40% Utilization GPU : 9% Memory : 2% ID: 19460 · Rating: 0 · rate: / Reply Quote