ACEMD2 6.12 cuda and 6.13 cuda31 for windows and linux

Author	Message
ftpd Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level Scientific publications	Message 19461 - Posted: 14 Nov 2010, 9:51:32 UTC Hi Gianni, My gtx295 card uses max 50-56% of gpu. My gtx480 card uses max 48% of gpu. Temp = OK All machines Windows XP-pro - 260.99 driver - boinc 06.10.58 Running application 6.13 (cuda31). Very slow!! Correct? Ton (ftpd) Netherlands ID: 19461 · Rating: 0 · rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level Scientific publications	Message 19462 - Posted: 14 Nov 2010, 11:09:33 UTC I've had 2 of the KASHIF WUs fail with this output: <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 460" # Clock rate: 1.53 GHz # Total amount of global memory: 804454400 bytes # Number of multiprocessors: 7 # Number of cores: 56 SIGABRT: abort called Stack trace (13 frames): ../../projects/www.gpugrid.net/acemd2_6.13_x86_64-pc-linux-gnu__cuda31(boinc_catch_signal+0x4d)[0x47d48d] /lib/libc.so.6(+0x33af0)[0x7f9be5b11af0] /lib/libc.so.6(gsignal+0x35)[0x7f9be5b11a75] /lib/libc.so.6(abort+0x180)[0x7f9be5b155c0] ../../projects/www.gpugrid.net/acemd2_6.13_x86_64-pc-linux-gnu__cuda31[0x48abeb] ../../projects/www.gpugrid.net/acemd2_6.13_x86_64-pc-linux-gnu__cuda31[0x433d50] ../../projects/www.gpugrid.net/acemd2_6.13_x86_64-pc-linux-gnu__cuda31[0x430246] ../../projects/www.gpugrid.net/acemd2_6.13_x86_64-pc-linux-gnu__cuda31[0x42f957] ../../projects/www.gpugrid.net/acemd2_6.13_x86_64-pc-linux-gnu__cuda31[0x41480d] ../../projects/www.gpugrid.net/acemd2_6.13_x86_64-pc-linux-gnu__cuda31[0x407ae0] ../../projects/www.gpugrid.net/acemd2_6.13_x86_64-pc-linux-gnu__cuda31[0x408346] /lib/libc.so.6(__libc_start_main+0xfd)[0x7f9be5afcc4d] ../../projects/www.gpugrid.net/acemd2_6.13_x86_64-pc-linux-gnu__cuda31[0x407849] Exiting... </stderr_txt> ]]> One KASHIF WU finished though: <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 460" # Clock rate: 1.53 GHz # Total amount of global memory: 804454400 bytes # Number of multiprocessors: 7 # Number of cores: 56 MDIO ERROR: cannot open file "restart.coor" # Time per step (avg over 1000000 steps): 28.011 ms # Approximate elapsed time for entire WU: 28010.660 s 22:21:57 (4713): called boinc_finish </stderr_txt> ]]> 64 bit Ubuntu 10.04 boinc 6.10.58 GTX460 nvidia driver version 260.19.14 ID: 19462 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 19463 - Posted: 14 Nov 2010, 11:39:56 UTC - in response to Message 19462. On linux, at normal priority 0 and with SWAN_SYNC=0, the %GPU used should be around 95%. gdf ID: 19463 · Rating: 0 · rate: / Reply Quote

Tom Philippart Send message Joined: 12 Feb 09 Posts: 57 Credit: 23,376,686 RAC: 0 Level Scientific publications	Message 19464 - Posted: 14 Nov 2010, 13:08:31 UTC - in response to Message 19461. Hi Gianni, My gtx295 card uses max 50-56% of gpu. My gtx480 card uses max 48% of gpu. Temp = OK All machines Windows XP-pro - 260.99 driver - boinc 06.10.58 Running application 6.13 (cuda31). Very slow!! Correct? around 50% usage on a gtx260 under win7 too! ID: 19464 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 19465 - Posted: 14 Nov 2010, 13:29:12 UTC Last modified: 14 Nov 2010, 13:29:37 UTC If you use one cpu core by default people complain that they want their cores back. If you try not to use it, GPU performance suffers and peolpe complain. Personally I'd go with the latter, but one thing is clear: deciding for either version will never satisfy everyone. GDF, you said you're talking with David about adding some feature to BOINC so it will be easier to configure the behaviour than using an environment variable. Couldn't you use methods already existing in BOINC? For example in Rosetta under "my account/Rosetta@Home settings" they've added a setting "Target CPU run time" where I can decide on any number of hours. Soemhow this gets passed to the application. And it's tied to the 4 profiles, so one can make dfferent choices for different PCs. I think that's just what we need. And probably set the default to "use one core", as SWAN_SYNC=1 appears to have a heavy impact on GPU performance. MrS Scanning for our furry friends since Jan 2002 ID: 19465 · Rating: 0 · rate: / Reply Quote

Bobrr Send message Joined: 2 Jul 10 Posts: 7 Credit: 28,599,565 RAC: 0 Level Scientific publications	Message 19469 - Posted: 14 Nov 2010, 17:37:49 UTC - in response to Message 19443. I have a GTS 450 w/1024Mb DDR5 'Processor: 1 AuthenticAMD AMD Athlon(tm) 64 Processor 3200+ [Family 15 Model 47 Stepping 0] Processor: 512.00 KB cache Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up rep_good pni lahf_lm OS: Linux: 2.6.35-22-generic Memory: 1.96 GB physical, 5.74 GB virtual Disk: 682.02 GB total, 639.43 GB free' It doesn't seem to be recognised in GPUGRID. ID: 19469 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19472 - Posted: 14 Nov 2010, 18:13:06 UTC - in response to Message 19469. BobR, you will need the latest NVidia driver. ID: 19472 · Rating: 0 · rate: / Reply Quote

Bobrr Send message Joined: 2 Jul 10 Posts: 7 Credit: 28,599,565 RAC: 0 Level Scientific publications	Message 19476 - Posted: 14 Nov 2010, 19:03:26 UTC - in response to Message 19472. Latest drivers installed and working. 'OpenGL version string: 4.1.0 NVIDIA 260.19.21' Any other clues. ID: 19476 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level Scientific publications	Message 19477 - Posted: 14 Nov 2010, 19:06:02 UTC - in response to Message 19476. Latest drivers installed and working. 'OpenGL version string: 4.1.0 NVIDIA 260.19.21' Any other clues. Ensure that all graphics drivers have finished loading before BOINC starts. Other Linux users (I'm not one, I just read the posts on the boards) sometimes suggest a delay in the BOINC startup script. ID: 19477 · Rating: 0 · rate: / Reply Quote

Bobrr Send message Joined: 2 Jul 10 Posts: 7 Credit: 28,599,565 RAC: 0 Level Scientific publications	Message 19478 - Posted: 14 Nov 2010, 19:26:13 UTC - in response to Message 19477. The drivers were installed last week, and the system including BOINC was rebooted since. '# nvidia-xconfig: X configuration file generated by nvidia-xconfig # nvidia-xconfig: version 260.19.21 (buildmeister@builder101) Thu Nov 4 21:47:28 PDT 2010' Did I miss anything? ID: 19478 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level Scientific publications	Message 19479 - Posted: 14 Nov 2010, 19:43:38 UTC - in response to Message 19478. The drivers were installed last week, and the system including BOINC was rebooted since. '# nvidia-xconfig: X configuration file generated by nvidia-xconfig # nvidia-xconfig: version 260.19.21 (buildmeister@builder101) Thu Nov 4 21:47:28 PDT 2010' Did I miss anything? Your host 84614 is not reporting any coprocessors visible to BOINC. You probably need to wait for a Linux BOINC specialist to advise you on that problem. ID: 19479 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19481 - Posted: 15 Nov 2010, 0:36:50 UTC - in response to Message 19479. BobR, your GPU is not listed here and there might be a problem with that, unless you just took the card out before posting, or uninstalled the driver. When I looked at your task list I can see that you did download and run many tasks when the GPU was present. The problem is that they all failed. Most ran and immediately failed on the v6.06 (cuda30) application, which does not work on a GTS450, giving you a bad credit rating. As a result you will only get one new task a day until you start completing tasks. http://www.gpugrid.net/results.php?hostid=84614 Maximum daily WU quota per CPU 1/day, http://www.gpugrid.net/show_host_detail.php?hostid=84614 This task ran for 31h before failing, with the error message "process got signal 11." The cuda3.1 app should work but that task time is too long. To give yourself a good chance of finishing a task you should leave a CPU core free in Boinc Manager and use swan_sync=0. This will significantly reduce run time; cut it in half at least. ID: 19481 · Rating: 0 · rate: / Reply Quote

Bobrr Send message Joined: 2 Jul 10 Posts: 7 Credit: 28,599,565 RAC: 0 Level Scientific publications	Message 19482 - Posted: 15 Nov 2010, 4:22:01 UTC - in response to Message 19481. Thank you for the explanation. It does explain why the host was not doing so good. I purchased this card specifically to provide more computing power as it was touted to be a great double precision capable card. It should have increased the power to produce more results especially in Linux. I understand htis card has compute capability 2.3.Alos, according to 2 other BOINC projects (Einsstein and Milky), this card also doesn't show up in the summary. How do I determine the projects to run to take best advantage of the card. I'm not worried about points or credits, just being able to work on WU's as efficiently as possible. Thank you. ID: 19482 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19484 - Posted: 15 Nov 2010, 9:40:11 UTC - in response to Message 19482. Hi BobR, As your GPU is not listed here, at Einstein and at MW, I would say it is not being recognised, but perhaps this is a server reporting issue (it is a new card type, GF106). What does the 13th line down of the Boinc Manager Messages say about the card? (post back). It should look something like this, 14/11/2010 15:50:57 NVIDIA GPU 0: GeForce GTS 450 (driver version 260.19, CUDA version 3100, compute capability 2.1, 1024MB, 602 GFLOPS peak). I thought the GTS450 is a CC2.1 card, but if it is a CC2.2 card there might need to be an application update before it works, but I cannot confirm this, and I am not sure if Boinc is identifying card types by Compute Capability. Apps are selected according to CC at Folding. On Einstein your tasks mostly failed, but even the ones that appeared to run actually had issues, “Error writing shared memory data (size limit exceeded)!” was listed repeatedly on the task details. This suggests the project does not support you card as yet, but you would need to confirm that over at Einstein. If I am correct, the GTS450 can use up to 2GB system memory to backup the cards 1GB memory. You have not run any tasks at MW. Suggest you ask if the GTS450 works there before trying. As you have a 3200+ CPU, I would reiterate, do not use it to crunch alongside the GPU; just crunch with the GPU. The CPU is needed to support the GPU and run the system, and that CPU would not do anywhere near the work of that GPU. I still think your GTS450 might work at GPUGrid now, but you would need to set the swan_sync to zero, and not use the CPU core to crunch with for other projects; for it to run for 31h suggests it can run. Of course because this is a new GF106 design you might need to wait for a CUDA3.2 based app. I think you should set swan_sync to equal zero in your .bashrc, stop crunching CPU projects, restart and give it a go again here. There is no public GPUGrid app that needs over 500MB GDDR space to run. As for which projects to crunch on, that’s up to you to decide – do you prefer astrophysics, nuclear research, maths or projects such as this one which advances science, computer modelling and medical research? PS. So that nobody falls for the NVidia OEM Trap: DO NOT buy an OEM GTS450 – they only have 144shaders (not 192) and 3 SM’s not 4. It is basically a GTS440 with DDR5 and reasonable clock rates. As for the GTS430, it would not be any better than a GT240, so it is not worth buying to crunch with. ID: 19484 · Rating: 0 · rate: / Reply Quote

Bobrr Send message Joined: 2 Jul 10 Posts: 7 Credit: 28,599,565 RAC: 0 Level Scientific publications	Message 19485 - Posted: 15 Nov 2010, 14:02:06 UTC - in response to Message 19484. Line 13 reads: 'Sat 13 Nov 2010 08:00:09 PM EST No usable GPUs found' Interestingly enough, that was the message a few days ago, after the install and before I updated the drivers. The messages from Einstein and GPUGRID are: 'Sat 13 Nov 2010 08:00:09 PM EST Einstein@Home Application uses missing NVIDIA GPU' 'Sat 13 Nov 2010 08:00:10 PM EST GPUGRID Application uses missing NVIDIA GPU' Which is confusing. I will get in touch with Einstein@home also. 'If I am correct, the GTS450 can use up to 2GB system memory to backup the cards 1GB memory' I may have to UP my RAM! As to MW, it hasn't run any tasks since before the new card was installed. Interesting also. As to the CPU, I'll work on changing the setup as recommended. As to the projects, I was enquiring as to which would support the GTS450 best, not which to crunch. I'll sort that out also. Thank you for your support and keep up the good word.I saw an article in Einstein@Home asking for suggestions as to how to increase the number of participants. In Canada and I'm sure across most of the participating nations, libraries have computers sitting idle or mostly idle. They would be a good source for expanding the systems. I will be bringing this up with my local library. I encourage anyone else reading this to do so also. ID: 19485 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 19486 - Posted: 15 Nov 2010, 15:02:43 UTC - in response to Message 19485. I guess you need to install the drivers again. ID: 19486 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level Scientific publications	Message 19487 - Posted: 15 Nov 2010, 15:08:38 UTC - in response to Message 19486. I guess you need to install the drivers again. And, as I said a few posts ago, ensure that the drivers have time to load and fully initialise before BOINC tries to use them. This is clearly a BOINC/driver issue, since it affects all projects equally. ID: 19487 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 19489 - Posted: 15 Nov 2010, 18:18:36 UTC - in response to Message 19458. The GIANNI tasks are using the new algorithm for faster speed. It is a test. Hopefully, soon all the simulations will use that. It should be quite a bit faster on every cards. gdf Tested my GTX260 (at factory settings) on Kubuntu. While running a nice fast GIANNI task without setting swan_sync to zero it was very slow; after 12h it had not reached 50% complete (47.5%). So it would not have finished within 24h. I then freed up a CPU core, configured swan_sync=0, restarted and the task sped up considerably: It finished in about 15½h, suggesting the task would have finished in around 7h if I had used swan_sync from the start. Just under 12ms per step. Have you tried setting the nice level as suggested above by several people. It would be interesting to compare this to using SWAN_SYNC. I know of no other DC projects that act this way. Suspect that it's a programming issue that needs to be addressed. GDF, the GIANNI_DHFR500 tasks are running well here with boosted priority and no SWAN_SYNC even on my GT 240 cards. SK, I tested the GIANNI_DHFR500 WUs with no SWAN_SYNC myself. My GT 240 card with a modest shader OC to 1500MHz ran these 2 GIANNI_DHFR500 WUs in 41772.922 and 41906.609 seconds. GPU usage was a steady 89%. This was without setting SWAN_SYNC, just boosting the GPUGRID app's priority to high. All 4 CPU projects ran at normal speeds as did AQUA. No cores wasted. http://www.gpugrid.net/result.php?resultid=3293056 http://www.gpugrid.net/result.php?resultid=3277797 You have one machine that also has finished 2 GIANNI_DHFR500 WUs, I assume using SWAN_SYNC. The times are considerably slower even though you are using a higher OC to 1600MHz. The times are 49102.764 and 46696.179 seconds: http://www.gpugrid.net/result.php?resultid=3287168 http://www.gpugrid.net/result.php?resultid=3283754 While SWAN_SYNC obviously works, I think it's akin to hunting mice with an elephant gun. Boosting priority and/or regulating polling via the application seems a more elegant solution. For instance, Collatz uses the following method to control polling: >> Polling behavior for the GPU within the Brook runtime: b (default 1) >> See the option w for starters. If that time has elapsed, the GPU polling starts. This can be done >> by continuously checking if the task has finished (b-1), enabling the fastest runtimes, but potentially >> creating a high CPU load (a bit dependent on driver version). Second possibility is to release the time >> slice allotted by the OS, so other apps can run (b0). The catch is that there is some interaction with >> the priority. The time slice is only released to other tasks of the same priority. So raising the priority >> effectively disables the release and the behavior is virtually identical to setting this parameter >> to -1. If a raised priority and a low CPU time is wanted, one should leave it at the default of 1. This >> suspends the task for at least 1 millisecond, enabling also tasks of lower priority to use the CPU in the >> meantime. One can use also b2 or b3 if one wants a smoother system behavior. >> Possible values: >> b-1: busy waiting >> b0: release time slice to other tasks of same priority >> b1, b2 or b3: release time slice for at least 1, 2, or 3 milliseconds respectively Seems this would have the same effect as SWAN_SYNC without using an entire CPU core. ID: 19489 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 19490 - Posted: 15 Nov 2010, 22:08:03 UTC - in response to Message 19489. Last modified: 15 Nov 2010, 22:08:45 UTC >> Polling behavior for the GPU within the Brook runtime: b (default 1) >> See the option w for starters. If that time has elapsed, the GPU polling starts. This can be done >> by continuously checking if the task has finished (b-1), enabling the fastest runtimes, but potentially >> creating a high CPU load (a bit dependent on driver version). Second possibility is to release the time >> slice allotted by the OS, so other apps can run (b0). The catch is that there is some interaction with >> the priority. The time slice is only released to other tasks of the same priority. So raising the priority >> effectively disables the release and the behavior is virtually identical to setting this parameter >> to -1. If a raised priority and a low CPU time is wanted, one should leave it at the default of 1. This >> suspends the task for at least 1 millisecond, enabling also tasks of lower priority to use the CPU in the >> meantime. One can use also b2 or b3 if one wants a smoother system behavior. >> Possible values: >> b-1: busy waiting >> b0: release time slice to other tasks of same priority >> b1, b2 or b3: release time slice for at least 1, 2, or 3 milliseconds respectively Seems this would have the same effect as SWAN_SYNC without using an entire CPU core. [/quote] The longest kernel in the DHFR workunit is of the order of a couple of milliseconds..., but we can make the priority higher maybe instead of changing the polling method. ID: 19490 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 19492 - Posted: 16 Nov 2010, 3:49:39 UTC - in response to Message 19490. The longest kernel in the DHFR workunit is of the order of a couple of milliseconds..., but we can make the priority higher maybe instead of changing the polling method. I've been setting the priority at high but maybe it could be a bit lower and still get the same boost. ID: 19492 · Rating: 0 · rate: / Reply Quote