GT240 and Linux: niceness and overclocking

Author	Message
Lem Novantotto Send message Joined: 11 Feb 11 Posts: 18 Credit: 377,139 RAC: 0 Level Scientific publications	Message 20434 - Posted: 14 Feb 2011, 10:45:09 UTC - in response to Message 20431. As you know you can add export SWAN_SYNC=0 to your .bashrc file That wouldn't work, skgiven. A good place to set an environment variable is /etc/environment. $ sudo cp /etc/environment /etc/environment.backup $ echo 'SWAN_SYNC=0' \| sudo tee -a /etc/environment Next time boinc will run something, it will do it... "zeroswansyncing". ;) It should, at least. Bye. ID: 20434 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20435 - Posted: 14 Feb 2011, 14:51:15 UTC - in response to Message 20434. Yes, my mistake. I’m working blind here (no Linux) and as you can tell, Linux is not my forte. It’s been months since I used any version. I found swan_sync fairly easy to use on Kubuntu 10.04 but struggled badly with Ubuntu 10.10. The commands are very different, even from Ubuntu 10.04; I had to use nautilus to get anywhere and change lots of security settings. Your entries look close to what I used, but I would have to dig out a notebook to confirm. I’m reluctant to install 10.10 again, because I want to use the 6.12app with my GT240 cards, and found Ubuntu 10.10 too difficult to work with (security, swan_sync and driver issues, lots of updates). Although I could use it with my GTX470 cards, I need to accurately control the fan speed, and if I’m not mistaken it is automatic or 100% (too hot or too loud), and no in between? When I managed to use a 195.x driver with 10.10 (no idea how) I ended up with a 640x400 screen. An update attempt killed the system and a recovery attempt failed. Hence I’m back on Win. The possibility of overclocking my GT240 for Linux is very tempting, but at present I don’t have the time to try this. Thanks for the posts. Linux expertise greatly appreciated. ID: 20435 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level Scientific publications	Message 20436 - Posted: 14 Feb 2011, 15:22:28 UTC I still fail to grasp why this extremely nerdy stuff isn't simply put in the app, especially as GPUGrid worked nearly fine and smooth until the last change of app, it just had to acknowledge it's use of a whole core, like Einstein does now, and everything would have been fine. Now it's a project for nerds or windoze. Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki ID: 20436 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20438 - Posted: 14 Feb 2011, 17:42:58 UTC - in response to Message 20436. GDF did ask that some such configurations be facilitated via Boinc Manger. I guess the differences in various distributions would make it difficult. At the minute I'm running my quad GT240 system without using swan_sync, on Vista x64. I'm running 3 greedy CPU tasks on a quad and using eFMer Priority x64 with reasonable success; after upping the shaders again I am now only 7.5 to 9.5% less efficient than using swan_sync and freeing up one CPU core per card. I want to increase CPU usage for another project for a while, so for now this is acceptable for me. eFMer is more like changing the nice value than using swan_sync. While there are a few how to use Linux threads, there is not a sufficient how to optimize for Linux thread. If I get the time I will try to put one together, but such things are difficult when you are not a Linux guru. ID: 20438 · Rating: 0 · rate: / Reply Quote

Carlesa25 Send message Joined: 13 Nov 10 Posts: 328 Credit: 72,619,453 RAC: 0 Level Scientific publications	Message 20442 - Posted: 14 Feb 2011, 18:48:16 UTC - in response to Message 20438. While there are a few how to use Linux threads, there is not a sufficient how to optimize for Linux thread. If I get the time I will try to put one together, but such things are difficult when you are not a Linux guru. Hi, The truth is that I am not very knowledgeable in Linux, but I'm using Ubuntu 10.10 (before other versions for a year) and it works very well, better performance with Windows7. The current Nvidia driver is 270.18 and my GTX295 works perfectly, does not exceed 62 º c and the fan control works (from 40% to 65%). Well-ventilated box. As I said in another thread, just change the process priority (10 to - 10 or under interested me) allows extensive control and I have good yields. For these types of jobs I have found it much better choice than Windows. Greetings. http://stats.free-dc.org/cpidtagb.php?cpid=b4bdc04dfe39b1028b9c5d6fef3082b8&theme=9&cols=1 ID: 20442 · Rating: 0 · rate: / Reply Quote

Kirby54925 Send message Joined: 21 Jan 11 Posts: 31 Credit: 70,061,988 RAC: 0 Level Scientific publications	Message 20443 - Posted: 14 Feb 2011, 18:59:57 UTC - in response to Message 20442. Last modified: 14 Feb 2011, 19:21:07 UTC I tried changing the niceness of the GPUGrid task to -10 (defaulted to 19). Then I set BOINC to use 100% of the processors. I wanted to see if the priority change would allow Rosetta@Home and GPUGrid to share CPU time in the fourth core. It still seems like Rosetta@Home is being greedy with the CPU, causing GPUGrid to slow down drastically. The Rosetta@Home task in the fourth core was using 99-100% of that particular core. The kicker was that the niceness for Rosetta@Home tasks is set at 19! It really appears that swan_sync doesn't do anything at all. It certainly isn't showing up in the stderr section when I look at my completed tasks. Just to reiterate, I'm using Linux Mint 10, which is based on Ubuntu 10.10. ID: 20443 · Rating: 0 · rate: / Reply Quote

Lem Novantotto Send message Joined: 11 Feb 11 Posts: 18 Credit: 377,139 RAC: 0 Level Scientific publications	Message 20451 - Posted: 15 Feb 2011, 11:28:57 UTC - in response to Message 20443. I tried changing the niceness of the GPUGrid task to -10 (defaulted to 19). Then I set BOINC to use 100% of the processors. I wanted to see if the priority change would allow Rosetta@Home and GPUGrid to share CPU time in the fourth core. It still seems like Rosetta@Home is being greedy with the CPU, causing GPUGrid to slow down drastically. The Rosetta@Home task in the fourth core was using 99-100% of that particular core. The kicker was that the niceness for Rosetta@Home tasks is set at 19! It really appears that swan_sync doesn't do anything at all. It certainly isn't showing up in the stderr section when I look at my completed tasks. Kirby, I'm running the 6.12 app, so I cannot replicate faithfully your environment. Please, open a terminal and run these commands: 1) top -u boinc Would you please cut&paste the output? Looking at the rightmost column, you'll immediately identify the gpugrid task. Read its "pid" (the leftmost value on its line). Let's call this number: PID. Now press Q to exit top. 2) ps -p PID -o comm= && chrt -p PID && taskset -p PID (changing PID with the number). Cut&paste this second output, too. Now repeat point 2 for a rosetta task, and cut&paste once again. We'll be able to have a look of how things are going. Maybe we'll find something. Bye. ID: 20451 · Rating: 0 · rate: / Reply Quote

Kirby54925 Send message Joined: 21 Jan 11 Posts: 31 Credit: 70,061,988 RAC: 0 Level Scientific publications	Message 20454 - Posted: 15 Feb 2011, 22:24:29 UTC - in response to Message 20451. Last modified: 15 Feb 2011, 22:27:27 UTC acemd2_6.13_x86 pid 2993's current scheduling policy: SCHED_IDLE pid 2993's current scheduling priority: 0 pid 2993's current affinity mask: f minirosetta_2.1 pid 2181's current scheduling policy: SCHED_IDLE pid 2181's current scheduling priority: 0 pid 2181's current affinity mask: f As you can see, they're exactly the same. At this point, all four cores are being used, and GPUGrid has a niceness of -10. EDIT: The percent completion is still incrementing on GPUGrid; it's just moving at a glacial pace. Normally when I only have three cores working on CPU tasks, GPUGrid tasks take about 4.5 hours to finish. With four cores enabled, this is looking to take 2x-2.5x longer. ID: 20454 · Rating: 0 · rate: / Reply Quote

Lem Novantotto Send message Joined: 11 Feb 11 Posts: 18 Credit: 377,139 RAC: 0 Level Scientific publications	Message 20455 - Posted: 15 Feb 2011, 23:10:01 UTC - in response to Message 20454. acemd2_6.13_x86 pid 2993's current scheduling policy: SCHED_IDLE pid 2993's current scheduling priority: 0 pid 2993's current affinity mask: f minirosetta_2.1 pid 2181's current scheduling policy: SCHED_IDLE pid 2181's current scheduling priority: 0 pid 2181's current affinity mask: f As you can see, they're exactly the same. At this point, all four cores are being used, and GPUGrid has a niceness of -10. Here is the problem! :) See my outputs with different tasks from different projects: acemd2_6.12_x86 pid 15279's current scheduling policy: SCHED_OTHER pid 15279's current scheduling priority: 0 pid 15279's current affinity mask: 3 wcg_faah_autodo pid 29777's current scheduling policy: SCHED_BATCH pid 29777's current scheduling priority: 0 pid 29777's current affinity mask: 3 simap_5.10_x86_ pid 15996's current scheduling policy: SCHED_BATCH pid 15996's current scheduling priority: 0 pid 15996's current affinity mask: 3 minirosetta_2.1 pid 16527's current scheduling policy: SCHED_BATCH pid 16527's current scheduling priority: 0 pid 16527's current affinity mask: 3 You see it, don't you? The problem is your sched_idle, mostly in the gpugrid app. Niceness is not priority itself: niceness is intended to affect priority (under certain circumstances). If you want to read something about priority: $ man 2 sched_setscheduler Try changing to SCHED_OTHER the scheduling policy for you gpugrid app: $ sudo chrt --other -p 0 PID (using its right PID - check with top). Remember that, if it works, you have to do it every time a new task begins (you could set up a cronjob to do it, as we've seen for niceness). Let me know. Bye. ID: 20455 · Rating: 0 · rate: / Reply Quote

Kirby54925 Send message Joined: 21 Jan 11 Posts: 31 Credit: 70,061,988 RAC: 0 Level Scientific publications	Message 20456 - Posted: 16 Feb 2011, 1:00:30 UTC - in response to Message 20455. It works! Now I can run four CPU tasks and a GPUGrid task at the same time! Thank you very much! This is much better than the swan_sync method that is often spoken of here. Another thing: does this need to be in rc.local as well? Or would crontab suffice? Additionally, does the chrt command need the terminal output suppression thingy at the end in crontab? (... > /dev/null 2>&1) ID: 20456 · Rating: 0 · rate: / Reply Quote

Lem Novantotto Send message Joined: 11 Feb 11 Posts: 18 Credit: 377,139 RAC: 0 Level Scientific publications	Message 20458 - Posted: 16 Feb 2011, 9:12:08 UTC - in response to Message 20456. Last modified: 16 Feb 2011, 9:48:51 UTC It works! I'm glad. :) You're welcome. Another thing: does this need to be in rc.local as well? Or would crontab suffice? Additionally, does the chrt command need the terminal output suppression thingy at the end in crontab? (... > /dev/null 2>&1) Using a cronjob, we could forget about rc.local (even for the niceness thing). However it doesn't hurt: rc.local is executed every time the runlevel is changed, so basically at boot (and at shutdown). Our cronjob runs every 5 minutes, so without rc.local we loose at most 5 minutes (as we obviously loose at most five minutes every time a new task starts), which is not so much with workunits that last many hours. But we can make it run more frequently, if we like. Every three minutes, for example. This entry takes care of both the scheduling policy and the niceness: /3 * * * root chrt --other -p 0 $(pidof acemd_whatever_is_your_app) > /dev/null 2>&1 ; renice -1 $(pidof acemd_whatever_is_your_app) > /dev/null 2>&1 Bye. P.S. The above works with no more than one gpugrid task being crunched at the same time. The renice part works with even more, actually, but the chrt part doesn't: you can renice many tasks at once, but you cannot change the scheduling policy of more than one task per invocation. Let's generalize for any number of simultaneous gpugrid tasks: /3 * * * root for p in $(pidof acemd_whatever_is_your_app) ; do chrt --other -p 0 $p > /dev/null 2>&1 ; done ; renice -1 $(pidof acemd_whatever_is_your_app) > /dev/null 2>&1 ID: 20458 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level Scientific publications	Message 20460 - Posted: 17 Feb 2011, 5:47:07 UTC Last modified: 17 Feb 2011, 5:51:28 UTC I've got something to compare: After the post by Lem Novantotto I tried another version of using SWAN_SYNC, nothing with those non-existing .bashrc stuff, that project fanboys tried to impose, but this etc/environment stuff. It had some grave consequences: I've got two similar WU, or at least they should have been according to the credit settings by the project, one before the change and one crunched yesterday. Old one, 76-KASHIF_HIVPR_n1_bound_so_ba2-92-100-RND2370_0: - crunched with nice factor -1 - crunched with SWAN_SYNC_0 according to fanboy instructions, but obviously not in reality according to stderr.out - Time per step (avg over 575000 steps): 61.307 ms - Run time 77,243.59 seconds - CPU time 1,577.01 seconds New one, 16-KASHIF_HIVPR_n1_bound_so_ba2-94-100-RND9931_0: - crunched with nice factor 10 - crunched with SWAN_SYNC=0 according to Lem, this time mentioned in stderr.out - Time per step (avg over 325000 steps): 82.817 ms - Run time 102,772.46 - CPU time 102,095.10 As you can see the main difference is the usage of massive CPU-power with the result of significantly reducing the crunching speed. It behaved like before the app change, i.e. it pretended to be 0.15 CPU + 1 GPU, while it used 1 CPU + 1 GPU, leaving 3 cores for the concurrent 4 WUS of other projects. I started with both, nice forced to -1 and new swan_sync, but it left one core idling, somehow it gave the other 4 projects in parallel not more than 2 cores according to System Monitor, so I commented that line in crontab out. This new method is definitely not useful, I will never try it again. It's a massive waste of resources. Next try, after Einstein got it's usual share again, will be with a forced nice factor of 19, so the 4 cores will be divided evenly to the 5 WUs, as it worked fine with the old app. Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki ID: 20460 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level Scientific publications	Message 20462 - Posted: 18 Feb 2011, 18:06:33 UTC Last modified: 18 Feb 2011, 18:09:37 UTC This time I got a Gianni, but I still have one of those with 7,491.18 credits in my list from before as well. Here's the data: old: 251-GIANNI_DHFR1000-34-99-RND0842_1 - crunched with nice factor -1 or -3 I don't remember exactly - crunched with SWAN_SYNC_0 according to fanboy instructions, but obviously not in reality according to stderr.out - Time per step (avg over 25000 steps): 98.671 ms - Run time 81,994.45 seconds - CPU time 2,106.21 seconds new: 800-GIANNI_DHFR1000-37-99-RND6435_1 - crunched with nice factor 19 - crunched with SWAN_SYNC=0 according to Lem, this time mentioned in stderr.out - Time per step (avg over 1905000 steps): 41.600 ms - Run time 83,551.62 - CPU time 66,219.42 So again no speed up, just a waste of CPU-power, but at least not slower than the old one ;) I think I bugger this swan_sync rubbish and stick to the priority alone. It's simple and it works. and it's far more effective than wasting precious CPU-time for slow-down. Edith says: I don't have a clue what these ms/TS are, they are obviously something completely different for both WUs, TS don't seem to relate to work done. Credits are defined by project, so both WUs did the same amount of work according to project, the new one just needed more time steps for the same work done. Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki ID: 20462 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20463 - Posted: 18 Feb 2011, 20:20:14 UTC - in response to Message 20462. You have to leave a CPU core free when using swan_sync, otherwise its not going to be faster for the GPU. There is no getting away from the fact that the 6.13app is slower for GT240 cards, which is why I use the 6.12app, albeit driver dependent. Your drivers are for Einstein, not GPUGrid. Linux is generally faster than Windows XP, and Vista is > 11% slower than XP, yet I can finish a task using a GT240 on Vista in less than half the time you can: Without using swan_sync, using eFMer Priority and running 3 CPU tasks on a quad core CPU, 506-GIANNI_DHFR1000-38-99-RND4572_1 Run time 47567.632002 CPU time 5514.854 Using swan_sync=0, 597-GIANNI_DHFR1000-33-99-RND7300_1 Run time 40999.983999 CPU time 35797.14 Both Claimed credit 7491.18171296296, Granted credit 11236.7725694444 Clearly using swan_sync is still faster (16%), if done correctly, and your Linux setup is poor (half the speed it could be). ID: 20463 · Rating: 0 · rate: / Reply Quote

Lem Novantotto Send message Joined: 11 Feb 11 Posts: 18 Credit: 377,139 RAC: 0 Level Scientific publications	Message 20464 - Posted: 18 Feb 2011, 22:51:24 UTC - in response to Message 20463. There is no getting away from the fact that the 6.13app is slower for GT240 cards, which is why I use the 6.12app, albeit driver dependent. Yesterday I decided to give the 270.18 driver a try. The cuda workunit I had in cache went like a charm with the good old 6.12 app; then a cuda31 workunit was downloaded, and it was a no go (even though both the apps, the former and the latter, almost saturated the GPU - the new driver can show this kind of info - and took their right slice of cpu time). I've got to come back to 195.36, lastly. The problem - if we can call it a problem - is that every time boinc asks for new work, it previously uses a function to retrieve cpu and gpu specs on the fly, which seems appropriate. These specs are part of the request (they can be read in /var/lib/boinc-clien/sched_request_www.gpugrid.net.xml). Among these specs there is the cudaVersion, which is "3000" with older drivers, and "4000" with newer ones. I'm pretty sure the gpugrid server sends back a cuda31 wu (and the 6.13 app if needed) if it reads 4000, a cuda wu (6.12) otherwise. Since the specs aren't stored in a file, but ratherly got from the driver on the fly, feign a 3000 cudaversion is not so easy. You should modify the boinc sources, and then recompile, to hide newer drivers response. Sorry for possible mistakes and for my overwhelming bad English, I'm a bit tired today. Goodnight (it's midnight here). :) ID: 20464 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level Scientific publications	Message 20465 - Posted: 18 Feb 2011, 22:51:34 UTC - in response to Message 20463. Last modified: 18 Feb 2011, 22:54:12 UTC You have to leave a CPU core free when using swan_sync, otherwise its not going to be faster for the GPU. This answer is totally b*s. It took a whole core in my first comparison example, and it was extremely slow. There is no getting away from the fact that the 6.13app is slower for GT240 cards, which is why I use the 6.12app, albeit driver dependent. Your drivers are for Einstein, not GPUGrid. The project team is giving me those WUs although it knows my setup. So it's their fault, and only their fault, to give 6.13 to GT240 instead of 6.12. They know my card, they actively decided to give me 6.13, so they say it's better for my machine. If they are too stupid to give me the right app, it's because of their lack of interest, not mine, As I said before: They only care about people who buy a new card for several hundred €/$ every few month. Gruesse vom Saenger* For questions about Boinc look in the BOINC-Wiki ID: 20465 · Rating: 0 · rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,616,860,456 RAC: 313,890 Level Scientific publications	Message 20466 - Posted: 19 Feb 2011, 1:34:26 UTC I have to agree with Saenger on this one, which is a pretty rare thing. I have noticed no difference with swan_sync + free core. This is on a win7 machine with a 580, which is a different setup than this thread subject. But my impression is similar about these tweeks that are supposed to speed things up. Reno, NV Team: SETI.USA ID: 20466 · Rating: 0 · rate: / Reply Quote

Kirby54925 Send message Joined: 21 Jan 11 Posts: 31 Credit: 70,061,988 RAC: 0 Level Scientific publications	Message 20467 - Posted: 19 Feb 2011, 9:32:34 UTC Yep, I agree. There is no difference whatsoever with swan_sync on or off for my GTX 570. It will still run for about 4.5-5 hours. On another note, it seems as if the server is running low on workunits to send out. I see only one workunit ready to send out. Could this be in preparation for the upcoming long workunits? ID: 20467 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20470 - Posted: 19 Feb 2011, 11:55:59 UTC - in response to Message 20467. Did any of you restart (even just the x-server, not so handy in 10.10) after adding swan_sync=0? If you didn't that would explain your observations. zombie67 [MM], you might have a bad OC: 3705076 17 Feb 2011 4:22:13 UTC 17 Feb 2011 15:27:33 UTC Completed and validated 16,526.66 16,764.16 7,491.18 11,236.77 ACEMD2: GPU molecular dynamics v6.13 (cuda31) 3703901 16 Feb 2011 23:16:48 UTC 17 Feb 2011 4:26:51 UTC Completed and validated 14,691.17 13,531.52 7,491.18 11,236.77 ACEMD2: GPU molecular dynamics v6.13 (cuda31) Two identical tasks but 11% difference in completion time. Some other task times are also wayward. Your card is probably throttling back at times (it's a feature designed to stop it failing). Lem Novantotto, Do you think the 270.18 driver caused the card to run in low power/clock mode (a common issue on Win with recent drivers)? Cuda 3.0 (3000)= 6.12app Cuda 3.1 (3010) or [above] (for now)= 6.13app [above] can be 3.2 or 4.0 I would not expect too much in the 270.18 driver for a GT240, just for the latest and next versions of Fermi. Kirby54925, There is no difference whatsoever with swan_sync on or off for my GTX 570. It will still run for about 4.5-5 hours. You don't have swan_sync enabled and none of your tasks to as far back as 5th Feb have actually used swan_sync! The ACEMD2 is at 70 WU's available. I don't know why, but perhaps they are letting them run down so they can start to use ACEMDLONG and ACEMD2 tasks, &/or they need to remove tasks in batches from the server. Thanks for the warning, I will keep an eye out and if I think I will run dry on my Fermi's I will allow MW tasks again. ID: 20470 · Rating: 0 · rate: / Reply Quote

Lem Novantotto Send message Joined: 11 Feb 11 Posts: 18 Credit: 377,139 RAC: 0 Level Scientific publications	Message 20471 - Posted: 19 Feb 2011, 12:45:43 UTC - in response to Message 20470. Lem Novantotto, Do you think the 270.18 driver caused the card to run in low power/clock mode (a common issue on Win with recent drivers)? The software showed the card was running at max clock, maximum performance, 95% GPU occupation. And it was running as hot as with the 195 driver. So I think we can regard it as a fact. The reason of the degraded performance must be elsewhere. Cuda 3.0 (3000)= 6.12app Cuda 3.1 (3010) or [above] (for now)= 6.13app [above] can be 3.2 or 4.0 Yep. The 170 driver reports cudaversion 4000: since 4000>=3010, the gpugrid server sends a cuda31 wu (that will be run by the 6.13 app). I would not expect too much in the 270.18 driver for a GT240, just for the latest and next versions of Fermi. I tried it just for the sake of curiosity, and indeed it was really too slow: a no go. Bye. ID: 20471 · Rating: 0 · rate: / Reply Quote