New acemd beta

Message boards : Graphics cards (GPUs) : New acemd beta

Author	Message
GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21333 - Posted: 7 Jun 2011 \| 15:35:23 UTC Last modified: 7 Jun 2011 \| 15:43:57 UTC
	I have uploaded a new acemdbeta application for Linux and some workunits to test. Mainly bug fixes. gdf
	ID: 21333 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,379,166,723 RAC: 18,990,592 Level Scientific publications	Message 21334 - Posted: 7 Jun 2011 \| 15:43:55 UTC - in response to Message 21333.
	Does this new app resolve the Cuda4/downclocking bug discussed in http://www.gpugrid.net/forum_thread.php?id=2534? If not, may I refer you to http://boinc.berkeley.edu/trac/changeset/23649/, and the new paragraph in http://boinc.berkeley.edu/trac/wiki/AppCoprocessor: Cleanup on premature exit The BOINC client may kill your application in the middle. This may leave the GPU in a bad state. To prevent this, call boinc_begin_critical_section(); before using the GPU, and between GPU kernels do if (boinc_status.quit_request \|\| boinc_status.abort_request) { // cudaThreadSynchronize(); or whatever is needed boinc_end_critical_section(); while (1) boinc_sleep(1); }
	ID: 21334 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21335 - Posted: 7 Jun 2011 \| 15:49:28 UTC - in response to Message 21254.
	No. This is a cuda3.1 app. yet I don't understand what that means. In the middle of what? gdf
	ID: 21335 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,379,166,723 RAC: 18,990,592 Level Scientific publications	Message 21336 - Posted: 7 Jun 2011 \| 15:59:00 UTC - in response to Message 21335.
	No. This is a cuda3.1 app. yet I don't understand what that means. In the middle of what? gdf The BOINC API library code is not threadsafe. If BOINC calls for the application to quit or suspend during computation, BOINC may terminate threads in an unsafe way. The new nVidia drivers which can handle Cuda4 apps are much more sensitive to this behaviour, even if the app that's running is only using a lower CUDA level. In self-protection, nVidia has written the new drivers - eveything strictly *later than* 266.58, from memory - to down-clock the card into a protective state when the abnormal thread termination is detected. That's my layman's interpretation - I'll try and get you the full report quickly.
	ID: 21336 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,379,166,723 RAC: 18,990,592 Level Scientific publications	Message 21337 - Posted: 7 Jun 2011 \| 16:06:15 UTC
	Ooops, it's just been pointed out to me that this is a Linux beta app, and my remarks have been concentrating on the Windows API - so probably not important in this case. But, since the new API code was only posted last night, it's still worth you knowing about it in preparation for the next Windows application test.
	ID: 21337 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21347 - Posted: 8 Jun 2011 \| 7:27:11 UTC - in response to Message 21337.
	The first results are fine. We are going to produce today a beta for Windows. This applications will substitute all production apps already this week, if all goes well gdf
	ID: 21347 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21351 - Posted: 9 Jun 2011 \| 10:05:54 UTC - in response to Message 21347.
	Windows application is out. gdf
	ID: 21351 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2356 Credit: 16,377,028,840 RAC: 3,486,991 Level Scientific publications	Message 21354 - Posted: 9 Jun 2011 \| 13:24:40 UTC - in response to Message 21351.
	Windows application is out. gdf I got two of these beta wus. Both of them failed immediately with exit code -1073741819 (0xc0000005). Maybe they can't stand overclocking?
	ID: 21354 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21358 - Posted: 9 Jun 2011 \| 14:49:15 UTC - in response to Message 21354. Last modified: 9 Jun 2011 \| 17:55:13 UTC
	acemdbeta_6.38_windows_intelx86__cuda31 - Application Error The exception unknown software exception (0xc0000005) occurred in the application at location 0x0040258c. Click OK to terminate the program Click on CANCEL to debug the program System: 2003 Server x64 i7-2600K, 8GB DDR3, 2TB, GTX470 (native clocks, increased fan speeds) Task ran for 1h45min but stayed at 0% complete, apparently going through a loop.
	ID: 21358 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21359 - Posted: 9 Jun 2011 \| 15:50:16 UTC - in response to Message 21358.
	My only other two Betas also failed after 7 and 15sec. Worth noting that if there is a pop-up error message and you don't select to end the task it will continue running indefinately. So if anyone has such a message do something about it or you will just keep running the same erroneuos Beta task.
	ID: 21359 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2356 Credit: 16,377,028,840 RAC: 3,486,991 Level Scientific publications	Message 21360 - Posted: 9 Jun 2011 \| 16:00:53 UTC - in response to Message 21359.
	I got two more of these 6.38 beta, both of them failed immediately just like the previous ones. There were a pop-up application error message. Application Failure acemdbeta_6.38_windows_intelx86__cuda31 0.0.0.0 in acemdbeta_6.38_windows_intelx86__cuda31 0.0.0.0 at offset 00002c58 These 6.38 beta WUs seem to fail on every computer, so I guess I shouldn't blame the overclocking. :)
	ID: 21360 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21361 - Posted: 9 Jun 2011 \| 16:47:55 UTC - in response to Message 21360.
	No, I am trying with a quick change in few minutes to see if it works. Otherwise, it will take more time to debug it. gdf
	ID: 21361 \| Rating: 0 \| rate: / Reply Quote

nenym Send message Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level Scientific publications	Message 21362 - Posted: 9 Jun 2011 \| 17:02:31 UTC
	The same here Run time 66.171875 CPU time 0 Interesting, Swan_sync set and works with standard/long run tasks. <core_client_version>6.10.60</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> ]]> Win XP 64bit, GTX 560.
	ID: 21362 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21363 - Posted: 9 Jun 2011 \| 17:04:11 UTC - in response to Message 21362.
	acemdbeta_6.39 substitutes acemdbeta_6.38 hopefully with better results... gdf
	ID: 21363 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21364 - Posted: 9 Jun 2011 \| 17:18:48 UTC - in response to Message 21363. Last modified: 9 Jun 2011 \| 17:49:43 UTC
	This 6.39 task reached 10% complete in 17min 43sec on a stock GTX470 (System), so estimated run time is 3h. 98% GPU Utilization, 315MB video memory usage. Looks good so far... 20%
	ID: 21364 \| Rating: 0 \| rate: / Reply Quote

nenym Send message Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level Scientific publications	Message 21365 - Posted: 9 Jun 2011 \| 19:04:40 UTC Last modified: 9 Jun 2011 \| 19:08:56 UTC
	Strange application that 6.39 beta one. 6.39 CPU process started with high priority. The system GUI was sluggish, 1 - 2 minutes response. The Boinc GUI freezed, the Boinc core restarted (CPU tasks without checkpoint have started from zero progress). After setting the priority of 6.39 CPU process to low by Process Tamer - response of PT GUI was about 2 minutes (high priority set for PT process!) - the Boinc core restarted again and now things seems to be OK. After 12 min 8% progress. Win XP 64bit, GTX 560Ti, 925MHz.
	ID: 21365 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21366 - Posted: 9 Jun 2011 \| 20:20:36 UTC - in response to Message 21364. Last modified: 9 Jun 2011 \| 22:38:37 UTC
	4069069 2520078 9 Jun 2011 17:12:15 UTC 9 Jun 2011 20:10:53 UTC Completed and validated 10,510.59 10,444.27 7,491.18 11,236.77 ACEMD beta version v6.39 (cuda31) Well, this task ran and finished (2h 55min) without error on the 6.39 app. Hopefully other results are just as positive.
	ID: 21366 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21367 - Posted: 9 Jun 2011 \| 21:07:19 UTC - in response to Message 21366.
	Any problem? gdf
	ID: 21367 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2356 Credit: 16,377,028,840 RAC: 3,486,991 Level Scientific publications	Message 21368 - Posted: 9 Jun 2011 \| 21:21:47 UTC - in response to Message 21367. Last modified: 9 Jun 2011 \| 21:38:29 UTC
	I got one 6.39 beta WU, it's running for 16 minutes now, 15% completed, 98% GPU usage (i7-950 @ 3.56GHz, GTX 580 @ 890MHz, WinXP, Above average priority). Edit: I put this WU to my GTX 590 @ 700MHz (my monitor is connected to this card) tried on both GPU of the 590, and I don't experience sluggish Windows GUI. After 32 minutes 28.2% completed.
	ID: 21368 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2356 Credit: 16,377,028,840 RAC: 3,486,991 Level Scientific publications	Message 21369 - Posted: 9 Jun 2011 \| 23:14:03 UTC - in response to Message 21368.
	This 6.39 beta WU completed fine in 6438s (1h47m) 3.219 ms/step. I got another one, it's processing will begin in 3 hours, because I'm going to sleep now, so I can't micromanage the processing order of the WUs. :)
	ID: 21369 \| Rating: 0 \| rate: / Reply Quote

nenym Send message Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level Scientific publications	Message 21370 - Posted: 9 Jun 2011 \| 23:17:24 UTC
	No problem now, second task runs without any oddness. The first task 48-GIANNI_TESTDHFR10-2-5-RND0377:13,000.61/ 13,131.97/ 7,491.18/ 11,236.77 ACEMD beta version v6.39 (cuda31). XP 64bit, GTX560@ 925MHz, driver 257.33. Interesting stderr at beginning (maybe when system was sluggish): <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 560 Ti" # Clock rate: 1.85 GHz # Total amount of global memory: 1073283072 bytes # Number of multiprocessors: 8 # Number of cores: 64 SWAN: Using synchronization method 0 MDIO: cannot open file "restart.coor" No heartbeat from core client for 30 sec - exiting Other strangeness:CPU time > Run time.
	ID: 21370 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 486 Credit: 11,386,082,230 RAC: 8,982,437 Level Scientific publications	Message 21371 - Posted: 10 Jun 2011 \| 1:56:51 UTC
	So far, all the 6.38 and 6.39 Betas crashed with computational errors on my computers, and yes, the computers were sluggish when running these units.
	ID: 21371 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21372 - Posted: 10 Jun 2011 \| 7:49:21 UTC - in response to Message 21371.
	Uhmm, don't really know why. It seems a driver thing, as it crashes quickly. So far, all the 6.38 and 6.39 Betas crashed with computational errors on my computers, and yes, the computers were sluggish when running these units.
	ID: 21372 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21374 - Posted: 10 Jun 2011 \| 10:25:50 UTC - in response to Message 21372. Last modified: 10 Jun 2011 \| 10:52:44 UTC
	nenym's No heartbeat from core client for 30 sec - exiting This can be caused by the Boinc client, the app, another CPU project, or something else running on the system. What else are you crunching/running? This is common when the CPU is over-taxed, often by other CPU projects. Are you using recommended settings (freeing up at least 1 CPU core/thread per GPU and using SWAN_SYNC=0)? It may be worth noting that GDF's latest tasks use 98% of the GPU, so previous overclocks (Bedrich 755MHz GTX480) that never encountered such high utilization may no longer be reliable. It's best to Beta test with default/factory settings and recommended GPUGrid settings. Also worth keeping an eye on temperature - I saw a rise of a couple of degrees, but in some cases that might be enough to push the card over the edge.
	ID: 21374 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21375 - Posted: 10 Jun 2011 \| 10:27:57 UTC - in response to Message 21374.
	Any problem with the higher priority? gdf
	ID: 21375 \| Rating: 0 \| rate: / Reply Quote

nenym Send message Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level Scientific publications	Message 21377 - Posted: 10 Jun 2011 \| 10:46:32 UTC - in response to Message 21374.
	- running CPU projects, when the issue occured: Spinhenge, Ibercivis Wilson, - one core free (4CPU Xeon, set maximum 99% cores), - Swan_sync has been set for a long time, - factory clock is 900 MHz, OC to 925 is running without any issue for any long run task - I did some work in simple Excel spreadsheet. The second 6.39 task run and finished with the same OC 925 MHz and the same CPU applications without issues, but priority of CPU 6.39 process has been set by PT to low. I did not try it without PT "renicing". I know "no heartbeat from core", it happened running Ibercivis (not sure which subproject) or CUDA AQUA a lot of months ago.
	ID: 21377 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21378 - Posted: 10 Jun 2011 \| 10:49:44 UTC - in response to Message 21375.
	Not personally, but I suppose if the GPUGrid priority was too high it might under some circumstances promote a Boinc client to crash. That said the system would still need to be heavily taxed by other CPU processes.
	ID: 21378 \| Rating: 0 \| rate: / Reply Quote

TylerChris Send message Joined: 12 Feb 10 Posts: 11 Credit: 50,020,466 RAC: 0 Level Scientific publications	Message 21379 - Posted: 10 Jun 2011 \| 12:04:13 UTC Last modified: 10 Jun 2011 \| 12:05:49 UTC
	One failure as expected with the v6.38 One success with the v6.39 Did not notice any issues,ran a little slower than most here taking 4.8 hours on a gtx275 ,windows 7 64.:)
	ID: 21379 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2356 Credit: 16,377,028,840 RAC: 3,486,991 Level Scientific publications	Message 21380 - Posted: 10 Jun 2011 \| 12:36:33 UTC - in response to Message 21378. Last modified: 10 Jun 2011 \| 12:38:41 UTC
	I've read somewhere that the priority of any process should not be raised to "high", because it will interfere with csrss.exe which is an essential subsystem (running intentionally at "high" priority by default), responsible for console UI operations, and thread management. So I recommend "above normal" as maximum priority level for any application (including priority changer tools). So far I've completed seven 6.39 beta WU fine on this priority level.
	ID: 21380 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21381 - Posted: 10 Jun 2011 \| 12:41:13 UTC - in response to Message 21379.
	Just after doing a system restart and got this 6.38 error message: Event Type: Information Event Source: Application Error Event Category: (100) Event ID: 1004 Date: 10/06/2011 Time: 13:15:29 User: N/A Computer: S Description: Reporting queued error: faulting application acemdbeta_6.38_windows_intelx86__cuda31, version 0.0.0.0, faulting module acemdbeta_6.38_windows_intelx86__cuda31, version 0.0.0.0, fault address 0x00002c58. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp. Data: 0000: 41 70 70 6c 69 63 61 74 Applicat 0008: 69 6f 6e 20 46 61 69 6c ion Fail 0010: 75 72 65 20 20 61 63 65 ure ace 0018: 6d 64 62 65 74 61 5f 36 mdbeta_6 0020: 2e 33 38 5f 77 69 6e 64 .38_wind 0028: 6f 77 73 5f 69 6e 74 65 ows_inte 0030: 6c 78 38 36 5f 5f 63 75 lx86__cu 0038: 64 61 33 31 20 30 2e 30 da31 0.0 0040: 2e 30 2e 30 20 69 6e 20 .0.0 in 0048: 61 63 65 6d 64 62 65 74 acemdbet 0050: 61 5f 36 2e 33 38 5f 77 a_6.38_w 0058: 69 6e 64 6f 77 73 5f 69 indows_i 0060: 6e 74 65 6c 78 38 36 5f ntelx86_ 0068: 5f 63 75 64 61 33 31 20 _cuda31 0070: 30 2e 30 2e 30 2e 30 20 0.0.0.0 0078: 61 74 20 6f 66 66 73 65 at offse 0080: 74 20 30 30 30 30 32 63 t 00002c 0088: 35 38 58 I was running a Tony long task, no Beta's. The Tony task is still running after I closed the Windows pop-up Error message. Boinc log: 10/06/2011 13:15:48 \| \| Starting BOINC client version 6.12.28 for windows_x86_64 10/06/2011 13:15:48 \| \| Config: report completed tasks immediately 10/06/2011 13:15:48 \| \| Config: use all coprocessors 10/06/2011 13:15:48 \| \| Config: zero long-term debts on startup 10/06/2011 13:15:48 \| \| log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched_debug, dcf_debug 10/06/2011 13:15:48 \| \| log flags: debt_debug, rr_simulation, sched_op_debug 10/06/2011 13:15:48 \| \| Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.5 10/06/2011 13:15:48 \| \| Data directory: C:\Documents and Settings\All Users\Application Data\BOINC 10/06/2011 13:15:48 \| \| Running under account Administrator 10/06/2011 13:15:48 \| \| Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7] 10/06/2011 13:15:48 \| \| Processor: 256.00 KB cache 10/06/2011 13:15:48 \| \| Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt aes pbe 10/06/2011 13:15:48 \| \| OS: Microsoft Windows Server 2003 "R2": Standard Server x64 Edition, Service Pack 2, (05.02.3790.00) 10/06/2011 13:15:48 \| \| Memory: 7.98 GB physical, 9.56 GB virtual 10/06/2011 13:15:48 \| \| Disk: 1.82 TB total, 1.43 TB free 10/06/2011 13:15:48 \| \| Local time is UTC +1 hours 10/06/2011 13:15:48 \| \| NVIDIA GPU 0: GeForce GTX 470 (driver version 27533, CUDA version 4000, compute capability 2.0, 1280MB, 1089 GFLOPS peak) 10/06/2011 13:15:48 \| \| NVIDIA library reports 1 GPU 10/06/2011 13:15:48 \| \| No ATI library found. 10/06/2011 13:15:48 \| GPUGRID \| URL http://www.gpugrid.net/; Computer ID 91249; resource share 1000 10/06/2011 13:15:48 \| \| General prefs: using separate prefs for home 10/06/2011 13:15:48 \| \| Reading preferences override file 10/06/2011 13:15:48 \| \| Preferences: 10/06/2011 13:15:48 \| \| max memory usage when active: 7357.62MB 10/06/2011 13:15:48 \| \| max memory usage when idle: 7357.62MB 10/06/2011 13:15:48 \| \| max disk usage: 100.00GB 10/06/2011 13:15:48 \| \| max CPUs used: 6 10/06/2011 13:15:48 \| \| (to change preferences, visit the web site of an attached project, or select Preferences in the Manager) 10/06/2011 13:15:48 \| \| [cpu_sched] Request CPU reschedule: Prefs update 10/06/2011 13:15:48 \| \| [cpu_sched] Request CPU reschedule: Startup 10/06/2011 13:15:48 \| \| Not using a proxy 10/06/2011 13:16:19 \| \| [cpu_sched] Request CPU reschedule: Idle state change 10/06/2011 13:16:19 \| \| [cpu_sched] Request CPU reschedule: periodic CPU scheduling 10/06/2011 13:16:19 \| GPUGRID \| [debt] CPU ineligible; LTD 0.00 10/06/2011 13:16:19 \| \| [debt] CPU LTD: adding offset 0.000000 10/06/2011 13:16:19 \| GPUGRID \| [debt] NVIDIA GPU LTD 0.00 delta 0.00 (0.500.00 - 0.00)/1 10/06/2011 13:16:19 \| \| [debt] NVIDIA GPU LTD: adding offset 0.000000 10/06/2011 13:16:19 \| \| [cpu_sched] schedule_cpus(): start 10/06/2011 13:16:19 \| \| [rr_sim] rr_sim start: work_buf_total 4320.86 on_frac 0.975 active_frac 0.897 10/06/2011 13:16:19 \| GPUGRID \| [rr_sim] 0.00: starting A429-TONI_AGGdense1-8-100-RND0011_0 (0.14 CPU + 1.00 NV) 10/06/2011 13:16:19 \| GPUGRID \| [rr_sim] 26954.51: A429-TONI_AGGdense1-8-100-RND0011_0 finishes after 3489.82 (151482.68G/43.41G) 10/06/2011 13:16:19 \| GPUGRID \| [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 (coprocessor job, FIFO) 10/06/2011 13:16:19 \| \| [cpu_sched_debug] reserving 1.000000 of coproc CUDA 10/06/2011 13:16:19 \| \| [cpu_sched] enforce_schedule(): start 10/06/2011 13:16:19 \| \| [cpu_sched] preliminary job list: 10/06/2011 13:16:19 \| GPUGRID \| [cpu_sched] 0: A429-TONI_AGGdense1-8-100-RND0011_0 (MD: no; UTS: no) 10/06/2011 13:16:19 \| \| [cpu_sched] final job list: 10/06/2011 13:16:19 \| GPUGRID \| [cpu_sched] 6: A429-TONI_AGGdense1-8-100-RND0011_0 (MD: no; UTS: no) 10/06/2011 13:16:19 \| GPUGRID \| [coproc] Assigning CUDA instance 0 to A429-TONI_AGGdense1-8-100-RND0011_0 10/06/2011 13:16:19 \| GPUGRID \| [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 10/06/2011 13:16:19 \| GPUGRID \| [cpu_sched] A429-TONI_AGGdense1-8-100-RND0011_0 sched state 1 next 2 task state 0 10/06/2011 13:16:19 \| GPUGRID \| Restarting task A429-TONI_AGGdense1-8-100-RND0011_0 using acemdlong version 613 10/06/2011 13:16:19 \| \| [cpu_sched] app startup took 21.218748 secs* 10/06/2011 13:16:19 \| \| [cpu_sched] Request CPU reschedule: slow app startup 10/06/2011 13:16:19 \| \| [cpu_sched] enforce_schedule: end 10/06/2011 13:16:42 \| GPUGRID \| [debt] CPU ineligible; LTD 0.00 10/06/2011 13:16:42 \| \| [debt] CPU LTD: adding offset -21.776753 10/06/2011 13:16:42 \| GPUGRID \| [debt] NVIDIA GPU LTD -11.11 delta -11.11 (0.50*22.22 - 22.22)/1 10/06/2011 13:16:42 \| \| [debt] NVIDIA GPU LTD: adding offset -11.109374 10/06/2011 13:16:42 \| \| [cpu_sched] schedule_cpus(): start 10/06/2011 13:16:42 \| \| [rr_sim] rr_sim start: work_buf_total 4320.86 on_frac 0.975 active_frac 0.897 10/06/2011 13:16:42 \| GPUGRID \| [rr_sim] 0.00: starting A429-TONI_AGGdense1-8-100-RND0011_0 (0.14 CPU + 1.00 NV) 10/06/2011 13:16:42 \| GPUGRID \| [rr_sim] 26954.51: A429-TONI_AGGdense1-8-100-RND0011_0 finishes after 3489.82 (151482.68G/43.41G) 10/06/2011 13:16:42 \| GPUGRID \| [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 (coprocessor job, FIFO)
	ID: 21381 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21382 - Posted: 10 Jun 2011 \| 13:13:09 UTC - in response to Message 21381. Last modified: 10 Jun 2011 \| 14:21:33 UTC
	New application coming up with ABOVE_NORMAL_PRIORITY_CLASS. acemdbeta_6.40 is out. gdf
	ID: 21382 \| Rating: 0 \| rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 21383 - Posted: 10 Jun 2011 \| 13:29:01 UTC - in response to Message 21382.
	Don't really see the need to raise priority above low myself. I have done it myself and it makes little difference on a machine running other BOINC tasks and will certainly not make any difference on a machine using SWAN synchronization. As for GPU utilization you run the risk of sucking so much use of the GPU that the machine can't perform without turning GPUGrid off. WARNING: don't try to suck all the juice out of the apple. ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline
	ID: 21383 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21384 - Posted: 10 Jun 2011 \| 13:47:47 UTC - in response to Message 21383.
	I think the intention is to improve stability on some systems. A well optimized system is unlikely to see any benefit, but there are so many poorly configured/optimized systems around that such a potential for overall project improvement must be investigated. In the last year or so I have come across several projects (5 or 6) that have at some stage impacted on other Boinc projects, so some Kevlar skin might help. I do think that such bullet-proofing changes would be best organized through Berkeley but as soon as just one project bypasses a default setting (and many have) it messes with other projects.
	ID: 21384 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21385 - Posted: 10 Jun 2011 \| 14:26:23 UTC - in response to Message 21384.
	The point of increasing slightly the priority is that sometime we see people with a GTX580 which are performing at less than half the speed. We assume that it is because they are overloading the CPU. Nowadays, the CPU is naturally overloaded due to hyper-threading, hopefully this will not impact the performance of their system but guarantee a decent responsiveness to the GPU. If you are using SWAN_SYNC, this is not going to change anything, but using SWAN_SYNC as default is not practicable. gdf
	ID: 21385 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21386 - Posted: 10 Jun 2011 \| 14:44:50 UTC - in response to Message 21385. Last modified: 11 Jun 2011 \| 7:19:26 UTC
	Running this 29-GIANNI_TESTDHFR10-0-5-RND5720_2 task on the 6.40 (cuda31) app. No problems so far. Looks like it will take about 3h on a GTX470. It's a fairly safe system; only using 6 of 8 threads for CPU crunching (various projects) and SWAN_SYNC=0 in use. I did suspend it and start another GPUGrid task before continuing the Beta, so it seems robust enough. Again 98% GPU usage and 322MB GDDR. - Completed and validated 10,414.70 10,378.75 7,491.18 11,236.77 ACEMD beta version v6.40 (cuda31)
	ID: 21386 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21387 - Posted: 11 Jun 2011 \| 8:41:54 UTC - in response to Message 21386.
	So far so good. If there is nothing else, I will pass it to production today. gdf
	ID: 21387 \| Rating: 0 \| rate: / Reply Quote

Norman_RKN Send message Joined: 22 Dec 09 Posts: 16 Credit: 23,522,575 RAC: 0 Level Scientific publications	Message 21543 - Posted: 26 Jun 2011 \| 16:26:13 UTC - in response to Message 21387.
	i use swan_sync and switch the app manually to lowest priority, because the system is very laggy. without swan_sync no problem, but the gpu usage is 10% lesser. only the "toni-WUs" are running with more acceptable gpu-load but 90%+ would be better for the project. energy is not cheap like water. with the newest nvidia beta driver the applications running fine. no more reboot/crash and the effort is lost. now i can reboot or crashing the system without any WU damage. ;) ____________ http://www.rechenkraft.net
	ID: 21543 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Graphics cards (GPUs) : New acemd beta

	About	Science	Volunteers	Performance	Forum	Join us	Donate

Author	Message
GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21333 - Posted: 7 Jun 2011 \| 15:35:23 UTC Last modified: 7 Jun 2011 \| 15:43:57 UTC
	I have uploaded a new acemdbeta application for Linux and some workunits to test. Mainly bug fixes. gdf
	ID: 21333 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,379,166,723 RAC: 18,990,592 Level Scientific publications	Message 21334 - Posted: 7 Jun 2011 \| 15:43:55 UTC - in response to Message 21333.
	Does this new app resolve the Cuda4/downclocking bug discussed in http://www.gpugrid.net/forum_thread.php?id=2534? If not, may I refer you to http://boinc.berkeley.edu/trac/changeset/23649/, and the new paragraph in http://boinc.berkeley.edu/trac/wiki/AppCoprocessor: Cleanup on premature exit The BOINC client may kill your application in the middle. This may leave the GPU in a bad state. To prevent this, call boinc_begin_critical_section(); before using the GPU, and between GPU kernels do if (boinc_status.quit_request \|\| boinc_status.abort_request) { // cudaThreadSynchronize(); or whatever is needed boinc_end_critical_section(); while (1) boinc_sleep(1); }
	ID: 21334 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21335 - Posted: 7 Jun 2011 \| 15:49:28 UTC - in response to Message 21254.
	No. This is a cuda3.1 app. yet I don't understand what that means. In the middle of what? gdf
	ID: 21335 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,379,166,723 RAC: 18,990,592 Level Scientific publications	Message 21336 - Posted: 7 Jun 2011 \| 15:59:00 UTC - in response to Message 21335.
	No. This is a cuda3.1 app. yet I don't understand what that means. In the middle of what? gdf The BOINC API library code is not threadsafe. If BOINC calls for the application to quit or suspend during computation, BOINC may terminate threads in an unsafe way. The new nVidia drivers which can handle Cuda4 apps are much more sensitive to this behaviour, even if the app that's running is only using a lower CUDA level. In self-protection, nVidia has written the new drivers - eveything strictly *later than* 266.58, from memory - to down-clock the card into a protective state when the abnormal thread termination is detected. That's my layman's interpretation - I'll try and get you the full report quickly.
	ID: 21336 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1626 Credit: 9,379,166,723 RAC: 18,990,592 Level Scientific publications	Message 21337 - Posted: 7 Jun 2011 \| 16:06:15 UTC
	Ooops, it's just been pointed out to me that this is a Linux beta app, and my remarks have been concentrating on the Windows API - so probably not important in this case. But, since the new API code was only posted last night, it's still worth you knowing about it in preparation for the next Windows application test.
	ID: 21337 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21347 - Posted: 8 Jun 2011 \| 7:27:11 UTC - in response to Message 21337.
	The first results are fine. We are going to produce today a beta for Windows. This applications will substitute all production apps already this week, if all goes well gdf
	ID: 21347 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21351 - Posted: 9 Jun 2011 \| 10:05:54 UTC - in response to Message 21347.
	Windows application is out. gdf
	ID: 21351 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2356 Credit: 16,377,028,840 RAC: 3,486,991 Level Scientific publications	Message 21354 - Posted: 9 Jun 2011 \| 13:24:40 UTC - in response to Message 21351.
	Windows application is out. gdf I got two of these beta wus. Both of them failed immediately with exit code -1073741819 (0xc0000005). Maybe they can't stand overclocking?
	ID: 21354 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21358 - Posted: 9 Jun 2011 \| 14:49:15 UTC - in response to Message 21354. Last modified: 9 Jun 2011 \| 17:55:13 UTC
	acemdbeta_6.38_windows_intelx86__cuda31 - Application Error The exception unknown software exception (0xc0000005) occurred in the application at location 0x0040258c. Click OK to terminate the program Click on CANCEL to debug the program System: 2003 Server x64 i7-2600K, 8GB DDR3, 2TB, GTX470 (native clocks, increased fan speeds) Task ran for 1h45min but stayed at 0% complete, apparently going through a loop.
	ID: 21358 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21359 - Posted: 9 Jun 2011 \| 15:50:16 UTC - in response to Message 21358.
	My only other two Betas also failed after 7 and 15sec. Worth noting that if there is a pop-up error message and you don't select to end the task it will continue running indefinately. So if anyone has such a message do something about it or you will just keep running the same erroneuos Beta task.
	ID: 21359 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2356 Credit: 16,377,028,840 RAC: 3,486,991 Level Scientific publications	Message 21360 - Posted: 9 Jun 2011 \| 16:00:53 UTC - in response to Message 21359.
	I got two more of these 6.38 beta, both of them failed immediately just like the previous ones. There were a pop-up application error message. Application Failure acemdbeta_6.38_windows_intelx86__cuda31 0.0.0.0 in acemdbeta_6.38_windows_intelx86__cuda31 0.0.0.0 at offset 00002c58 These 6.38 beta WUs seem to fail on every computer, so I guess I shouldn't blame the overclocking. :)
	ID: 21360 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21361 - Posted: 9 Jun 2011 \| 16:47:55 UTC - in response to Message 21360.
	No, I am trying with a quick change in few minutes to see if it works. Otherwise, it will take more time to debug it. gdf
	ID: 21361 \| Rating: 0 \| rate: / Reply Quote

nenym Send message Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level Scientific publications	Message 21362 - Posted: 9 Jun 2011 \| 17:02:31 UTC
	The same here Run time 66.171875 CPU time 0 Interesting, Swan_sync set and works with standard/long run tasks. <core_client_version>6.10.60</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> ]]> Win XP 64bit, GTX 560.
	ID: 21362 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21363 - Posted: 9 Jun 2011 \| 17:04:11 UTC - in response to Message 21362.
	acemdbeta_6.39 substitutes acemdbeta_6.38 hopefully with better results... gdf
	ID: 21363 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21364 - Posted: 9 Jun 2011 \| 17:18:48 UTC - in response to Message 21363. Last modified: 9 Jun 2011 \| 17:49:43 UTC
	This 6.39 task reached 10% complete in 17min 43sec on a stock GTX470 (System), so estimated run time is 3h. 98% GPU Utilization, 315MB video memory usage. Looks good so far... 20%
	ID: 21364 \| Rating: 0 \| rate: / Reply Quote

nenym Send message Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level Scientific publications	Message 21365 - Posted: 9 Jun 2011 \| 19:04:40 UTC Last modified: 9 Jun 2011 \| 19:08:56 UTC
	Strange application that 6.39 beta one. 6.39 CPU process started with high priority. The system GUI was sluggish, 1 - 2 minutes response. The Boinc GUI freezed, the Boinc core restarted (CPU tasks without checkpoint have started from zero progress). After setting the priority of 6.39 CPU process to low by Process Tamer - response of PT GUI was about 2 minutes (high priority set for PT process!) - the Boinc core restarted again and now things seems to be OK. After 12 min 8% progress. Win XP 64bit, GTX 560Ti, 925MHz.
	ID: 21365 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21366 - Posted: 9 Jun 2011 \| 20:20:36 UTC - in response to Message 21364. Last modified: 9 Jun 2011 \| 22:38:37 UTC
	4069069 2520078 9 Jun 2011 17:12:15 UTC 9 Jun 2011 20:10:53 UTC Completed and validated 10,510.59 10,444.27 7,491.18 11,236.77 ACEMD beta version v6.39 (cuda31) Well, this task ran and finished (2h 55min) without error on the 6.39 app. Hopefully other results are just as positive.
	ID: 21366 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21367 - Posted: 9 Jun 2011 \| 21:07:19 UTC - in response to Message 21366.
	Any problem? gdf
	ID: 21367 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2356 Credit: 16,377,028,840 RAC: 3,486,991 Level Scientific publications	Message 21368 - Posted: 9 Jun 2011 \| 21:21:47 UTC - in response to Message 21367. Last modified: 9 Jun 2011 \| 21:38:29 UTC
	I got one 6.39 beta WU, it's running for 16 minutes now, 15% completed, 98% GPU usage (i7-950 @ 3.56GHz, GTX 580 @ 890MHz, WinXP, Above average priority). Edit: I put this WU to my GTX 590 @ 700MHz (my monitor is connected to this card) tried on both GPU of the 590, and I don't experience sluggish Windows GUI. After 32 minutes 28.2% completed.
	ID: 21368 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2356 Credit: 16,377,028,840 RAC: 3,486,991 Level Scientific publications	Message 21369 - Posted: 9 Jun 2011 \| 23:14:03 UTC - in response to Message 21368.
	This 6.39 beta WU completed fine in 6438s (1h47m) 3.219 ms/step. I got another one, it's processing will begin in 3 hours, because I'm going to sleep now, so I can't micromanage the processing order of the WUs. :)
	ID: 21369 \| Rating: 0 \| rate: / Reply Quote

nenym Send message Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level Scientific publications	Message 21370 - Posted: 9 Jun 2011 \| 23:17:24 UTC
	No problem now, second task runs without any oddness. The first task 48-GIANNI_TESTDHFR10-2-5-RND0377:13,000.61/ 13,131.97/ 7,491.18/ 11,236.77 ACEMD beta version v6.39 (cuda31). XP 64bit, GTX560@ 925MHz, driver 257.33. Interesting stderr at beginning (maybe when system was sluggish): <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 560 Ti" # Clock rate: 1.85 GHz # Total amount of global memory: 1073283072 bytes # Number of multiprocessors: 8 # Number of cores: 64 SWAN: Using synchronization method 0 MDIO: cannot open file "restart.coor" No heartbeat from core client for 30 sec - exiting Other strangeness:CPU time > Run time.
	ID: 21370 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 486 Credit: 11,386,082,230 RAC: 8,982,437 Level Scientific publications	Message 21371 - Posted: 10 Jun 2011 \| 1:56:51 UTC
	So far, all the 6.38 and 6.39 Betas crashed with computational errors on my computers, and yes, the computers were sluggish when running these units.
	ID: 21371 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21372 - Posted: 10 Jun 2011 \| 7:49:21 UTC - in response to Message 21371.
	Uhmm, don't really know why. It seems a driver thing, as it crashes quickly. So far, all the 6.38 and 6.39 Betas crashed with computational errors on my computers, and yes, the computers were sluggish when running these units.
	ID: 21372 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21374 - Posted: 10 Jun 2011 \| 10:25:50 UTC - in response to Message 21372. Last modified: 10 Jun 2011 \| 10:52:44 UTC
	nenym's No heartbeat from core client for 30 sec - exiting This can be caused by the Boinc client, the app, another CPU project, or something else running on the system. What else are you crunching/running? This is common when the CPU is over-taxed, often by other CPU projects. Are you using recommended settings (freeing up at least 1 CPU core/thread per GPU and using SWAN_SYNC=0)? It may be worth noting that GDF's latest tasks use 98% of the GPU, so previous overclocks (Bedrich 755MHz GTX480) that never encountered such high utilization may no longer be reliable. It's best to Beta test with default/factory settings and recommended GPUGrid settings. Also worth keeping an eye on temperature - I saw a rise of a couple of degrees, but in some cases that might be enough to push the card over the edge.
	ID: 21374 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21375 - Posted: 10 Jun 2011 \| 10:27:57 UTC - in response to Message 21374.
	Any problem with the higher priority? gdf
	ID: 21375 \| Rating: 0 \| rate: / Reply Quote

nenym Send message Joined: 31 Mar 09 Posts: 137 Credit: 1,308,230,581 RAC: 0 Level Scientific publications	Message 21377 - Posted: 10 Jun 2011 \| 10:46:32 UTC - in response to Message 21374.
	- running CPU projects, when the issue occured: Spinhenge, Ibercivis Wilson, - one core free (4CPU Xeon, set maximum 99% cores), - Swan_sync has been set for a long time, - factory clock is 900 MHz, OC to 925 is running without any issue for any long run task - I did some work in simple Excel spreadsheet. The second 6.39 task run and finished with the same OC 925 MHz and the same CPU applications without issues, but priority of CPU 6.39 process has been set by PT to low. I did not try it without PT "renicing". I know "no heartbeat from core", it happened running Ibercivis (not sure which subproject) or CUDA AQUA a lot of months ago.
	ID: 21377 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21378 - Posted: 10 Jun 2011 \| 10:49:44 UTC - in response to Message 21375.
	Not personally, but I suppose if the GPUGrid priority was too high it might under some circumstances promote a Boinc client to crash. That said the system would still need to be heavily taxed by other CPU processes.
	ID: 21378 \| Rating: 0 \| rate: / Reply Quote

TylerChris Send message Joined: 12 Feb 10 Posts: 11 Credit: 50,020,466 RAC: 0 Level Scientific publications	Message 21379 - Posted: 10 Jun 2011 \| 12:04:13 UTC Last modified: 10 Jun 2011 \| 12:05:49 UTC
	One failure as expected with the v6.38 One success with the v6.39 Did not notice any issues,ran a little slower than most here taking 4.8 hours on a gtx275 ,windows 7 64.:)
	ID: 21379 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2356 Credit: 16,377,028,840 RAC: 3,486,991 Level Scientific publications	Message 21380 - Posted: 10 Jun 2011 \| 12:36:33 UTC - in response to Message 21378. Last modified: 10 Jun 2011 \| 12:38:41 UTC
	I've read somewhere that the priority of any process should not be raised to "high", because it will interfere with csrss.exe which is an essential subsystem (running intentionally at "high" priority by default), responsible for console UI operations, and thread management. So I recommend "above normal" as maximum priority level for any application (including priority changer tools). So far I've completed seven 6.39 beta WU fine on this priority level.
	ID: 21380 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21381 - Posted: 10 Jun 2011 \| 12:41:13 UTC - in response to Message 21379.
	Just after doing a system restart and got this 6.38 error message: Event Type: Information Event Source: Application Error Event Category: (100) Event ID: 1004 Date: 10/06/2011 Time: 13:15:29 User: N/A Computer: S Description: Reporting queued error: faulting application acemdbeta_6.38_windows_intelx86__cuda31, version 0.0.0.0, faulting module acemdbeta_6.38_windows_intelx86__cuda31, version 0.0.0.0, fault address 0x00002c58. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp. Data: 0000: 41 70 70 6c 69 63 61 74 Applicat 0008: 69 6f 6e 20 46 61 69 6c ion Fail 0010: 75 72 65 20 20 61 63 65 ure ace 0018: 6d 64 62 65 74 61 5f 36 mdbeta_6 0020: 2e 33 38 5f 77 69 6e 64 .38_wind 0028: 6f 77 73 5f 69 6e 74 65 ows_inte 0030: 6c 78 38 36 5f 5f 63 75 lx86__cu 0038: 64 61 33 31 20 30 2e 30 da31 0.0 0040: 2e 30 2e 30 20 69 6e 20 .0.0 in 0048: 61 63 65 6d 64 62 65 74 acemdbet 0050: 61 5f 36 2e 33 38 5f 77 a_6.38_w 0058: 69 6e 64 6f 77 73 5f 69 indows_i 0060: 6e 74 65 6c 78 38 36 5f ntelx86_ 0068: 5f 63 75 64 61 33 31 20 _cuda31 0070: 30 2e 30 2e 30 2e 30 20 0.0.0.0 0078: 61 74 20 6f 66 66 73 65 at offse 0080: 74 20 30 30 30 30 32 63 t 00002c 0088: 35 38 58 I was running a Tony long task, no Beta's. The Tony task is still running after I closed the Windows pop-up Error message. Boinc log: 10/06/2011 13:15:48 \| \| Starting BOINC client version 6.12.28 for windows_x86_64 10/06/2011 13:15:48 \| \| Config: report completed tasks immediately 10/06/2011 13:15:48 \| \| Config: use all coprocessors 10/06/2011 13:15:48 \| \| Config: zero long-term debts on startup 10/06/2011 13:15:48 \| \| log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched_debug, dcf_debug 10/06/2011 13:15:48 \| \| log flags: debt_debug, rr_simulation, sched_op_debug 10/06/2011 13:15:48 \| \| Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.5 10/06/2011 13:15:48 \| \| Data directory: C:\Documents and Settings\All Users\Application Data\BOINC 10/06/2011 13:15:48 \| \| Running under account Administrator 10/06/2011 13:15:48 \| \| Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7] 10/06/2011 13:15:48 \| \| Processor: 256.00 KB cache 10/06/2011 13:15:48 \| \| Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt aes pbe 10/06/2011 13:15:48 \| \| OS: Microsoft Windows Server 2003 "R2": Standard Server x64 Edition, Service Pack 2, (05.02.3790.00) 10/06/2011 13:15:48 \| \| Memory: 7.98 GB physical, 9.56 GB virtual 10/06/2011 13:15:48 \| \| Disk: 1.82 TB total, 1.43 TB free 10/06/2011 13:15:48 \| \| Local time is UTC +1 hours 10/06/2011 13:15:48 \| \| NVIDIA GPU 0: GeForce GTX 470 (driver version 27533, CUDA version 4000, compute capability 2.0, 1280MB, 1089 GFLOPS peak) 10/06/2011 13:15:48 \| \| NVIDIA library reports 1 GPU 10/06/2011 13:15:48 \| \| No ATI library found. 10/06/2011 13:15:48 \| GPUGRID \| URL http://www.gpugrid.net/; Computer ID 91249; resource share 1000 10/06/2011 13:15:48 \| \| General prefs: using separate prefs for home 10/06/2011 13:15:48 \| \| Reading preferences override file 10/06/2011 13:15:48 \| \| Preferences: 10/06/2011 13:15:48 \| \| max memory usage when active: 7357.62MB 10/06/2011 13:15:48 \| \| max memory usage when idle: 7357.62MB 10/06/2011 13:15:48 \| \| max disk usage: 100.00GB 10/06/2011 13:15:48 \| \| max CPUs used: 6 10/06/2011 13:15:48 \| \| (to change preferences, visit the web site of an attached project, or select Preferences in the Manager) 10/06/2011 13:15:48 \| \| [cpu_sched] Request CPU reschedule: Prefs update 10/06/2011 13:15:48 \| \| [cpu_sched] Request CPU reschedule: Startup 10/06/2011 13:15:48 \| \| Not using a proxy 10/06/2011 13:16:19 \| \| [cpu_sched] Request CPU reschedule: Idle state change 10/06/2011 13:16:19 \| \| [cpu_sched] Request CPU reschedule: periodic CPU scheduling 10/06/2011 13:16:19 \| GPUGRID \| [debt] CPU ineligible; LTD 0.00 10/06/2011 13:16:19 \| \| [debt] CPU LTD: adding offset 0.000000 10/06/2011 13:16:19 \| GPUGRID \| [debt] NVIDIA GPU LTD 0.00 delta 0.00 (0.500.00 - 0.00)/1 10/06/2011 13:16:19 \| \| [debt] NVIDIA GPU LTD: adding offset 0.000000 10/06/2011 13:16:19 \| \| [cpu_sched] schedule_cpus(): start 10/06/2011 13:16:19 \| \| [rr_sim] rr_sim start: work_buf_total 4320.86 on_frac 0.975 active_frac 0.897 10/06/2011 13:16:19 \| GPUGRID \| [rr_sim] 0.00: starting A429-TONI_AGGdense1-8-100-RND0011_0 (0.14 CPU + 1.00 NV) 10/06/2011 13:16:19 \| GPUGRID \| [rr_sim] 26954.51: A429-TONI_AGGdense1-8-100-RND0011_0 finishes after 3489.82 (151482.68G/43.41G) 10/06/2011 13:16:19 \| GPUGRID \| [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 (coprocessor job, FIFO) 10/06/2011 13:16:19 \| \| [cpu_sched_debug] reserving 1.000000 of coproc CUDA 10/06/2011 13:16:19 \| \| [cpu_sched] enforce_schedule(): start 10/06/2011 13:16:19 \| \| [cpu_sched] preliminary job list: 10/06/2011 13:16:19 \| GPUGRID \| [cpu_sched] 0: A429-TONI_AGGdense1-8-100-RND0011_0 (MD: no; UTS: no) 10/06/2011 13:16:19 \| \| [cpu_sched] final job list: 10/06/2011 13:16:19 \| GPUGRID \| [cpu_sched] 6: A429-TONI_AGGdense1-8-100-RND0011_0 (MD: no; UTS: no) 10/06/2011 13:16:19 \| GPUGRID \| [coproc] Assigning CUDA instance 0 to A429-TONI_AGGdense1-8-100-RND0011_0 10/06/2011 13:16:19 \| GPUGRID \| [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 10/06/2011 13:16:19 \| GPUGRID \| [cpu_sched] A429-TONI_AGGdense1-8-100-RND0011_0 sched state 1 next 2 task state 0 10/06/2011 13:16:19 \| GPUGRID \| Restarting task A429-TONI_AGGdense1-8-100-RND0011_0 using acemdlong version 613 10/06/2011 13:16:19 \| \| [cpu_sched] app startup took 21.218748 secs* 10/06/2011 13:16:19 \| \| [cpu_sched] Request CPU reschedule: slow app startup 10/06/2011 13:16:19 \| \| [cpu_sched] enforce_schedule: end 10/06/2011 13:16:42 \| GPUGRID \| [debt] CPU ineligible; LTD 0.00 10/06/2011 13:16:42 \| \| [debt] CPU LTD: adding offset -21.776753 10/06/2011 13:16:42 \| GPUGRID \| [debt] NVIDIA GPU LTD -11.11 delta -11.11 (0.50*22.22 - 22.22)/1 10/06/2011 13:16:42 \| \| [debt] NVIDIA GPU LTD: adding offset -11.109374 10/06/2011 13:16:42 \| \| [cpu_sched] schedule_cpus(): start 10/06/2011 13:16:42 \| \| [rr_sim] rr_sim start: work_buf_total 4320.86 on_frac 0.975 active_frac 0.897 10/06/2011 13:16:42 \| GPUGRID \| [rr_sim] 0.00: starting A429-TONI_AGGdense1-8-100-RND0011_0 (0.14 CPU + 1.00 NV) 10/06/2011 13:16:42 \| GPUGRID \| [rr_sim] 26954.51: A429-TONI_AGGdense1-8-100-RND0011_0 finishes after 3489.82 (151482.68G/43.41G) 10/06/2011 13:16:42 \| GPUGRID \| [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 (coprocessor job, FIFO)
	ID: 21381 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21382 - Posted: 10 Jun 2011 \| 13:13:09 UTC - in response to Message 21381. Last modified: 10 Jun 2011 \| 14:21:33 UTC
	New application coming up with ABOVE_NORMAL_PRIORITY_CLASS. acemdbeta_6.40 is out. gdf
	ID: 21382 \| Rating: 0 \| rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 21383 - Posted: 10 Jun 2011 \| 13:29:01 UTC - in response to Message 21382.
	Don't really see the need to raise priority above low myself. I have done it myself and it makes little difference on a machine running other BOINC tasks and will certainly not make any difference on a machine using SWAN synchronization. As for GPU utilization you run the risk of sucking so much use of the GPU that the machine can't perform without turning GPUGrid off. WARNING: don't try to suck all the juice out of the apple. ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline
	ID: 21383 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21384 - Posted: 10 Jun 2011 \| 13:47:47 UTC - in response to Message 21383.
	I think the intention is to improve stability on some systems. A well optimized system is unlikely to see any benefit, but there are so many poorly configured/optimized systems around that such a potential for overall project improvement must be investigated. In the last year or so I have come across several projects (5 or 6) that have at some stage impacted on other Boinc projects, so some Kevlar skin might help. I do think that such bullet-proofing changes would be best organized through Berkeley but as soon as just one project bypasses a default setting (and many have) it messes with other projects.
	ID: 21384 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21385 - Posted: 10 Jun 2011 \| 14:26:23 UTC - in response to Message 21384.
	The point of increasing slightly the priority is that sometime we see people with a GTX580 which are performing at less than half the speed. We assume that it is because they are overloading the CPU. Nowadays, the CPU is naturally overloaded due to hyper-threading, hopefully this will not impact the performance of their system but guarantee a decent responsiveness to the GPU. If you are using SWAN_SYNC, this is not going to change anything, but using SWAN_SYNC as default is not practicable. gdf
	ID: 21385 \| Rating: 0 \| rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21386 - Posted: 10 Jun 2011 \| 14:44:50 UTC - in response to Message 21385. Last modified: 11 Jun 2011 \| 7:19:26 UTC
	Running this 29-GIANNI_TESTDHFR10-0-5-RND5720_2 task on the 6.40 (cuda31) app. No problems so far. Looks like it will take about 3h on a GTX470. It's a fairly safe system; only using 6 of 8 threads for CPU crunching (various projects) and SWAN_SYNC=0 in use. I did suspend it and start another GPUGrid task before continuing the Beta, so it seems robust enough. Again 98% GPU usage and 322MB GDDR. - Completed and validated 10,414.70 10,378.75 7,491.18 11,236.77 ACEMD beta version v6.40 (cuda31)
	ID: 21386 \| Rating: 0 \| rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level Scientific publications	Message 21387 - Posted: 11 Jun 2011 \| 8:41:54 UTC - in response to Message 21386.
	So far so good. If there is nothing else, I will pass it to production today. gdf
	ID: 21387 \| Rating: 0 \| rate: / Reply Quote

Norman_RKN Send message Joined: 22 Dec 09 Posts: 16 Credit: 23,522,575 RAC: 0 Level Scientific publications	Message 21543 - Posted: 26 Jun 2011 \| 16:26:13 UTC - in response to Message 21387.
	i use swan_sync and switch the app manually to lowest priority, because the system is very laggy. without swan_sync no problem, but the gpu usage is 10% lesser. only the "toni-WUs" are running with more acceptable gpu-load but 90%+ would be better for the project. energy is not cheap like water. with the newest nvidia beta driver the applications running fine. no more reboot/crash and the effort is lost. now i can reboot or crashing the system without any WU damage. ;) ____________ http://www.rechenkraft.net
	ID: 21543 \| Rating: 0 \| rate: / Reply Quote