Message boards :
Graphics cards (GPUs) :
New acemd beta
Message board moderation
Previous · 1 · 2
| Author | Message |
|---|---|
nenymSend message Joined: 31 Mar 09 Posts: 137 Credit: 1,429,587,071 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
No problem now, second task runs without any oddness. The first task 48-GIANNI_TESTDHFR10-2-5-RND0377:13,000.61/ 13,131.97/ 7,491.18/ 11,236.77 ACEMD beta version v6.39 (cuda31). XP 64bit, GTX560@ 925MHz, driver 257.33. Interesting stderr at beginning (maybe when system was sluggish): <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 560 Ti" # Clock rate: 1.85 GHz # Total amount of global memory: 1073283072 bytes # Number of multiprocessors: 8 # Number of cores: 64 SWAN: Using synchronization method 0 MDIO: cannot open file "restart.coor" No heartbeat from core client for 30 sec - exitingOther strangeness:CPU time > Run time. |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,732,395,728 RAC: 71,755 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So far, all the 6.38 and 6.39 Betas crashed with computational errors on my computers, and yes, the computers were sluggish when running these units. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Uhmm, don't really know why. It seems a driver thing, as it crashes quickly. So far, all the 6.38 and 6.39 Betas crashed with computational errors on my computers, and yes, the computers were sluggish when running these units. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
nenym's No heartbeat from core client for 30 sec - exiting This can be caused by the Boinc client, the app, another CPU project, or something else running on the system. What else are you crunching/running? This is common when the CPU is over-taxed, often by other CPU projects. Are you using recommended settings (freeing up at least 1 CPU core/thread per GPU and using SWAN_SYNC=0)? It may be worth noting that GDF's latest tasks use 98% of the GPU, so previous overclocks (Bedrich 755MHz GTX480) that never encountered such high utilization may no longer be reliable. It's best to Beta test with default/factory settings and recommended GPUGrid settings. Also worth keeping an eye on temperature - I saw a rise of a couple of degrees, but in some cases that might be enough to push the card over the edge. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Any problem with the higher priority? gdf |
nenymSend message Joined: 31 Mar 09 Posts: 137 Credit: 1,429,587,071 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
- running CPU projects, when the issue occured: Spinhenge, Ibercivis Wilson, - one core free (4CPU Xeon, set maximum 99% cores), - Swan_sync has been set for a long time, - factory clock is 900 MHz, OC to 925 is running without any issue for any long run task - I did some work in simple Excel spreadsheet. The second 6.39 task run and finished with the same OC 925 MHz and the same CPU applications without issues, but priority of CPU 6.39 process has been set by PT to low. I did not try it without PT "renicing". I know "no heartbeat from core", it happened running Ibercivis (not sure which subproject) or CUDA AQUA a lot of months ago. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Not personally, but I suppose if the GPUGrid priority was too high it might under some circumstances promote a Boinc client to crash. That said the system would still need to be heavily taxed by other CPU processes. |
|
Send message Joined: 12 Feb 10 Posts: 11 Credit: 50,020,466 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
One failure as expected with the v6.38 One success with the v6.39 Did not notice any issues,ran a little slower than most here taking 4.8 hours on a gtx275 ,windows 7 64.:) |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've read somewhere that the priority of any process should not be raised to "high", because it will interfere with csrss.exe which is an essential subsystem (running intentionally at "high" priority by default), responsible for console UI operations, and thread management. So I recommend "above normal" as maximum priority level for any application (including priority changer tools). So far I've completed seven 6.39 beta WU fine on this priority level. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just after doing a system restart and got this 6.38 error message: Event Type: Information Event Source: Application Error Event Category: (100) Event ID: 1004 Date: 10/06/2011 Time: 13:15:29 User: N/A Computer: S Description: Reporting queued error: faulting application acemdbeta_6.38_windows_intelx86__cuda31, version 0.0.0.0, faulting module acemdbeta_6.38_windows_intelx86__cuda31, version 0.0.0.0, fault address 0x00002c58. For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp. Data: 0000: 41 70 70 6c 69 63 61 74 Applicat 0008: 69 6f 6e 20 46 61 69 6c ion Fail 0010: 75 72 65 20 20 61 63 65 ure ace 0018: 6d 64 62 65 74 61 5f 36 mdbeta_6 0020: 2e 33 38 5f 77 69 6e 64 .38_wind 0028: 6f 77 73 5f 69 6e 74 65 ows_inte 0030: 6c 78 38 36 5f 5f 63 75 lx86__cu 0038: 64 61 33 31 20 30 2e 30 da31 0.0 0040: 2e 30 2e 30 20 69 6e 20 .0.0 in 0048: 61 63 65 6d 64 62 65 74 acemdbet 0050: 61 5f 36 2e 33 38 5f 77 a_6.38_w 0058: 69 6e 64 6f 77 73 5f 69 indows_i 0060: 6e 74 65 6c 78 38 36 5f ntelx86_ 0068: 5f 63 75 64 61 33 31 20 _cuda31 0070: 30 2e 30 2e 30 2e 30 20 0.0.0.0 0078: 61 74 20 6f 66 66 73 65 at offse 0080: 74 20 30 30 30 30 32 63 t 00002c 0088: 35 38 58 I was running a Tony long task, no Beta's. The Tony task is still running after I closed the Windows pop-up Error message. Boinc log: 10/06/2011 13:15:48 | | Starting BOINC client version 6.12.28 for windows_x86_64 10/06/2011 13:15:48 | | Config: report completed tasks immediately 10/06/2011 13:15:48 | | Config: use all coprocessors 10/06/2011 13:15:48 | | Config: zero long-term debts on startup 10/06/2011 13:15:48 | | log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched_debug, dcf_debug 10/06/2011 13:15:48 | | log flags: debt_debug, rr_simulation, sched_op_debug 10/06/2011 13:15:48 | | Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.5 10/06/2011 13:15:48 | | Data directory: C:\Documents and Settings\All Users\Application Data\BOINC 10/06/2011 13:15:48 | | Running under account Administrator 10/06/2011 13:15:48 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7] 10/06/2011 13:15:48 | | Processor: 256.00 KB cache 10/06/2011 13:15:48 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt aes pbe 10/06/2011 13:15:48 | | OS: Microsoft Windows Server 2003 "R2": Standard Server x64 Edition, Service Pack 2, (05.02.3790.00) 10/06/2011 13:15:48 | | Memory: 7.98 GB physical, 9.56 GB virtual 10/06/2011 13:15:48 | | Disk: 1.82 TB total, 1.43 TB free 10/06/2011 13:15:48 | | Local time is UTC +1 hours 10/06/2011 13:15:48 | | NVIDIA GPU 0: GeForce GTX 470 (driver version 27533, CUDA version 4000, compute capability 2.0, 1280MB, 1089 GFLOPS peak) 10/06/2011 13:15:48 | | NVIDIA library reports 1 GPU 10/06/2011 13:15:48 | | No ATI library found. 10/06/2011 13:15:48 | GPUGRID | URL http://www.gpugrid.net/; Computer ID 91249; resource share 1000 10/06/2011 13:15:48 | | General prefs: using separate prefs for home 10/06/2011 13:15:48 | | Reading preferences override file 10/06/2011 13:15:48 | | Preferences: 10/06/2011 13:15:48 | | max memory usage when active: 7357.62MB 10/06/2011 13:15:48 | | max memory usage when idle: 7357.62MB 10/06/2011 13:15:48 | | max disk usage: 100.00GB 10/06/2011 13:15:48 | | max CPUs used: 6 10/06/2011 13:15:48 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager) 10/06/2011 13:15:48 | | [cpu_sched] Request CPU reschedule: Prefs update 10/06/2011 13:15:48 | | [cpu_sched] Request CPU reschedule: Startup 10/06/2011 13:15:48 | | Not using a proxy 10/06/2011 13:16:19 | | [cpu_sched] Request CPU reschedule: Idle state change 10/06/2011 13:16:19 | | [cpu_sched] Request CPU reschedule: periodic CPU scheduling 10/06/2011 13:16:19 | GPUGRID | [debt] CPU ineligible; LTD 0.00 10/06/2011 13:16:19 | | [debt] CPU LTD: adding offset 0.000000 10/06/2011 13:16:19 | GPUGRID | [debt] NVIDIA GPU LTD 0.00 delta 0.00 (0.50*0.00 - 0.00)/1 10/06/2011 13:16:19 | | [debt] NVIDIA GPU LTD: adding offset 0.000000 10/06/2011 13:16:19 | | [cpu_sched] schedule_cpus(): start 10/06/2011 13:16:19 | | [rr_sim] rr_sim start: work_buf_total 4320.86 on_frac 0.975 active_frac 0.897 10/06/2011 13:16:19 | GPUGRID | [rr_sim] 0.00: starting A429-TONI_AGGdense1-8-100-RND0011_0 (0.14 CPU + 1.00 NV) 10/06/2011 13:16:19 | GPUGRID | [rr_sim] 26954.51: A429-TONI_AGGdense1-8-100-RND0011_0 finishes after 3489.82 (151482.68G/43.41G) 10/06/2011 13:16:19 | GPUGRID | [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 (coprocessor job, FIFO) 10/06/2011 13:16:19 | | [cpu_sched_debug] reserving 1.000000 of coproc CUDA 10/06/2011 13:16:19 | | [cpu_sched] enforce_schedule(): start 10/06/2011 13:16:19 | | [cpu_sched] preliminary job list: 10/06/2011 13:16:19 | GPUGRID | [cpu_sched] 0: A429-TONI_AGGdense1-8-100-RND0011_0 (MD: no; UTS: no) 10/06/2011 13:16:19 | | [cpu_sched] final job list: 10/06/2011 13:16:19 | GPUGRID | [cpu_sched] 6: A429-TONI_AGGdense1-8-100-RND0011_0 (MD: no; UTS: no) 10/06/2011 13:16:19 | GPUGRID | [coproc] Assigning CUDA instance 0 to A429-TONI_AGGdense1-8-100-RND0011_0 10/06/2011 13:16:19 | GPUGRID | [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 10/06/2011 13:16:19 | GPUGRID | [cpu_sched] A429-TONI_AGGdense1-8-100-RND0011_0 sched state 1 next 2 task state 0 10/06/2011 13:16:19 | GPUGRID | Restarting task A429-TONI_AGGdense1-8-100-RND0011_0 using acemdlong version 613 10/06/2011 13:16:19 | | [cpu_sched] app startup took 21.218748 secs 10/06/2011 13:16:19 | | [cpu_sched] Request CPU reschedule: slow app startup 10/06/2011 13:16:19 | | [cpu_sched] enforce_schedule: end 10/06/2011 13:16:42 | GPUGRID | [debt] CPU ineligible; LTD 0.00 10/06/2011 13:16:42 | | [debt] CPU LTD: adding offset -21.776753 10/06/2011 13:16:42 | GPUGRID | [debt] NVIDIA GPU LTD -11.11 delta -11.11 (0.50*22.22 - 22.22)/1 10/06/2011 13:16:42 | | [debt] NVIDIA GPU LTD: adding offset -11.109374 10/06/2011 13:16:42 | | [cpu_sched] schedule_cpus(): start 10/06/2011 13:16:42 | | [rr_sim] rr_sim start: work_buf_total 4320.86 on_frac 0.975 active_frac 0.897 10/06/2011 13:16:42 | GPUGRID | [rr_sim] 0.00: starting A429-TONI_AGGdense1-8-100-RND0011_0 (0.14 CPU + 1.00 NV) 10/06/2011 13:16:42 | GPUGRID | [rr_sim] 26954.51: A429-TONI_AGGdense1-8-100-RND0011_0 finishes after 3489.82 (151482.68G/43.41G) 10/06/2011 13:16:42 | GPUGRID | [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 (coprocessor job, FIFO) |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
New application coming up with ABOVE_NORMAL_PRIORITY_CLASS. acemdbeta_6.40 is out. gdf |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Don't really see the need to raise priority above low myself. I have done it myself and it makes little difference on a machine running other BOINC tasks and will certainly not make any difference on a machine using SWAN synchronization. As for GPU utilization you run the risk of sucking so much use of the GPU that the machine can't perform without turning GPUGrid off. WARNING: don't try to suck all the juice out of the apple. Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I think the intention is to improve stability on some systems. A well optimized system is unlikely to see any benefit, but there are so many poorly configured/optimized systems around that such a potential for overall project improvement must be investigated. In the last year or so I have come across several projects (5 or 6) that have at some stage impacted on other Boinc projects, so some Kevlar skin might help. I do think that such bullet-proofing changes would be best organized through Berkeley but as soon as just one project bypasses a default setting (and many have) it messes with other projects. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
The point of increasing slightly the priority is that sometime we see people with a GTX580 which are performing at less than half the speed. We assume that it is because they are overloading the CPU. Nowadays, the CPU is naturally overloaded due to hyper-threading, hopefully this will not impact the performance of their system but guarantee a decent responsiveness to the GPU. If you are using SWAN_SYNC, this is not going to change anything, but using SWAN_SYNC as default is not practicable. gdf |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Running this 29-GIANNI_TESTDHFR10-0-5-RND5720_2 task on the 6.40 (cuda31) app. No problems so far. Looks like it will take about 3h on a GTX470. It's a fairly safe system; only using 6 of 8 threads for CPU crunching (various projects) and SWAN_SYNC=0 in use. I did suspend it and start another GPUGrid task before continuing the Beta, so it seems robust enough. Again 98% GPU usage and 322MB GDDR. - Completed and validated 10,414.70 10,378.75 7,491.18 11,236.77 ACEMD beta version v6.40 (cuda31) |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
So far so good. If there is nothing else, I will pass it to production today. gdf |
|
Send message Joined: 22 Dec 09 Posts: 16 Credit: 23,522,575 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
i use swan_sync and switch the app manually to lowest priority, because the system is very laggy. without swan_sync no problem, but the gpu usage is 10% lesser. only the "toni-WUs" are running with more acceptable gpu-load but 90%+ would be better for the project. energy is not cheap like water. with the newest nvidia beta driver the applications running fine. no more reboot/crash and the effort is lost. now i can reboot or crashing the system without any WU damage. ;) http://www.rechenkraft.net |
©2025 Universitat Pompeu Fabra