New acemd beta

Message boards : Graphics cards (GPUs) : New acemd beta
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile nenym

Send message
Joined: 31 Mar 09
Posts: 137
Credit: 1,429,587,071
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21370 - Posted: 9 Jun 2011, 23:17:24 UTC

No problem now, second task runs without any oddness.
The first task 48-GIANNI_TESTDHFR10-2-5-RND0377:13,000.61/ 13,131.97/ 7,491.18/ 11,236.77 ACEMD beta version v6.39 (cuda31). XP 64bit, GTX560@ 925MHz, driver 257.33.
Interesting stderr at beginning (maybe when system was sluggish):
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 560 Ti"
# Clock rate: 1.85 GHz
# Total amount of global memory:                 1073283072 bytes
# Number of multiprocessors:                     8
# Number of cores:                               64
SWAN: Using synchronization method 0
MDIO: cannot open file "restart.coor"
No heartbeat from core client for 30 sec - exiting
Other strangeness:CPU time > Run time.
ID: 21370 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,732,395,728
RAC: 71,755
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21371 - Posted: 10 Jun 2011, 1:56:51 UTC

So far, all the 6.38 and 6.39 Betas crashed with computational errors on my computers, and yes, the computers were sluggish when running these units.
ID: 21371 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21372 - Posted: 10 Jun 2011, 7:49:21 UTC - in response to Message 21371.  

Uhmm, don't really know why. It seems a driver thing, as it crashes quickly.

So far, all the 6.38 and 6.39 Betas crashed with computational errors on my computers, and yes, the computers were sluggish when running these units.

ID: 21372 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21374 - Posted: 10 Jun 2011, 10:25:50 UTC - in response to Message 21372.  
Last modified: 10 Jun 2011, 10:52:44 UTC

nenym's No heartbeat from core client for 30 sec - exiting

This can be caused by the Boinc client, the app, another CPU project, or something else running on the system.
What else are you crunching/running? This is common when the CPU is over-taxed, often by other CPU projects.
Are you using recommended settings (freeing up at least 1 CPU core/thread per GPU and using SWAN_SYNC=0)?

It may be worth noting that GDF's latest tasks use 98% of the GPU, so previous overclocks (Bedrich 755MHz GTX480) that never encountered such high utilization may no longer be reliable.
It's best to Beta test with default/factory settings and recommended GPUGrid settings. Also worth keeping an eye on temperature - I saw a rise of a couple of degrees, but in some cases that might be enough to push the card over the edge.
ID: 21374 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21375 - Posted: 10 Jun 2011, 10:27:57 UTC - in response to Message 21374.  

Any problem with the higher priority?

gdf
ID: 21375 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nenym

Send message
Joined: 31 Mar 09
Posts: 137
Credit: 1,429,587,071
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21377 - Posted: 10 Jun 2011, 10:46:32 UTC - in response to Message 21374.  

- running CPU projects, when the issue occured: Spinhenge, Ibercivis Wilson,
- one core free (4CPU Xeon, set maximum 99% cores),
- Swan_sync has been set for a long time,
- factory clock is 900 MHz, OC to 925 is running without any issue for any long run task
- I did some work in simple Excel spreadsheet.
The second 6.39 task run and finished with the same OC 925 MHz and the same CPU applications without issues, but priority of CPU 6.39 process has been set by PT to low. I did not try it without PT "renicing".
I know "no heartbeat from core", it happened running Ibercivis (not sure which subproject) or CUDA AQUA a lot of months ago.
ID: 21377 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21378 - Posted: 10 Jun 2011, 10:49:44 UTC - in response to Message 21375.  

Not personally, but I suppose if the GPUGrid priority was too high it might under some circumstances promote a Boinc client to crash. That said the system would still need to be heavily taxed by other CPU processes.
ID: 21378 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TylerChris

Send message
Joined: 12 Feb 10
Posts: 11
Credit: 50,020,466
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 21379 - Posted: 10 Jun 2011, 12:04:13 UTC
Last modified: 10 Jun 2011, 12:05:49 UTC

One failure as expected with the v6.38
One success with the v6.39
Did not notice any issues,ran a little slower than most here taking 4.8 hours on a gtx275 ,windows 7 64.:)
ID: 21379 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21380 - Posted: 10 Jun 2011, 12:36:33 UTC - in response to Message 21378.  
Last modified: 10 Jun 2011, 12:38:41 UTC

I've read somewhere that the priority of any process should not be raised to "high", because it will interfere with csrss.exe which is an essential subsystem (running intentionally at "high" priority by default), responsible for console UI operations, and thread management. So I recommend "above normal" as maximum priority level for any application (including priority changer tools). So far I've completed seven 6.39 beta WU fine on this priority level.
ID: 21380 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21381 - Posted: 10 Jun 2011, 12:41:13 UTC - in response to Message 21379.  

Just after doing a system restart and got this 6.38 error message:

Event Type: Information
Event Source: Application Error
Event Category: (100)
Event ID: 1004
Date: 10/06/2011
Time: 13:15:29
User: N/A
Computer: S
Description:
Reporting queued error: faulting application acemdbeta_6.38_windows_intelx86__cuda31, version 0.0.0.0, faulting module acemdbeta_6.38_windows_intelx86__cuda31, version 0.0.0.0, fault address 0x00002c58.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 41 70 70 6c 69 63 61 74 Applicat
0008: 69 6f 6e 20 46 61 69 6c ion Fail
0010: 75 72 65 20 20 61 63 65 ure ace
0018: 6d 64 62 65 74 61 5f 36 mdbeta_6
0020: 2e 33 38 5f 77 69 6e 64 .38_wind
0028: 6f 77 73 5f 69 6e 74 65 ows_inte
0030: 6c 78 38 36 5f 5f 63 75 lx86__cu
0038: 64 61 33 31 20 30 2e 30 da31 0.0
0040: 2e 30 2e 30 20 69 6e 20 .0.0 in
0048: 61 63 65 6d 64 62 65 74 acemdbet
0050: 61 5f 36 2e 33 38 5f 77 a_6.38_w
0058: 69 6e 64 6f 77 73 5f 69 indows_i
0060: 6e 74 65 6c 78 38 36 5f ntelx86_
0068: 5f 63 75 64 61 33 31 20 _cuda31
0070: 30 2e 30 2e 30 2e 30 20 0.0.0.0
0078: 61 74 20 6f 66 66 73 65 at offse
0080: 74 20 30 30 30 30 32 63 t 00002c
0088: 35 38 58


I was running a Tony long task, no Beta's. The Tony task is still running after I closed the Windows pop-up Error message.

Boinc log:
10/06/2011 13:15:48 | | Starting BOINC client version 6.12.28 for windows_x86_64
10/06/2011 13:15:48 | | Config: report completed tasks immediately
10/06/2011 13:15:48 | | Config: use all coprocessors
10/06/2011 13:15:48 | | Config: zero long-term debts on startup
10/06/2011 13:15:48 | | log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched_debug, dcf_debug
10/06/2011 13:15:48 | | log flags: debt_debug, rr_simulation, sched_op_debug
10/06/2011 13:15:48 | | Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.5
10/06/2011 13:15:48 | | Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
10/06/2011 13:15:48 | | Running under account Administrator
10/06/2011 13:15:48 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7]
10/06/2011 13:15:48 | | Processor: 256.00 KB cache
10/06/2011 13:15:48 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt aes pbe
10/06/2011 13:15:48 | | OS: Microsoft Windows Server 2003 "R2": Standard Server x64 Edition, Service Pack 2, (05.02.3790.00)
10/06/2011 13:15:48 | | Memory: 7.98 GB physical, 9.56 GB virtual
10/06/2011 13:15:48 | | Disk: 1.82 TB total, 1.43 TB free
10/06/2011 13:15:48 | | Local time is UTC +1 hours
10/06/2011 13:15:48 | | NVIDIA GPU 0: GeForce GTX 470 (driver version 27533, CUDA version 4000, compute capability 2.0, 1280MB, 1089 GFLOPS peak)
10/06/2011 13:15:48 | | NVIDIA library reports 1 GPU
10/06/2011 13:15:48 | | No ATI library found.
10/06/2011 13:15:48 | GPUGRID | URL http://www.gpugrid.net/; Computer ID 91249; resource share 1000
10/06/2011 13:15:48 | | General prefs: using separate prefs for home
10/06/2011 13:15:48 | | Reading preferences override file
10/06/2011 13:15:48 | | Preferences:
10/06/2011 13:15:48 | | max memory usage when active: 7357.62MB
10/06/2011 13:15:48 | | max memory usage when idle: 7357.62MB
10/06/2011 13:15:48 | | max disk usage: 100.00GB
10/06/2011 13:15:48 | | max CPUs used: 6
10/06/2011 13:15:48 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
10/06/2011 13:15:48 | | [cpu_sched] Request CPU reschedule: Prefs update
10/06/2011 13:15:48 | | [cpu_sched] Request CPU reschedule: Startup
10/06/2011 13:15:48 | | Not using a proxy
10/06/2011 13:16:19 | | [cpu_sched] Request CPU reschedule: Idle state change
10/06/2011 13:16:19 | | [cpu_sched] Request CPU reschedule: periodic CPU scheduling
10/06/2011 13:16:19 | GPUGRID | [debt] CPU ineligible; LTD 0.00
10/06/2011 13:16:19 | | [debt] CPU LTD: adding offset 0.000000
10/06/2011 13:16:19 | GPUGRID | [debt] NVIDIA GPU LTD 0.00 delta 0.00 (0.50*0.00 - 0.00)/1
10/06/2011 13:16:19 | | [debt] NVIDIA GPU LTD: adding offset 0.000000
10/06/2011 13:16:19 | | [cpu_sched] schedule_cpus(): start
10/06/2011 13:16:19 | | [rr_sim] rr_sim start: work_buf_total 4320.86 on_frac 0.975 active_frac 0.897
10/06/2011 13:16:19 | GPUGRID | [rr_sim] 0.00: starting A429-TONI_AGGdense1-8-100-RND0011_0 (0.14 CPU + 1.00 NV)
10/06/2011 13:16:19 | GPUGRID | [rr_sim] 26954.51: A429-TONI_AGGdense1-8-100-RND0011_0 finishes after 3489.82 (151482.68G/43.41G)
10/06/2011 13:16:19 | GPUGRID | [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 (coprocessor job, FIFO)
10/06/2011 13:16:19 | | [cpu_sched_debug] reserving 1.000000 of coproc CUDA
10/06/2011 13:16:19 | | [cpu_sched] enforce_schedule(): start
10/06/2011 13:16:19 | | [cpu_sched] preliminary job list:
10/06/2011 13:16:19 | GPUGRID | [cpu_sched] 0: A429-TONI_AGGdense1-8-100-RND0011_0 (MD: no; UTS: no)
10/06/2011 13:16:19 | | [cpu_sched] final job list:
10/06/2011 13:16:19 | GPUGRID | [cpu_sched] 6: A429-TONI_AGGdense1-8-100-RND0011_0 (MD: no; UTS: no)
10/06/2011 13:16:19 | GPUGRID | [coproc] Assigning CUDA instance 0 to A429-TONI_AGGdense1-8-100-RND0011_0
10/06/2011 13:16:19 | GPUGRID | [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0
10/06/2011 13:16:19 | GPUGRID | [cpu_sched] A429-TONI_AGGdense1-8-100-RND0011_0 sched state 1 next 2 task state 0
10/06/2011 13:16:19 | GPUGRID | Restarting task A429-TONI_AGGdense1-8-100-RND0011_0 using acemdlong version 613
10/06/2011 13:16:19 | | [cpu_sched] app startup took 21.218748 secs
10/06/2011 13:16:19 | | [cpu_sched] Request CPU reschedule: slow app startup
10/06/2011 13:16:19 | | [cpu_sched] enforce_schedule: end
10/06/2011 13:16:42 | GPUGRID | [debt] CPU ineligible; LTD 0.00
10/06/2011 13:16:42 | | [debt] CPU LTD: adding offset -21.776753
10/06/2011 13:16:42 | GPUGRID | [debt] NVIDIA GPU LTD -11.11 delta -11.11 (0.50*22.22 - 22.22)/1
10/06/2011 13:16:42 | | [debt] NVIDIA GPU LTD: adding offset -11.109374
10/06/2011 13:16:42 | | [cpu_sched] schedule_cpus(): start
10/06/2011 13:16:42 | | [rr_sim] rr_sim start: work_buf_total 4320.86 on_frac 0.975 active_frac 0.897
10/06/2011 13:16:42 | GPUGRID | [rr_sim] 0.00: starting A429-TONI_AGGdense1-8-100-RND0011_0 (0.14 CPU + 1.00 NV)
10/06/2011 13:16:42 | GPUGRID | [rr_sim] 26954.51: A429-TONI_AGGdense1-8-100-RND0011_0 finishes after 3489.82 (151482.68G/43.41G)
10/06/2011 13:16:42 | GPUGRID | [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 (coprocessor job, FIFO)
ID: 21381 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21382 - Posted: 10 Jun 2011, 13:13:09 UTC - in response to Message 21381.  
Last modified: 10 Jun 2011, 14:21:33 UTC

New application coming up with ABOVE_NORMAL_PRIORITY_CLASS.
acemdbeta_6.40 is out.

gdf
ID: 21382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21383 - Posted: 10 Jun 2011, 13:29:01 UTC - in response to Message 21382.  

Don't really see the need to raise priority above low myself.

I have done it myself and it makes little difference on a machine running other BOINC tasks and will certainly not make any difference on a machine using SWAN synchronization.

As for GPU utilization you run the risk of sucking so much use of the GPU that the machine can't perform without turning GPUGrid off.

WARNING: don't try to suck all the juice out of the apple.


Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 21383 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21384 - Posted: 10 Jun 2011, 13:47:47 UTC - in response to Message 21383.  

I think the intention is to improve stability on some systems. A well optimized system is unlikely to see any benefit, but there are so many poorly configured/optimized systems around that such a potential for overall project improvement must be investigated. In the last year or so I have come across several projects (5 or 6) that have at some stage impacted on other Boinc projects, so some Kevlar skin might help. I do think that such bullet-proofing changes would be best organized through Berkeley but as soon as just one project bypasses a default setting (and many have) it messes with other projects.
ID: 21384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21385 - Posted: 10 Jun 2011, 14:26:23 UTC - in response to Message 21384.  

The point of increasing slightly the priority is that sometime we see people with a GTX580 which are performing at less than half the speed. We assume that it is because they are overloading the CPU. Nowadays, the CPU is naturally overloaded due to hyper-threading, hopefully this will not impact the performance of their system but guarantee a decent responsiveness to the GPU.

If you are using SWAN_SYNC, this is not going to change anything, but using SWAN_SYNC as default is not practicable.

gdf
ID: 21385 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21386 - Posted: 10 Jun 2011, 14:44:50 UTC - in response to Message 21385.  
Last modified: 11 Jun 2011, 7:19:26 UTC

Running this 29-GIANNI_TESTDHFR10-0-5-RND5720_2 task on the 6.40 (cuda31) app.
No problems so far. Looks like it will take about 3h on a GTX470.
It's a fairly safe system; only using 6 of 8 threads for CPU crunching (various projects) and SWAN_SYNC=0 in use. I did suspend it and start another GPUGrid task before continuing the Beta, so it seems robust enough.
Again 98% GPU usage and 322MB GDDR.

- Completed and validated 10,414.70 10,378.75 7,491.18 11,236.77 ACEMD beta version v6.40 (cuda31)
ID: 21386 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21387 - Posted: 11 Jun 2011, 8:41:54 UTC - in response to Message 21386.  

So far so good.
If there is nothing else, I will pass it to production today.

gdf
ID: 21387 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Norman_RKN

Send message
Joined: 22 Dec 09
Posts: 16
Credit: 23,522,575
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 21543 - Posted: 26 Jun 2011, 16:26:13 UTC - in response to Message 21387.  

i use swan_sync and switch the app manually to lowest priority, because the system is very laggy.
without swan_sync no problem, but the gpu usage is 10% lesser.
only the "toni-WUs" are running with more acceptable gpu-load but 90%+ would be better for the project.
energy is not cheap like water.

with the newest nvidia beta driver the applications running fine.
no more reboot/crash and the effort is lost.
now i can reboot or crashing the system without any WU damage. ;)



http://www.rechenkraft.net
ID: 21543 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Graphics cards (GPUs) : New acemd beta

©2025 Universitat Pompeu Fabra