Advanced search

Message boards : Graphics cards (GPUs) : New acemd beta

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21333 - Posted: 7 Jun 2011 | 15:35:23 UTC
Last modified: 7 Jun 2011 | 15:43:57 UTC

I have uploaded a new acemdbeta application for Linux and some workunits to test.

Mainly bug fixes.

gdf

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1626
Credit: 9,384,566,723
RAC: 19,075,423
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21334 - Posted: 7 Jun 2011 | 15:43:55 UTC - in response to Message 21333.

Does this new app resolve the Cuda4/downclocking bug discussed in http://www.gpugrid.net/forum_thread.php?id=2534?

If not, may I refer you to http://boinc.berkeley.edu/trac/changeset/23649/, and the new paragraph in http://boinc.berkeley.edu/trac/wiki/AppCoprocessor:

Cleanup on premature exit
The BOINC client may kill your application in the middle. This may leave the GPU in a bad state. To prevent this, call

boinc_begin_critical_section();

before using the GPU, and between GPU kernels do

if (boinc_status.quit_request || boinc_status.abort_request) {
// cudaThreadSynchronize(); or whatever is needed
boinc_end_critical_section();
while (1) boinc_sleep(1);
}

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21335 - Posted: 7 Jun 2011 | 15:49:28 UTC - in response to Message 21254.

No. This is a cuda3.1 app.

yet I don't understand what that means. In the middle of what?

gdf

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1626
Credit: 9,384,566,723
RAC: 19,075,423
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21336 - Posted: 7 Jun 2011 | 15:59:00 UTC - in response to Message 21335.

No. This is a cuda3.1 app.

yet I don't understand what that means. In the middle of what?

gdf

The BOINC API library code is not threadsafe. If BOINC calls for the application to quit or suspend during computation, BOINC may terminate threads in an unsafe way. The new nVidia drivers which can handle Cuda4 apps are much more sensitive to this behaviour, even if the app that's running is only using a lower CUDA level. In self-protection, nVidia has written the new drivers - eveything strictly *later than* 266.58, from memory - to down-clock the card into a protective state when the abnormal thread termination is detected.

That's my layman's interpretation - I'll try and get you the full report quickly.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1626
Credit: 9,384,566,723
RAC: 19,075,423
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21337 - Posted: 7 Jun 2011 | 16:06:15 UTC

Ooops, it's just been pointed out to me that this is a Linux beta app, and my remarks have been concentrating on the Windows API - so probably not important in this case.

But, since the new API code was only posted last night, it's still worth you knowing about it in preparation for the next Windows application test.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21347 - Posted: 8 Jun 2011 | 7:27:11 UTC - in response to Message 21337.

The first results are fine. We are going to produce today a beta for Windows.
This applications will substitute all production apps already this week, if all goes well
gdf

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21351 - Posted: 9 Jun 2011 | 10:05:54 UTC - in response to Message 21347.

Windows application is out.

gdf

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21354 - Posted: 9 Jun 2011 | 13:24:40 UTC - in response to Message 21351.

Windows application is out.

gdf


I got two of these beta wus. Both of them failed immediately with exit code -1073741819 (0xc0000005). Maybe they can't stand overclocking?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21358 - Posted: 9 Jun 2011 | 14:49:15 UTC - in response to Message 21354.
Last modified: 9 Jun 2011 | 17:55:13 UTC

acemdbeta_6.38_windows_intelx86__cuda31 - Application Error

The exception unknown software exception (0xc0000005) occurred in the application at location 0x0040258c.

Click OK to terminate the program
Click on CANCEL to debug the program

System: 2003 Server x64 i7-2600K, 8GB DDR3, 2TB, GTX470 (native clocks, increased fan speeds)

Task ran for 1h45min but stayed at 0% complete, apparently going through a loop.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21359 - Posted: 9 Jun 2011 | 15:50:16 UTC - in response to Message 21358.

My only other two Betas also failed after 7 and 15sec.

Worth noting that if there is a pop-up error message and you don't select to end the task it will continue running indefinately.
So if anyone has such a message do something about it or you will just keep running the same erroneuos Beta task.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21360 - Posted: 9 Jun 2011 | 16:00:53 UTC - in response to Message 21359.

I got two more of these 6.38 beta, both of them failed immediately just like the previous ones. There were a pop-up application error message.

Application Failure acemdbeta_6.38_windows_intelx86__cuda31 0.0.0.0 in acemdbeta_6.38_windows_intelx86__cuda31 0.0.0.0 at offset 00002c58

These 6.38 beta WUs seem to fail on every computer, so I guess I shouldn't blame the overclocking. :)

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21361 - Posted: 9 Jun 2011 | 16:47:55 UTC - in response to Message 21360.

No, I am trying with a quick change in few minutes to see if it works. Otherwise, it will take more time to debug it.

gdf

Profile nenym
Send message
Joined: 31 Mar 09
Posts: 137
Credit: 1,308,230,581
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21362 - Posted: 9 Jun 2011 | 17:02:31 UTC

The same here
Run time 66.171875
CPU time 0
Interesting, Swan_sync set and works with standard/long run tasks.

<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
]]>
Win XP 64bit, GTX 560.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21363 - Posted: 9 Jun 2011 | 17:04:11 UTC - in response to Message 21362.

acemdbeta_6.39 substitutes acemdbeta_6.38 hopefully with better results...

gdf

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21364 - Posted: 9 Jun 2011 | 17:18:48 UTC - in response to Message 21363.
Last modified: 9 Jun 2011 | 17:49:43 UTC

This 6.39 task reached 10% complete in 17min 43sec on a stock GTX470 (System), so estimated run time is 3h.
98% GPU Utilization, 315MB video memory usage.

Looks good so far... 20%

Profile nenym
Send message
Joined: 31 Mar 09
Posts: 137
Credit: 1,308,230,581
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21365 - Posted: 9 Jun 2011 | 19:04:40 UTC
Last modified: 9 Jun 2011 | 19:08:56 UTC

Strange application that 6.39 beta one.
6.39 CPU process started with high priority. The system GUI was sluggish, 1 - 2 minutes response. The Boinc GUI freezed, the Boinc core restarted (CPU tasks without checkpoint have started from zero progress). After setting the priority of 6.39 CPU process to low by Process Tamer - response of PT GUI was about 2 minutes (high priority set for PT process!) - the Boinc core restarted again and now things seems to be OK.
After 12 min 8% progress.
Win XP 64bit, GTX 560Ti, 925MHz.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21366 - Posted: 9 Jun 2011 | 20:20:36 UTC - in response to Message 21364.
Last modified: 9 Jun 2011 | 22:38:37 UTC

4069069 2520078 9 Jun 2011 17:12:15 UTC 9 Jun 2011 20:10:53 UTC Completed and validated 10,510.59 10,444.27 7,491.18 11,236.77 ACEMD beta version v6.39 (cuda31)

Well, this task ran and finished (2h 55min) without error on the 6.39 app. Hopefully other results are just as positive.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21367 - Posted: 9 Jun 2011 | 21:07:19 UTC - in response to Message 21366.

Any problem?

gdf

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21368 - Posted: 9 Jun 2011 | 21:21:47 UTC - in response to Message 21367.
Last modified: 9 Jun 2011 | 21:38:29 UTC

I got one 6.39 beta WU, it's running for 16 minutes now, 15% completed, 98% GPU usage (i7-950 @ 3.56GHz, GTX 580 @ 890MHz, WinXP, Above average priority).

Edit:

I put this WU to my GTX 590 @ 700MHz (my monitor is connected to this card) tried on both GPU of the 590, and I don't experience sluggish Windows GUI.

After 32 minutes 28.2% completed.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21369 - Posted: 9 Jun 2011 | 23:14:03 UTC - in response to Message 21368.

This 6.39 beta WU completed fine in 6438s (1h47m) 3.219 ms/step.
I got another one, it's processing will begin in 3 hours, because I'm going to sleep now, so I can't micromanage the processing order of the WUs. :)

Profile nenym
Send message
Joined: 31 Mar 09
Posts: 137
Credit: 1,308,230,581
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21370 - Posted: 9 Jun 2011 | 23:17:24 UTC

No problem now, second task runs without any oddness.
The first task 48-GIANNI_TESTDHFR10-2-5-RND0377:13,000.61/ 13,131.97/ 7,491.18/ 11,236.77 ACEMD beta version v6.39 (cuda31). XP 64bit, GTX560@ 925MHz, driver 257.33.
Interesting stderr at beginning (maybe when system was sluggish):

<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 560 Ti"
# Clock rate: 1.85 GHz
# Total amount of global memory: 1073283072 bytes
# Number of multiprocessors: 8
# Number of cores: 64
SWAN: Using synchronization method 0
MDIO: cannot open file "restart.coor"
No heartbeat from core client for 30 sec - exiting
Other strangeness:CPU time > Run time.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 486
Credit: 11,389,394,370
RAC: 9,104,127
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21371 - Posted: 10 Jun 2011 | 1:56:51 UTC

So far, all the 6.38 and 6.39 Betas crashed with computational errors on my computers, and yes, the computers were sluggish when running these units.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21372 - Posted: 10 Jun 2011 | 7:49:21 UTC - in response to Message 21371.

Uhmm, don't really know why. It seems a driver thing, as it crashes quickly.

So far, all the 6.38 and 6.39 Betas crashed with computational errors on my computers, and yes, the computers were sluggish when running these units.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21374 - Posted: 10 Jun 2011 | 10:25:50 UTC - in response to Message 21372.
Last modified: 10 Jun 2011 | 10:52:44 UTC

nenym's No heartbeat from core client for 30 sec - exiting

This can be caused by the Boinc client, the app, another CPU project, or something else running on the system.
What else are you crunching/running? This is common when the CPU is over-taxed, often by other CPU projects.
Are you using recommended settings (freeing up at least 1 CPU core/thread per GPU and using SWAN_SYNC=0)?

It may be worth noting that GDF's latest tasks use 98% of the GPU, so previous overclocks (Bedrich 755MHz GTX480) that never encountered such high utilization may no longer be reliable.
It's best to Beta test with default/factory settings and recommended GPUGrid settings. Also worth keeping an eye on temperature - I saw a rise of a couple of degrees, but in some cases that might be enough to push the card over the edge.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21375 - Posted: 10 Jun 2011 | 10:27:57 UTC - in response to Message 21374.

Any problem with the higher priority?

gdf

Profile nenym
Send message
Joined: 31 Mar 09
Posts: 137
Credit: 1,308,230,581
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21377 - Posted: 10 Jun 2011 | 10:46:32 UTC - in response to Message 21374.

- running CPU projects, when the issue occured: Spinhenge, Ibercivis Wilson,
- one core free (4CPU Xeon, set maximum 99% cores),
- Swan_sync has been set for a long time,
- factory clock is 900 MHz, OC to 925 is running without any issue for any long run task
- I did some work in simple Excel spreadsheet.
The second 6.39 task run and finished with the same OC 925 MHz and the same CPU applications without issues, but priority of CPU 6.39 process has been set by PT to low. I did not try it without PT "renicing".
I know "no heartbeat from core", it happened running Ibercivis (not sure which subproject) or CUDA AQUA a lot of months ago.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21378 - Posted: 10 Jun 2011 | 10:49:44 UTC - in response to Message 21375.

Not personally, but I suppose if the GPUGrid priority was too high it might under some circumstances promote a Boinc client to crash. That said the system would still need to be heavily taxed by other CPU processes.

TylerChris
Send message
Joined: 12 Feb 10
Posts: 11
Credit: 50,020,466
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 21379 - Posted: 10 Jun 2011 | 12:04:13 UTC
Last modified: 10 Jun 2011 | 12:05:49 UTC

One failure as expected with the v6.38
One success with the v6.39
Did not notice any issues,ran a little slower than most here taking 4.8 hours on a gtx275 ,windows 7 64.:)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21380 - Posted: 10 Jun 2011 | 12:36:33 UTC - in response to Message 21378.
Last modified: 10 Jun 2011 | 12:38:41 UTC

I've read somewhere that the priority of any process should not be raised to "high", because it will interfere with csrss.exe which is an essential subsystem (running intentionally at "high" priority by default), responsible for console UI operations, and thread management. So I recommend "above normal" as maximum priority level for any application (including priority changer tools). So far I've completed seven 6.39 beta WU fine on this priority level.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21381 - Posted: 10 Jun 2011 | 12:41:13 UTC - in response to Message 21379.

Just after doing a system restart and got this 6.38 error message:

Event Type: Information
Event Source: Application Error
Event Category: (100)
Event ID: 1004
Date: 10/06/2011
Time: 13:15:29
User: N/A
Computer: S
Description:
Reporting queued error: faulting application acemdbeta_6.38_windows_intelx86__cuda31, version 0.0.0.0, faulting module acemdbeta_6.38_windows_intelx86__cuda31, version 0.0.0.0, fault address 0x00002c58.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 41 70 70 6c 69 63 61 74 Applicat
0008: 69 6f 6e 20 46 61 69 6c ion Fail
0010: 75 72 65 20 20 61 63 65 ure ace
0018: 6d 64 62 65 74 61 5f 36 mdbeta_6
0020: 2e 33 38 5f 77 69 6e 64 .38_wind
0028: 6f 77 73 5f 69 6e 74 65 ows_inte
0030: 6c 78 38 36 5f 5f 63 75 lx86__cu
0038: 64 61 33 31 20 30 2e 30 da31 0.0
0040: 2e 30 2e 30 20 69 6e 20 .0.0 in
0048: 61 63 65 6d 64 62 65 74 acemdbet
0050: 61 5f 36 2e 33 38 5f 77 a_6.38_w
0058: 69 6e 64 6f 77 73 5f 69 indows_i
0060: 6e 74 65 6c 78 38 36 5f ntelx86_
0068: 5f 63 75 64 61 33 31 20 _cuda31
0070: 30 2e 30 2e 30 2e 30 20 0.0.0.0
0078: 61 74 20 6f 66 66 73 65 at offse
0080: 74 20 30 30 30 30 32 63 t 00002c
0088: 35 38 58


I was running a Tony long task, no Beta's. The Tony task is still running after I closed the Windows pop-up Error message.

Boinc log:
10/06/2011 13:15:48 | | Starting BOINC client version 6.12.28 for windows_x86_64
10/06/2011 13:15:48 | | Config: report completed tasks immediately
10/06/2011 13:15:48 | | Config: use all coprocessors
10/06/2011 13:15:48 | | Config: zero long-term debts on startup
10/06/2011 13:15:48 | | log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched_debug, dcf_debug
10/06/2011 13:15:48 | | log flags: debt_debug, rr_simulation, sched_op_debug
10/06/2011 13:15:48 | | Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.5
10/06/2011 13:15:48 | | Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
10/06/2011 13:15:48 | | Running under account Administrator
10/06/2011 13:15:48 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz [Family 6 Model 42 Stepping 7]
10/06/2011 13:15:48 | | Processor: 256.00 KB cache
10/06/2011 13:15:48 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt aes pbe
10/06/2011 13:15:48 | | OS: Microsoft Windows Server 2003 "R2": Standard Server x64 Edition, Service Pack 2, (05.02.3790.00)
10/06/2011 13:15:48 | | Memory: 7.98 GB physical, 9.56 GB virtual
10/06/2011 13:15:48 | | Disk: 1.82 TB total, 1.43 TB free
10/06/2011 13:15:48 | | Local time is UTC +1 hours
10/06/2011 13:15:48 | | NVIDIA GPU 0: GeForce GTX 470 (driver version 27533, CUDA version 4000, compute capability 2.0, 1280MB, 1089 GFLOPS peak)
10/06/2011 13:15:48 | | NVIDIA library reports 1 GPU
10/06/2011 13:15:48 | | No ATI library found.
10/06/2011 13:15:48 | GPUGRID | URL http://www.gpugrid.net/; Computer ID 91249; resource share 1000
10/06/2011 13:15:48 | | General prefs: using separate prefs for home
10/06/2011 13:15:48 | | Reading preferences override file
10/06/2011 13:15:48 | | Preferences:
10/06/2011 13:15:48 | | max memory usage when active: 7357.62MB
10/06/2011 13:15:48 | | max memory usage when idle: 7357.62MB
10/06/2011 13:15:48 | | max disk usage: 100.00GB
10/06/2011 13:15:48 | | max CPUs used: 6
10/06/2011 13:15:48 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
10/06/2011 13:15:48 | | [cpu_sched] Request CPU reschedule: Prefs update
10/06/2011 13:15:48 | | [cpu_sched] Request CPU reschedule: Startup
10/06/2011 13:15:48 | | Not using a proxy
10/06/2011 13:16:19 | | [cpu_sched] Request CPU reschedule: Idle state change
10/06/2011 13:16:19 | | [cpu_sched] Request CPU reschedule: periodic CPU scheduling
10/06/2011 13:16:19 | GPUGRID | [debt] CPU ineligible; LTD 0.00
10/06/2011 13:16:19 | | [debt] CPU LTD: adding offset 0.000000
10/06/2011 13:16:19 | GPUGRID | [debt] NVIDIA GPU LTD 0.00 delta 0.00 (0.50*0.00 - 0.00)/1
10/06/2011 13:16:19 | | [debt] NVIDIA GPU LTD: adding offset 0.000000
10/06/2011 13:16:19 | | [cpu_sched] schedule_cpus(): start
10/06/2011 13:16:19 | | [rr_sim] rr_sim start: work_buf_total 4320.86 on_frac 0.975 active_frac 0.897
10/06/2011 13:16:19 | GPUGRID | [rr_sim] 0.00: starting A429-TONI_AGGdense1-8-100-RND0011_0 (0.14 CPU + 1.00 NV)
10/06/2011 13:16:19 | GPUGRID | [rr_sim] 26954.51: A429-TONI_AGGdense1-8-100-RND0011_0 finishes after 3489.82 (151482.68G/43.41G)
10/06/2011 13:16:19 | GPUGRID | [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 (coprocessor job, FIFO)
10/06/2011 13:16:19 | | [cpu_sched_debug] reserving 1.000000 of coproc CUDA
10/06/2011 13:16:19 | | [cpu_sched] enforce_schedule(): start
10/06/2011 13:16:19 | | [cpu_sched] preliminary job list:
10/06/2011 13:16:19 | GPUGRID | [cpu_sched] 0: A429-TONI_AGGdense1-8-100-RND0011_0 (MD: no; UTS: no)
10/06/2011 13:16:19 | | [cpu_sched] final job list:
10/06/2011 13:16:19 | GPUGRID | [cpu_sched] 6: A429-TONI_AGGdense1-8-100-RND0011_0 (MD: no; UTS: no)
10/06/2011 13:16:19 | GPUGRID | [coproc] Assigning CUDA instance 0 to A429-TONI_AGGdense1-8-100-RND0011_0
10/06/2011 13:16:19 | GPUGRID | [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0
10/06/2011 13:16:19 | GPUGRID | [cpu_sched] A429-TONI_AGGdense1-8-100-RND0011_0 sched state 1 next 2 task state 0
10/06/2011 13:16:19 | GPUGRID | Restarting task A429-TONI_AGGdense1-8-100-RND0011_0 using acemdlong version 613
10/06/2011 13:16:19 | | [cpu_sched] app startup took 21.218748 secs
10/06/2011 13:16:19 | | [cpu_sched] Request CPU reschedule: slow app startup
10/06/2011 13:16:19 | | [cpu_sched] enforce_schedule: end
10/06/2011 13:16:42 | GPUGRID | [debt] CPU ineligible; LTD 0.00
10/06/2011 13:16:42 | | [debt] CPU LTD: adding offset -21.776753
10/06/2011 13:16:42 | GPUGRID | [debt] NVIDIA GPU LTD -11.11 delta -11.11 (0.50*22.22 - 22.22)/1
10/06/2011 13:16:42 | | [debt] NVIDIA GPU LTD: adding offset -11.109374
10/06/2011 13:16:42 | | [cpu_sched] schedule_cpus(): start
10/06/2011 13:16:42 | | [rr_sim] rr_sim start: work_buf_total 4320.86 on_frac 0.975 active_frac 0.897
10/06/2011 13:16:42 | GPUGRID | [rr_sim] 0.00: starting A429-TONI_AGGdense1-8-100-RND0011_0 (0.14 CPU + 1.00 NV)
10/06/2011 13:16:42 | GPUGRID | [rr_sim] 26954.51: A429-TONI_AGGdense1-8-100-RND0011_0 finishes after 3489.82 (151482.68G/43.41G)
10/06/2011 13:16:42 | GPUGRID | [cpu_sched] scheduling A429-TONI_AGGdense1-8-100-RND0011_0 (coprocessor job, FIFO)

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21382 - Posted: 10 Jun 2011 | 13:13:09 UTC - in response to Message 21381.
Last modified: 10 Jun 2011 | 14:21:33 UTC

New application coming up with ABOVE_NORMAL_PRIORITY_CLASS.
acemdbeta_6.40 is out.

gdf

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21383 - Posted: 10 Jun 2011 | 13:29:01 UTC - in response to Message 21382.

Don't really see the need to raise priority above low myself.

I have done it myself and it makes little difference on a machine running other BOINC tasks and will certainly not make any difference on a machine using SWAN synchronization.

As for GPU utilization you run the risk of sucking so much use of the GPU that the machine can't perform without turning GPUGrid off.

WARNING: don't try to suck all the juice out of the apple.


____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21384 - Posted: 10 Jun 2011 | 13:47:47 UTC - in response to Message 21383.

I think the intention is to improve stability on some systems. A well optimized system is unlikely to see any benefit, but there are so many poorly configured/optimized systems around that such a potential for overall project improvement must be investigated. In the last year or so I have come across several projects (5 or 6) that have at some stage impacted on other Boinc projects, so some Kevlar skin might help. I do think that such bullet-proofing changes would be best organized through Berkeley but as soon as just one project bypasses a default setting (and many have) it messes with other projects.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21385 - Posted: 10 Jun 2011 | 14:26:23 UTC - in response to Message 21384.

The point of increasing slightly the priority is that sometime we see people with a GTX580 which are performing at less than half the speed. We assume that it is because they are overloading the CPU. Nowadays, the CPU is naturally overloaded due to hyper-threading, hopefully this will not impact the performance of their system but guarantee a decent responsiveness to the GPU.

If you are using SWAN_SYNC, this is not going to change anything, but using SWAN_SYNC as default is not practicable.

gdf

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21386 - Posted: 10 Jun 2011 | 14:44:50 UTC - in response to Message 21385.
Last modified: 11 Jun 2011 | 7:19:26 UTC

Running this 29-GIANNI_TESTDHFR10-0-5-RND5720_2 task on the 6.40 (cuda31) app.
No problems so far. Looks like it will take about 3h on a GTX470.
It's a fairly safe system; only using 6 of 8 threads for CPU crunching (various projects) and SWAN_SYNC=0 in use. I did suspend it and start another GPUGrid task before continuing the Beta, so it seems robust enough.
Again 98% GPU usage and 322MB GDDR.

- Completed and validated 10,414.70 10,378.75 7,491.18 11,236.77 ACEMD beta version v6.40 (cuda31)

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 21387 - Posted: 11 Jun 2011 | 8:41:54 UTC - in response to Message 21386.

So far so good.
If there is nothing else, I will pass it to production today.

gdf

Norman_RKN
Send message
Joined: 22 Dec 09
Posts: 16
Credit: 23,522,575
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 21543 - Posted: 26 Jun 2011 | 16:26:13 UTC - in response to Message 21387.

i use swan_sync and switch the app manually to lowest priority, because the system is very laggy.
without swan_sync no problem, but the gpu usage is 10% lesser.
only the "toni-WUs" are running with more acceptable gpu-load but 90%+ would be better for the project.
energy is not cheap like water.

with the newest nvidia beta driver the applications running fine.
no more reboot/crash and the effort is lost.
now i can reboot or crashing the system without any WU damage. ;)



____________
http://www.rechenkraft.net

Post to thread

Message boards : Graphics cards (GPUs) : New acemd beta

//