Advanced search

Message boards : News : *CXCL12_chalcone_umbrella* batch

Author Message
Gerard
Send message
Joined: 26 Mar 14
Posts: 101
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 42853 - Posted: 28 Feb 2016 | 17:52:41 UTC
Last modified: 28 Feb 2016 | 17:53:17 UTC

Hi everyone,

yesterday we launched a bit more than 13,000 short WUs called *CXCL12_chalcone_umbrella*. They are pretty small WU, of 8ns (compared to a normal long WU ~40ns) and we hope they are easy and fun for you to crunch. Please post any problem you may encounter.

Scientists have been using a technique called Umbrella Sampling for some time now, with relative success in determining what we call binding free energy (which indicates how strongly a drug can bind its protein target). It is a pretty straightforward and much less expensive technique compared to the one we regularly use in our lab (adaptive sampling). However, it is particularly error-prone if the assumptions we take are wrong.

We are particularly excited about these WU, because while most scientific effort have been focused on reproducing free energies for single particular models or (most of the time) very simple toy models, it is the first time to our knowledge that, thanks to the fantastic community we have built together in GPUGRID, we can use this technique in a real case to screen potential drugs binding to CXCL12, a chemokine related to cancer metastasis.

Thanks to everyone for your contribution and I will be happy to assist you if you find any problem on the way. :)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42858 - Posted: 28 Feb 2016 | 19:37:34 UTC

Hi Gerard,

These workuints, just as the "GERARD_A2AR_NUL1D" (long) workunits are running with very low GPU load (32~53%), especially when the CPU is crunching too. Can you add any comments on this?

[CSF] Thomas H.V. DUPONT
Send message
Joined: 20 Jul 14
Posts: 732
Credit: 100,630,366
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 42861 - Posted: 29 Feb 2016 | 7:26:36 UTC - in response to Message 42853.

Thanks for the heads-up Gerard! :)
____________
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres

Gerard
Send message
Joined: 26 Mar 14
Posts: 101
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 42862 - Posted: 29 Feb 2016 | 9:35:08 UTC - in response to Message 42858.
Last modified: 22 Mar 2016 | 11:22:58 UTC

Hi Retvari,

I've been reading a lot the same question in the forums so I think you deserve a concrete answer. The workunits that you refer to include "extra" forces that, because of the simulator implementation, cannot be calculated with the GPU and must be calculated via CPU. I guess that the communication between CPUs and GPUs for each of the simulation step is definetely a bottleneck causing a decrease in GPU performance.

I've made some figures for the curious ones, explaning these "extra forces".

*A2AR* batch




These are membrane systems. You can see the protein in yellow, embedded in a membrane (sticks) and solvated in a water box (blue cage). In these models, we place a drug (marked as "ligand") in the extracellular space, where it has to bind with its receptor (the protein).
Because of the way we simulate, these systems have a property we call "periodic boundary conditions", meaning that each of the molecules on the sides of the box, interact and can actually flip to the opposite side of the box. This allows the nice effect that the system is solvated in a "non-finite" box. Because of this effect, the ligand can jump from one side of the box to the other one. However, this doesn't make any biological sense, because drugs placed in the extracellular space can't freely access the intracellular space (they can't jump the membrane!). To ensure that the ligand stays in the extracellular phase, for each simulation frame we calculate the position of the ligand and if it is higher than the red line we apply a force down to make it stay. This arbitrary "force" is considered extra and must be calculated via CPU.

*umbrella* batch



In this case, we use a technique called "Umbrella Sampling" to calculate the binding free energy of the ligand (in yellow) to the protein (in blue). This technique consists in assuming that the unbinding (or the binding) occurs in a linear way (green line), and what we do is to simulate the ligand in different positions along this pathway (ideally every 0.5 angstroms). After the simulation, we calculate how stable the ligand was in each of the positions and using some mathematical framework called WHAM we can calculate the probabilities that the ligand goes from unbound to bound. Now comes the "extra force": in order to force the ligand to sample the desired position along the pathway, we apply a force (in red) to make it stay there. This force increases as the ligand goes further away from the initial position, giving this potential profile of an inverse umbrella (in black). This is why is called "Umbrella sampling". Again, this "extra force" must be calculated via CPU...

I hope you found it useful or interesting! If you have any questions please do not hesitate. :)

EDIT
I have revised the A2AR_1D batch because it was indeed looking much slower than the other A2AR. I've made a change that should speed up new 1D WU you may get. I'll be sending *1Dx* batch soon, tell me if is faster for you guys.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42863 - Posted: 29 Feb 2016 | 11:12:28 UTC - in response to Message 42862.

Thanks for the explanation and time Gerard it is what makes me want to crunch GPUGrid unlike other projects who explain nothing.

kain
Send message
Joined: 3 Sep 14
Posts: 152
Credit: 641,182,245
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 42864 - Posted: 29 Feb 2016 | 11:16:09 UTC

Great news! Thank you!

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42865 - Posted: 29 Feb 2016 | 14:09:11 UTC

Very good explanation Gerard, now we know what we crunch.
Thank you.
____________
Greetings from TJ

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42866 - Posted: 29 Feb 2016 | 23:51:30 UTC - in response to Message 42862.

Thank you, Gerard

[CSF] Thomas H.V. DUPONT
Send message
Joined: 20 Jul 14
Posts: 732
Credit: 100,630,366
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 42867 - Posted: 1 Mar 2016 | 6:12:39 UTC - in response to Message 42862.

Thanks for your time Gerard! Really appreciated! :)
____________
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42868 - Posted: 1 Mar 2016 | 8:47:33 UTC

My statistics are still rather thin, but with one GERARD_CXCL12_chalcone_umbrella completed on each of four Maxwell cards:

GTX 960: GPU Load 34% (supported by 1 core of i7-4790)
MCL: 16%
Power: 33.3% TDP = 40 watts
Time: 4 hours 11 minutes (ave between the two cards for 2 work units)

GTX 750 Ti: GPU Load 49% (supported by 1 core of i7-4770)
MCL: 10%
Power: 31.3 TDP = 19 watts
Time: 5 hours 6 minutes (ave between the two cards for 2 work units)

None of the cards were overclocked by me, and only minimally factory-overclocked. They were so lightly loaded that they usually ran at less than their maximum GPU clock settings.

So it seems that the GTX 750 Ti is more efficient, running at 80% of the speed of the GTX 960, but using only 50% of the power.

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 9,769,314,893
RAC: 39,662
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42872 - Posted: 1 Mar 2016 | 19:27:29 UTC

Thanks for the explanation Gerard!

This one is failing in all hosts at uploading after completing correctly with :

<error_code>-131 (file size too big)</error_code>

https://www.gpugrid.net/workunit.php?wuid=11490316

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,187,896,966
RAC: 10,567,140
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42873 - Posted: 2 Mar 2016 | 1:24:31 UTC - in response to Message 42862.



EDIT
I have revised the A2AR_1D batch because it was indeed looking much slower than the other A2AR. I've made a change that should speed up new 1D WU you may get. I'll be sending *1Dx* batch soon, tell me if is faster for you guys.



It looks like your fix is working:


name e1s18_1-GERARD_A2AR_NUL1Dx2-0-2-RND6828
application Long runs (8-12 hours on fastest card)
created 29 Feb 2016 | 11:06:27 UTC
canonical result 14973218
granted credit 227,850.00
minimum quorum 1
initial replication 1
max # of error/total/success tasks 7, 10, 6
Task
click for details Computer Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Credit Application
14973218 263612 29 Feb 2016 | 22:01:58 UTC 1 Mar 2016 | 7:13:49 UTC Completed and validated 26,058.87 25,948.73 227,850.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65)


https://www.gpugrid.net/workunit.php?wuid=11503823



name e1s2_1-GERARD_A2AR_luf6632_b_1Dx2-1-2-RND8928
application Long runs (8-12 hours on fastest card)
created 1 Mar 2016 | 8:34:30 UTC
canonical result 14975190
granted credit 227,850.00
minimum quorum 1
initial replication 1
max # of error/total/success tasks 7, 10, 6
Task
click for details Computer Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Credit Application
14975190 263612 1 Mar 2016 | 11:12:20 UTC 2 Mar 2016 | 0:13:09 UTC Completed and validated 26,523.80 26,430.66 227,850.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65)




https://www.gpugrid.net/workunit.php?wuid=11504861


Gerard
Send message
Joined: 26 Mar 14
Posts: 101
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 42875 - Posted: 2 Mar 2016 | 12:23:11 UTC - in response to Message 42872.

We've changed the size limit for these WU. I hope this fixes this problem in the new WU. Sorry for the inconvinience!

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42886 - Posted: 4 Mar 2016 | 16:02:25 UTC - in response to Message 42875.

We've changed the size limit for these WU. I hope this fixes this problem in the new WU. Sorry for the inconvinience!


Which problem is this intended to fix?

This workunit:
https://www.gpugrid.net/workunit.php?wuid=11490978

... had 3 task failures for:

<message>
upload failure: <file_xfer_error>
<file_name>chalcone537x1x47-GERARD_CXCL12_chalcone_umbrella-0-1-RND4302_2_9</file_name>
<error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message>


https://www.gpugrid.net/result.php?resultid=14972428
https://www.gpugrid.net/result.php?resultid=14973031
https://www.gpugrid.net/result.php?resultid=14975154

Why are these failing?

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,187,896,966
RAC: 10,567,140
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42896 - Posted: 6 Mar 2016 | 9:11:19 UTC - in response to Message 42895.

Correction. it should be like this:

For these low GPU usage and high CPU usage WUs, if you use this:

<app>
<name>acemdshort</name>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>


in your app_config.xml file, you can increase GPU usage from 30%-40% range to 60% to 70% range, depending on your hardware and OS.

That's what is happening on my computers.


Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42897 - Posted: 6 Mar 2016 | 9:23:17 UTC - in response to Message 42896.
Last modified: 6 Mar 2016 | 9:25:24 UTC

Also note that this app_config.xml should be placed in the project's folder under the BOINC folder.
For example on Windows Vista, 7, 8, 8.1, 10:

c:\ProgramData\BOINC\projects\www.gpugrid.net\

on Windows XP in the following folder:
c:\Documents and Settings\All Users\Application Data\BOINC\projects\www.gpugrid.net\

These short workunits generate very large output files 60~140MB, sometimes it's larger than the long run's output, so no wonder if the server runs out of space, and some contributor's (like mine) ADSL connection gets congested due to continuous uploads. So if these workunits will be the "standard" then the 2 workunits/GPU limit should be raised to 3 per GPU.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,187,896,966
RAC: 10,567,140
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42898 - Posted: 6 Mar 2016 | 11:26:25 UTC - in response to Message 42897.

The website access is also slow at times.


I hope these WUs do not become the "standard". That would be a total waste of high end GPU video cards.




Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42899 - Posted: 6 Mar 2016 | 11:47:18 UTC - in response to Message 42897.
Last modified: 6 Mar 2016 | 12:01:03 UTC

2 umbrella models running on GTX970’s (1 task per GPU):
GPU power @ 83% & 84%
GPU0 @ 1316MHz, GPU1 @ 1050MHz (it downclocked itself)
Temps @ 59°C and 50°C (with power limited to 83% - now at 90%)
GPU usage was 43% and 33%
~600MB GDDR each.

A few suggestions:
If trying to run 2 Umbrella tasks at a time, free up more CPU headroom and stop running long tasks (might hog the GPU).
Using NVIDIA Inspector (NVI) could help to set/fix the clocks (GPU and Memory) and raising the Memory from 3005MHz to 3505MHz might help a bit too (partially reduce the bottlenecks; CPU usage/communication supposedly higher with these tasks).
- When these WU’s dry up you might need to revert/change settings again.

If you’re running a mix of long and short tasks and the GPU frequency drops when running a short task, try changing/fixing the GPU settings using NVI and Suspend any short tasks on slow GPU's to start a long WU. In theory, when going back to the short WU it might run at the new settings (say, 1316MHz rather than 1050MHz). If not try again with a restart (or preferably a cold restart) after suspending the short task. Existing tasks might want to keep their GPU settings but New short tasks should use your NVI defined settings.

Website slow for me too.
43% utilization is low, but running 2 tasks would be about the same as one normal task in terms of GPU usage. It's a challenge but it's doable and these WU's won't be around for ever. There are also 'normal' GPU utilizing long WU's.

Upload file sizes are set to prevent upload of massive erroneous data.

Good luck,
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 9,769,314,893
RAC: 39,662
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42900 - Posted: 6 Mar 2016 | 14:56:21 UTC

I'm getting now the "transient upload error" and "server is out of disk space" messages in three units trying unsuccessfully to upload.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,187,896,966
RAC: 10,567,140
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42901 - Posted: 6 Mar 2016 | 15:13:27 UTC - in response to Message 42900.

I'm getting now the "transient upload error" and "server is out of disk space" messages in three units trying unsuccessfully to upload.



Same here, I will soon finish crunching all my task GPUGRD tasks and won't be able to download anymore. Good thing I have a back up project.





Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42905 - Posted: 6 Mar 2016 | 16:22:15 UTC

same problem with my 4 hosts: no upload of finished WUs possible, no download of new WUs for crunching.

Server breakdown at GPUGRID ?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42906 - Posted: 6 Mar 2016 | 16:44:02 UTC - in response to Message 42905.

It means the server is out of disk space, again!
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42907 - Posted: 6 Mar 2016 | 17:00:40 UTC - in response to Message 42906.

It means the server is out of disk space, again!

any rough idea when this will be fixed ?

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42908 - Posted: 6 Mar 2016 | 17:03:42 UTC - in response to Message 42907.

How could anyone on here and not involved with the servers possibly answer that question?

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42909 - Posted: 6 Mar 2016 | 17:05:44 UTC - in response to Message 42898.

... I hope these WUs do not become the "standard". That would be a total waste of high end GPU video cards.

I bought 3 of those recently, just for sake of GPUGRID computing


Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42910 - Posted: 6 Mar 2016 | 17:09:02 UTC - in response to Message 42908.

How could anyone on here and not involved with the servers possibly answer that question?

is it impossible that anyone involved with the servers is also reading in the Forum?

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42911 - Posted: 6 Mar 2016 | 17:11:27 UTC - in response to Message 42910.

How could anyone on here and not involved with the servers possibly answer that question?

is it impossible that anyone involved with the servers is also reading in the Forum?


I'm sure that if they were aware of the problem and able to do something they would stick another floppy disk in.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42912 - Posted: 6 Mar 2016 | 17:35:46 UTC - in response to Message 42911.

... they would stick another floppy disk in.

yes, indeed :-)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42913 - Posted: 6 Mar 2016 | 18:59:57 UTC - in response to Message 42912.

... they would stick another floppy disk in.

yes, indeed :-)
Perhaps we should donate a 8TB HDD for the project. Who's in?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42914 - Posted: 6 Mar 2016 | 19:10:18 UTC

Yep, I have 3 GPUs, and all 6 of my tasks are completed with uploads stalled. I suppose this is a good test of GPU backup projects (they are working on Asteroids, Einstein, and SETI)... but I hope the GPUGrid admins get their upload/space issues resolved :)

Profile Dave GPU
Send message
Joined: 21 May 14
Posts: 12
Credit: 1,175,961,380
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 42915 - Posted: 6 Mar 2016 | 19:30:22 UTC

Think I will give this a rest till long wu are back.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 42916 - Posted: 6 Mar 2016 | 19:41:57 UTC - in response to Message 42913.

... they would stick another floppy disk in.

yes, indeed :-)
Perhaps we should donate a 8TB HDD for the project. Who's in?


Count me in. Also, why not a couple of them?
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42917 - Posted: 6 Mar 2016 | 20:10:27 UTC - in response to Message 42913.
Last modified: 6 Mar 2016 | 20:10:52 UTC

... they would stick another floppy disk in.

yes, indeed :-)
Perhaps we should donate a 8TB HDD for the project. Who's in?

I am in :-)

Nick Name
Send message
Joined: 3 Sep 13
Posts: 53
Credit: 1,533,531,731
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 42918 - Posted: 6 Mar 2016 | 20:18:24 UTC - in response to Message 42913.

... they would stick another floppy disk in.

yes, indeed :-)
Perhaps we should donate a 8TB HDD for the project. Who's in?

First, I'd like to know how much space job records like these are eating.

22 Jul 2014 invalid: https://www.gpugrid.net/workunit.php?wuid=9910195
5 Mar 2015 invalid: https://www.gpugrid.net/workunit.php?wuid=10721767


____________
Team USA forum | Team USA page
Join us and #crunchforcures. We are now also folding:join team ID 236370!

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42919 - Posted: 6 Mar 2016 | 20:38:41 UTC - in response to Message 42918.

... they would stick another floppy disk in.

yes, indeed :-)
Perhaps we should donate a 8TB HDD for the project. Who's in?

First, I'd like to know how much space job records like these are eating.

22 Jul 2014 invalid: https://www.gpugrid.net/workunit.php?wuid=9910195
5 Mar 2015 invalid: https://www.gpugrid.net/workunit.php?wuid=10721767
Those have no results, so I don't think they consume much.
However the recent 10.000 short runs which consume 60~120MB each, all together they could take 600~1200GB, or twice as much if they are two-step (if I'm decoding it right from the '0-1' suffix in their names).

sis651
Send message
Joined: 25 Nov 13
Posts: 66
Credit: 193,925,538
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42930 - Posted: 7 Mar 2016 | 8:17:31 UTC

I think these are the work units to run on mobile notebooks. As notebooks are thin devices most doesn't have enough air ventilation to cool them enough on loads such as GPUgrid works.

However, my notebook with GT740M runs at about %85 - 90 GPU usage with these units and at 65C degrees with 3 units of WCG works running on the CPU. This is really colder than other units which reach 80 or more on the same situation. :)

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42934 - Posted: 7 Mar 2016 | 12:43:13 UTC - in response to Message 42914.

Yep, I have 3 GPUs, and all 6 of my tasks are completed with uploads stalled. I suppose this is a good test of GPU backup projects (they are working on Asteroids, Einstein, and SETI)... but I hope the GPUGrid admins get their upload/space issues resolved :)

Also, POEM@home and Milkyway@home

Emailed Gianni regarding the disks full/upload problem.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Matt
Avatar
Send message
Joined: 11 Jan 13
Posts: 216
Credit: 846,538,252
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42937 - Posted: 7 Mar 2016 | 13:00:57 UTC

The team is aware of the problem. They're working on it.

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42938 - Posted: 7 Mar 2016 | 13:06:19 UTC - in response to Message 42900.

I have been getting the server is out of disk space for the past 24-hours.
____________

Gerard
Send message
Joined: 26 Mar 14
Posts: 101
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 42957 - Posted: 8 Mar 2016 | 16:15:19 UTC - in response to Message 42938.

Should be fixed by now. Please let us know if you are still having this error!

fractal
Send message
Joined: 16 Aug 08
Posts: 87
Credit: 1,248,879,715
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42959 - Posted: 8 Mar 2016 | 17:08:46 UTC - in response to Message 42934.

Yep, I have 3 GPUs, and all 6 of my tasks are completed with uploads stalled. I suppose this is a good test of GPU backup projects (they are working on Asteroids, Einstein, and SETI)... but I hope the GPUGrid admins get their upload/space issues resolved :)

Also, POEM@home and Milkyway@home

Emailed Gianni regarding the disks full/upload problem.

Don't forget Moo. I like it because the work units are short ( 20 minutes ) so my machine can get back to GPUgrid quicker. Though, that doesn't help when we go from 2-3 units/day to 2-3 days between units like I have seen lately.

Oh, and no disc full messages for me. Just HTTP errors. Still seeing those even now.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 42970 - Posted: 9 Mar 2016 | 21:21:19 UTC

no more "Umbrella" WUs coming ?

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,187,896,966
RAC: 10,567,140
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42971 - Posted: 9 Mar 2016 | 23:19:21 UTC - in response to Message 42970.

no more "Umbrella" WUs coming ?



I hope not. They were a pain!





John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 42972 - Posted: 10 Mar 2016 | 5:25:31 UTC - in response to Message 42971.

Oh, I dunno: I processed 28 of these WUs successfully!


no more "Umbrella" WUs coming ?



I hope not. They were a pain!






Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42974 - Posted: 10 Mar 2016 | 9:23:54 UTC - in response to Message 42972.
Last modified: 10 Mar 2016 | 9:41:17 UTC

Got through 17 brollies without error but while the fastest took 8,187sec (2h 16min) the slowest took 13,463 sec (3h 44min) - the GPU downclocked - it 'thought' it wasn't using enough resources to justify remaining at a high frequency (boosters off).

As I like a bit of a challenge I don't mind troubleshooting and tuning specifically for these tasks, forcing the clocks to remain high and might have run 2 tasks at a time if work continued to flow, but even doing this didn't yield great performances.

Some of the performances look horrific, especially for those hampered by the WDDM overhead. I've seen a GTX970 on Linux finish in half the time of my 970 (the card that didn't downclock) and it looks like lots of devices down-clocked to barely functional levels - 28Ksec vs 4Ksec.

The big issue was the output file size. That's what caused the server has no disc space errors and stopped people uploading results, until the batch was withdrawn and disk space freed up.

Releasing as Beta might have been a better option.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jozef J
Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,118,845,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 42975 - Posted: 10 Mar 2016 | 12:05:54 UTC

All (71) · In progress (2) · Pending (0) · Valid (21) · Invalid (1) · Error (47) ....47vs71.. im crunch now only on old gtx 680 and laptop gtx 960 4gb.
im dont check long time my stats and message board here, just im now say wow :-)) hope they fix it soon for new gf..-)

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 9,769,314,893
RAC: 39,662
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42982 - Posted: 11 Mar 2016 | 19:40:00 UTC

141 wus crunched, only few errors at the beginning due to incorrect upload figure in the WU, then smooth.

Sure, I have a good internet connection.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,187,896,966
RAC: 10,567,140
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43006 - Posted: 15 Mar 2016 | 0:00:01 UTC
Last modified: 15 Mar 2016 | 0:00:20 UTC

I like these new GERARD_CXCL12_BestUmbrella WUs. They have good GPU usage and the output files were not too big. I successfully completed 3 so far.

Good work!

Can I have some more of them?

Gerard
Send message
Joined: 26 Mar 14
Posts: 101
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 43012 - Posted: 15 Mar 2016 | 10:31:38 UTC - in response to Message 43006.

I am sorry for the Umbrella runs, they were somewhat experimental. However, the results look promising so far.

I can try to tune some parameters for future releases like the file size (we may be able to reduce it without big impact on the analysis results) but unfortunately they will still be quite CPU consuming because of the aforementioned reasons. Could you post the main reasons why they were a "pain"? :) We can try to find a common solution.

I am now sending plenty of classic long WU that should push your GPUs to the limit! :) However, I plan to launch more Umbrella short WU in the future, which to me seem ideal for old GPUs.

I'll keep you posted!

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43015 - Posted: 15 Mar 2016 | 12:20:03 UTC - in response to Message 43012.
Last modified: 15 Mar 2016 | 12:20:22 UTC

Could you post the main reasons why they were a "pain"? :) We can try to find a common solution.
1. High CPU usage -> low GPU usage -> need of 2 simultaneous short task per GPU -> need of 3 short task per GPU in the queue
(presently the limit is 2 per GPU)
2. large output file combined with short runtimes -> upload congestion at the user & your server runs out of space

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43018 - Posted: 15 Mar 2016 | 20:13:27 UTC - in response to Message 43015.

Also, the lack of testing, communication, advice on settings/setup, and GPU clocks dropping off to non-boost rates.
Larger output files catch people out with contention, bandwidth limiting, peak hours throttling and possibly disk space and RAM for some.
Not enough tasks to go around means we have to add other projects or run dry.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,187,896,966
RAC: 10,567,140
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43020 - Posted: 15 Mar 2016 | 23:47:33 UTC - in response to Message 43012.

I am sorry for the Umbrella runs, they were somewhat experimental. However, the results look promising so far.

I can try to tune some parameters for future releases like the file size (we may be able to reduce it without big impact on the analysis results) but unfortunately they will still be quite CPU consuming because of the aforementioned reasons. Could you post the main reasons why they were a "pain"? :) We can try to find a common solution.

I am now sending plenty of classic long WU that should push your GPUs to the limit! :) However, I plan to launch more Umbrella short WU in the future, which to me seem ideal for old GPUs.

I'll keep you posted!


I agree pretty much with what Retvari Zoltan and skgiven mentioned in their posts.

Though, I don't think having 3 short tasks per GPU would have made much of a difference, (having 2 or 3 finished WUs per GPU not being able to upload, and not getting any new WUs, I still would have spent most of that Sunday crunching my back up project).

What would have made a difference is not releasing "somewhat experimental" WUs on the weekends, when staff level is limited. Release them, during the week, when most everybody is at work, to deal with potential problems.




Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 43021 - Posted: 16 Mar 2016 | 9:26:44 UTC - in response to Message 43015.

[quote]need of 3 short task per GPU in the Queue (presently the limit is 2 per GPU)


once these WUs are distributed again, it really would make a lot of sense to raise the limit from 2 to 3 per GPU !

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43022 - Posted: 16 Mar 2016 | 9:46:58 UTC - in response to Message 43021.
Last modified: 16 Mar 2016 | 10:09:55 UTC

[quote]need of 3 short task per GPU in the Queue (presently the limit is 2 per GPU)


once these WUs are distributed again, it really would make a lot of sense to raise the limit from 2 to 3 per GPU !



I disagree it would mean even more units held up for 5 days by the hosts that never return a completed WU and those that error after a long period of time.

We need different queues for the really fast cards such as 980ti, 980, Titan, 970, 780ti and a 2 day deadline. Mid cards could have a 3 day deadline. Slow cards could remain on 5 days, with an adjusted percentage of WU's allocated to each queue.

We could also accelerate the drop of WU's available to hosts that don't return or consistently error.

However as Gerard has said in another post he is already overun with work and probably does not have the time to do any of these things.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43024 - Posted: 16 Mar 2016 | 13:38:58 UTC
Last modified: 16 Mar 2016 | 13:39:26 UTC

To offer a minority view, I was quite happy to get them. I don't normally run the shorts, but the science looks very interesting, and my GTX 750 Tis are not that good on the longs anymore. If they don't use much GPU power, but more CPU power, that is OK if that is what the calculations call for. You can't change the science or math just to heat up the cards more. Also, I have large enough upload bandwidth (4 Mbps) that I did not notice any problems there, or with memory, etc. But a warning as to all of this would undoubtedly be a good idea, since it may push many machines over the edge in one way or another..

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 43025 - Posted: 16 Mar 2016 | 14:32:59 UTC - in response to Message 43022.

However as Gerard has said in another post he is already overun with work and probably does not have the time to do any of these things.

that's why we all should hope that the new students which where expected for January will finally come on bord.

The amount of work which Gerard is doing, all by himself, is terrific! I guess, at some point he deserves rest and recreation :-)

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 43072 - Posted: 24 Mar 2016 | 8:41:06 UTC - in response to Message 43015.

High CPU usage -> low GPU usage -> need of 2 simultaneous short task per GPU -> need of 3 short task per GPU in the queue
(presently the limit is 2 per GPU)


the current short runs "Enamine_Umbrella" use some 50-60% of a high-end GPU.
As already said above by one of our power crunchers: the Limit of 2 such WUs per GPU should be increased to 3, if not to 4.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1090
Credit: 6,603,906,926
RAC: 21,893,126
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 43143 - Posted: 4 Apr 2016 | 4:50:55 UTC - in response to Message 43072.

the current short runs "Enamine_Umbrella" use some 50-60% of a high-end GPU.
As already said above by one of our power crunchers: the limit of 2 such WUs per GPU should be increased to 3, if not to 4.

any news on this?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43148 - Posted: 4 Apr 2016 | 23:07:04 UTC - in response to Message 43143.

the current short runs "Enamine_Umbrella" use some 50-60% of a high-end GPU.
As already said above by one of our power crunchers: the limit of 2 such WUs per GPU should be increased to 3, if not to 4.

any news on this?

The recent workunits are not *that* problematic, so this is not that important right now.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43156 - Posted: 6 Apr 2016 | 13:51:47 UTC - in response to Message 43148.

There are no short runs at present, when there are short runs there's not always many WU's and the batches don't last as long - Server Status
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : News : *CXCL12_chalcone_umbrella* batch

//