Long WUs are out - 50% bonus

Message boards : News : Long WUs are out - 50% bonus
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
ignasi

Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 19912 - Posted: 14 Dec 2010, 19:05:15 UTC

Double sized WUs are out: *variant*_long* and include a bonus of the 50% on the credits.
ID: 19912 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Saenger
Avatar

Send message
Joined: 20 Jul 08
Posts: 134
Credit: 23,657,183
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 19913 - Posted: 14 Dec 2010, 19:48:47 UTC - in response to Message 19912.  

Double sized WUs are out: *variant*_long* and include a bonus of the 50% on the credits.

Is there a possibility to opt out of those monsters, as they would probably crash the 2-day deadline an normal computers?
Or do we have to abort them manually? And how do we recognize them?
Gruesse vom Saenger

For questions about Boinc look in the BOINC-Wiki
ID: 19913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 193,866
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19916 - Posted: 15 Dec 2010, 6:20:05 UTC - in response to Message 19912.  
Last modified: 15 Dec 2010, 6:22:25 UTC

First one of these is completed in 7h32m (27116.891s).
Time per step (avg over 2500000 steps): 10.847ms
Claimed Credit: 23878 (that's increased 50% compared to a *_IBUCH_?_pYEEI_long_* WU)
Granted Credit: 35817 (standard 50% fast return bonus)
It gives 1.32 credit/sec. that's not bad at all :) My fastest GIANNI_DHFR1000 gives 1.5 credit/sec.
GPU usage is 62-64% on GTX 580, and 62-67% on GTX 480.
ID: 19916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19917 - Posted: 15 Dec 2010, 9:08:18 UTC
Last modified: 15 Dec 2010, 9:10:09 UTC

10-IBUCH_8_variantP_long-0-2-RND0787_0
Workunit 2166428
Aangemaakt 14 Dec 2010 19:08:59 UTC
Sent 14 Dec 2010 22:55:38 UTC
Received 15 Dec 2010 9:05:29 UTC
Server state Over
Outcome Success
Client state Geen
Exit status 0 (0x0)
Computer ID 35174
Report deadline 19 Dec 2010 22:55:38 UTC
Run time 35497.448383
CPU time 35436.44
stderr out <core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 480"
# Clock rate: 1.40 GHz
# Total amount of global memory: 1610153984 bytes
# Number of multiprocessors: 15
# Number of cores: 120
SWAN: Using synchronization method 0
MDIO ERROR: cannot open file "restart.coor"
# Time per step (avg over 2500000 steps): 14.200 ms
# Approximate elapsed time for entire WU: 35500.370 s
called boinc_finish

</stderr_txt>
]]>


Validate state Geldig
Claimed credit 23878.0787037037
Granted credit 35817.1180555555
application version ACEMD2: GPU molecular dynamics v6.13 (cuda31)

--------------------------------------------------------------------------------
14 Dec 2010 22:55:38 UTC 15 Dec 2010 9:05:29 UTC Completed and validated 35,497.45 35,436.44 23,878.08 35,817.12 ACEMD2: GPU molecular dynamics v6.13 (cuda31

--------------------------------------------------------------------------------
My first long WU!

Give me more please!
Ton (ftpd) Netherlands
ID: 19917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ignasi

Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 19919 - Posted: 15 Dec 2010, 11:12:20 UTC - in response to Message 19917.  

Well,

It seems that long WUs aren't afterall that much gain for everybody. We have dropped by 1000 WUs in progress in one single day. We may be trying to push the computations too far here. The last thing we want is to scare people out and run away from GPUGRID.

We are going to reconsider the strategy for "fast-track" WUs.

What do you think?

cheers,
ignasi
ID: 19919 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19920 - Posted: 15 Dec 2010, 11:17:32 UTC - in response to Message 19919.  
Last modified: 15 Dec 2010, 11:18:36 UTC

Ignasi,

If i had to choose: very long WU's or small quick WU's, but then a greater amount of WU's.

I am NOT scared, but give some cards (gts250) also something to crunch.
Ton (ftpd) Netherlands
ID: 19920 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,053,468,649
RAC: 1,308,024
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19921 - Posted: 15 Dec 2010, 11:48:47 UTC - in response to Message 19919.  

It seems that long WUs aren't afterall that much gain for everybody. We have dropped by 1000 WUs in progress in one single day. We may be trying to push the computations too far here. The last thing we want is to scare people out and run away from GPUGRID.

You may have to consider external factors, as well as your own internal choices.

Do you have a medium/long term record of that "WUs in progress" figure? I suspect that you may have been affected by that other big NVidia beast in the BOINC jungle - SETI@home.

They have been effectively out of action since the end of October. You may well have been benefitting from extra resources during that time. SETI has been (slowly and intermittently) getting back up to speed over the last week or so, and as they do so, you will inevitably lose some volunteers (or some share of their machines).

Having said that, I've got IBUCH_*_variantP_long running on all four hosts at the moment - I'll comment on your other questions when they've finished, in 5 - 38 hours from now.
ID: 19921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ignasi

Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 19922 - Posted: 15 Dec 2010, 11:53:41 UTC - in response to Message 19920.  

Sure.

I am submitting non-long WUs at the moment.

There's plenty of work for everybody always.

i
ID: 19922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19923 - Posted: 15 Dec 2010, 12:12:53 UTC - in response to Message 19922.  

We have dropped by 1000 WUs in progress in one single day.

In progress includes running tasks and queued tasks.
As running tasks will take longer there are less queued tasks, that's why there are less in progress. You will need to wait for a few days to equilibrate.

Don't hit the panic button.
ID: 19923 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 193,866
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19926 - Posted: 15 Dec 2010, 13:47:39 UTC - in response to Message 19919.  
Last modified: 15 Dec 2010, 14:16:20 UTC

Well,

It seems that long WUs aren't afterall that much gain for everybody. We have dropped by 1000 WUs in progress in one single day. We may be trying to push the computations too far here. The last thing we want is to scare people out and run away from GPUGRID.

We are going to reconsider the strategy for "fast-track" WUs.

What do you think?

cheers,
ignasi

Ignasi, you don't read the other topics in your forum, do you? :)
I think you should separate the _long_ workunits from the normal WUs, or even create _short_ WUs for crunchers with older cards. If you couldn't develop an automated separation process, you should make it possible for the users to do it on their own (but not by aborting long WUs one by one manually). There are some computers equipped with multiple, but very different cards, so this is a very complicated problem. The best solution would be limiting the running time of a WU, instead of using a fixed simulation timeframe. (As far as I know, this is almost impossible for GPUGRID to implement.) My other project (rosetta@home) gives the user the opportunity of setting a desired WU running time, you too should give this opportunity in some way or another. My computers are on 24/7 so I could (and I would) do even a 100ns simulation, if it were up to me, and if the credit/time ratio would be the same (or higher). Another way to get around this problem: you should create a new project under BOINC for these _long_ WUs - let's say it's called FermiGRID - and encourage users with faster cards to join FermiGRID, and set GPUGRID as a backup project. You should contact NVidia about the naming of the new project before it starts, maybe they consider it as advertisement (and give you something in exchange), or maybe they consider it as a copyright infringement and send their lawyers to sue you for it. :)
ID: 19926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,053,468,649
RAC: 1,308,024
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19927 - Posted: 15 Dec 2010, 14:15:39 UTC - in response to Message 19921.  

...
Having said that, I've got IBUCH_*_variantP_long running on all four hosts at the moment - I'll comment on your other questions when they've finished, in 5 - 38 hours from now.

Well, one has now failed - task 3445790.

Exit status -40 (0xffffffffffffffd8) 
SWAN: FATAL : swanBindToTexture1D failed -- texture not found

I hope that isn't a low memory outcome on a 512MB card - if it is, the others are going to go the same way.

And I hope the new shorter batch aren't all going to be *_HIVPR_n1_[un]bound_*
- they are just as difficult on my cards.
ID: 19927 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ignasi

Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 19928 - Posted: 15 Dec 2010, 14:16:00 UTC - in response to Message 19926.  

@skgiven
Correct.
It's a matter of few days as you say.

@Retvari Zoltan
I do my best, thank you.

The *long* WUs are not meant to be something regular at all. We always stated that.
The problem is that I have abused of them in the rush for getting key results back for publication. The solution for this issue is not just extending WU length. It has to be well thought out and degrees of freedom to adjust, pinpointed. Classification of WU by card comp. capacity is certainly an option.

Back to science,
i
ID: 19928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 193,866
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19930 - Posted: 15 Dec 2010, 15:33:13 UTC - in response to Message 19928.  
Last modified: 15 Dec 2010, 15:37:28 UTC

I do my best, thank you.

I'm sorry, I didn't mean to offend you.

The *long* WUs are not meant to be something regular at all. We always stated that.
The problem is that I have abused of them in the rush for getting key results back for publication.

We, crunchers, on the other end of the project, see this problem from a very different view. When a WU fails, the cruncher will be disappointed, and if many WUs keep on failing, the cruncher will leave this project for a more successful one.
We don't see the progress of the subprojects you are working on, and cannot choose the appropriate subproject for our GPUs.

The solution for this issue is not just extending WU length. It has to be well thought out and degrees of freedom to adjust, pinpointed. Classification of WU by card comp. capacity is certainly an option.

I'm (and I suppose every cruncher in this forum are) just guessing what could be the best solution for the project, because I don't have the information needed to pick (to invent) the right one. But you (I mean GPUGRID) have that information (the precise number of the crunchers, and their WU returning time). You just have to process that information wisely. You should create a little application for simulating GPUGRID (if you don't have one already). This application have to have some variables in it, for example WU length, subprojects, and processing realiability. If you play with those variables a little from time to time, you can choose the best thing to change in the whole project. You can even implement a simulation of a very different GPUGRID, to see if the new one worth the hassle. Just like when SETI were transformed to BOINC.
ID: 19930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19933 - Posted: 15 Dec 2010, 17:04:55 UTC - in response to Message 19922.  

Ignasi,

My gts250 does NOT receive any downloads for 24 hrs now.

All Kashiv-wu are cancelling after several hours on this card.

Is that the reason?


Ton (ftpd) Netherlands
ID: 19933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile dataman
Avatar

Send message
Joined: 18 Sep 08
Posts: 36
Credit: 100,352,867
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 19935 - Posted: 15 Dec 2010, 17:28:20 UTC

I just cruised by to say I'm lovin' the new wu's. ~21 hours on GTX260 but only ~55% utilization of the card. ??? Nice credits too :)

Good job.

ID: 19935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Saenger
Avatar

Send message
Joined: 20 Jul 08
Posts: 134
Credit: 23,657,183
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 19936 - Posted: 15 Dec 2010, 18:39:28 UTC - in response to Message 19928.  

The problem is that I have abused of them in the rush for getting key results back for publication. The solution for this issue is not just extending WU length. It has to be well thought out and degrees of freedom to adjust, pinpointed. Classification of WU by card comp. capacity is certainly an option.

That's what I asked for in this thread.

At the moment, and especially with sending such monsters to everyone participating, even normal crunchers, running their cards only 8h per day, and not the latest, most expensive cards, you are alienating those crunchers. I had to abort one of these monsters after 15h, because it would never have made the 48h deadline, and better to waste just 15h than to waste 48h.

As you seem to know quite good beforehand how demanding your WUs will be, the fixed credits give a good clue to that, you could do such adjustment could be only to allow certain types of WUs for the crunchers, i.e. no bigger than 5000 credits claim for mine usually, if nothing else is available give up to 8000 claim.

I think you could do that even on the scheduler, although I'm no programmer, but it shouldn't be so hard to put the GPUs on our puters in classes and send them according to their capabilities. Those capabilities are known to BOINC.
Gruesse vom Saenger

For questions about Boinc look in the BOINC-Wiki
ID: 19936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19937 - Posted: 15 Dec 2010, 18:57:35 UTC

Computer ID 47762
Report deadline 19 Dec 2010 23:11:00 UTC
Run time 63982.28125
CPU time 17871.09
stderr out <core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using device 0
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939327488 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 30
# Number of cores: 240
MDIO ERROR: cannot open file "restart.coor"
ERROR: file tclutil.cpp line 31: get_Dvec() element 0 (b)
called boinc_finish

</stderr_txt>
]]>


Validate state

This one cancelled after almost 18 hrs. Windows XP - gtx 295.

Next one, please
Ton (ftpd) Netherlands
ID: 19937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ftpd

Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19939 - Posted: 15 Dec 2010, 20:58:58 UTC

Computer ID 47762
Report deadline 19 Dec 2010 23:11:00 UTC
Run time 71377.171875
CPU time 18345.22
stderr out <core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
# Using device 1
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939327488 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 30
# Number of cores: 240
MDIO ERROR: cannot open file "restart.coor"
# Using device 1
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939327488 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Time per step (avg over 65000 steps): 34.528 ms
# Approximate elapsed time for entire WU: 86320.913 s
called boinc_finish

</stderr_txt>
]]>


Validate state Geldig
Claimed credit 23878.0787037037
Granted credit 35817.1180555555
application version ACEMD2: GPU molecular dynamics v6.13 (cuda31)

--------------------------------------------------------------------------------


--------------------------------------------------------------------------------
This one is succesfull.

Next one is processing!
Ton (ftpd) Netherlands
ID: 19939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werkstatt

Send message
Joined: 23 May 09
Posts: 121
Credit: 397,300,664
RAC: 7,295
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19940 - Posted: 15 Dec 2010, 21:11:52 UTC

Ignasi,
is there a way to sort out the long wu's by an app_info.xml? If, for example, the 6.13 app is only for the shorter wu's and a (suggested) 6.14 app is for the longer wu's everyone can decide for himself what type he wants to crunch. And without app_info he gets whats available.
Alexander
ID: 19940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 19948 - Posted: 16 Dec 2010, 10:44:57 UTC - in response to Message 19940.  

Presently, tasks are allocated acording to what Cuda capability your card has, which in turn is determined by your driver:

Use a full driver after 197.45 and you will get the 6.13app to run tasks on.
Use 197.45 or earlier (down to 195) and you will use the 6.12app to run tasks.

So if we extended the existing system to another app, we would still be asking crunchers to uninstall and reinstall drivers to crunch small tasks.

Some of the recent suggestions are starting to go round in circles; many of the suggestions we have seen in the last day or so have been suggested before, some several times.

The best place for suggestions in the Wish List.
ID: 19948 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : News : Long WUs are out - 50% bonus

©2025 Universitat Pompeu Fabra