WARNING/CHALLENGE: VERY LONG WU (VERYLONG_CXCL12_confAna)

Message boards : News : WARNING/CHALLENGE: VERY LONG WU (VERYLONG_CXCL12_confAna)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

AuthorMessage
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39651 - Posted: 23 Jan 2015, 18:39:50 UTC - in response to Message 39636.  
Last modified: 23 Jan 2015, 18:47:47 UTC

OK, so here are the KISS ('Keep it simple, stupid') instructions.

1) Check if you are running a GPUGrid task with 'GERARD_VERYLONG_CXCL12_confAna' in the task name - or if you have one waiting to start. If you can't see one anywhere, relax and do nothing (except maybe open a beer).

2) If you have one of these tasks, read to the end of these instructions. If you feel confident about following them and carrying out the (very simple) edit required - carry on. If you don't feel confident, abort the task - running without editing it is a waste of time.

Edit: Hmm from this example above it looks like some WUs may have _1_9 instead of the _0_9 that all of mine had:

<file>
<name>2x10-GERARD_VERYLONG_CXCL12_confAna-0-1-RND3907_1_9</name>
<nbytes>0.000000</nbytes>
<max_nbytes>256000000.000000</max_nbytes>
<status>0</status>
<upload_url>http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler</upload_url>
</file>

3) OK, you've decided to edit. First, shut down BOINC, making sure that the science applications shut down as well.

4) Find the file 'client_state.xml' in your BOINC Data folder. Under Windows, this is likely - if you accepted the default installation setting - to be C:\Programdata\BOINC: under Linux, it might be /var/lib/boinc

5) Open client_state.xml for editing, using a plain-text editor. Under Windows, NotePad or one of its replacements like NotePad++ is recommended - linux users are on their own here, but probably know their own toolsets.

6) Search the file for that GERARD_VERYLONG_CXCL12_confAna text we started with. There will be many search hits: we are looking for a <file>...</file> section like this:

<file>
    <name>2x23-GERARD_VERYLONG_CXCL12_confAna-0-1-RND3250_0_9</name>
    <nbytes>0.000000</nbytes>
    <max_nbytes>128000000.000000</max_nbytes>
    <status>0</status>
    <upload_url>http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler</upload_url>
</file>

7) Make sure you have exactly the right section: the last number before </name> should be _9, and there should be an <upload_url> line.

8) Change the first three numbers after <max_nbytes> from 128 to 256. Just those three numbers - don't accidentally delete any punctuation, change the number of zeroes, or make any other change.

9) Repeat steps (6), (7) and (8) for each separate VERYLONG task that you have on the system.

10) Save the file, restart BOINC, and relax. All done.

Thanks Richard. I found 3 of the verylong WUs and executed your fix. One thing that may make this easier: I simply searched for _0_9 and in each case the first instance found was the correct one. Just make sure it's a verylong WU and not a Noelia that you're editing. All 3 of these are on 750Ti cards and completion looks to be around 50 hours.

Edit: Hmm from the example above it looks like some of the WUs may have _1_9 instead of _0_9 that all of mine had:

<file>
<name>2x10-GERARD_VERYLONG_CXCL12_confAna-0-1-RND3907_1_9</name>
<nbytes>0.000000</nbytes>
<max_nbytes>256000000.000000</max_nbytes>
<status>0</status>
<upload_url>http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler</upload_url>
</file>
ID: 39651 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gerard

Send message
Joined: 26 Mar 14
Posts: 101
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 39652 - Posted: 23 Jan 2015, 18:40:34 UTC

The BOINC administrator just raised the upload limit to 512 Mb, please let us know if you can upload the WU now.
ID: 39652 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39653 - Posted: 23 Jan 2015, 18:49:14 UTC - in response to Message 39652.  

The BOINC administrator just raised the upload limit to 512 Mb, please let us know if you can upload the WU now.

It should be fine for newly created WUs, but I'm not sure whether the change will propagate to automatically-generated replacements for tasks which fail - we'll need to keep an eye on those.

It certainly won't be passed to tasks which are already 'out in the field' - on volunteers' computers. They will have to be modified manually, or allowed to fail.
ID: 39653 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39654 - Posted: 23 Jan 2015, 18:55:31 UTC

The upload of the first result is finished.
Here are the details: 3x17-GERARD_VERYLONG_CXCL12_confAna-0-1-RND9026_1
ID: 39654 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins] Phil1966

Send message
Joined: 16 Jul 13
Posts: 56
Credit: 1,626,354,890
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39655 - Posted: 23 Jan 2015, 18:55:31 UTC - in response to Message 39653.  

The BOINC administrator just raised the upload limit to 512 Mb, please let us know if you can upload the WU now.

It should be fine for newly created WUs, but I'm not sure whether the change will propagate to automatically-generated replacements for tasks which fail - we'll need to keep an eye on those.

It certainly won't be passed to tasks which are already 'out in the field' - on volunteers' computers. They will have to be modified manually, or allowed to fail.


Even if you "update" the project ?
ID: 39655 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39656 - Posted: 23 Jan 2015, 18:55:32 UTC - in response to Message 39650.  

My GTX 980 host is uploading the first result.
The final file size is 186.980.704 bytes.
The total processing time is 15h 39m 51s.

Credit 600,000.00

Congratulations on the home run - and thanks for the confirmation that the file edit is effective.
ID: 39656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39657 - Posted: 23 Jan 2015, 19:08:54 UTC - in response to Message 39655.  

The BOINC administrator just raised the upload limit to 512 Mb, please let us know if you can upload the WU now.

It should be fine for newly created WUs, but I'm not sure whether the change will propagate to automatically-generated replacements for tasks which fail - we'll need to keep an eye on those.

It certainly won't be passed to tasks which are already 'out in the field' - on volunteers' computers. They will have to be modified manually, or allowed to fail.


Even if you "update" the project ?

I've received a 1894-NOELIA_BI3_unbind-1-10-RND9593_0 just now, and the file info has the old size limit:
<file_info>
    <name>1894-NOELIA_BI3_unbind-1-10-RND9593_0_8</name>
    <nbytes>0.000000</nbytes>
    <max_nbytes>256000000.000000</max_nbytes>
    <generated_locally/>
    <status>0</status>
    <upload_when_present/>
    <url>http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler</url>
</file_info>
<file_info>
    <name>1894-NOELIA_BI3_unbind-1-10-RND9593_0_9</name>
    <nbytes>0.000000</nbytes>
    <max_nbytes>128000000.000000</max_nbytes>
    <generated_locally/>
    <status>0</status>
    <upload_when_present/>
    <url>http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler</url>
</file_info>
ID: 39657 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39658 - Posted: 23 Jan 2015, 19:12:45 UTC - in response to Message 39656.  

Congratulations on the home run - and thanks for the confirmation that the file edit is effective.

Thank you!
We had similar upload size problems before, and the solution was the same back then.
ID: 39658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39659 - Posted: 23 Jan 2015, 19:21:12 UTC - in response to Message 39658.  

Congratulations on the home run - and thanks for the confirmation that the file edit is effective.

Thank you!
We had similar upload size problems before, and the solution was the same back then.

And we recently had the same thing at CPDN, which was why I checked - it had been bumped back to the top of my list of "things project administrators forget to do" when they're excited by an interesting bit of research.

Which reminds me.....

@ Gerard,
If you find yourself having to re-generate all or part of this batch of 'verylong' tasks, could you please adjust <rsc_fpops_est> proportionately, so that our BOINC clients show a fair estimate of the task runtime from the beginning, and the task doesn't mess up DCF when it finishes?
ID: 39659 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 39660 - Posted: 23 Jan 2015, 19:27:24 UTC - in response to Message 39659.  
Last modified: 23 Jan 2015, 19:28:07 UTC

I've raised the limit in the DB for VERYLONG WUs. I'm not sure whether such changes propagate to clients at some time.
ID: 39660 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39661 - Posted: 23 Jan 2015, 19:52:05 UTC

My second very long workunit is uploading.
7x6-GERARD_VERYLONG_CXCL12_confAna-0-1-RND0829_0
ID: 39661 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
skydivingnerd

Send message
Joined: 26 Feb 13
Posts: 7
Credit: 2,242,660,281
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39662 - Posted: 23 Jan 2015, 23:18:20 UTC

I'm glad I checked this thread again. Two of my clients have one of the very long WUs. I implemented the fix as described by Richard. I'll keep an eye on it and check the status on completion.
http://www.gpugrid.net/result.php?resultid=13737049 (GTX 770)
http://www.gpugrid.net/result.php?resultid=13737185 (GTX 670)
ID: 39662 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39663 - Posted: 23 Jan 2015, 23:55:24 UTC

Thanks for the detailed guidance Richard and Retvari!

I just got home from work and checked this thread. My WU was 90%+ done so just in time.
ID: 39663 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39664 - Posted: 24 Jan 2015, 0:05:16 UTC - in response to Message 39662.  

4x6-GERARD_VERYLONG_CXCL12_confAna-0-1-RND1754_0_0 working now. It started out and continued to count down about 18.5 hours till completion, then at 48.8% finished and about 13 hours is jumped to 13.5 hours left. hah! Anyway, with just around 10 hours showing left I am changing the xml according to Richard's instructions. Only have one since I only have the one machine I run longs on. Good thing it has the 3 780's in it. I'll keep an eye out to see if I get any more in the near future also. Right now I have 3 queued and 3 working and only 1 of these.
1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!"
Ephesians 6:18-20, please ;-)
http://tbc-pa.org
ID: 39664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wiyosaya

Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39665 - Posted: 24 Jan 2015, 0:14:53 UTC - in response to Message 39600.  

We just launched 400 very long WU (they will take about 24h in a 780GTX) named VERYLONG_CXCL12_confAna whose results we need as soon as possible (we are in a hurry).

I've got two of them on my GTX 780Ti host.
According to my linear approximation made at 12.3% progress, the total computing time will be about 18 hours and 15 minutes.

They come with a credit+bonus of 400K.

That's nice.

Please, if you don't have a good graphic card, reject them.

That's an inappropriate way to arrange such batches.
You should set up a third queue for these purposes.
I couldn't receive one of these workunits on my GTX980 host. I'm sure I'm not alone with that.
EDIT: I had to abort 10 other workunits to receive one of these on my GTX980 host.
That's why your method is dangerous: it propagates failed workunits (by encouraging user intervention)

For the brave ones, take it as a challenge and see you on the performance tab ;)

Challenge accepted. :)

Absolutely agree that this is an inappropriate way to handle these large work units. Another "very long" queue like Retvari says or have the server automatically figure out what computers should get them based on the installed graphics cards.
ID: 39665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39666 - Posted: 24 Jan 2015, 0:17:14 UTC

Finally got one but on my GTX560ti which I aborted but nothing for my GTX970

Need to get a better system, no doubt about that.
ID: 39666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39667 - Posted: 24 Jan 2015, 1:26:21 UTC - in response to Message 39665.  
Last modified: 24 Jan 2015, 1:29:56 UTC

Absolutely agree that this is an inappropriate way to handle these large work units. Another "very long" queue like Retvari says or have the server automatically figure out what computers should get them based on the installed graphics cards.

As far as that goes, the Notice that went out and started this thread is clear that they had a limited number of work units that needed immediate release and ASAP completion. The fact that they are very long is secondary to the fact that they are needed ASAP. Having the priority on the ASAP means that adding a different queue for them involves either voluntary addition to that queue by the end users, maybe in response to a notice that goes out calling for them, or forcing everyone onto that queue which then ends in the exact thing you have right now, which is having them go out to the first come/first serve whether they can be completed or not by those machines.

I don't think either of these is an appropriate thing for an on the spot addition of a longer task that needs to be completed ASAP. So that leaves the other option, which is having the servers determine if the machine can run it in the time needed before assigning it to that machine. I suppose that could be done, but I think if it could be currently done immediately and it was not, it was just a bad judgment call. Based on that, I would assume that their side of the system does not currently have that ability past what the user tells them you can do, via the queues you choose for your machines, i.e. Short, Long, Test, CPU, etc. I think assigning these tasks to the machines that are set to receive "normal" long work units and then sending out an official BOINC Notice to flash on the client IS the right way to have done this this time. And then, based on finances and manpower, work on adding more functionality to their back-end to determine what machines can do what tasks to fine tune what is already in place in the voluntary queues.

It seems very clear that they were not expecting these VERYLONG work units too far in advance to actually have done this any better, making the way it was done the best way it could have been done. And now for the future, if it is to be done on occasion, not much manpower and time needs to go into it to "correct" the process, but if the VERYLONG work units are to become a regular thing for the grid, then time should be invested to add a queue or help their servers better determine the potential of machines to finish them in the times needed.

All in all, people who have no better solutions, but only want to share frustrations are better off for everyone involved to state that there is an issue, what they think the issue is, and then know that someone saw the statement and will work to fix it if it needs fixing. We don't need to get emotionally involved unless we are singled out as overtly ignored. And then, there is always more projects or more official channels than the fellow user base and volunteer workers on a forum board. Not flaming, just always want to see solution makers making solutions and agitators making quiet. Life works better that way around. :-)
ID: 39667 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jamaar@Siam

Send message
Joined: 7 May 13
Posts: 1
Credit: 157,304,655
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 39668 - Posted: 24 Jan 2015, 3:51:49 UTC

I see that after the first VERYLONG units errored out, it has been posted that it was necessary to edit the xml before finishing the task.

When I read that, it had taken my GTX 770 31 hours 24 minutes to complete the task.
Not amused to see the log:

24/01/2015 08:46:57 | GPUGRID | Output file 6x13-GERARD_VERYLONG_CXCL12_confAna-0-1-RND0906_0_9 for task 6x13-GERARD_VERYLONG_CXCL12_confAna-0-1-RND0906_0 exceeds size limit.
24/01/2015 08:46:57 | GPUGRID | File size: 186797160.000000 bytes. Limit: 128000000.000000 bytes


31 Hours of wasted time, which I could have used for 1 abandonned and 1 suspended NOELIA, just because I answered to the notice in my BOINC-manager which asked for help to finish those VERYLONG wu's ASAP. :-(
ID: 39668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Viktor Svantner

Send message
Joined: 13 Feb 11
Posts: 25
Credit: 7,516,466,698
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39670 - Posted: 24 Jan 2015, 5:35:30 UTC

Hello.

I got the same unhappy result.

http://www.gpugrid.net/result.php?resultid=13737081

It took my Titan black roughly 25 hours to complete the task.

Next time, please be more careful what you prepare for cranching. I am very keen to help, but this is a waste of time that could had been used for other tasks.
ID: 39670 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Viktor Svantner

Send message
Joined: 13 Feb 11
Posts: 25
Credit: 7,516,466,698
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39671 - Posted: 24 Jan 2015, 5:35:34 UTC

Hello.

I got the same unhappy result.

http://www.gpugrid.net/result.php?resultid=13737081

It took my Titan black roughly 25 hours to complete the task.

Next time, please be more careful what you prepare for cranching. I am very keen to help, but this is a waste of time that could had been used for other tasks.
ID: 39671 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

Message boards : News : WARNING/CHALLENGE: VERY LONG WU (VERYLONG_CXCL12_confAna)

©2025 Universitat Pompeu Fabra