Message boards :
News :
New workunits
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next
| Author | Message |
|---|---|
|
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Ok, on point 1, it was set for 360 already because that's a good time for LHC ATLAS to run complete. I moved it up to 480 now to try and deal with this stuff in GPUGRID. As your GPU is taking 728 minutes to complete the current batch of Tasks, this setting needs to be MORE that 728 to have a positive effect. Times for other projects don't suit GPUgrid requirements as tasks here can be longer. |
|
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Ok, on point 1, it was set for 360 already because that's a good time for LHC ATLAS to run complete. I moved it up to 480 now to try and deal with this stuff in GPUGRID. Oh? That's interesting. Changed to 750 minutes. |
|
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Just suffered DPC_WATCHDOG_VIOLATION on my system. Will be offline ba few days. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
These workunits has failed on all 8 hosts with this error condition.# Engine failed: Particle coordinate is nan initial_1923-ELISA_GSN4V1-12-100-RND5980 initial_1086-ELISA_GSN0V1-2-100-RND9613 Perhaps these workunits inherited a NaN (=Not a Number) from their previous stage. I don't think this could be solved by a reboot. I'm eagerly waiting to see how many batches will survive through all the 100 stages. |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 69 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I ran the following unit: 1_7-GERARD_pocket_discovery_d89241c4_7afa_4928_b469_bad3dc186521-0-2-RND1542_1, which ran well and would have finished as valid, if the following error did not occur: </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>1_7-GERARD_pocket_discovery_d89241c4_7afa_4928_b469_bad3dc186521-0-2-RND1542_1_9</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> </message> ]]> Scroll to the bottom on this page: http://www.gpugrid.net/result.php?resultid=21553962 It looks like you need to increase the size limits of the output files for it to upload. It should be done for all the subsequent WUs. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I must have squeaked in under the wire by just this much with this GERARD_pocket_discovery task. https://www.gpugrid.net/result.php?resultid=21551650 |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 69 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I must have squeaked in under the wire by just this much with this GERARD_pocket_discovery task. Apparently, these units vary in length. Here is another one with the same problem: http://www.gpugrid.net/workunit.php?wuid=16894092 |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've got one running from 1_5-GERARD_pocket_discovery_d89241c4_7afa_4928_b469_bad3dc186521-0-2-RND2573 - I'll try to catch some figures to see how bad the problem is. Edit - the _9 upload file (the one named in previous error messages) is set to allow <max_nbytes>256000000.000000</max_nbytes> or 256,000,000 bytes. You'd have thought that was enough. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
The 256 MB is the new limit - I raised it today. There are only a handful of WUs like that. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I put precautions in place, but you beat me to it - final file size was 155,265,144 bytes. Plenty of room. Uploading now. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
what I also noticed with the GERARD tasks (currently is running 0_2-GERARD_pocket_discovery ...): the GPU utilization oscillates between 76% and 95% (in contrast to the ELISA tasks, where it was permanently close to or even at 100%) |
God is Love, JC proves it. I t...Send message Joined: 24 Nov 11 Posts: 30 Credit: 201,648,059 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am getting upload errors too, on most but not all (4 of 6) WUs... but, only on my 950M, not on my 1660 Ti, ... or EVEN my GeForce 640 !! need to increase the size limits of the output files So, how is this done? Via the Options, Computing preferences, under Network, the default values are not shown (that I can see). I WOULD have assumed that boinc manager would have these as only limited by the system constraints unless tighter limits are desired. AND, only download rate, upload rate, and usage limits can be set. Again, how should output file size limits be increased. It would have been VERY polite of GpuGrid to post some notice about this with the new WU releases. I am very miffed, and justifiably so, at having wasted so much of my GPU time and energy, and effort on my part to hunt down the problem. Indeed, there was NO feedback from GpuGrid on this at all; I only noticed that my RAC kept falling even though I was running WUs pretty much nonstop. I realize that getting research done is the primary goal, but if GpuGrid is asking people to donate their PC time and GPU time, then please be more polite to your donors. LLP, PhD |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
You can't control the result output file. That is set by the science application under control of the project administrators. The quote you referenced was from Toni acknowledging that he needed to increase the size of the upload server input buffer to handle the larger result files that a few tasks were producing. Not the norm of the usual work we have processed so far. Should be rare cases the results files exceed 250MB. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Neither of those two. The maximum file size is specified in the job specification associated with the task in question. You can (as I did) increase the maximum size by careful editing of the file 'client_state.xml', but it needs a steady hand, some knowledge, and is not for the faint of heart. It shouldn't be needed now, after Toni's correction at source. |
God is Love, JC proves it. I t...Send message Joined: 24 Nov 11 Posts: 30 Credit: 201,648,059 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hm, Toni's message (53295) was posted on the 7th. Toni used the past tense on the 7th ("I raised"); yet, https://gpugrid.net/result.php?resultid=21553648 ended on the 8th and still had the same frustrating error. After running for hours, the results were nonetheless lost: upload failure: <file_xfer_error> <file_name>initial_1497-ELISA_GSN4V1-20-100-RND8978_0_0</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> Also, I must be just extremely unlucky. Toni says this came up on 'only a handful' of WUs, yet this happened to at least five of the WUs my GPUs ran. I am holding off on running any GpuGrid WUs for a while, until this problem is more fully corrected. Just for full disclosure... Industrial Engineers hate waste. LLP MS and PhD in Industrial & Systems Engineering. Registered Prof. Engr. (Industrial Engineering) |
God is Love, JC proves it. I t...Send message Joined: 24 Nov 11 Posts: 30 Credit: 201,648,059 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Besides the upload errors, a couple, resultid=21544426 and resultid=21532174, had said: "Detected memory leaks!" So I ran extensive memory diagnostics, but no errors were reported by windoze (extensive as in some eight hours of diagnostics). Boinc did not indicate if this was RAM or GPU 'memory leaks' In fact, now I am wondering whether these 'memory leaks' were on my end at all, or on the GpuGrid servers... LLP I think ∴ I THINK I am My thinking neither is the source of my being NOR proves it to you God Is Love, Jesus proves it! ∴ we are |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hm, That's a different error. Toni's post was about a file size error. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Besides the upload errors, Such messages are always present in Windows. They are not related to successful or not termination. If an error message is present, it's elsewhere in the output. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Also, slow and mobile cards should not be used for crunching for the reasons you mention. |
|
Send message Joined: 24 Jul 19 Posts: 1 Credit: 112,924,891 RAC: 0 Level ![]() Scientific publications
|
Hi, I have not received any new WU in like 30-40 days. Why? Are there no available WU:s for anyone or could it be bad settings on my side? My PC:s are starving... Br Thomas |
©2025 Universitat Pompeu Fabra