Message boards :
Number crunching :
WU invalid because of an upload issue at GPUGRIDs server end?
Message board moderation
| Author | Message |
|---|---|
Michael H.W. WeberSend message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Any idea why this task was marked as invalid after approx. 40,000 seconds of precious run time on my RTX 3080? So far, this machine has not produced a single error and the end of the log appears to note an upload issue? Task: https://www.gpugrid.net/workunit.php?wuid=27081741 The results are about 500 MB in size for each of these tasks - not at all a problem at my end but according to the snail-style data transfer speed apparently a MAJOR problem at Barcelona's end? Name e5s122_e2s172p0f91-ADRIA_AdB_KIXCMYB_HIP-1-2-RND2100_3
Arbeitspaket 27081741
Erstellt 12 Oct 2021 | 11:45:25 UTC
Gesendet 12 Oct 2021 | 11:45:51 UTC
Empfangen 13 Oct 2021 | 1:06:53 UTC
Serverstatus Abgeschlossen
Resultat Berechnungsfehler
Clientstatus Berechnungsfehler
Endstatus 0 (0x0)
Computer ID 584499
Ablaufdatum 17 Oct 2021 | 11:45:51 UTC
Laufzeit 40,157.41
CPU Zeit 39,016.81
Prüfungsstatus Ungültig
Punkte 0.00
Anwendungsversion New version of ACEMD v2.18 (cuda1121)
Stderr Ausgabe
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
15:52:17 (23288): wrapper (7.9.26016): starting
15:52:17 (23288): wrapper: running bin/acemd3.exe (--boinc --device 0)
03:01:09 (23288): bin/acemd3.exe exited; CPU time 39016.812500
03:01:20 (23288): called boinc_finish(0)
0 bytes in 0 Free Blocks.
186 bytes in 4 Normal Blocks.
1144 bytes in 1 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 0 bytes.
Total allocations: 824084403 bytes.
Dumping objects ->
{389617} normal block at 0x0000028AAECC3BC0, 85 bytes long.
Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65
..\api\boinc_api.cpp(309) : {389614} normal block at 0x0000028AAECC4620, 8 bytes long.
Data: <  ®Š > 00 00 A0 AE 8A 02 00 00
{388969} normal block at 0x0000028AAECC3C60, 85 bytes long.
Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65
{388355} normal block at 0x0000028AAECC48F0, 8 bytes long.
Data: < Î®Š > 10 9D CE AE 8A 02 00 00
..\zip\boinc_zip.cpp(122) : {146} normal block at 0x0000028AAECC3090, 260 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{133} normal block at 0x0000028AAECC4670, 16 bytes long.
Data: <PâË®Š > 50 E2 CB AE 8A 02 00 00 00 00 00 00 00 00 00 00
{132} normal block at 0x0000028AAECBE250, 40 bytes long.
Data: <pFÌ®Š conda-pa> 70 46 CC AE 8A 02 00 00 63 6F 6E 64 61 2D 70 61
{125} normal block at 0x0000028AAECBE480, 48 bytes long.
Data: <--boinc --device> 2D 2D 62 6F 69 6E 63 20 2D 2D 64 65 76 69 63 65
{124} normal block at 0x0000028AAECC4030, 16 bytes long.
Data: <XNÌ®Š > 58 4E CC AE 8A 02 00 00 00 00 00 00 00 00 00 00
{123} normal block at 0x0000028AAECC48A0, 16 bytes long.
Data: <0NÌ®Š > 30 4E CC AE 8A 02 00 00 00 00 00 00 00 00 00 00
{122} normal block at 0x0000028AAECC4CB0, 16 bytes long.
Data: < NÌ®Š > 08 4E CC AE 8A 02 00 00 00 00 00 00 00 00 00 00
{121} normal block at 0x0000028AAECC4440, 16 bytes long.
Data: <àMÌ®Š > E0 4D CC AE 8A 02 00 00 00 00 00 00 00 00 00 00
{120} normal block at 0x0000028AAECC43A0, 16 bytes long.
Data: <¸MÌ®Š > B8 4D CC AE 8A 02 00 00 00 00 00 00 00 00 00 00
{119} normal block at 0x0000028AAECC4530, 16 bytes long.
Data: < MÌ®Š > 90 4D CC AE 8A 02 00 00 00 00 00 00 00 00 00 00
{118} normal block at 0x0000028AAECC3D60, 16 bytes long.
Data: <pMÌ®Š > 70 4D CC AE 8A 02 00 00 00 00 00 00 00 00 00 00
{117} normal block at 0x0000028AAECC4B20, 16 bytes long.
Data: <HMÌ®Š > 48 4D CC AE 8A 02 00 00 00 00 00 00 00 00 00 00
{116} normal block at 0x0000028AAECC4760, 16 bytes long.
Data: < MÌ®Š > 20 4D CC AE 8A 02 00 00 00 00 00 00 00 00 00 00
{115} normal block at 0x0000028AAECC4D20, 496 bytes long.
Data: <`GÌ®Š bin/acem> 60 47 CC AE 8A 02 00 00 62 69 6E 2F 61 63 65 6D
{65} normal block at 0x0000028AAECB3280, 16 bytes long.
Data: < 굤ö > 80 EA B5 A4 F6 7F 00 00 00 00 00 00 00 00 00 00
{64} normal block at 0x0000028AAECB3230, 16 bytes long.
Data: <@鵤ö > 40 E9 B5 A4 F6 7F 00 00 00 00 00 00 00 00 00 00
{63} normal block at 0x0000028AAECB2FB0, 16 bytes long.
Data: <øW²¤ö > F8 57 B2 A4 F6 7F 00 00 00 00 00 00 00 00 00 00
{62} normal block at 0x0000028AAECB3190, 16 bytes long.
Data: <ØW²¤ö > D8 57 B2 A4 F6 7F 00 00 00 00 00 00 00 00 00 00
{61} normal block at 0x0000028AAECB2BF0, 16 bytes long.
Data: <P ²¤ö > 50 04 B2 A4 F6 7F 00 00 00 00 00 00 00 00 00 00
{60} normal block at 0x0000028AAECB2BA0, 16 bytes long.
Data: <0 ²¤ö > 30 04 B2 A4 F6 7F 00 00 00 00 00 00 00 00 00 00
{59} normal block at 0x0000028AAECB3780, 16 bytes long.
Data: <à ²¤ö > E0 02 B2 A4 F6 7F 00 00 00 00 00 00 00 00 00 00
{58} normal block at 0x0000028AAECB2B00, 16 bytes long.
Data: < ²¤ö > 10 04 B2 A4 F6 7F 00 00 00 00 00 00 00 00 00 00
{57} normal block at 0x0000028AAECB2EC0, 16 bytes long.
Data: <p ²¤ö > 70 04 B2 A4 F6 7F 00 00 00 00 00 00 00 00 00 00
{56} normal block at 0x0000028AAECB3910, 16 bytes long.
Data: < À°¤ö > 18 C0 B0 A4 F6 7F 00 00 00 00 00 00 00 00 00 00
Object dump complete.
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>e5s122_e2s172p0f91-ADRIA_AdB_KIXCMYB_HIP-1-2-RND2100_3_0</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>
]]>
Technical data transfer issues due to poor server performance are persisting for many, many years with this project and should be resolved quickly. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Yes, problem with the project that can't accept large file sizes. https://www.gpugrid.net/forum_thread.php?id=5261 |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Actually, that one had up upload error on e5s122_e2s172p0f91-ADRIA_AdB_KIXCMYB_HIP-1-2-RND2100_3_0 - not the _9 file which usually grows to ~ 500 MB and sometimes more. error code was -240: that seems to mean that BOINC had a problem creating the file in the first place, before even trying to upload it. https://www.gpugrid.net/result.php?resultid=32653943 |
Michael H.W. WeberSend message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes, problem with the project that can't accept large file sizes. OMG. I can't believe this... Check out this one. Actually, that one had up upload error on e5s122_e2s172p0f91-ADRIA_AdB_KIXCMYB_HIP-1-2-RND2100_3_0 - not the _9 file which usually grows to ~ 500 MB and sometimes more. I have taken a look at some of the tasks I had completed successfully. None of them had a _9 ending in the task name. Still they had approx. 500 MB upload file sizes. Hence, the file name - to my observation - does not reliably hint to a result file size. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The _9 doesn't refer to the task name, it refers to the upload file name. Each task generates multiple upload files. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Check out this one. It contains the line Temporarily failed upload of e2s67_e1s44p0f1240-ADRIA_AdB_KIXCMYB_HIP-0-2-RND1963_3_9 |
|
Send message Joined: 14 Feb 20 Posts: 16 Credit: 27,395,983 RAC: 0 Level ![]() Scientific publications
|
a partial cross-post IN THE HOPES THAT SOME ADMIN WILL NOTICE: WU ran for 33 HOURS, 38 Min 10/14/2021 12:59PM Computation for task e9s158_e7s140p0f143-ADRIA_AdB_KIXCMYB_HIP-0-2-RND0938_5 finished e9s158_e7s140p0f143-ADRIA_AdB_KIXCMYB_HIP-0-2-RND0938_5_9 501MB (525,918,872 bytes) is the one showing in the Transfers tab, WHAT can I do so as NOT to waste this WR results or the 33 1/2 hours GPU time |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
a partial cross-post nothing you can do at the moment unfortunately.
|
|
Send message Joined: 14 Feb 20 Posts: 16 Credit: 27,395,983 RAC: 0 Level ![]() Scientific publications
|
.OK, I have made (multiple) backup copies of the entire GPU project folder, and have for now suspended transfers. If the upload does abort, how can I get BOINC to retry the WU output file uploads from the backups?? |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
.OK, I have made (multiple) backup copies of the entire GPU project folder, and have for now suspended transfers. even if you restart the task from a backup, you will have the same issue. the problem is the output file is too big and cannot be uploaded since it is over the maximum file size allowed by the project's server. restarting computation will result in the same file being generated, still too big. the problem can only be solved by the project.
|
|
Send message Joined: 14 Feb 20 Posts: 16 Credit: 27,395,983 RAC: 0 Level ![]() Scientific publications
|
it may be a moot point, but I am NOT in the least interested in rerunning the WU (or, franking running ANY GpuGrid WUs for the foreseeable future). I made a backup of the OUTPUT files. my question is, can I try (for what it's worth) to get BOINC to redo the UPLOAD attempt, not rerun another 33 1/2 hours of GPU usage LLP, PhD PE |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
You claimed that there was a file in your transfers tab. That’s the file that won’t upload because it’s too large. BOINC will keep trying indefinitely already, retransferring all the files that have already been uploaded won’t make any difference. Each GPUGRID task produces several output files that all need to be uploaded. When the _9 file is too big, you run into this problem. Short of gaining control of the project’s upload server and changing their settings for them, there’s really nothing you can do at this point.
|
|
Send message Joined: 22 May 24 Posts: 2 Credit: 106,860,125 RAC: 0 Level ![]() Scientific publications
|
15-12-2024 08:58:22 | GPUGRID | Temporarily failed upload of 1eb6A00_300_1-ANTONIOM_MDCATH300r1se-9-50-RND1180_1_10: transient upload error upload file size is large i think that is the problem |
|
Send message Joined: 22 May 24 Posts: 2 Credit: 106,860,125 RAC: 0 Level ![]() Scientific publications
|
we can upload file after some time or days by clicking retry now on transfers tab on advanced view |
©2025 Universitat Pompeu Fabra