Message boards :
Number crunching :
Cause of quantum chemistry task failures: md5sum errors
Message board moderation
| Author | Message |
|---|---|
Michael H.W. WeberSend message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have a single Ubuntu Linux machine participating in GPUGRID using its CPU. Apart from a few correctly completed QC tasks, by now this machine has produced 28 "compute errors" after just a few seconds of run time each (0 secs CPU time). Checking the error logs yields the following message for all 28 tasks: WARNING: md5sum mismatch of tar archive
expected: 75a9f0faa822a01dfe0e0e5c43400ed0
got: dfc9f09eb6b6771c69d6cf10b91bc6c9 -
bunzip2: Data integrity error when decompressing.I noticed that WU download (and communication in general) is extremely slow - could it be that this is the cause of byte-hick-ups resulting in non-functional WU archives ending up with checksum errors upon extraction? In effect, this machine is prohibited to download additional tasks for 24 hours making it kind of obsolete to continue to participate in the current GPUGRID team challenge and GPUGRID QC task computation in general. Maybe an upgrade of the GPUGRID server infrastructure would help improve the situation? Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
|
Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I seem to remember something about those in the past, just can't remember. Your computers are hidden so we can not check the error you report. If you unhide them we can look at the entire stderr report and hopefully get an idea as to what is occuring.
|
Michael H.W. WeberSend message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here is an exemplary stderr log: Name m0000040872_65a1af79_n00050-SDOERR_QMML50_4-0-1-RND5714_1
Arbeitspaket 15707811
Erstellt 27 Dec 2018 | 17:23:04 UTC
Gesendet 27 Dec 2018 | 18:30:32 UTC
Empfangen 27 Dec 2018 | 18:32:06 UTC
Serverstatus Abgeschlossen
Resultat Berechnungsfehler
Clientstatus Berechnungsfehler
Endstatus 195 (0xc3) EXIT_CHILD_FAILED
Computer ID 428878
Ablaufdatum 1 Jan 2019 | 18:30:32 UTC
Laufzeit 2.72
CPU Zeit 0.00
Prüfungsstatus Ungültig
Punkte 0.00
Anwendungsversion Quantum Chemistry v3.31 (mt)
Stderr Ausgabe
<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
19:30:37 (6677): wrapper (7.7.26016): starting
19:30:37 (6677): wrapper (7.7.26016): starting
19:30:37 (6677): wrapper: running /usr/bin/flock (/var/lib/boinc-client/projects/www.gpugrid.net/miniconda.lock -c "/bin/bash ./miniconda-installer.sh -b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda &&
/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/conda install -m -y -p qmml3 --override-channels -c defaults -c gpugrid --file requirements.txt ")
WARNING: md5sum mismatch of tar archive
expected: 75a9f0faa822a01dfe0e0e5c43400ed0
got: dfc9f09eb6b6771c69d6cf10b91bc6c9 -
bunzip2: Data integrity error when decompressing.
Input file = /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/preconda.tar.bz2, output file = (stdout)
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
19:30:38 (6677): /usr/bin/flock exited; CPU time 0.185019
19:30:38 (6677): app exit status: 0x1
19:30:38 (6677): called boinc_finish(195)
</stderr_txt>
]]>
Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
Michael H.W. WeberSend message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
44 tasks are now affected... Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
Michael H.W. WeberSend message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Is this issue resolved? Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Not something we can fix from here. Try resetting the project, which should clear local files. |
©2025 Universitat Pompeu Fabra