Author |
Message |
|
Hello,
just had a trypsin_lig_1161_3-NOELIA_RL3_run-0-1-RND4573_0 crashing immediately, here's the stderr output:
<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -98 (0xffffff9e)
</message>
<stderr_txt>
# GPU [GeForce GT 650M] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0 :
# Name : GeForce GT 650M
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:01:00.0
# Device clock : 950MHz
# Memory clock : 900MHz
# Memory width : 128bit
# Driver version : r325_00 : 32723
ERROR: file pme.cpp line 85: PME NX too small
21:24:17 (4392): called boinc_finish
</stderr_txt>
]]>
After that I got a 6x9-SANTI_MARwtdim-16-25- which is running fine so far. |
|
|
JohnSend message
Joined: 15 Oct 11 Posts: 17 Credit: 81,085,378 RAC: 0 Level
Scientific publications
|
Same issue here.... x2 workunits
trypsin_lig_1316_4-NOELIA_RL3_run-0-1-RND5763_0
trypsin_lig_1097_4-NOELIA_RL3_run-0-1-RND4667_0
Stderr output
<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -98 (0xffffff9e)
</message>
<stderr_txt>
# GPU [GeForce GTS 450] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0 :
# Name : GeForce GTS 450
# ECC : Disabled
# Global mem : 1024MB
# Capability : 2.1
# PCI ID : 0000:01:00.0
# Device clock : 1760MHz
# Memory clock : 1840MHz
# Memory width : 128bit
# Driver version : r331_54 : 33158
ERROR: file pme.cpp line 85: PME NX too small
19:11:34 (4304): called boinc_finish
</stderr_txt> |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The aforementioned batch of WU's contains a bug. All 13 WU's I recieved failed (after 2 or 3 seconds) on all my systems (Linux and Windows).
Example Stderr output
<core_client_version>7.2.23</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -98 (0xffffff9e)
</message>
<stderr_txt>
# GPU [GeForce GTX 660] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 2 :
# Name : GeForce GTX 660
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:02:00.0
# Device clock : 1032MHz
# Memory clock : 3004MHz
# Memory width : 192bit
# Driver version : r331_00 : 33140
ERROR: file pme.cpp line 85: PME NX too small
09:38:11 (6728): called boinc_finish
</stderr_txt>
]]>
The same WU also failed on other systems, 7412562 151335 30 Oct 2013 | 23:00:12 UTC 30 Oct 2013 | 23:06:23 UTC Error while computing 2.27 0.08 --- Short runs (2-3 hours on fastest card) v8.14 (cuda42)
7415514 143807 31 Oct 2013 | 0:50:38 UTC 31 Oct 2013 | 0:56:44 UTC Error while computing 2.43 0.14 --- Short runs (2-3 hours on fastest card) v8.14 (cuda42)
7415906 152255 31 Oct 2013 | 2:16:17 UTC 31 Oct 2013 | 2:22:13 UTC Error while computing 2.02 0.08 --- Short runs (2-3 hours on fastest card) v8.14 (cuda42)
7416291 131405 31 Oct 2013 | 4:21:57 UTC 31 Oct 2013 | 4:26:02 UTC Error while computing 1.62 0.11 --- Short runs (2-3 hours on fastest card) v8.14 (cuda42)
7416888 127801 31 Oct 2013 | 6:33:24 UTC 31 Oct 2013 | 6:42:57 UTC Error while computing 2.09 0.42 --- Short runs (2-3 hours on fastest card) v8.14 (cuda55)
7417449 139265 31 Oct 2013 | 8:00:12 UTC 31 Oct 2013 | 9:43:26 UTC Error while computing 2.22 0.20 --- Short runs (2-3 hours on fastest card) v8.14 (cuda55)
7418231 139502 31 Oct 2013 | 11:44:17 UTC 31 Oct 2013 | 11:50:05 UTC Error while computing 2.03 0.11 --- Short runs (2-3 hours on fastest card) v8.14 (cuda55)
7418699 --- --- --- Unsent --- --- ---
http://www.gpugrid.net/result.php?resultid=7417225
http://www.gpugrid.net/result.php?resultid=7416175...
All, ERROR: file pme.cpp line 85: PME NX too small
Too many errors (may have bug)
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
Oh and i though i could be my old card witch dont want to run with these batch O.o 30 Errors :/ didnt look into sderr, but now i see all fail on multiple machines.
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
Tin ManSend message
Joined: 1 Sep 09 Posts: 2 Credit: 214,365,451 RAC: 0 Level
Scientific publications
|
Same error Message for me as well!! |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The latest effort seems to work well.
Linux system (GTX670 FOC and GTX770):
trypsin_lig_1298_4x1-NOELIA_RL3run-0-1-RND2976_0 4888061 1 Nov 2013 | 4:27:41 UTC 1 Nov 2013 | 6:07:34 UTC Completed and validated 1,716.63 1,658.44 1,500.00 Short runs (2-3 hours on fastest card) v8.00 (cuda42)
trypsin_lig_1089_2x1-NOELIA_RL3run-0-1-RND9336_0 4887950 31 Oct 2013 | 21:55:42 UTC 1 Nov 2013 | 0:47:14 UTC Completed and validated 1,485.76 1,429.36 1,500.00 Short runs (2-3 hours on fastest card) v8.00 (cuda55)
Windows 7 (GTX770):
trypsin_lig_1706_1x1-NOELIA_RL3run-0-1-RND8824_0 4888310 1 Nov 2013 | 5:15:45 UTC 1 Nov 2013 | 7:32:23 UTC Completed and validated 2,028.98 2,007.25 1,500.00 Short runs (2-3 hours on fastest card) v8.14 (cuda55)
They seem to be much faster on Linux (35%) rather than the typical 11%. I expect they are small simulations.
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
|
32 crashes from 30.10.-31.10.2013. But GPU is oced, maybe these workunits were more sensitive to OC.
|
|
|
WerkstattSend message
Joined: 23 May 09 Posts: 121 Credit: 333,451,807 RAC: 357,200 Level
Scientific publications
|
I've added my new GT630 a few days ago. Many many bad results, always right after a few seconds.
Host: http://www.gpugrid.net/results.php?userid=25200
Error -52, SWAN : FATAL Unable to load module .mshake_kernel.cu. (702)
Error -98, ERROR: file pme.cpp line 85: PME NX too small
One fault caused the system to crash. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Werkstatt, other crunchers, mods and even scientists have 'No access' access to 'your' link - you need to link to the individual system (it's a Boinc server/site template security thing),
http://www.gpugrid.net/results.php?hostid=161299&offset=0&show_names=1&state=0&appid=
Fortunately you haven't hidden your systems.
You had the unfortunate experience of encountering a bad batch of WU's. These failed on everyone's cards.
trypsin_lig_1298_1-NOELIA_RL3_run-0-1-RND8370_6 - Already failed 6 times before being sent to you. As they failed in about 2 or 3seconds, numerous tasks would have failed for everyone before anyone noticed.
You seem to be having success now,
trypsin_lig_9_4x1-NOELIA_RC3run-0-1-RND4317_0 4889572 2 Nov 2013 | 0:13:18 UTC 2 Nov 2013 | 6:28:49 UTC Completed and validated 22,149.52 22,125.89 1,500.00 Short runs (2-3 hours on fastest card) v8.14 (cuda55)
trypsin_lig_1355_3x1-NOELIA_RC3run-0-1-RND4109_0 4889060 1 Nov 2013 | 17:10:25 UTC 2 Nov 2013 | 0:19:32 UTC Completed and validated 22,070.07 22,049.27 1,500.00 Short runs (2-3 hours on fastest card) v8.14 (cuda55)
trypsin_lig_458_2x1-NOELIA_RL3run-0-1-RND5559_0 4888617 1 Nov 2013 | 15:50:44 UTC 1 Nov 2013 | 17:11:01 UTC Completed and validated 4,769.82 4,769.82 1,500.00 Short runs (2-3 hours on fastest card) v8.14 (cuda55)
but that credit!
Either it's a new batch with more complicated molecules (8.827 ms vs 1.9ms per step) and poor credit, or your GPU downclocked, or your system was busy doing other things.
Noelia, if it's a new batch perhaps you could up the credits for the next batch?
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
WerkstattSend message
Joined: 23 May 09 Posts: 121 Credit: 333,451,807 RAC: 357,200 Level
Scientific publications
|
You had the unfortunate experience of encountering a bad batch of WU's. These failed on everyone's cards.
Noelia, if it's a new batch perhaps you could up the credits for the next batch?
Hi Skygiven,
THX for the reply. Failing wu's right after they start is not really a problem for me, have internet flatrate. Just wanted to keep Admins informed that there is a proplem somwhere, maybe triggered by discussion @ Einstein about a fault in cuda 5 they ran in and still use cuda 32.
I want to test my new card with different projects. Its a 'Kepler' card, driven by the GK208 chip, it adds < 19W to the power budget and it's a passive cooled single slot slim size card.
Cheers
Alexander |
|
|
|
wow didnt see this low credits before, thats hard O.o
____________
DSKAG Austria Research Team: http://www.research.dskag.at
|
|
|
WerkstattSend message
Joined: 23 May 09 Posts: 121 Credit: 333,451,807 RAC: 357,200 Level
Scientific publications
|
last one was much better ... :)) |
|
|