Message boards :
Number crunching :
Major SNAFU in Effect
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 3 Sep 13 Posts: 53 Credit: 1,533,531,731 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I noticed a ton of errors on a previously 100% reliable host tonight. Looks like a bad batch of WUs got pushed out, both IDP and KIX jobs are affected. IDP http://www.gpugrid.net/workunit.php?wuid=16483464 http://www.gpugrid.net/workunit.php?wuid=16480175 http://www.gpugrid.net/workunit.php?wuid=16480417 http://www.gpugrid.net/workunit.php?wuid=16453242 KIX http://www.gpugrid.net/workunit.php?wuid=16483553 http://www.gpugrid.net/workunit.php?wuid=16474311 http://www.gpugrid.net/workunit.php?wuid=16483548 I have 25 bad jobs in total that also have failed on numerous other hosts. [edit]I should have said mine is a Linux host, and I just noticed most of the other hosts where work failed are also Linux machines.[/edit] Team USA forum | Team USA page Join us and #crunchforcures. We are now also folding:join team ID 236370! |
|
Send message Joined: 21 Mar 16 Posts: 513 Credit: 4,673,458,277 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 57 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
http://www.gpugrid.net/results.php?hostid=490728 Did someone forget to renew a license? |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I'm getting nothing but comp errors on these new tasks also. |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same here, of course. But I haven't seen anyone from the project around here for a while. Is anyone at home? |
|
Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same here as well. Error 212 on WU's that were running fine up to 4 -5 hours ago. sounds like a license thing to me as well. Suspended project until the issue is resolved. |
|
Send message Joined: 23 Feb 17 Posts: 21 Credit: 5,528,199,475 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Have the same issues on two Linux machines, so not sure if this is a license thing. |
|
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
For the last 2 years, the License error usually comes after July 1st. 12 month license, I am assuming. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Every task I had in my cache on 4 hosts errored out today. Since I don't run very high resource allotment, some tasks had been running a couple of hours a day with no issues until today. The hosts are processing other projects without any errors during this time. I'd have to guess a license expired today. |
|
Send message Joined: 7 Jan 17 Posts: 34 Credit: 1,371,429,518 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
Same. I have two Ubuntu machines that throw up nothing but immediate errors now. My two Windows crunchers are fine, though. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The Linux app is broken (most probably its license expired). All of my Linux hosts run immediately into this error with every single workunit: <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 212 (0xd4, -44)</message> <stderr_txt> </stderr_txt> ]]> However my Windows host are crunching happily, so I switched back to Windows on my Linux hosts. The GPUGrid staff need to act on this without delay. |
Michael H.W. WeberSend message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same over here: http://www.gpugrid.net/forum_thread.php?id=4909&nowrap=true#51794 Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The Linux ACEMD v9.19 apps were deployed on 13/14 February 2018 - so it possibly looks like a 15 month licence expiry. The Windows v9.22 apps were deployed on 26 July 2018, so with luck we have until late October for those... Applications |
|
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
A temporary fix for Linux users is to set your system date back 1 year. EDIT: Setting time back 1 year caused certificate errors with other projects. So I have now set time back 1 month. This seems to work better. This has allowed me to start GPUgrid jobs successfully. You may need to stop time sync services so the system does not reset the time back to current time. For systemd based distros (eg...Ubuntu) - sudo datetimectl set-ntp 0 will turn time sync off EDIT: you will need to reissue this command and reset time after each reboot. If this licensing issue persists, I will post a more permanent time sync fix This was the temporary fix last year when license issues occurred. |
|
Send message Joined: 16 Apr 09 Posts: 7 Credit: 3,568,270,438 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Is project leadership aware of the licensing expiration? Seems like someone should be keeping a tickler file for this so that renewals could happen before WU's start erroring out. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Is project leadership aware of the licensing expiration?Apparently not. That's why this SNAFU. Seems like someone should be keeping a tickler file for this so that renewals could happen before WU's start erroring out.True. |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There wasn't any notification of the pending shutdown of the Quantum Chemistry (CPU) work units either, or when they might be restarted. I am not sure that there is any project leadership at the moment. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I'm going to just suspend the project on all my hosts. The fact I have to exclude my Turing cards makes it difficult to work with the project anyway. I'll just check back in occasionally and see if a new Linux app is available with current licensing. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Seems like someone should be keeping a tickler file for this so that renewals could happen before WU's start erroring out. also in the past, license renewals were not done in time and tasks failed. Too bad, but it really seems that the people at GPUGRID simply forget about these things. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just in case anyone is still wondering, I've been sent WU 16485663. Failed three times on Linux v9.19 hosts, now running normally under Windows v9.22 Confirms that it's an application problem, not a data problem. |
©2025 Universitat Pompeu Fabra