Advanced search

Message boards : Number crunching : Python app V4.03 fails on windows servers

Author Message
jjch
Send message
Joined: 10 Nov 13
Posts: 101
Credit: 15,583,700,388
RAC: 4,166,616
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58882 - Posted: 6 Jun 2022 | 0:42:29 UTC

I'm trying to get a few windows 2012 servers with 32GB or more memory to run the Python app and they all consistently fail import cv2

File "T:\Program Data\BOINC\slots\5\lib\site-packages\pytorchrl\envs\atari\wrappers.py", line 8, in <module>
import cv2
ImportError: DLL load failed while importing cv2: The specified module could not be found.

https://www.gpugrid.net/result.php?resultid=32907162
https://www.gpugrid.net/result.php?resultid=32907179
https://www.gpugrid.net/result.php?resultid=32907165
https://www.gpugrid.net/result.php?resultid=32907038

Is this part of the python app that is being downloaded and extracted or is it something else that's missing?

I'm wondering if it could be part of the Microsoft Visual C++ Redistributable that needs to be updated.

These currently have Microsoft Visual C++ Redistributable 2015-2019. Do they need to be updated to 2022?

If anyone has a Windows system that is successfully running Python apps it would be interesting to see what version of the VS is installed.

Anything else you could think of would be appreciated.


jjch
Send message
Joined: 10 Nov 13
Posts: 101
Credit: 15,583,700,388
RAC: 4,166,616
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58897 - Posted: 10 Jun 2022 | 18:58:08 UTC
Last modified: 10 Jun 2022 | 18:59:01 UTC

Update - Installing the Microsoft Visual C++ 2015-2022 Redistributable on these did not resolve the issue. They are still failing with the same error.

Since it seems that the missing module is not being included in the python app I will have to try finding it elsewhere and see if it can be installed separately.

Any thoughts on what's needed to resolve the Windows application errors would be appreciated.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,681,721,308
RAC: 13,169,240
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58898 - Posted: 10 Jun 2022 | 19:02:20 UTC - in response to Message 58882.


If anyone has a Windows system that is successfully running Python apps it would be interesting to see what version of the VS is installed.

Anything else you could think of would be appreciated.




You should shoot Richard Haselgrove a PM and ask how he is able to run the Python tasks in Windows.

Richard knows a ton about Windows and BOINC.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58905 - Posted: 11 Jun 2022 | 19:13:04 UTC - in response to Message 58898.

You should shoot Richard Haselgrove a PM and ask how he is able to run the Python tasks in Windows.

Richard knows a ton about Windows and BOINC.

I can't - I'm only running Python on Linux at the moment.

My Windows machines have good enough GPUs, but they don't have enough system memory. And I haven't been able to get the "guaranteed compatible" RAM upgrade I bought to POST, let alone boot.

It's a low priority project - they can run ACEMD and Einstein for the time being.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,681,721,308
RAC: 13,169,240
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58909 - Posted: 11 Jun 2022 | 20:57:52 UTC

Read this post of mine for help on Windows.

https://www.gpugrid.net/forum_thread.php?id=5322&nowrap=true#58908

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58910 - Posted: 11 Jun 2022 | 21:22:08 UTC - in response to Message 58909.

This one is hardware. The Crucial system scanner detects that the OEM system builder (local gaming specialists) has installed Kingston KHX1866C10D3/4G RAM - note speed 1866.

It goes on to say that a Crucial 8GB DDR3L-1600 UDIMM upgrade is compatible - but I get nada, zilch, on power-up.

Motherboard is a Gigabyte GA-Z97P-D3, again as detected by Crucial (but it corresponds with system documentation and markings on the circuit board).

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,681,721,308
RAC: 13,169,240
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58911 - Posted: 11 Jun 2022 | 21:39:42 UTC - in response to Message 58910.

Can you get into the BIOS or is even that not possible?

I'd try and set the memory back down to base 800Mhz.

Then try putting in the new memory and see if it boots.

If it reads the new memory, try increasing the clock speed.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58913 - Posted: 12 Jun 2022 | 10:27:52 UTC - in response to Message 58911.

With apologies to the thread starter for taking it off topic:

I can get into the BIOS when the original build RAM (only) is present, but not if the 'compatible' upgrade RAM is present - either in conjunction with the original RAM, or by itself.

The working assumption has to be that the builder tuned the RAM timings for the 1866 RAM, and they would need de-tuning for the new stuff. But I'd need to set that 'flying blind', with only the old RAM installed, and I'd risk bricking the whole setup if I got that wrong. The alternatives could be:

Source some 1866 RAM for the upgrade - but nobody seems to be making/selling it now?

Contact the system builder - they're normally a friendly bunch, but tech support retreated behind a chatbot during the pandemic, and they hadn't emerged last time I looked. (and they're not selling 1866 RAM any more, either)

As I said, it's a low-priority job. If this project (or any project) offers applications which require me to deplete even more of the earth's resources in order to run them - that's their problem, not mine. My machines can carry on running other work, and being available for testing, as they have been since the last major refit.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,810,412,024
RAC: 20,644,706
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58916 - Posted: 12 Jun 2022 | 17:44:44 UTC - in response to Message 58913.

With apologies to the thread starter for taking it off topic:
(With apologies for the same reason ;-)

I can get into the BIOS when the original build RAM (only) is present, but not if the 'compatible' upgrade RAM is present

I'd try to update BIOS to the latest for your GA-Z97P-D3 motherboard.
It is F9b Beta BIOS, but just "Fix memory compatibility" is mentioned at Version Description
Please, be sure to check and choose the right update for your MB revision.
GA-Z97P-D3 (rev. 1.0)
GA-Z97P-D3 (rev. 1.1)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58917 - Posted: 12 Jun 2022 | 18:08:56 UTC - in response to Message 58916.

Yup, saw that. I'm on F7 at the moment, which isn't too bad - F8 just improved support for 5th. gen Intel CPUs, which I haven't got.

Had a poke around the BIOS and memory settings today. Biggest warning flag seems to be that the builder's Kingston RAM (it's HyperX Fury) is running at 1.5V, whereas the Crucial is specc'd for 1.35V. It seems to be using SPD, but not XMP, and the configurable setting are all on auto.

I can move things around, so one machine is all Kingston, and another is all Crucial - avoiding mixing seems like a good idea.

No indication via software (or on the invoice) whether it's 1.0 or 1.1 - I'll have to open her up and shine a torch around.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,681,721,308
RAC: 13,169,240
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58918 - Posted: 12 Jun 2022 | 18:33:42 UTC

I'd definitely update the BIOS.

Also I'd try a BIOS reset or clear to get rid of any builder's settings.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58921 - Posted: 13 Jun 2022 | 15:06:25 UTC

Well, I've got her up on the bench and under an inspection lamp. I was just about to conclude that "if it doesn't say rev 1.1, it must be 1.0" when I spotted the rev 1.1 mark deep in the darkest corner, behind 2 GPUs. Oops.

So, 1.1 it is - and it comes with a autoexec.bat file...

I'll try that at a command prompt under Windows 7, and see how I get on. I've just picked up an ACEMD 3 job, so I'll let that run overnight, and try swapping the RAM round tomorrow.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58922 - Posted: 13 Jun 2022 | 15:30:32 UTC
Last modified: 13 Jun 2022 | 15:41:48 UTC

Guess what.


OK, that worked. Running the ACEMD now.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,681,721,308
RAC: 13,169,240
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58923 - Posted: 13 Jun 2022 | 16:59:39 UTC

I think all the efiflash utilities are 16 bit DOS based.

Put a copy of FreeDOS on a USB stick along with the utility and firmware image and go to it.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58924 - Posted: 13 Jun 2022 | 17:04:42 UTC - in response to Message 58923.

That's what I did, basically.

There's a Q-Flash utility in the BIOS, which can read a USB drive.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58925 - Posted: 14 Jun 2022 | 11:56:48 UTC

Yay! Something's worked - probably the BIOS update. I've now got a machine running with twice the memory size - according to BIOS, Windows, and BOINC. It'll be a bit slower, because it's using the 1600 compatible RAM: but that means I've got two spare sticks of 1866 RAM to put in the next machine.

If I can squeeze it past the knuckle-grater of a heat sink. It's fitted with heat-spreaders, so I'll probably have to remove the CPU fan to gain access, and re-position it slightly higher up the heatsink to provide clearance.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58926 - Posted: 14 Jun 2022 | 14:24:35 UTC

And it seems to be attempting to run. It's got to 2.980%, and

13:38:39 (3360): .\7za.exe exited; CPU time 8.626855
13:38:39 (3360): wrapper: running python.exe (run.py)
Windows fix!!
Created CWorker with worker_index 0
Created GWorker with worker_index 0
Created UWorker with worker_index 0
Created training scheme.
Created Learner.

But that's in 1 hour 50 minutes.

CPU is over-committed (I haven't added Python to the app_config yet, so it's still allocating <1 CPU in BOINC's scheduler. I'll finesse that next.)

We can watch task 32918800 together.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,681,721,308
RAC: 13,169,240
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58927 - Posted: 14 Jun 2022 | 18:28:32 UTC

Progress!

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58928 - Posted: 14 Jun 2022 | 21:37:41 UTC

The plucky little thing has progressed to 19.640% in 9 hours - might even get an intermediate (1-2 day) bonus! It's not brilliant, but for a January 2016 build, with a 4-core, 4th. gen. i5, Windows 7, it's doing its best to keep up. I'm shutting down as much else as I can, and I'll post the virtual memory settings when/if this task succeeds.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58933 - Posted: 16 Jun 2022 | 11:55:17 UTC

We have a home run for task 32918800

From stderr:
13:38:39 (3360): wrapper: running python.exe (run.py) [Tuesday]
02:40:32 (3360): python.exe exited; CPU time 301601.725332 [Thursday]

So Python ran for a little over 37 hours (133,200 seconds), with an average CPU utilisation of 2.26.

This machine is fitted with a GTX 1660 Ti, the same as Linux host 508381. That hosts completes the tasks in about 14 hours. So, either the operating system, or - I suspect - the host CPU, is significantly slower in Windows than in Linux.

The Windows host has a 4-core i5-4690 CPU @ 3.50GHz
The Linux host has a 6-core i5-9600K CPU @ 3.70GHz
Both machines (now!) have 16 GB of RAM

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,873,012,138
RAC: 19,819,746
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58934 - Posted: 16 Jun 2022 | 12:03:48 UTC

The memory settings for the run just reported were




C: is a (small - 128 GB) SSD for the OS - quick boot.
D: is a mechanical 1 TB HDD, hosting BOINC and most user data folders.

I hope that gives some useful pointers for other Windows users.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,681,721,308
RAC: 13,169,240
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58935 - Posted: 16 Jun 2022 | 18:27:32 UTC

Great news Richard.

Yes, the size of the custom paging file for the Python gpu tasks is great information for those Windows users still struggling to complete a task.

Post to thread

Message boards : Number crunching : Python app V4.03 fails on windows servers

//