Experimental Python tasks (beta)

Message boards : News : Experimental Python tasks (beta)

Author	Message
Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 55588 - Posted: 13 Oct 2020 \| 6:07:19 UTC
	I'm creating some experimental tasks for the Python app (made Beta). They are Linux and CUDA specific and serve in preparation for future batches. They may use a relatively large amount of disk space (order of 1-10 GB) which persists between runs, and is cleared if you reset the project.
	ID: 55588 \| Rating: 0 \| rate: / Reply Quote

rod4x4 Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level Scientific publications	Message 55590 - Posted: 13 Oct 2020 \| 7:44:18 UTC - in response to Message 55588. Last modified: 13 Oct 2020 \| 8:24:54 UTC
	I'm creating some experimental tasks for the Python app (made Beta). They are Linux and CUDA specific and serve in preparation for future batches. They may use a relatively large amount of disk space (order of 1-10 GB) which persists between runs, and is cleared if you reset the project. Preference Ticked, ready and waiting... EDIT: Received some already https://www.gpugrid.net/result.php?resultid=29466771 https://www.gpugrid.net/result.php?resultid=29466770 Conda Warnings reported. Will you push out update with app (or safe to ignore)? Also Warnings about path not found: WARNING conda.core.envs_manager:register_env(50): Unable to register environment. Path not writable or missing. environment location: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda registry file: /root/.conda/environments.txt Registry file location ( /root/ ) will not be accessible to boinc user unless conda is already installed on the host (by root user) and conda file is world readable Otherwise the task status is Completed and Validated
	ID: 55590 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 55591 - Posted: 13 Oct 2020 \| 9:25:38 UTC - in response to Message 55590.
	Looks harmless, thanks for reporting. It's because the "boinc" user doesn't have a HOME directory I think.
	ID: 55591 \| Rating: 0 \| rate: / Reply Quote

rod4x4 Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level Scientific publications	Message 55592 - Posted: 13 Oct 2020 \| 11:14:14 UTC - in response to Message 55591. Last modified: 13 Oct 2020 \| 11:17:49 UTC
	Looks harmless, thanks for reporting. It's because the "boinc" user doesn't have a HOME directory I think. Agreed Perhaps adding "./envs" switch to the end of the command: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/conda install May help with setting up the environment. This switch should add environment file to current directory from which command is executed.
	ID: 55592 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 55724 - Posted: 12 Nov 2020 \| 1:59:01 UTC
	I got one of these tasks which confused me as I have not set "accept beta applications" in my project preferences. Failed after 1200 seconds. Any idea why I got this task even when I have not accepted the app through beta settings? https://www.gpugrid.net/result.php?resultid=30508976
	ID: 55724 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 55920 - Posted: 9 Dec 2020 \| 19:42:43 UTC
	What is the difference between these test Python apps and the standard one? Is it just that this application is coded in Python? what language are the default apps coded in? ____________
	ID: 55920 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 55926 - Posted: 9 Dec 2020 \| 23:40:13 UTC - in response to Message 55920.
	Both apps are wrappered. One is the stock acemd3 and I assume is written in some form of C. The new Anaconda Python task is a conda application. And Python. I think Toni is going to have to explain what and how these new tasks and application work. Very strange behavior. I think the conda and python parts run first and communicate with the project doing some intermediary calculation/configuration/formatting or something. Lots of upstream network activity and nothing going on in the client transfers screen. I saw the tasks get to 100% progress and no time remaining and then stall out. No upload of the finished task. Looked away from the machine and looked again and now both tasks have reset their progress and now have 3 hours to run. I first saw conda show up in the process list and now that has disappeared to be replaced with a acemd3 and python process for each task. Must be doing something other than insta-failing that the previous tries.
	ID: 55926 \| Rating: 0 \| rate: / Reply Quote

sph Send message Joined: 22 Oct 20 Posts: 4 Credit: 34,434,982 RAC: 0 Level Scientific publications	Message 55933 - Posted: 10 Dec 2020 \| 5:22:30 UTC
	CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2> Elapsed: - An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way. I am receiving this error in STDerr Output for Experimental Python tasks on all my hosts. This is probably due to the fact all my PCs are behind a proxy. Can you please set the Python tasks to use the Proxy defined in the Boinc Client? Work Units here: https://www.gpugrid.net/result.php?resultid=31672354 https://www.gpugrid.net/result.php?resultid=31668427 https://www.gpugrid.net/result.php?resultid=31665961
	ID: 55933 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 55936 - Posted: 10 Dec 2020 \| 8:30:18 UTC
	Boy, mixing both regular acemd3 and the python anaconda tasks sure F*s up the APR for both tasks. The insanely low APR for the Python tasks is forcing all GPUGrid tasks into High Priority. The regular acemd3 tasks are getting 3-6 day estimated completions.
	ID: 55936 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 55945 - Posted: 10 Dec 2020 \| 15:25:26 UTC - in response to Message 55936. Last modified: 10 Dec 2020 \| 15:41:38 UTC
	Boy, mixing both regular acemd3 and the python anaconda tasks sure F*s up the APR for both tasks. The insanely low APR for the Python tasks is forcing all GPUGrid tasks into High Priority. The regular acemd3 tasks are getting 3-6 day estimated completions. I'm seeing that too lol. but it doesnt seem to be causing too much trouble for me since I don't run more than one GPU project concurrently. Only have Prime and backup. copying my message from another thread with my observations about these tasks for Toni to see if he doesnt check the other threads: Looks like I have 11 successful tasks, and 2 failures. the two failures both failed with "196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED" after a few mins and on different hosts. https://www.gpugrid.net/result.php?resultid=31680145 https://www.gpugrid.net/result.php?resultid=31678136 curious, since both systems have plenty of free space, and I've allowed BOINC to use 90% of it. these tasks also have much different behavior compared to the default new version acemd tasks. and they don't seem well optimized yet. -less reliance on PCIe bandwidth, seeing 2-8% PCIe 3.0 bus utilization -more reliance on GPU VRAM, seeing 2-3GB memory used -less GPU utilization, seeing 65-85% GPU utilization. (maybe more dependent on a fast CPU/mem subsystem. my 3900X system gets better GPU% than my slower EPYC systems) contrast that with the default acemd3 tasks: -25-50% PCIe 3.0 bus utilization -about 500MB GPU VRAM used -95+% GPU utilization thinking about the GPU utilization being dependent on CPU speed. It could also have to do with the relative speed between the GPU:CPU. just something I observed on my systems. slower GPUs seem to tolerate slower CPUs better, which makes sense if the CPU speed is a limiting factor. Ryzen 3900X @4.20GHz w/ 2080ti = 85% GPU Utilization EPYC 7402P @3.30GHz w/ 2080ti = 65% GPU Utilization EPYC 7402P @3.30GHz w/ 2070 = 76% GPU Utilization EPYC 7642 @2.80GHz w/ 1660Super = 71% GPU Utilization needs more optimization IMO. the default app sees much better performance keeping the GPU fully loaded. ____________
	ID: 55945 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 55946 - Posted: 10 Dec 2020 \| 16:03:34 UTC - in response to Message 55936.
	Boy, mixing both regular acemd3 and the python anaconda tasks sure Fs up the APR for both tasks. The insanely low APR for the Python tasks is forcing all GPUGrid tasks into High Priority. The regular acemd3 tasks are getting 3-6 day estimated completions. Actually, that won't be the cause. The APRs are kept separately for each application, and once you have an 'active' APR (11 or more 'completions' - validated tasks for that app), they should keep out of each others way. What will F things up is that this project still allows DCF to run free - and that's a single value which is applied to both task types.
	ID: 55946 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 55947 - Posted: 10 Dec 2020 \| 16:07:55 UTC - in response to Message 55946.
	Yeah, after I wrote that I realized I meant the DCF is what is messing up the runtime estimations. I wonder if the regular acemd3 tasks will ever get their normal DCF's back to normal. I haven't run ANY of my other gpu project tasks since these anaconda python tasks have shown up. I will eventually when the other projects deadlines approach of course.
	ID: 55947 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 55948 - Posted: 10 Dec 2020 \| 16:09:51 UTC - in response to Message 55946.
	what's DCF? ____________
	ID: 55948 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 55949 - Posted: 10 Dec 2020 \| 16:29:16 UTC - in response to Message 55948.
	what's DCF? Task Duration Correction Factor. The older BOINC server versions use it like Einstein. It messes up gpu tasks of different apps there too.
	ID: 55949 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 55951 - Posted: 10 Dec 2020 \| 17:11:20 UTC - in response to Message 55947.
	You can't talk about 'their DCFs' - there is only one (there could have been more than one, but that's the way David chose to play it) You can see it in BOINC Manager, on the Projects\|properties dialog. If it gets really, really high (above 90), it'll inch downwards at 1% per task. Below 90, it'll speed up to 10% par task. The standard advice used to be "two weeks to stabilise", but with modern machines (multi-core, multi-GPU, and faster), the tasks fly by, and it should be quicker.
	ID: 55951 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 55953 - Posted: 10 Dec 2020 \| 17:28:15 UTC - in response to Message 55951.
	What is also messed up is the size of the Anaconda Python task estimated computation size shown in the task properties. The ones I crunched were only set for 3,000 GFLOPS. The regular acemd3 tasks are set for 5,000,000 GFLOPS. This also probably influenced the wildly inaccurate DCF's for the new python tasks.
	ID: 55953 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 55954 - Posted: 10 Dec 2020 \| 17:33:17 UTC - in response to Message 55951.
	You can't talk about 'their DCFs' - there is only one (there could have been more than one, but that's the way David chose to play it) You can see it in BOINC Manager, on the Projects\|properties dialog. If it gets really, really high (above 90), it'll inch downwards at 1% per task. Below 90, it'll speed up to 10% par task. The standard advice used to be "two weeks to stabilise", but with modern machines (multi-core, multi-GPU, and faster), the tasks fly by, and it should be quicker. This daily driver has GPUGrid DCF Project properties currently at 85 and change.
	ID: 55954 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 55955 - Posted: 10 Dec 2020 \| 17:33:49 UTC - in response to Message 55953. Last modified: 10 Dec 2020 \| 17:37:11 UTC
	What is also messed up is the size of the Anaconda Python task estimated computation size shown in the task properties. The ones I crunched were only set for 3,000 GFLOPS. The regular acemd3 tasks are set for 5,000,000 GFLOPS. This also probably influenced the wildly inaccurate DCF's for the new python tasks. can confirm. could this be why the credit reward is so high too? I wonder what the flop estimate was on this one from Kevvy: https://www.gpugrid.net/result.php?resultid=31679003 he got wrecked on this one, over 5hrs on a 2080ti, and got a mere 20 credits lol. ____________
	ID: 55955 \| Rating: 0 \| rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 635,057 Level Scientific publications	Message 55956 - Posted: 10 Dec 2020 \| 18:20:14 UTC
	I've got one running now on an RTX 2070S and the only real issue is low GPU utilization (60-70%). The current task is using ~2 GB of VRAM and ~3 GB of system RAM. I have one thread free on a ryzen 3900X to support the GPU and that thread is running at 100%. This computer has complete 3 of the new python tasks successfully. Linux Mint 20; Driver Version: 440.95.01; CUDA Version: 10.2
	ID: 55956 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 55957 - Posted: 10 Dec 2020 \| 18:25:08 UTC - in response to Message 55956.
	I've got one running now on an RTX 2070S and the only real issue is low GPU utilization (60-70%). The current task is using ~2 GB of VRAM and ~3 GB of system RAM. I have one thread free on a ryzen 3900X to support the GPU and that thread is running at 100%. This computer has complete 3 of the new python tasks successfully. Linux Mint 20; Driver Version: 440.95.01; CUDA Version: 10.2 what kind of BOINC install do you have? does it run as a service? or a standalone install that runs from an executable? what is the clock speed of your 3900X and memory speed as well? try letting there be 2 spare free threads (so you have one doing nothing) to avoid maxing out the CPU to 100% utilization on all threads. this is known to slow down GPU work. this might increase your GPU utilization a bit. ____________
	ID: 55957 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 55958 - Posted: 10 Dec 2020 \| 19:05:05 UTC - in response to Message 55955.
	There's an explanation for 20 credit tasks over at Rosetta. Has to do with a task being interrupted in calculation and restarted if I remember correctly.
	ID: 55958 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 55959 - Posted: 10 Dec 2020 \| 19:07:58 UTC - in response to Message 55957. Last modified: 10 Dec 2020 \| 19:15:47 UTC
	what kind of BOINC install do you have? does it run as a service? or a standalone install that runs from an executable? That was one of the questions I wanted to ask Mr. Kevvy in the case he seems to be the first cruncher to successfully crunch a ton of them without errors. I wondered if his BOINC was a service install or a standalone. [Edit] OK, so Mr. Kevvy is still using the AIO. I wondered since a lot of our team seem to have dropped the AIO and gone back to the service install. So, then likely the main difference is that Mr. Kevvy is using the older glibc 2.29 instead of the glibc 2.31 that we Ubuntu 20 users are running.
	ID: 55959 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 55961 - Posted: 10 Dec 2020 \| 19:17:41 UTC - in response to Message 55959.
	I'm almost positive he's running a standalone install. ____________
	ID: 55961 \| Rating: 0 \| rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 635,057 Level Scientific publications	Message 55962 - Posted: 10 Dec 2020 \| 19:28:54 UTC - in response to Message 55957.
	I've got one running now on an RTX 2070S and the only real issue is low GPU utilization (60-70%). The current task is using ~2 GB of VRAM and ~3 GB of system RAM. I have one thread free on a ryzen 3900X to support the GPU and that thread is running at 100%. This computer has complete 3 of the new python tasks successfully. Linux Mint 20; Driver Version: 440.95.01; CUDA Version: 10.2 what kind of BOINC install do you have? does it run as a service? or a standalone install that runs from an executable? what is the clock speed of your 3900X and memory speed as well? try letting there be 2 spare free threads (so you have one doing nothing) to avoid maxing out the CPU to 100% utilization on all threads. this is known to slow down GPU work. this might increase your GPU utilization a bit. Boinc runs as a service and was installed from the Mint repository (version 17.16.6). The CPU clock speed is 3.9 GHz and the RAM is DDR4 3200 CL16. I did free up another thread but I didn't see an obvious difference in GPU utilization.
	ID: 55962 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 55963 - Posted: 10 Dec 2020 \| 19:31:00 UTC - in response to Message 55959. Last modified: 10 Dec 2020 \| 19:39:47 UTC
	So, then likely the main difference is that Mr. Kevvy is using the older glibc 2.29 instead of the glibc 2.31 that we Ubuntu 20 users are running. difference in what sense? you and I both have glibc 2.31 and we both have a bunch of successful completions. looks like Kevvy's Ubuntu 20 systems also have 2.31. all of us with these Ubuntu 20.04 systems have successful completions. but of all of his Linux Mint (based on Ubuntu 19) systems, none have completed a single Python task successfully. I'm not sure if it's a problem with Linux Mint or what. I'm not sure its necessarily anything to do with the GLIBC since his error messages are varied, and none mention GLIBC as being the cause. It could just be that the app has some bugs to work out when running in different environments. I also don't know if he's using service installs on his Mint systems, he's got a lot of different BOINC versions across all his systems. ____________
	ID: 55963 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 55964 - Posted: 10 Dec 2020 \| 19:36:51 UTC - in response to Message 55962.
	Boinc runs as a service and was installed from the Mint repository (version 17.16.6). The CPU clock speed is 3.9 GHz and the RAM is DDR4 3200 CL16. I did free up another thread but I didn't see an obvious difference in GPU utilization. thanks for the clarification. it was worth a shot on the GPU utilization with the free thread, low hanging fruit. I run my memory at 3600 CL14, but I've never seen memory matter that much even for CPU tasks on other projects, let alone GPU tasks. (I saw no difference when changing from 3200CL16 to 3600CL14), but anything's possible I guess. ____________
	ID: 55964 \| Rating: 0 \| rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 635,057 Level Scientific publications	Message 55965 - Posted: 10 Dec 2020 \| 19:44:14 UTC - in response to Message 55963.
	So, then likely the main difference is that Mr. Kevvy is using the older glibc 2.29 instead of the glibc 2.31 that we Ubuntu 20 users are running. difference in what sense? you and I both have glibc 2.31 and we both have a bunch of successful completions. looks like Kevvy's Ubuntu 20 systems also have 2.31. all of us with these Ubuntu 20.04 systems have successful completions. but of all of his Linux Mint (based on Ubuntu 19) systems, none have completed a single Python task successfully. I'm not sure if it's a problem with Linux Mint or what. I'm not sure its necessarily anything to do with the GLIBC since his error messages are varied, and none mention GLIBC as being the cause. It could just be that the app has some bugs to work out when running in different environments. Mint 20 is based on Ubuntu 20.04 and has glibc 2.31. The 2 computers I have running GPUGrid have Mint 20 installed and the RTX cards on those computers are completing the new python tasks successfully.
	ID: 55965 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 55966 - Posted: 10 Dec 2020 \| 20:01:37 UTC - in response to Message 55965. Last modified: 10 Dec 2020 \| 20:02:13 UTC
	Mint 20 is based on Ubuntu 20.04 and has glibc 2.31. The 2 computers I have running GPUGrid have Mint 20 installed and the RTX cards on those computers are completing the new python tasks successfully. Yes, I know. But my point was that there are many differences between Mint 19 and 20, not just GLIBC version, and usually when GLIBC is an issue that shows up as the reason for the error in the task results, but that hasn't been the case. and conversely we have several examples of tasks hitting Ubuntu 20.04 systems with GLIBC of 2.31 and they still fail. I think it's just buggy. ____________
	ID: 55966 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 55969 - Posted: 10 Dec 2020 \| 22:05:44 UTC - in response to Message 55966.
	Yes, I had over a half dozen failed tasks before the first successful task. Why I was wondering if the failed tasks report the failed configuration upstream and change the future task configuration. Pretty sure lots of prerequisite software is downloaded first from conda and configured on the system before finally actually starting real crunching. And the configuration downloads happen for each task I think. Not just some initial download and from then on all the file are static.
	ID: 55969 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 55996 - Posted: 12 Dec 2020 \| 20:10:00 UTC
	FYI, these tasks don't checkpoint properly. if you need to stop BOINC or the system experiences a power outage, the tasks restart from the beginning (10%) but the task timer still tracks from where it left off even though the task restarted. if the tasks were short like MDAD (but MDAD checkpoints properly) it wouldn't be a huge problem. but when they run for 4-5hrs and need to start over for any interruption, it's a bit of a kick in the pants. even worse when these restarted tasks only get 20cred for up to 2x total run time. not worth finishing it at that point. additionally as has been mentioned in the other thread, these tasks wreak havoc on the system's DCF since it seems to be set incorrectly for these tasks. you get these tasks that make boinc thing they will complete in 10 seconds, and they end up taking 4hrs, so BOINC counters with inflating the run time of normal tasks to 10+ days when they only take 20-40 min lol. and it swings wildly back and forth depending how many of each type you've completed. and credit reward, other than being about 10x normal for tasks of this runtime, seems only tied to FLOPS and runtime without accounting for efficiency at all. my 3900X/2080ti completes tasks on average much faster than my EPYC/2080ti system since the 3900X system is running higher GPU utilization allowing faster run times. but the 3900X system earns proportionally less credit. so both systems end up earning the same amount of credit per card. the 3900X/2080ti should be earning more credit since it's doing more tasks. reward is being overinflated for tasks that have longer run time due to inefficiency. it seems tied only to raw runtime and estimated flops. i understand that tasks can have varying run times, but if you wont account for efficiency you need to have a static reward not dependent on runtime at all. for reference, a static reward of about 175,000 would, on average, bring these tasks near the MDAD for cred/unit-time. ____________
	ID: 55996 \| Rating: 0 \| rate: / Reply Quote

Greger Send message Joined: 6 Jan 15 Posts: 76 Credit: 24,369,033,523 RAC: 16,483,368 Level Scientific publications	Message 55997 - Posted: 12 Dec 2020 \| 22:03:58 UTC Last modified: 12 Dec 2020 \| 22:06:56 UTC
	My host switch to another project task then resume and after a while i had to update system and restart. So it indeed fail to resume from last state so it looks like checkpoint was far behind or no checkpoint at all. Time stay at around 2 hour which was hours behind and est percentage locked at 10% I aborted it next day as it reached 14 hours. https://www.gpugrid.net/result.php?resultid=31701824 I would expect it not to be fully working and checkpoint added later on. There is much testing of this but low info for us still so we need to take it for what it is and deal with it if the don't work.
	ID: 55997 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 56007 - Posted: 15 Dec 2020 \| 15:52:05 UTC - in response to Message 55997.
	The Python app runs ACEMD, but uses additional libraries to compute additional force terms. These libraries are distributed as Conda (Python) packages. For this to work, I had to make an App which installs a self-contained Conda install in the project dir. The installation is re-used from one run to the other. This is rather finicky (for example, downloads are large, and I have to be careful with concurrent installs). Two outstanding issues are over-crediting (I am using some default BOINC formula) and, as far as i understand, the flops estimate (?).
	ID: 56007 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 4,095,161,456 RAC: 6,807,679 Level Scientific publications	Message 56008 - Posted: 15 Dec 2020 \| 17:04:43 UTC - in response to Message 56007.
	Two outstanding issues are over-crediting (I am using some default BOINC formula) and, as far as i understand, the flops estimate (?). Over-crediting? I am seeing the opposite problem. https://www.gpugrid.net/result.php?resultid=31902208 20.83 credits for 4.5 hours of run time on an RTX 2080 Ti. That is practically nothing. And this is not a one-off. All my tasks so far are similar. ____________ Reno, NV Team: SETI.USA
	ID: 56008 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56009 - Posted: 15 Dec 2020 \| 17:14:18 UTC - in response to Message 56007.
	Thanks for the details. The flops estimate Yes, the "size" of the tasks, as expressed by <rsc_fpops_est> in the workunit template. The current value is 3,000 GFLOPS: all other GPUGrid task types are are 5,000,000 GFLOPS. An App which installs a self-contained Conda install We are encountering an unfortunate clash with the security of BOINC running as a systemd service under Linux. Useful bits of BOINC (pausing computation when the computer's user is active on the mouse or keyboard) rely on having access to the public /tmp/ folder structure. The conda installer wants to make use of a temporary folder. systemd allows us to have either public tmp folders (read only, for security), or private tmp folders (write access). But not both at the same time. We're exploring how to get the best of both worlds... Discussions in https://www.gpugrid.net/forum_thread.php?id=5204 https://github.com/BOINC/boinc/issues/4125 over-crediting We're enjoying it while it lasts!
	ID: 56009 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56010 - Posted: 15 Dec 2020 \| 17:19:51 UTC - in response to Message 56008.
	Over-crediting? OK, make that 'inconsistent crediting'. Mine are all in the 600,000 - 900,000 range, for much the same runtime on a 1660 Ti. Host 508381
	ID: 56010 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56011 - Posted: 15 Dec 2020 \| 17:50:02 UTC - in response to Message 56010. Last modified: 15 Dec 2020 \| 18:13:03 UTC
	Over-crediting? OK, make that 'inconsistent crediting'. Mine are all in the 600,000 - 900,000 range, for much the same runtime on a 1660 Ti. Host 508381 the 20 credits thing seems to only happen with restarted tasks from what ive seen. not sure if anything else triggers it. but I can say with certainty that the credit allocation is "questionable", and only appears to be related to the flops of device 0 in BOINC, as well as runtime. slow devices masked behind a fast device0 will earn credit at the rate of the faster device... ____________
	ID: 56011 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56012 - Posted: 15 Dec 2020 \| 17:53:40 UTC - in response to Message 56008.
	Two outstanding issues are over-crediting (I am using some default BOINC formula) and, as far as i understand, the flops estimate (?). Over-crediting? I am seeing the opposite problem. https://www.gpugrid.net/result.php?resultid=31902208 20.83 credits for 4.5 hours of run time on an RTX 2080 Ti. That is practically nothing. And this is not a one-off. All my tasks so far are similar. this happens when the task is interrupted. started and resumed. you can't interrupt these tasks at all. ____________
	ID: 56012 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56013 - Posted: 15 Dec 2020 \| 18:09:15 UTC
	We should perhaps mention the lack of effective checkpointing while we have Toni's attention. Even though the tasks claim to checkpoint every 0.9% (after the initial 10% allowed for the setup), the apps are unable to resume from the point previously reached.
	ID: 56013 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 4,095,161,456 RAC: 6,807,679 Level Scientific publications	Message 56014 - Posted: 15 Dec 2020 \| 18:22:52 UTC - in response to Message 56012.
	Over-crediting? I am seeing the opposite problem. https://www.gpugrid.net/result.php?resultid=31902208 20.83 credits for 4.5 hours of run time on an RTX 2080 Ti. That is practically nothing. And this is not a one-off. All my tasks so far are similar. this happens when the task is interrupted. started and resumed. you can't interrupt these tasks at all. I'll check that out. But I have not suspended or otherwise interrupted any tasks. Unless BOINC is doing that without my knowledge. But I don't think so. ____________ Reno, NV Team: SETI.USA
	ID: 56014 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56015 - Posted: 15 Dec 2020 \| 18:28:46 UTC - in response to Message 56014.
	Over-crediting? I am seeing the opposite problem. https://www.gpugrid.net/result.php?resultid=31902208 20.83 credits for 4.5 hours of run time on an RTX 2080 Ti. That is practically nothing. And this is not a one-off. All my tasks so far are similar. this happens when the task is interrupted. started and resumed. you can't interrupt these tasks at all. I'll check that out. But I have not suspended or otherwise interrupted any tasks. Unless BOINC is doing that without my knowledge. But I don't think so. you also appear to have your hosts setup to ONLY crunch these beta tasks. is there a reason for that? does your system process the normal tasks fine? maybe it's something going on with your system as a whole. ____________
	ID: 56015 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 4,095,161,456 RAC: 6,807,679 Level Scientific publications	Message 56016 - Posted: 15 Dec 2020 \| 18:57:24 UTC - in response to Message 56015.
	you also appear to have your hosts setup to ONLY crunch these beta tasks. is there a reason for that? I have reached my wuprop goals for the other apps. So I am interested in only this particular app (for now). does your system process the normal tasks fine? maybe it's something going on with your system as a whole. Yep, all the other apps run fine, both here and on other projects. ____________ Reno, NV Team: SETI.USA
	ID: 56016 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56017 - Posted: 15 Dec 2020 \| 20:40:18 UTC - in response to Message 56016. Last modified: 15 Dec 2020 \| 21:09:19 UTC
	you also appear to have your hosts setup to ONLY crunch these beta tasks. is there a reason for that? I have reached my wuprop goals for the other apps. So I am interested in only this particular app (for now). does your system process the normal tasks fine? maybe it's something going on with your system as a whole. Yep, all the other apps run fine, both here and on other projects. I have a theory, but not sure if it's correct or not. can you tell me the peak_flops value reported in your coproc_info.xml file for the 2080ti? basically, since you are using such an old version of BOINC (7.9.3) which pre-dates the fixes implemented in 7.14.2 to properly calculate the peak flops of Turing cards. So I'm willing to bet that your version of BOINC is over-estimating your peak flops by a factor of 2. a 2080ti should read somewhere between 13.5 and 15 TFlops, and I'm guessing your old version of BOINC is thinking it's closer to double that (25-30 TFlops) the second half of the theory is that there is some kind of hard limit (maybe an anti-cheat mechanism?) that prevents a credit reward somewhere around >2,000,000. maybe 1.8million, maybe 1.9million? but I haven't observed ANYONE getting a task earning that much, and all tasks that would reach that level based on runtime seem to get this 20-credit value. thats my theory, i could be wrong. if you try a newer version of boinc that properly measures the flops on a turing card, and you start getting real credit, then it might hold water. ____________
	ID: 56017 \| Rating: 0 \| rate: / Reply Quote

sph Send message Joined: 22 Oct 20 Posts: 4 Credit: 34,434,982 RAC: 0 Level Scientific publications	Message 56018 - Posted: 15 Dec 2020 \| 23:13:08 UTC - in response to Message 56007. Last modified: 15 Dec 2020 \| 23:15:51 UTC
	Two outstanding issues are over-crediting (I am using some default BOINC formula) and, as far as i understand, the flops estimate (?). Toni, One more issue to add to the list. The download from Ananconda website does not allow for hosts behind a proxy. Can you please add a check for Proxy settings in the BOINC client so external software can be downloaded? I have other hosts that are not behind a proxy and they download and run the Experimental tasks fine. Issue here: CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2> Elapsed: - An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way. This error repeats itself until it eventually gives up after 5 minutes and fails the task. Happens on 2 hosts sitting behind a Web Proxy (Squid)
	ID: 56018 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 4,095,161,456 RAC: 6,807,679 Level Scientific publications	Message 56019 - Posted: 16 Dec 2020 \| 1:19:31 UTC - in response to Message 56017.
	A second, identical machine, except it has dual RTX 1660 Ti cards, finally got some work. The tasks reported and were awarded the large credits. So that rules out the question WRT BOINC version. FWIW, that version of BOINC is the latest available from the repository. So maybe it is due to interruptions after all, and I am just unaware? I am running some more tasks now, and will check again in the morning. ____________ Reno, NV Team: SETI.USA
	ID: 56019 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56020 - Posted: 16 Dec 2020 \| 2:57:24 UTC - in response to Message 56019. Last modified: 16 Dec 2020 \| 3:01:29 UTC
	A second, identical machine, except it has dual RTX 1660 Ti cards, finally got some work. The tasks reported and were awarded the large credits. So that rules out the question WRT BOINC version. FWIW, that version of BOINC is the latest available from the repository. So maybe it is due to interruptions after all, and I am just unaware? I am running some more tasks now, and will check again in the morning. it doesnt rule it out because a 1660ti has a much lower flops value. like 5.5 TFlop. so with the old boinc version, it's estimating ~11TFlop and that's not high enough to trigger the issue. you're only seeing it on the 2080ti because it's a much higher performing card. ~14TFlop by default, and the old boinc version is scaling it all the way up to 28+ TFlop. this causes the calculated credit to be MUCH higher than that of the 1660ti, and hence triggering the 20-cred issue, according to my theory of course. but your 1660ti tasks are well below the 2,000,000 credit threshold that i'm estimating. highest i've seen is ~1.7million, so the line cant be much higher. I'm willing to bet that if one of your tasks on that 1660ti system runs for ~30,000-40,000 seconds, it gets hit with 20 credits. ¯\_(ツ)_/¯ you really should try to get your hands on a newer version of BOINC. I use a version of BOINC that was compiled custom, and have usually used custom compiled versions from newer versions of the source code. maybe one of the other guys here can point you to a different repository that has a newer version of BOINC that can properly manage the Turing cards. ____________
	ID: 56020 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56021 - Posted: 16 Dec 2020 \| 3:13:29 UTC - in response to Message 56020.
	i also verified that restarting ALONE, wont necessarily trigger the 20-credit reward. it depends WHEN you restart it. if you restart the task early, early enough that the combined runtime wont reach a point where you wont come close to the 2mil credit mark, you'll get the normal points this task here: https://www.gpugrid.net/result.php?resultid=31934720 I restarted this task about 10-15mins into it. and it started over from the 10% mark, ran to completion, and still got normal crediting. and well below the threshold. ____________
	ID: 56021 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56023 - Posted: 16 Dec 2020 \| 14:36:25 UTC - in response to Message 56019.
	A second, identical machine, except it has dual RTX 1660 Ti cards, finally got some work. The tasks reported and were awarded the large credits. So that rules out the question WRT BOINC version. FWIW, that version of BOINC is the latest available from the repository. So maybe it is due to interruptions after all, and I am just unaware? I am running some more tasks now, and will check again in the morning. i see you changed BOINC to 7.17.0. another thing I noticed was that the change in tasks didnt take effect until new tasks were downloaded after the change, so tasks that were already there and tagged with the overinflated flops value will probably still get 20-cred. only the newly downloaded tasks after the change should work better. ____________
	ID: 56023 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56027 - Posted: 16 Dec 2020 \| 18:10:19 UTC - in response to Message 56023.
	aaaand your 2080ti just completed a task and got credit with the new BOINC version. called it. http://www.gpugrid.net/result.php?resultid=31951281 ____________
	ID: 56027 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56028 - Posted: 16 Dec 2020 \| 18:13:53 UTC - in response to Message 56020.
	I'm willing to bet that if one of your tasks on that 1660ti system runs for ~30,000-40,000 seconds, it gets hit with 20 credits. ¯\_(ツ)_/¯ looks like just 25,000s was enough to trigger it. http://www.gpugrid.net/result.php?resultid=31946707 it'll even out over time, since your other credits are earning 2x as much credit as you should be since the old version of BOINC is doubling your peak_flops value. ____________
	ID: 56028 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 4,095,161,456 RAC: 6,807,679 Level Scientific publications	Message 56030 - Posted: 17 Dec 2020 \| 0:43:46 UTC
	After upgrading all the BOINC clients, the tasks are erroring out. Ugh. ____________ Reno, NV Team: SETI.USA
	ID: 56030 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56031 - Posted: 17 Dec 2020 \| 0:54:19 UTC - in response to Message 56030.
	they were working fine on your 2080ti system when you had 7.17.0. why change it? but the issue you're having now looks like the same issue that richard was dealing with here: https://www.gpugrid.net/forum_thread.php?id=5204 that thread has the steps they took to fix it. it's a permissions issue. ____________
	ID: 56031 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 4,095,161,456 RAC: 6,807,679 Level Scientific publications	Message 56033 - Posted: 17 Dec 2020 \| 4:47:44 UTC - in response to Message 56031.
	they were working fine on your 2080ti system when you had 7.17.0. why change it? but the issue you're having now looks like the same issue that richard was dealing with here: https://www.gpugrid.net/forum_thread.php?id=5204 that thread has the steps they took to fix it. it's a permissions issue. That was a kludge. There is no such thing as 7.17.0. =;^) Once I verified that the newer version worked, I updated all my machines with the latest repository version, so it would be clean and updated going forward. ____________ Reno, NV Team: SETI.USA
	ID: 56033 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56036 - Posted: 17 Dec 2020 \| 5:05:48 UTC - in response to Message 56033.
	There is such a thing. It’s the development branch. All of my systems use a version of BOINC based on 7.17.0 :) ____________
	ID: 56036 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 4,095,161,456 RAC: 6,807,679 Level Scientific publications	Message 56037 - Posted: 17 Dec 2020 \| 5:23:58 UTC
	Well sure. I meant a released version. ____________ Reno, NV Team: SETI.USA
	ID: 56037 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,738,265,233 RAC: 10,260,369 Level Scientific publications	Message 56046 - Posted: 18 Dec 2020 \| 11:24:17 UTC Last modified: 18 Dec 2020 \| 11:24:46 UTC
	So long start to end run times cause the 20 credit issue, not that they were restarted. But tasks that are interrupted cause them to restart at 0, thus having a longer start to end run time. 1070 or 1070Ti 27,656.18s received 1,316,998.40 42,652.74 received 20.83 1080Ti 21,508.23 received 1,694,500.25 25,133.86, 29,742.04, 38,297.41 tasks received 20.83 I doubt they were interrupted with the tasks being High Priority and nothing else but GPUGrid in the BOINC queue.
	ID: 56046 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56049 - Posted: 18 Dec 2020 \| 14:57:21 UTC - in response to Message 56046.
	yup I confirmed this. I manually restarted a task that didnt run very long and it didnt have the issue. the issue only happens if your credit reward will be greater than about 1.9 million. take some of your completed tasks, divide the total credit by the runtime seconds to figure how much credit you earn per second. then figure how many seconds you need to hit 1.9 million, and that's the runtime limit for your system, anything over that and you get the 20-credit bug ____________
	ID: 56049 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 4,095,161,456 RAC: 6,807,679 Level Scientific publications	Message 56148 - Posted: 24 Dec 2020 \| 15:33:20 UTC
	Why is the number of tasks in progress dwindling? Are no new tasks being issued? ____________ Reno, NV Team: SETI.USA
	ID: 56148 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56149 - Posted: 24 Dec 2020 \| 15:48:21 UTC - in response to Message 56148. Last modified: 24 Dec 2020 \| 15:49:07 UTC
	most of the Python tasks I've received in the last 3 days have been "_0", so that indicates brand new. and a few resends here and there. the rate in which they are creating them is likely slowed, and the demand is high since points chasers have come to try to snatch them up. also possible that the recent new (_0) ones are only recreations of earlier failed tasks that had some bug that needed fixing. it does seem that this run is concluding. ____________
	ID: 56149 \| Rating: 0 \| rate: / Reply Quote

trigggl Send message Joined: 6 Mar 09 Posts: 25 Credit: 102,324,681 RAC: 0 Level Scientific publications	Message 56151 - Posted: 25 Dec 2020 \| 16:41:49 UTC - in response to Message 55590.
	... Also Warnings about path not found: WARNING conda.core.envs_manager:register_env(50): Unable to register environment. Path not writable or missing. environment location: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda registry file: /root/.conda/environments.txt Registry file location ( /root/ ) will not be accessible to boinc user unless conda is already installed on the host (by root user) and conda file is world readable ... I had the same error message except that mine was trying to go to /opt/boinc/.conda/environments.txt
	ID: 56151 \| Rating: 0 \| rate: / Reply Quote

trigggl Send message Joined: 6 Mar 09 Posts: 25 Credit: 102,324,681 RAC: 0 Level Scientific publications	Message 56152 - Posted: 25 Dec 2020 \| 16:43:36 UTC - in response to Message 55590. Last modified: 25 Dec 2020 \| 16:59:59 UTC
	... Also Warnings about path not found: WARNING conda.core.envs_manager:register_env(50): Unable to register environment. Path not writable or missing. environment location: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda registry file: /root/.conda/environments.txt Registry file location ( /root/ ) will not be accessible to boinc user unless conda is already installed on the host (by root user) and conda file is world readable ... I had the same error message except that mine was trying to go to... /opt/boinc/.conda/environments.txt Looks harmless, thanks for reporting. It's because the "boinc" user doesn't have a HOME directory I think. Gentoo put the home for boinc at /opt/boinc. I updated the user file to change it to /var/lib/boinc.
	ID: 56152 \| Rating: 0 \| rate: / Reply Quote

ALAIN_13013 Send message Joined: 11 Sep 08 Posts: 18 Credit: 1,551,929,462 RAC: 61 Level Scientific publications	Message 56177 - Posted: 29 Dec 2020 \| 6:50:04 UTC - in response to Message 55588.
	I'm creating some experimental tasks for the Python app (made Beta). They are Linux and CUDA specific and serve in preparation for future batches. They may use a relatively large amount of disk space (order of 1-10 GB) which persists between runs, and is cleared if you reset the project. What type of card minimum for this app. My 980Ti don't load WU. ____________
	ID: 56177 \| Rating: 0 \| rate: / Reply Quote

rod4x4 Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level Scientific publications	Message 56181 - Posted: 29 Dec 2020 \| 13:08:38 UTC - in response to Message 56177. Last modified: 29 Dec 2020 \| 13:10:12 UTC
	I'm creating some experimental tasks for the Python app (made Beta). They are Linux and CUDA specific and serve in preparation for future batches. They may use a relatively large amount of disk space (order of 1-10 GB) which persists between runs, and is cleared if you reset the project. What type of card minimum for this app. My 980Ti don't load WU. In "GPUGRID Preferences", ensure you select "Python Runtime (beta)" and "Run test applications?" Your GPU, driver and OS should run these tasks fine
	ID: 56181 \| Rating: 0 \| rate: / Reply Quote

ALAIN_13013 Send message Joined: 11 Sep 08 Posts: 18 Credit: 1,551,929,462 RAC: 61 Level Scientific publications	Message 56182 - Posted: 29 Dec 2020 \| 13:32:58 UTC - in response to Message 56181. Last modified: 29 Dec 2020 \| 13:33:30 UTC
	I'm creating some experimental tasks for the Python app (made Beta). They are Linux and CUDA specific and serve in preparation for future batches. They may use a relatively large amount of disk space (order of 1-10 GB) which persists between runs, and is cleared if you reset the project. What type of card minimum for this app. My 980Ti don't load WU. In "GPUGRID Preferences", ensure you select "Python Runtime (beta)" and "Run test applications?" Your GPU, driver and OS should run these tasks fine Merci, I just forgot Run test applications :) ____________
	ID: 56182 \| Rating: 0 \| rate: / Reply Quote

jiipee Send message Joined: 4 Jun 15 Posts: 19 Credit: 8,511,945,439 RAC: 4,802,976 Level Scientific publications	Message 56183 - Posted: 29 Dec 2020 \| 13:35:30 UTC
	All of these seem now to error out after computation has finished. On several computers: <message> upload failure: <file_xfer_error> <file_name>2p95312000-RAIMIS_NNPMM-0-1-RND8920_1_0</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> </message> What causes this and how it can be fixed?
	ID: 56183 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56185 - Posted: 29 Dec 2020 \| 14:24:17 UTC - in response to Message 56183.
	What causes this and how it can be fixed? I've just posted instructions in the Anaconda Python 3 Environment v4.01 failures thread (Number Crunching). Read through the whole post. If you don't understand anything, or you don't know how to do any of the steps I've described - back away. Don't even attempt it until you're sure. You have to edit a very important, protected, file - and that needs care and experience.
	ID: 56185 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56186 - Posted: 29 Dec 2020 \| 14:33:52 UTC - in response to Message 56185.
	What causes this and how it can be fixed? I've just posted instructions in the Anaconda Python 3 Environment v4.01 failures thread (Number Crunching). Read through the whole post. If you don't understand anything, or you don't know how to do any of the steps I've described - back away. Don't even attempt it until you're sure. You have to edit a very important, protected, file - and that needs care and experience. really needs to be fixed server side (or would be nice if it were configurable via cc_config but that doesnt look to be the case either). stopping and starting the client is a recipe for instant errors, and where successful, this process will need to be repeated for every time you download new tasks. not really a viable option unless you want to babysit the system all day. ____________
	ID: 56186 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56187 - Posted: 29 Dec 2020 \| 14:45:32 UTC - in response to Message 56186.
	Stopping and starting the client is a recipe for instant errors, and where successful, this process will need to be repeated for every time you download new tasks. not really a viable option unless you want to babysit the system all day. By itself, it's fairly safe - provided you know and understand the software on your own system well enough. But you do need to have that experience and knowledge, which I why I put the caveats in. I agree about having to re-do it for every new task, but I'd like to get my APR back up to something reasonable - and I'm happy to help nudge the admins one more step along the way to a fully-working, 'set and forget', application.
	ID: 56187 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56189 - Posted: 29 Dec 2020 \| 16:39:50 UTC - in response to Message 56187.
	They're working on something... WU 26917726
	ID: 56189 \| Rating: 0 \| rate: / Reply Quote

jiipee Send message Joined: 4 Jun 15 Posts: 19 Credit: 8,511,945,439 RAC: 4,802,976 Level Scientific publications	Message 56208 - Posted: 31 Dec 2020 \| 8:59:22 UTC - in response to Message 56186.
	What causes this and how it can be fixed? I've just posted instructions in the Anaconda Python 3 Environment v4.01 failures thread (Number Crunching). Read through the whole post. If you don't understand anything, or you don't know how to do any of the steps I've described - back away. Don't even attempt it until you're sure. You have to edit a very important, protected, file - and that needs care and experience. *really needs to be fixed server side (or would be nice if it were configurable via cc_config but that doesnt look to be the case either).* stopping and starting the client is a recipe for instant errors, and where successful, this process will need to be repeated for every time you download new tasks. not really a viable option unless you want to babysit the system all day. Excaltly so. I don't know about others, but I have no time to sit and watch my hosts working. A host is working 10 hours to get the task done, and then everything turns out to be just a waste of time and energy because of this file size limitation. This is somewhat frustrating.
	ID: 56208 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56209 - Posted: 31 Dec 2020 \| 10:16:49 UTC - in response to Message 56208.
	Opt out of the Beta test programme if you don't want to encounter those problems. But as it happens, I haven't had a single over-run since they cancelled the one I highlighted in the post before yours.
	ID: 56209 \| Rating: 0 \| rate: / Reply Quote

jiipee Send message Joined: 4 Jun 15 Posts: 19 Credit: 8,511,945,439 RAC: 4,802,976 Level Scientific publications	Message 56210 - Posted: 31 Dec 2020 \| 12:02:22 UTC - in response to Message 56209.
	Opt out of the Beta test programme if you don't want to encounter those problems. But as it happens, I haven't had a single over-run since they cancelled the one I highlighted in the post before yours. Yes, I agree - something has changed. It looks like the last full time (successful) computation on my hosts that produced too large output file was WU 26900019, ended 29 Dec 2020 \| 15:00:52 UTC after 31,056 seconds of run time.
	ID: 56210 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56864 - Posted: 7 May 2021 \| 12:33:53 UTC Last modified: 7 May 2021 \| 12:46:38 UTC
	I see some new Python tasks have gone out. however they seem to be erroring for everyone. https://www.gpugrid.net/results.php?userid=552015&offset=0&show_names=0&state=0&appid=31 seems to always error with this "os" not defined error. GPU load 0% Environment Traceback (most recent call last): File "run.py", line 5, in <module> for key, value in os.environ.items(): NameError: name 'os' is not defined ____________
	ID: 56864 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56865 - Posted: 7 May 2021 \| 14:14:09 UTC - in response to Message 56864. Last modified: 7 May 2021 \| 14:16:10 UTC
	now seeing this: ==> WARNING: A newer version of conda exists. <== current version: 4.8.3 latest version: 4.10.1 Please update conda by running $ conda update -n base -c defaults conda 10:07:30 (341141): /usr/bin/flock exited; CPU time 42.091445 application ./gpugridpy/bin/python missing and this: 09:57:32 (340085): wrapper (7.7.26016): starting [input.zip] End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of input.zip or input.zip.zip, and cannot find input.zip.ZIP, period. boinc_unzip() error: 9 ____________
	ID: 56865 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56866 - Posted: 7 May 2021 \| 14:30:42 UTC - in response to Message 56865.
	just had my first two successful completions. doesn't look like it ran any GPU work though, the GPU was never loaded. just unpacked the WU, ran the setup. then exited. marked as complete with no error. only ran for about 45 seconds. https://www.gpugrid.net/result.php?resultid=32570561 ____________
	ID: 56866 \| Rating: 0 \| rate: / Reply Quote

klepel Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,727,470,259 RAC: 1,163,025 Level Scientific publications	Message 56867 - Posted: 7 May 2021 \| 15:04:46 UTC - in response to Message 56866.
	just had my first two successful completions. doesn't look like it ran any GPU work though, the GPU was never loaded. just unpacked the WU, ran the setup. then exited. marked as complete with no error. only ran for about 45 seconds. https://www.gpugrid.net/result.php?resultid=32570561 Did you have to up-date conda for the two successful tasks? I received a few new WUs but all errored. I will not have access to this computer until tomorrow.
	ID: 56867 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56868 - Posted: 7 May 2021 \| 15:09:16 UTC - in response to Message 56867.
	just had my first two successful completions. doesn't look like it ran any GPU work though, the GPU was never loaded. just unpacked the WU, ran the setup. then exited. marked as complete with no error. only ran for about 45 seconds. https://www.gpugrid.net/result.php?resultid=32570561 Did you have to up-date conda for the two successful tasks? I received a few new WUs but all errored. I will not have access to this computer until tomorrow. I didnt make any changes to my system between failed tasks and successful tasks. AFAIK the project is sending conda packaged into these WUs so it doesn't matter what you have installed, it contains everything you should need. looks like testrun93+ ish are OK, but test runs in the 80s and lower all fail with some form of error like the errors I listed above. ____________
	ID: 56868 \| Rating: 0 \| rate: / Reply Quote

jiipee Send message Joined: 4 Jun 15 Posts: 19 Credit: 8,511,945,439 RAC: 4,802,976 Level Scientific publications	Message 56874 - Posted: 19 May 2021 \| 12:03:14 UTC Last modified: 19 May 2021 \| 12:05:01 UTC
	All of these Python WU's seem to fail. A pair of examples with different problems: http://www.gpugrid.net/result.php?resultid=32583864 http://www.gpugrid.net/result.php?resultid=32583210
	ID: 56874 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56875 - Posted: 19 May 2021 \| 12:39:26 UTC - in response to Message 56874.
	some succeed. but very few. out of the 94 python tasks i've received recently. only 4 of them succeeded. ____________
	ID: 56875 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56876 - Posted: 19 May 2021 \| 15:11:55 UTC
	i see some new tasks going out. still broken. https://www.gpugrid.net/result.php?resultid=32584011 11:06:39 (1387708): /usr/bin/flock exited; CPU time 281.233647 11:06:39 (1387708): wrapper: running ./gpugridpy/bin/python (run.py) WARNING: ray 1.3.0 does not provide the extra 'debug' ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow 2.4.1 requires flatbuffers~=1.12.0, but you have flatbuffers 20210226132247 which is incompatible. tensorflow 2.4.1 requires gast==0.3.3, but you have gast 0.4.0 which is incompatible. tensorflow 2.4.1 requires grpcio~=1.32.0, but you have grpcio 1.36.1 which is incompatible. tensorflow 2.4.1 requires opt-einsum~=3.3.0, but you have opt-einsum 3.1.0 which is incompatible. /home/icrum/BOINC/slots/41/gpugridpy/lib/python3.7/site-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command. "update your install command.", FutureWarning) Traceback (most recent call last): File "run.py", line 296, in <module> main() File "run.py", line 35, in main args = get_args() File "run.py", line 283, in get_args config_file = open(config_path, 'rt', encoding='utf8') FileNotFoundError: [Errno 2] No such file or directory: '/home/icrum/BOINC/slots/41/data/conf.yaml' 11:07:04 (1387708): ./gpugridpy/bin/python exited; CPU time 20.831556 11:07:04 (1387708): app exit status: 0x1 11:07:04 (1387708): called boinc_finish(195) ____________
	ID: 56876 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 581 Credit: 9,991,975,256 RAC: 18,889,125 Level Scientific publications	Message 56878 - Posted: 19 May 2021 \| 17:57:38 UTC - in response to Message 56875.
	some succeed. but very few. out of the 94 python tasks i've received recently. only 4 of them succeeded. 65 received / 64 errored / 1 successful is my current balance
	ID: 56878 \| Rating: 0 \| rate: / Reply Quote

Greger Send message Joined: 6 Jan 15 Posts: 76 Credit: 24,369,033,523 RAC: 16,483,368 Level Scientific publications	Message 56879 - Posted: 19 May 2021 \| 21:08:09 UTC
	204 failed with 5 succeded. Got one that was running for a while but got runtimerror RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` https://www.gpugrid.net/result.php?resultid=32584418 I did reading at boinc discord today that MLC@Home also testing pytorch and looks like it cause some issues. PyTorch uses SIGARLM internally, which seems to conflict with libboinc API's usage of SIGALRM. I hope Toni would get this working soon it looks to be complex setup.
	ID: 56879 \| Rating: 0 \| rate: / Reply Quote

Greger Send message Joined: 6 Jan 15 Posts: 76 Credit: 24,369,033,523 RAC: 16,483,368 Level Scientific publications	Message 56880 - Posted: 20 May 2021 \| 15:34:27 UTC
	Most off task for Anaconda Python 3 worked well today. Some changes have been made. e1a1-ABOU_testzip13-0-1-RND2694_0 an higher appears to be good.
	ID: 56880 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56881 - Posted: 20 May 2021 \| 16:37:34 UTC
	It seems that these Python tasks are being used to train some kind of AI/Machine Learning model. can any of the admins or researchers comment on this? I'd like to know more about the work being done. ____________
	ID: 56881 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56882 - Posted: 20 May 2021 \| 16:59:41 UTC - in response to Message 56880.
	Most off task for Anaconda Python 3 worked well today. Some changes have been made. e1a1-ABOU_testzip13-0-1-RND2694_0 an higher appears to be good. side-note: you should set no new tasks or remove GPUGRID from your RTX 30-series hosts. the applications here do not work with RTX 30-series Ampere cards and always produce errors. ____________
	ID: 56882 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 56883 - Posted: 21 May 2021 \| 1:06:02 UTC
	Looks like he only let one acemd3 task slip through to an Ampere card. I don't think the Python tasks care much about the gpu architecture. If the tasks are formatted correctly they appear to run fine on Ampere cards.
	ID: 56883 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56884 - Posted: 21 May 2021 \| 4:50:05 UTC - in response to Message 56883.
	Looks like he only let one acemd3 task slip through to an Ampere card. I don't think the Python tasks care much about the gpu architecture. If the tasks are formatted correctly they appear to run fine on Ampere cards. They care. They are still CUDA 10.0. And were compiled without the proper configurations for ampere. They will all still fail under an Ampere card. The Python tasks they’ve been pushing out recently never actually run any work on the GPU. They do a little bit of CPU processing and then complete or error. Even the few that succeed never touch the GPU. ____________
	ID: 56884 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56926 - Posted: 2 Jun 2021 \| 20:58:15 UTC Last modified: 2 Jun 2021 \| 21:30:03 UTC
	I see a bunch of Python tasks went out again. I allowed my hosts to pick up one. I don't have high hopes for it though. it's a _6 already and constant errors from all the hosts before. so I'm expecting it'll fail as well. anyone have a successful run? Maybe an admin comment on why they keep sending out tasks that mostly fail and never seem to use the GPU? -edit- I was right, the Python task failed right around 2mins, never ran anything on the GPU. It's like they aren't even bothering to test that these tasks will fail before sending them out. ____________
	ID: 56926 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,738,265,233 RAC: 10,260,369 Level Scientific publications	Message 56927 - Posted: 2 Jun 2021 \| 21:37:13 UTC Last modified: 2 Jun 2021 \| 21:38:07 UTC
	All junk for me. None have completed. Pretty sure some have before for me. All around 525-530 seconds. Nice ETA of 646 days so BOINC freaks out. CPU usage reported in BOINCTasks goes up to 4 threads worth before leveling off a bit. BOINC only reports CPU time = run time even though that doesn't match what I see. Run time is half of what is reported
	ID: 56927 \| Rating: 0 \| rate: / Reply Quote

trigggl Send message Joined: 6 Mar 09 Posts: 25 Credit: 102,324,681 RAC: 0 Level Scientific publications	Message 56934 - Posted: 6 Jun 2021 \| 13:49:38 UTC
	A have 3 of these valid over the past couple days. None of them used the GPU. Did they complete any work? https://www.gpugrid.net/result.php?resultid=32619357
	ID: 56934 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,738,265,233 RAC: 10,260,369 Level Scientific publications	Message 56935 - Posted: 6 Jun 2021 \| 18:51:11 UTC
	This one worked for me after that same PC failed earlier in the week https://www.gpugrid.net/result.php?resultid=32619337
	ID: 56935 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56936 - Posted: 6 Jun 2021 \| 19:07:10 UTC - in response to Message 56934.
	A have 3 of these valid over the past couple days. None of them used the GPU. Did they complete any work? https://www.gpugrid.net/result.php?resultid=32619357 I agree. It’s weird that these tasks are marked as being a GPU task with CUDA10.0, makes the GPU otherwise unavailable for other tasks in BOINC, yet they never touch the GPU. According to the stderr.txt, they seem to spend most of their time extracting and installing packages, then does “something” for a few seconds and completes. It’s obvious that they are exploring some kind of machine learning approach based on the packages used (pytorch, tensorflow, etc) and references to model training. Maybe they are still working out how to properly package the WUs so they have the right configuration for future real tasks. Would be cool to hear what they are actually trying to accomplish with these tasks. ____________
	ID: 56936 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,511,497,676 RAC: 26,098,279 Level Scientific publications	Message 56938 - Posted: 9 Jun 2021 \| 4:46:24 UTC - in response to Message 56936.
	Would be cool to hear what they are actually trying to accomplish with these tasks. I guess you will never hear any details from them. As we know, the GPUGRID people are very taciturn on everything.
	ID: 56938 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 581 Credit: 9,991,975,256 RAC: 18,889,125 Level Scientific publications	Message 56939 - Posted: 9 Jun 2021 \| 5:48:30 UTC - in response to Message 56938. Last modified: 9 Jun 2021 \| 5:49:21 UTC
	Would be cool to hear what they are actually trying to accomplish with these tasks. I guess you will never hear any details from them. As we know, the GPUGRID people are very taciturn on everything. In other times, when the Gpugrid Project run smoothly, they used to be more polite by returning some feedback to contributors. I guess that there must be heavy reasons for this current lack of communication.
	ID: 56939 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 56940 - Posted: 10 Jun 2021 \| 10:12:43 UTC - in response to Message 56939.
	For the time being we are perfecting the WU machinery so to support ML packages + CUDA. All tasks are linux beta for now. Thanks!
	ID: 56940 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 581 Credit: 9,991,975,256 RAC: 18,889,125 Level Scientific publications	Message 56941 - Posted: 10 Jun 2021 \| 10:28:56 UTC - in response to Message 56940.
	Thank you for this pearl! Nice to know that everything is going on...
	ID: 56941 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56942 - Posted: 10 Jun 2021 \| 12:59:40 UTC - in response to Message 56940.
	For the time being we are perfecting the WU machinery so to support ML packages + CUDA. All tasks are linux beta for now. Thanks! Thanks, Toni. Can you explain why these tasks are not using the GPU at all? they only run on the CPU. GPU utilization stays at 0% ____________
	ID: 56942 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 56943 - Posted: 11 Jun 2021 \| 17:19:49 UTC
	I would like to know whether we are supposed to do the things requested in the output file. Things like updating the various packages that are called out. Or are we supposed to do nothing and let the app/task packagers sort it out before generation?
	ID: 56943 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56944 - Posted: 11 Jun 2021 \| 17:44:37 UTC - in response to Message 56943.
	I would like to know whether we are supposed to do the things requested in the output file. Things like updating the various packages that are called out. Or are we supposed to do nothing and let the app/task packagers sort it out before generation? I'm relatively sure these tasks are sandboxed. the packages being referenced are part of the whole WU (tensorflow). they are installed by the extraction phase in the beginning of the WU. if you check your system you will find that you do not have tensorflow installed most likely. the package updates need to happen on the project side before distribution to us. ____________
	ID: 56944 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 56945 - Posted: 11 Jun 2021 \| 18:23:02 UTC - in response to Message 56944.
	I wonder if I should add the project to my Nvidia Nano. It has Tensorflow installed by default in the distro. But I wonder if the app would even run on the Maxwell card even though it is mainly a cpu application for the time being and never touches the gpu it seems.
	ID: 56945 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56946 - Posted: 11 Jun 2021 \| 18:53:07 UTC - in response to Message 56945. Last modified: 11 Jun 2021 \| 18:54:07 UTC
	I wonder if I should add the project to my Nvidia Nano. It has Tensorflow installed by default in the distro. But I wonder if the app would even run on the Maxwell card even though it is mainly a cpu application for the time being and never touches the gpu it seems. you can try, but I don't think it'll run because of the ARM CPU. there's no app for that here. ____________
	ID: 56946 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 56947 - Posted: 11 Jun 2021 \| 19:15:29 UTC - in response to Message 56946.
	Ah, yes . . . . forgot about that small matter . . . .
	ID: 56947 \| Rating: 0 \| rate: / Reply Quote

trigggl Send message Joined: 6 Mar 09 Posts: 25 Credit: 102,324,681 RAC: 0 Level Scientific publications	Message 56949 - Posted: 12 Jun 2021 \| 17:09:44 UTC - in response to Message 56944.
	I would like to know whether we are supposed to do the things requested in the output file. Things like updating the various packages that are called out. Or are we supposed to do nothing and let the app/task packagers sort it out before generation? I'm relatively sure these tasks are sandboxed. the packages being referenced are part of the whole WU (tensorflow). they are installed by the extraction phase in the beginning of the WU. if you check your system you will find that you do not have tensorflow installed most likely. the package updates need to happen on the project side before distribution to us. Furthermore, I checked on my Gentoo system what it would take to install Tensorflow. The only vesion available to me required python 3.8. I didn't even bother to check it out because my system is using python 3.9 stable. Things may become easier with the app if they are able to upgrade to python 3.8. I don't know how this will work with python 3.7. Is it just Gentoo taking the 3.7 option away? emerge -v1p tensorflow These are the packages that would be merged, in order: Calculating dependencies ..... !!! Problem resolving dependencies for sci-libs/tensorflow ... done! !!! The ebuild selected to satisfy "tensorflow" has unmet requirements. - sci-libs/tensorflow-2.4.0::gentoo USE="cuda python -mpi -xla" ABI_X86="(64)" CPU_FLAGS_X86="avx avx2 fma3 sse sse2 sse3 sse4_1 sse4_2 -fma4" PYTHON_TARGETS="-python3_8" The following REQUIRED_USE flag constraints are unsatisfied: python? ( python_targets_python3_8 )
	ID: 56949 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 56950 - Posted: 12 Jun 2021 \| 21:38:59 UTC
	My Ubuntu 20.04.2 LTS distro has Python 3.8.5 installed so should satisfy the tensorflow requirements. I'm curious enough to experiment and install tensorflow to see if the tasks will actually do something other than unpack the packages.
	ID: 56950 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56958 - Posted: 15 Jun 2021 \| 11:11:25 UTC
	I was trying to catch some ACEMD tasks to test the oversized upload file report, but got a block of Pythons instead. http://www.gpugrid.net/result.php?resultid=32623625 Can anyone advise on the multitude of gcc failures, starting with gcc: fatal error: cannot execute âcc1plusâ: execvp: No such file or directory compilation terminated. Machine is Linux Mint 20.1, freshly updated today (including BOINC v7.16.17, which is an auto-build test for the Mac release last week - not relevant here, but useful to keep an eye on to make sure they haven't broken anything else). I have a couple of spare tasks suspended - I'll look through the actual runtime packaging to see what they're trying to achieve.
	ID: 56958 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 56960 - Posted: 15 Jun 2021 \| 13:58:15 UTC - in response to Message 56958.
	Richard, I haven't had any GCC errors with any of the Python tasks on my hosts. Never see it invoked. Just see lots of unpacking and attempts to run other ML programs.
	ID: 56960 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56962 - Posted: 15 Jun 2021 \| 14:54:08 UTC - in response to Message 56960.
	Interestingly, the task that failed to run gcc was re-issued to a computer on ServicEnginIC's account - and ran successfully. That gives me a completely different stderr_txt file to compare with mine. I'll make a permanent copy of both for reference, and try to work out what went wrong.
	ID: 56962 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 581 Credit: 9,991,975,256 RAC: 18,889,125 Level Scientific publications	Message 56963 - Posted: 15 Jun 2021 \| 15:27:38 UTC - in response to Message 56962.
	Interestingly, the task that failed to run gcc was re-issued to a computer on ServicEnginIC's account - and ran successfully. That gives me a completely different stderr_txt file to compare with mine. I'll make a permanent copy of both for reference, and try to work out what went wrong. I remember that I applied to all my hosts a kind remedy suggested by you at your message #55967. I related it at message #55986 Thank you again.
	ID: 56963 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56964 - Posted: 15 Jun 2021 \| 15:48:43 UTC - in response to Message 56963.
	Thanks for the kind words. Yes, that's necessary, but not sufficient. My host 132158 got a block of four tasks when I re-enabled work fetch this morning. The first task I ran - ID _621 - got [1008937] INTERNAL ERROR: cannot create temporary directory! 11:23:17 (1008908): /usr/bin/flock exited; CPU time 0.132604 - that same old problem. I stopped the machine, did a full update and restart, and verified that the new .service file had the fix for that bug. Then I fired off task ID _625 - that's the one with the cpp errors. Unfortunately, we only get the last 64 KB of the file, and it's not enough in this case - we can't see what stage it's reached. But since the first task only ran 3 seconds, and the second lasted for 190 seconds, I assume we fell at the second hurdle. My second Linux machine has just picked up two of the ADRIA tasks I was hunting for - I'll sort those out next.
	ID: 56964 \| Rating: 0 \| rate: / Reply Quote

valterc Send message Joined: 21 Jun 10 Posts: 21 Credit: 8,728,689,672 RAC: 25,354,237 Level Scientific publications	Message 56975 - Posted: 16 Jun 2021 \| 12:11:07 UTC - in response to Message 56958.
	I was trying to catch some ACEMD tasks to test the oversized upload file report, but got a block of Pythons instead. http://www.gpugrid.net/result.php?resultid=32623625 Can anyone advise on the multitude of gcc failures, starting with gcc: fatal error: cannot execute âcc1plusâ: execvp: No such file or directory compilation terminated. Machine is Linux Mint 20.1, freshly updated today (including BOINC v7.16.17, which is an auto-build test for the Mac release last week - not relevant here, but useful to keep an eye on to make sure they haven't broken anything else). I have a couple of spare tasks suspended - I'll look through the actual runtime packaging to see what they're trying to achieve. I got a similar error some time ago. A memory module was faulty, started to get segmentation fault errors. Eventually my compiling environment (gcc etc.) became messed up. Solved the situation by removing the bad module and completely reinstalling the compiling environment. What I might suggest to do is try to verify if gcc/g++ are actually working, by compiling something of your choice.
	ID: 56975 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56980 - Posted: 17 Jun 2021 \| 16:29:54 UTC
	Finally got time to google my gcc error. Simples: turns out the app requires g++, and it's not installed by default on Ubuntu - and, one assumes, derivatives like mine. All it needed was sudo apt-get install g++ No restart needed, of either BOINC or Linux, and task 32623619 completed successfully. No much sign of any checkpointing: one update at 10%, then nothing until the end.
	ID: 56980 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 56981 - Posted: 17 Jun 2021 \| 16:40:25 UTC - in response to Message 56980.
	Finally got time to google my gcc error. Simples: turns out the app requires g++, and it's not installed by default on Ubuntu - and, one assumes, derivatives like mine. Hummm... I have run a few on Ubuntu 20.04.2 and did not do anything special, unless maybe something else I was working on required it. http://www.gpugrid.net/results.php?hostid=452287
	ID: 56981 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56982 - Posted: 17 Jun 2021 \| 17:47:38 UTC - in response to Message 56980. Last modified: 17 Jun 2021 \| 17:50:31 UTC
	Finally got time to google my gcc error. Simples: turns out the app requires g++, and it's not installed by default on Ubuntu - and, one assumes, derivatives like mine. All it needed was sudo apt-get install g++ No restart needed, of either BOINC or Linux, and task 32623619 completed successfully. No much sign of any checkpointing: one update at 10%, then nothing until the end. I think this just might be your distribution. I never installed this (specifically) on my Ubuntu 20.04 systems. if it's there, it was there by default or through some other package as a dependency. ____________
	ID: 56982 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 9,013,493,931 RAC: 16,193,293 Level Scientific publications	Message 56983 - Posted: 17 Jun 2021 \| 18:20:34 UTC - in response to Message 56982.
	This was a fairly recent (February 2021) clean installation of Linux Mint 20.1 'Ulyssa' - I decided to throw away my initial fumblings with Mint 19.1, and start afresh. So, let this be a warning: not every distro is as complete as you might expect. Anyway, the solution is in public now, in case anyone else needs it.
	ID: 56983 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 56999 - Posted: 22 Jun 2021 \| 12:11:05 UTC
	errors on the Python tasks again. https://www.gpugrid.net/result.php?resultid=32626713 ____________
	ID: 56999 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 57001 - Posted: 22 Jun 2021 \| 15:39:11 UTC - in response to Message 56999.
	errors on the Python tasks again. I see them too. http://www.gpugrid.net/results.php?hostid=452287 UnsatisfiableError: The following specifications were found to be incompatible with each other: That will give them something to work on.
	ID: 57001 \| Rating: 0 \| rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 762,585,692 RAC: 492,038 Level Scientific publications	Message 57006 - Posted: 23 Jun 2021 \| 0:51:34 UTC
	I'm now using my GPU preferably for tasks related to medical research. Could you mention whether the Python tasks are related to medical research and whether they use the GPU?
	ID: 57006 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1074 Credit: 40,231,533,983 RAC: 161 Level Scientific publications	Message 57007 - Posted: 23 Jun 2021 \| 3:07:49 UTC - in response to Message 57006.
	I'm now using my GPU preferably for tasks related to medical research. Could you mention whether the Python tasks are related to medical research and whether they use the GPU? right now these Python tasks are using machine learning to do what we assume is some medical kind of research, but the admins haven't given many specifics on exactly how or what type of medical research specifically is being done. GPUGRID as a whole does various types of medical research. see more info about it in the other thread here: https://www.gpugrid.net/forum_thread.php?id=5233 the tasks are labelled as being GPU tasks, and they do otherwise reserve the GPU in BOINC (ie, other tasks wont run on it), however in reality the GPU is not actually used. it sits idle and all the computation happens on the CPU thread that's assigned to the job. the admins have stated that these early tasks are still in testing and they will use the GPU in the future. but right now they don't. the other thing to keep in mind, the Python application is Linux-only (at least right now). you wont be able to get these tasks on your Windows system. ____________
	ID: 57007 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1352 Credit: 7,771,365,518 RAC: 10,389,448 Level Scientific publications	Message 57036 - Posted: 29 Jun 2021 \| 22:56:36 UTC
	Just finished a new Python task that didn't error out. Hope this is the start of a trend.
	ID: 57036 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : News : Experimental Python tasks (beta)

	About	Science	Volunteers	Performance	Forum	Join us	Donate