Experimental Python tasks (beta)

Author	Message
zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,616,860,456 RAC: 6,599 Level Scientific publications	Message 56016 - Posted: 15 Dec 2020, 18:57:24 UTC - in response to Message 56015. you also appear to have your hosts setup to ONLY crunch these beta tasks. is there a reason for that? I have reached my wuprop goals for the other apps. So I am interested in only this particular app (for now). does your system process the normal tasks fine? maybe it's something going on with your system as a whole. Yep, all the other apps run fine, both here and on other projects. Reno, NV Team: SETI.USA ID: 56016 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 6,618 Level Scientific publications	Message 56017 - Posted: 15 Dec 2020, 20:40:18 UTC - in response to Message 56016. Last modified: 15 Dec 2020, 21:09:19 UTC you also appear to have your hosts setup to ONLY crunch these beta tasks. is there a reason for that? I have reached my wuprop goals for the other apps. So I am interested in only this particular app (for now). does your system process the normal tasks fine? maybe it's something going on with your system as a whole. Yep, all the other apps run fine, both here and on other projects. I have a theory, but not sure if it's correct or not. can you tell me the peak_flops value reported in your coproc_info.xml file for the 2080ti? basically, since you are using such an old version of BOINC (7.9.3) which pre-dates the fixes implemented in 7.14.2 to properly calculate the peak flops of Turing cards. So I'm willing to bet that your version of BOINC is over-estimating your peak flops by a factor of 2. a 2080ti should read somewhere between 13.5 and 15 TFlops, and I'm guessing your old version of BOINC is thinking it's closer to double that (25-30 TFlops) the second half of the theory is that there is some kind of hard limit (maybe an anti-cheat mechanism?) that prevents a credit reward somewhere around >2,000,000. maybe 1.8million, maybe 1.9million? but I haven't observed ANYONE getting a task earning that much, and all tasks that would reach that level based on runtime seem to get this 20-credit value. thats my theory, i could be wrong. if you try a newer version of boinc that properly measures the flops on a turing card, and you start getting real credit, then it might hold water. ID: 56017 · Rating: 0 · rate: / Reply Quote

sph Send message Joined: 22 Oct 20 Posts: 4 Credit: 34,434,982 RAC: 0 Level Scientific publications	Message 56018 - Posted: 15 Dec 2020, 23:13:08 UTC - in response to Message 56007. Last modified: 15 Dec 2020, 23:15:51 UTC Two outstanding issues are over-crediting (I am using some default BOINC formula) and, as far as i understand, the flops estimate (?). Toni, One more issue to add to the list. The download from Ananconda website does not allow for hosts behind a proxy. Can you please add a check for Proxy settings in the BOINC client so external software can be downloaded? I have other hosts that are not behind a proxy and they download and run the Experimental tasks fine. Issue here: CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2> Elapsed: - An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way. This error repeats itself until it eventually gives up after 5 minutes and fails the task. Happens on 2 hosts sitting behind a Web Proxy (Squid) ID: 56018 · Rating: 0 · rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,616,860,456 RAC: 6,599 Level Scientific publications	Message 56019 - Posted: 16 Dec 2020, 1:19:31 UTC - in response to Message 56017. A second, identical machine, except it has dual RTX 1660 Ti cards, finally got some work. The tasks reported and were awarded the large credits. So that rules out the question WRT BOINC version. FWIW, that version of BOINC is the latest available from the repository. So maybe it is due to interruptions after all, and I am just unaware? I am running some more tasks now, and will check again in the morning. Reno, NV Team: SETI.USA ID: 56019 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 6,618 Level Scientific publications	Message 56020 - Posted: 16 Dec 2020, 2:57:24 UTC - in response to Message 56019. Last modified: 16 Dec 2020, 3:01:29 UTC A second, identical machine, except it has dual RTX 1660 Ti cards, finally got some work. The tasks reported and were awarded the large credits. So that rules out the question WRT BOINC version. FWIW, that version of BOINC is the latest available from the repository. So maybe it is due to interruptions after all, and I am just unaware? I am running some more tasks now, and will check again in the morning. it doesnt rule it out because a 1660ti has a much lower flops value. like 5.5 TFlop. so with the old boinc version, it's estimating ~11TFlop and that's not high enough to trigger the issue. you're only seeing it on the 2080ti because it's a much higher performing card. ~14TFlop by default, and the old boinc version is scaling it all the way up to 28+ TFlop. this causes the calculated credit to be MUCH higher than that of the 1660ti, and hence triggering the 20-cred issue, according to my theory of course. but your 1660ti tasks are well below the 2,000,000 credit threshold that i'm estimating. highest i've seen is ~1.7million, so the line cant be much higher. I'm willing to bet that if one of your tasks on that 1660ti system runs for ~30,000-40,000 seconds, it gets hit with 20 credits. ¯\_(ツ)_/¯ you really should try to get your hands on a newer version of BOINC. I use a version of BOINC that was compiled custom, and have usually used custom compiled versions from newer versions of the source code. maybe one of the other guys here can point you to a different repository that has a newer version of BOINC that can properly manage the Turing cards. ID: 56020 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 6,618 Level Scientific publications	Message 56021 - Posted: 16 Dec 2020, 3:13:29 UTC - in response to Message 56020. i also verified that restarting ALONE, wont necessarily trigger the 20-credit reward. it depends WHEN you restart it. if you restart the task early, early enough that the combined runtime wont reach a point where you wont come close to the 2mil credit mark, you'll get the normal points this task here: https://www.gpugrid.net/result.php?resultid=31934720 I restarted this task about 10-15mins into it. and it started over from the 10% mark, ran to completion, and still got normal crediting. and well below the threshold. ID: 56021 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 6,618 Level Scientific publications	Message 56023 - Posted: 16 Dec 2020, 14:36:25 UTC - in response to Message 56019. A second, identical machine, except it has dual RTX 1660 Ti cards, finally got some work. The tasks reported and were awarded the large credits. So that rules out the question WRT BOINC version. FWIW, that version of BOINC is the latest available from the repository. So maybe it is due to interruptions after all, and I am just unaware? I am running some more tasks now, and will check again in the morning. i see you changed BOINC to 7.17.0. another thing I noticed was that the change in tasks didnt take effect until new tasks were downloaded after the change, so tasks that were already there and tagged with the overinflated flops value will probably still get 20-cred. only the newly downloaded tasks after the change should work better. ID: 56023 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 6,618 Level Scientific publications	Message 56027 - Posted: 16 Dec 2020, 18:10:19 UTC - in response to Message 56023. aaaand your 2080ti just completed a task and got credit with the new BOINC version. called it. http://www.gpugrid.net/result.php?resultid=31951281 ID: 56027 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 6,618 Level Scientific publications	Message 56028 - Posted: 16 Dec 2020, 18:13:53 UTC - in response to Message 56020. I'm willing to bet that if one of your tasks on that 1660ti system runs for ~30,000-40,000 seconds, it gets hit with 20 credits. ¯\_(ツ)_/¯ looks like just 25,000s was enough to trigger it. http://www.gpugrid.net/result.php?resultid=31946707 it'll even out over time, since your other credits are earning 2x as much credit as you should be since the old version of BOINC is doubling your peak_flops value. ID: 56028 · Rating: 0 · rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,616,860,456 RAC: 6,599 Level Scientific publications	Message 56030 - Posted: 17 Dec 2020, 0:43:46 UTC After upgrading all the BOINC clients, the tasks are erroring out. Ugh. Reno, NV Team: SETI.USA ID: 56030 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 6,618 Level Scientific publications	Message 56031 - Posted: 17 Dec 2020, 0:54:19 UTC - in response to Message 56030. they were working fine on your 2080ti system when you had 7.17.0. why change it? but the issue you're having now looks like the same issue that richard was dealing with here: https://www.gpugrid.net/forum_thread.php?id=5204 that thread has the steps they took to fix it. it's a permissions issue. ID: 56031 · Rating: 0 · rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,616,860,456 RAC: 6,599 Level Scientific publications	Message 56033 - Posted: 17 Dec 2020, 4:47:44 UTC - in response to Message 56031. they were working fine on your 2080ti system when you had 7.17.0. why change it? but the issue you're having now looks like the same issue that richard was dealing with here: https://www.gpugrid.net/forum_thread.php?id=5204 that thread has the steps they took to fix it. it's a permissions issue. That was a kludge. There is no such thing as 7.17.0. =;^) Once I verified that the newer version worked, I updated all my machines with the latest repository version, so it would be clean and updated going forward. Reno, NV Team: SETI.USA ID: 56033 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 6,618 Level Scientific publications	Message 56036 - Posted: 17 Dec 2020, 5:05:48 UTC - in response to Message 56033. There is such a thing. It’s the development branch. All of my systems use a version of BOINC based on 7.17.0 :) ID: 56036 · Rating: 0 · rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,616,860,456 RAC: 6,599 Level Scientific publications	Message 56037 - Posted: 17 Dec 2020, 5:23:58 UTC Well sure. I meant a released version. Reno, NV Team: SETI.USA ID: 56037 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 339 Credit: 7,990,341,558 RAC: 69 Level Scientific publications	Message 56046 - Posted: 18 Dec 2020, 11:24:17 UTC Last modified: 18 Dec 2020, 11:24:46 UTC So long start to end run times cause the 20 credit issue, not that they were restarted. But tasks that are interrupted cause them to restart at 0, thus having a longer start to end run time. 1070 or 1070Ti 27,656.18s received 1,316,998.40 42,652.74 received 20.83 1080Ti 21,508.23 received 1,694,500.25 25,133.86, 29,742.04, 38,297.41 tasks received 20.83 I doubt they were interrupted with the tasks being High Priority and nothing else but GPUGrid in the BOINC queue. ID: 56046 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 6,618 Level Scientific publications	Message 56049 - Posted: 18 Dec 2020, 14:57:21 UTC - in response to Message 56046. yup I confirmed this. I manually restarted a task that didnt run very long and it didnt have the issue. the issue only happens if your credit reward will be greater than about 1.9 million. take some of your completed tasks, divide the total credit by the runtime seconds to figure how much credit you earn per second. then figure how many seconds you need to hit 1.9 million, and that's the runtime limit for your system, anything over that and you get the 20-credit bug ID: 56049 · Rating: 0 · rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,616,860,456 RAC: 6,599 Level Scientific publications	Message 56148 - Posted: 24 Dec 2020, 15:33:20 UTC Why is the number of tasks in progress dwindling? Are no new tasks being issued? Reno, NV Team: SETI.USA ID: 56148 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 6,618 Level Scientific publications	Message 56149 - Posted: 24 Dec 2020, 15:48:21 UTC - in response to Message 56148. Last modified: 24 Dec 2020, 15:49:07 UTC most of the Python tasks I've received in the last 3 days have been "_0", so that indicates brand new. and a few resends here and there. the rate in which they are creating them is likely slowed, and the demand is high since points chasers have come to try to snatch them up. also possible that the recent new (_0) ones are only recreations of earlier failed tasks that had some bug that needed fixing. it does seem that this run is concluding. ID: 56149 · Rating: 0 · rate: / Reply Quote

trigggl Send message Joined: 6 Mar 09 Posts: 25 Credit: 102,324,681 RAC: 0 Level Scientific publications	Message 56151 - Posted: 25 Dec 2020, 16:41:49 UTC - in response to Message 55590. ... Also Warnings about path not found: WARNING conda.core.envs_manager:register_env(50): Unable to register environment. Path not writable or missing. environment location: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda registry file: /root/.conda/environments.txt Registry file location ( /root/ ) will not be accessible to boinc user unless conda is already installed on the host (by root user) and conda file is world readable ... I had the same error message except that mine was trying to go to /opt/boinc/.conda/environments.txt ID: 56151 · Rating: 0 · rate: / Reply Quote

trigggl Send message Joined: 6 Mar 09 Posts: 25 Credit: 102,324,681 RAC: 0 Level Scientific publications	Message 56152 - Posted: 25 Dec 2020, 16:43:36 UTC - in response to Message 55590. Last modified: 25 Dec 2020, 16:59:59 UTC ... Also Warnings about path not found: WARNING conda.core.envs_manager:register_env(50): Unable to register environment. Path not writable or missing. environment location: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda registry file: /root/.conda/environments.txt Registry file location ( /root/ ) will not be accessible to boinc user unless conda is already installed on the host (by root user) and conda file is world readable ... I had the same error message except that mine was trying to go to... /opt/boinc/.conda/environments.txt Looks harmless, thanks for reporting. It's because the "boinc" user doesn't have a HOME directory I think. Gentoo put the home for boinc at /opt/boinc. I updated the user file to change it to /var/lib/boinc. ID: 56152 · Rating: 0 · rate: / Reply Quote