Message boards :
News :
More Acemd3 tests
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 31 Oct 08 Posts: 186 Credit: 3,578,903,157 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have a request that gave me 7.15.0 is it supposed to be 7.16.2? The systems I have that run gpugrid on windows are matched GPUs. |
Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,496,860,456 RAC: 8,582,660 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There are two different gt1030's. One is significantly slower than the other else they are identical. The newer versions are crippled. I was wondering if the pair you have together are matched. Just a guess as that could cause unexpected timing values if the apps simply checks the name and does not bother to recalculate parameters. These two 1030s are identical. Same brand and model, bought at the same time. It seems like a clue, that only the CUDA 100 tasks fail, and not the CUDA 101. Note, another of my machines has a single, identical 1030 (also purchased at the same time). It does fail either 101 or 100. Perhaps there is something about CUDA 100 and dual-card machines. Just a guess. Reno, NV Team: SETI.USA |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
It is still from the master branch which is the development version 7.15.0. Or at least it still has the versioning number from the master branch. It may have more commits from further upstream too. If the version.h and version.log files aren't updated, the compile will still show whatever the version in those files are set. But it has the commit I referenced in it with a fix for wrapper apps. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have a request for help from Windows users. Does anyone want to try a development branch of the client that may be able to handle the pause/suspend issues on the acemd3 wrapper apps? I'm a little worried by that. The changes in PR #3307 were made in the wrapper app itself (only). You could indeed download the win-apps bundle from appveyor and extract wrapper_26014_windows_x86_64.exe, but it would be hard to deploy if Toni is issuing an earlier version from the server. If the client downloaded from that link has improvements, they'll come from the cumulative set of changes made both before and after the 7.16 branch was split. We urgently need to work out which the beneficial change was, and whether it happened before or after the fork. If it was made later, it needs to be cherrypicked into the new release. |
Send message Joined: 15 Sep 19 Posts: 4 Credit: 485,304,520 RAC: 0 Level ![]() Scientific publications ![]() |
Hi I'm new on this forum. Here is what I've got with my 1050ti on linux x64 : curent task : ADRIA_FOLDUBQ_BANDIT_ss_contacts_50_ubiquitin_4-0-2 resources : 0.909 CPUs + 1 NVIDIA GPU task size : 5000000 GFLOPs elapsed time : 08:54:01 remaining time : 09:09:51 progress : 13,800 % 14% done after 50% elapsed time ??? |
![]() ![]() Send message Joined: 31 Oct 08 Posts: 186 Credit: 3,578,903,157 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
7.2.42 is really old (but latest on berkeley download). very likely the client is estimating wrong in addition to mis-identifying the cpu. apt-get under ubuntu 18.04 got me version 7.16.1 boinc |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I never thought about where the wrapper app originated. If issued by the server, it still controls the show if the new one doesn't get put into play. I just thought the description of the fix dovetailed perfectly into what we are seeing with the Windows acemd3 app runs and their inability to be suspended without failing. I was hoping you might see this post and contribute Richard as you know far more about how releases are handled. Are you saying that the wrapper app needs to be updated in the server code? Like in the new 1.20 server release? |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Are you saying that the wrapper app needs to be updated in the server code? Like in the new 1.20 server release? Not really either of those. The wrapper is a self-contained application, built from code in the \samples\ folder on Github. I would imagine that most projects who need to use it would compile their own copy from that source. I see from your most recent stderr.txt that your machine is using Toni's "wrapper (7.7.26016)". I'm not sure exactly how the version number is generated: that sounds like a combination of old-ish server source code (7.7) and a possibly auto-incrementing value seeded from the old SVN repository (26016). Given that the Appveyor version I downloaded from your link this morning was 26014, it looks like Toni has possibly been updating his own local copy along the way, and getting ahead of BOINC Central. If so, I hope he pushes back any useful changes to GitHub when he's got it all working. But that's all just guesswork. Only Toni could tell you for certain. |
![]() Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi I'm new on this forum. They're done at least a partial new version lately to handle the newest Nvidia cards. The calculations for estimated remaining time tend to give rather inaccurate values under new versions until at least ten other tasks with the new version have run on the same computer. I'm also seeing rather inaccurate values with my 1080 under Windows 10 x64. |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Given that the Appveyor version I downloaded from your link this morning was 26014, it looks like Toni has possibly been updating his own local copy along the way, and getting ahead of BOINC Central. If so, I hope he pushes back any useful changes to GitHub when he's got it all working. Yes, hope Toni reads the thread and finds something useful from PR #3307 to incorporate if he in fact is updating the wrapper app on his own. Thanks for the insight about the versioning. |
Send message Joined: 15 Sep 19 Posts: 4 Credit: 485,304,520 RAC: 0 Level ![]() Scientific publications ![]() |
The calculations for estimated remaining time tend to give rather inaccurate values under new versions until at least ten other tasks with the new version have run on the same computer. Thank you. I see what you mean. Now remaining time is growing up 1 sec every 3 sec. I estimate at 60h the real time this task will do the job. |
Send message Joined: 22 Oct 10 Posts: 42 Credit: 1,752,050,315 RAC: 39,148 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
9/29/2019 9:55:41 AM | GPUGRID | Computation for task e16s9_e14s4p0f17-ADRIA_FOLDUBQ_BANDIT_crystal_ss_contacts_50_ubiquitin_0-0-2-RND4379_1 finished This 2.07 task failed 8.75 seconds after startup following a management activity on my part. I had NOT suspended boinc manager activity before the shutdown. The machine is I7, W10, RTX2080. Again frustrating that at this point the problem has not been solved. Again TONI do you want these individual reports? |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Dears, sorry for the slow progress but I determined (at least) a restart problem, and it is not related to the wrapper. It is Windows-only, CUDA 10 only, as far as I can tell from your reports, and manifests itself with the "The periodic box size has decreased to less than twice the nonbonded cutoff."message. Unfortunately the root cause is hard to identify (may be external to our code). I have compiled the wrapper myself (the binaries on the boinc page are old and had one important bug in variable substitution), but for now the failures seem unrelated. It's a bit frustrating because everything else seems to work nicely. |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
9/29/2019 9:55:41 AM | GPUGRID | Computation for task e16s9_e14s4p0f17-ADRIA_FOLDUBQ_BANDIT_crystal_ss_contacts_50_ubiquitin_0-0-2-RND4379_1 finished That seems a faulty WU. Failed elsewhere. |
Send message Joined: 15 Sep 19 Posts: 4 Credit: 485,304,520 RAC: 0 Level ![]() Scientific publications ![]() |
no task failed on linux. 1st unit : i7 with 1x 1050ti (cuda80 tasks) 2nd unit : i5 with 2x 1060 (cuda100 tasks) |
Send message Joined: 13 Feb 14 Posts: 6 Credit: 1,068,161,100 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Dears, sorry for the slow progress but I determined (at least) a restart problem, and it is not related to the wrapper. It is Windows-only, CUDA 10 only, as far as I can tell from your reports, and manifests itself with the"The periodic box size has decreased to less than twice the nonbonded cutoff."message. Any chance the Linux app could be released now, since the Linux community has been without steady work for months and the Linux app seems to be working fine? Please, please please. Edit - I forgot about the problem reported by Keith Myers involving suspend/resume on different types of cards. I guess this will need to be fixed before it can be released. No issues here with suspending and resuming tasks under Linux. Just suspended a WU and it resumed on the other GPU in that box without issue (both GPUs are RTX 2080's). |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Edit - I forgot about the problem reported by Keith Myers involving suspend/resume on different types of cards. I guess this will need to be fixed before it can be released. I solved that issue by changing my Preferences to rotate between projects to 360minutes vice the stock 60 minutes. The task stays on the same card it starts on and finishes. Longest task so far has only run for just shy of 3 hours. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Any chance the Linux app could be released now, since the Linux community has been without steady work for months and the Linux app seems to be working fine? Please, please please.There is not enough work even for the Windows based hosts in the past few months. There would be much more complaints for the lack of work if the Linux community could also crunch them. BTW I am in both groups, but I prefer Linux for the higher performance due to the lack of WDDM. |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There is not enough work even for the Windows based hosts in the past few months. There would be much more complaints for the lack of work if the Linux community could also crunch them. BTW I am in both groups, but I prefer Linux for the higher performance due to the lack of WDDM. But that could be because all their new work is for Acemd3, and they are just letting the old stuff complete. I would state it the other way: They could do all the work they need to just with the Linux machines. They can work on the Windows app later, and have it working when they need it. Complaints? Have they ever stopped? |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 869 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Complaints? Have they ever stopped? :-) :-) :-) |
©2025 Universitat Pompeu Fabra