Message boards : News : Update acemd3 app
Author | Message |
---|---|
I deployed the new app, which now requires cuda 11.2 and hopefully support all the latest cards. Touching the cuda versions is always a nightmare in boinc scheduler so expect problems. | |
ID: 57041 | Rating: 0 | rate: / Reply Quote | |
YES! Thank you so much! | |
ID: 57042 | Rating: 0 | rate: / Reply Quote | |
I noticed the plan class is listed as "cuda1121" on the Applications page. is this a typo? will it cause any issues with getting work or running the application? | |
ID: 57043 | Rating: 0 | rate: / Reply Quote | |
Great news! So far it's only Linux, right? | |
ID: 57045 | Rating: 0 | rate: / Reply Quote | |
So far. | |
ID: 57046 | Rating: 0 | rate: / Reply Quote | |
Just so people are aware, CUDA 11.2 (I assume the "1121" means CUDA 11.2.1 "update 1") means you need at least driver 460.32 on Linux. | |
ID: 57047 | Rating: 0 | rate: / Reply Quote | |
Can someone confirm that the Linux cuda100 app is still sent out (and likely fail)? | |
ID: 57048 | Rating: 0 | rate: / Reply Quote | |
Can someone confirm that the Linux cuda100 app is still sent out (and likely fail)? is this the reason that the Linux tasks have been failing recently? they need this new app? did you remove the Linux cuda100 app? ____________ | |
ID: 57049 | Rating: 0 | rate: / Reply Quote | |
I just got a couple tasks on my RTX 3080 Ti host, it got the new app. it failed in 2 seconds. it looks like you're missing a file, or you forgot to statically link boost into the app: 16:50:34 (15968): wrapper (7.7.26016): starting https://www.gpugrid.net/result.php?resultid=32631384 but it's promising that I didnt get the "invalid architecture" error ____________ | |
ID: 57050 | Rating: 0 | rate: / Reply Quote | |
Looks like Ubuntu 20.04.2 LTS has libboost-all-dev 1.71 installed. | |
ID: 57051 | Rating: 0 | rate: / Reply Quote | |
i think these are sandboxed in the wrapper. so packages on the system in theory shouldnt matter right? | |
ID: 57052 | Rating: 0 | rate: / Reply Quote | |
Just failed a couple more acemd3 tasks. What a waste . . . . as hard as they are to snag. | |
ID: 57053 | Rating: 0 | rate: / Reply Quote | |
Just failed a couple more acemd3 tasks. What a waste . . . . as hard as they are to snag. did you get the new app? do you have that newer version of boost installed? ____________ | |
ID: 57054 | Rating: 0 | rate: / Reply Quote | |
No I just have the normal CUDA 10.0 app installed. I am just investigating what would be needed to install the missing libraries. | |
ID: 57055 | Rating: 0 | rate: / Reply Quote | |
Great to see this progress as prices of GPUs are beginning to fall and Ampere GPUs are currently dominating the market availability. I hope China's Ban on mining becomes a budgetary boon for crunchers and gamers worldwide. | |
ID: 57056 | Rating: 0 | rate: / Reply Quote | |
No I just have the normal CUDA 10.0 app installed. I am just investigating what would be needed to install the missing libraries. looks like you're actually getting the new app now: http://www.gpugrid.net/result.php?resultid=32631755 ____________ | |
ID: 57057 | Rating: 0 | rate: / Reply Quote | |
No I just have the normal CUDA 10.0 app installed. I am just investigating what would be needed to install the missing libraries. Huh, hadn't noticed. So maybe the New version of ACEMD v2.12 (cuda1121) is going to be the default app even for the older cards. | |
ID: 57058 | Rating: 0 | rate: / Reply Quote | |
I think you’ll only get 11.2 app if you have a driver that’s compatible. Greater than 460.32. Just my guess. I’ll need to see if systems with and older driver will still get the cuda 100 app | |
ID: 57059 | Rating: 0 | rate: / Reply Quote | |
I'm still trying to figure out the best way to distribute the app. The current way has hard-coded minimum-maximum driver versions for each CUDA version and it's too cumbersome to maintain. | |
ID: 57060 | Rating: 0 | rate: / Reply Quote | |
I'm still trying to figure out the best way to distribute the app. The current way has hard-coded minimum-maximum driver versions for each CUDA version and it's too cumbersome to maintain. Here is an idea: How about distribution by card type? That would exclude the really slow cards, like 740M. BTW: What driver version do we need for this? | |
ID: 57061 | Rating: 0 | rate: / Reply Quote | |
Toni, I think the first thing that needs to be fixed is the problem with boost 1.74 library not being included in the app distribution. the app is failing right away because it's not there. you either need to distribute the .so file or statically link it into the acemd3 app so it's not needed separately. | |
ID: 57062 | Rating: 0 | rate: / Reply Quote | |
after manually installing the required boost to get past that error, I now get this error on my 3080 Ti system: 09:55:10 (4806): wrapper (7.7.26016): starting Task: https://www.gpugrid.net/result.php?resultid=32632410 I tried purging and reinstalling the nvidia drivers, but no change. it looks like this same error popped up when you first released acemd3 2 years ago: http://www.gpugrid.net/forum_thread.php?id=4935#51970 biodoc wrote: Multiple failures of this task on both windows and linux you then updated the app which fixed the problem at that time, but you didnt post exactly what was changed: http://www.gpugrid.net/forum_thread.php?id=4935&nowrap=true#52022 Toni wrote: It was a cryptic bug in the order loading shared libraries, or something like that. Otherwise unexplainably system-dependent. so whatever kind of change you made between v2.02 and v2.03 seems to be what needs fixing again. ____________ | |
ID: 57063 | Rating: 0 | rate: / Reply Quote | |
I deployed the new app, which now requires cuda 11.2 and hopefully support all the latest cards. Touching the cuda versions is always a nightmare in boinc scheduler so expect problems. Thank you so much. Those efforts are for noble reasons. Regarding persistent errors: I also manually installed boost as a try at one of my Ubuntu 20.04 hosts, by means of the following commands: sudo add-apt-repository ppa:mhier/libboost-latest But a new task downloaded after that still failed: e3s644_e1s419p0f770-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND9285_4 Then, I've reset GPUGrid project, and it seems that it did the trick. A new task is currently running on this host, instead of failing after a few seconds past: e4s126_e3s248p0f238-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND6347_7 49 minutes, 1,919% progress by now. | |
ID: 57064 | Rating: 0 | rate: / Reply Quote | |
I deployed the new app, which now requires cuda 11.2 and hopefully support all the latest cards. Touching the cuda versions is always a nightmare in boinc scheduler so expect problems. Thanks, I'll try a project reset. though I had already done a project reset after the new app was announced. I guess it can't hurt. ____________ | |
ID: 57065 | Rating: 0 | rate: / Reply Quote | |
nope, even after the project reset, still the same error process exited with code 195 (0xc3, -61)</message> https://www.gpugrid.net/result.php?resultid=32632487 ____________ | |
ID: 57066 | Rating: 0 | rate: / Reply Quote | |
sudo add-apt-repository ppa:mhier/libboost-latest small correction here. it's "libboost1.74", not just "boost1.74" ____________ | |
ID: 57067 | Rating: 0 | rate: / Reply Quote | |
Maybe that your problem is an Ampere-specific one (?). | |
ID: 57068 | Rating: 0 | rate: / Reply Quote | |
Maybe that your problem is an Ampere-specific one (?). I had this thought. I put in my old 2080ti to the problem-host, and will see if it starts processing, or if it's really a problem with the host-specific configuration. this isn't the first time this has happened though. and Toni previously fixed it with an app update. so it looks like that will be needed again even if it's Ampere-specifc. I think the difference in install commands comes down to the use of apt vs. apt-get. although apt-get still works, transitioning to just apt will be better in the long term. Difference between apt and apt-get ____________ | |
ID: 57069 | Rating: 0 | rate: / Reply Quote | |
Maybe that your problem is an Ampere-specific one (?). well, it seems it's not Ampere specific. it failed in the same way on my 2080ti here: https://www.gpugrid.net/result.php?resultid=32632521 still the CUDA compiler error unfortunately I can't easily move the 3080ti to another system since it's a watercooled model that requires a custom water loop. ____________ | |
ID: 57070 | Rating: 0 | rate: / Reply Quote | |
I just used the ppa method on my other two hosts. But I did not reboot. | |
ID: 57071 | Rating: 0 | rate: / Reply Quote | |
I think I finally solved the issue! it's running on the 3080ti finally! first I removed the manual installation of boost. and installed the PPA version. I don't think this was the issue though. while poking around in my OS installs, I discovered that I had the CUDA 11.1 toolkit installed (likely from my previous attempts at building some apps to run on Ampere). I removed this old toolkit, cleaned up any files, rebooted, reset the project and waited for a task to show up. so now it's running finally. now to see how long it'll take a 3080ti ;). it has over 10,000 CUDA cores so I'm hoping for a fast time. 2080ti runs about 12hrs, so it'll be interesting to see how fast I can knock it out. using about 310 watts right now. but with the caveat that ever since I've had this card, I've noticed some weird power limiting behavior. I'm waiting on an RMA now for a new card, and I'm hoping it can really stretch it's legs, plan to still power limit it to about 320W though. ____________ | |
ID: 57072 | Rating: 0 | rate: / Reply Quote | |
Congratulations! | |
ID: 57073 | Rating: 0 | rate: / Reply Quote | |
Well, it will be your problem, not mine. Even having decent hard- and software I am pretty astonished how folks like cosmology and the likes seemingly do not understand how VM and the like works. I am pretty pissed as an amateur, though... | |
ID: 57074 | Rating: 0 | rate: / Reply Quote | |
Just seen my first failures with libboost errors on Linux Mint 20.1, driver 460.80, GTX 1660 super. | |
ID: 57075 | Rating: 0 | rate: / Reply Quote | |
Well, it will be your problem, not mine. Even having decent hard- and software I am pretty astonished how folks like cosmology and the likes seemingly do not understand how VM and the like works. I am pretty pissed as an amateur, though... what problem are you having specifically? this project has nothing to do with cosmology, and this project does not use VMs. ____________ | |
ID: 57076 | Rating: 0 | rate: / Reply Quote | |
I think I finally solved the issue! it's running on the 3080ti finally! This is the moment of truth we're all waiting for. My bet is 9h 15m. | |
ID: 57077 | Rating: 0 | rate: / Reply Quote | |
I think I finally solved the issue! it's running on the 3080ti finally! I’m not sure it’ll be so simple. When I checked earlier, it was tracking a 12.5hr completion time. But the 2080ti was tracking a 14.5hr completion time. Either the new run of tasks are longer, or the CUDA 11.2 app is slower? We’ll have to see. ____________ | |
ID: 57078 | Rating: 0 | rate: / Reply Quote | |
I'm curious how you have a real estimated time remaining calculated for a brand new application. | |
ID: 57079 | Rating: 0 | rate: / Reply Quote | |
I didn’t use the time remaining estimate from BOINC. I estimated it myself based on % complete and elapsed time, assuming a linear completion rate. | |
ID: 57080 | Rating: 0 | rate: / Reply Quote | |
When I checked earlier, it was tracking a 12.5hr completion time. But the 2080ti was tracking a 14.5hr completion time.If the new tasks are longer, the awarded credit should be higher. The present ADRIA_New_KIXcMyb_HIP_AdaptiveBandit workunits "worth" 675.000 credits, while the previous ADRIA_D3RBandit_batch_nmax5000 "worth" 523.125 credits, so the present ones are longer. My estimation was 12h/1.3=9h15m (based on my optimistic 30% performance improvement expectation). Nevertheless we can use the completion times to estimate the actual performance improvement (3080Ti vs 2080Ti): The 3080 Ti completed the task in 44368s (12h 19m 28s) the 2080Ti completed the task in 52642s (14h 37m 22s), so the 3080Ti is "only" 18.65% faster. So the number of the usable CUDA cores in the 30xx series are the half of the advertised number (just as I expected), as 10240/2=5120, 5120/4352=1.1765 (so the 3080Ti has 17.65% more CUDA cores than the 2080Ti has), the CUDA cores of the 3080Ti are 1.4% faster than of the 2080Ti. | |
ID: 57081 | Rating: 0 | rate: / Reply Quote | |
The PPA-reset trick worked - I have a new task running now. Another satisfied customer. | |
ID: 57082 | Rating: 0 | rate: / Reply Quote | |
Some of the tasks were even longer. I have two more 2080ti reports that were 55,583 and 56,560s respectively. On Identical GPUs running the same clocks. There seems to be Some variability. If you use the slower one it puts it closer to 30%. This exposes the flaw of using a single sample to form a conclusion. More data is required. | |
ID: 57083 | Rating: 0 | rate: / Reply Quote | |
I looked back into the 'job_log_www.gpugrid.net.txt' in the BOINC data folder to get my comparison times. I haven't run many AdaptiveBandits yet, but I think the 'D3RBandit_batch*' time was a robust average over the many sub-types. | |
ID: 57084 | Rating: 0 | rate: / Reply Quote | |
After cross referencing runtimes for various Windows hosts, I think the new app is just slower. Windows hosts haven’t experienced an app change (yet) and haven’t shown any sudden or recent change in run time with the KIX AdaptiveBandit jobs. This suggests that that tasks haven’t really changed, leading the only other cause of the longer run time to be a slower 11.2 app. | |
ID: 57085 | Rating: 0 | rate: / Reply Quote | |
GTX 1080 | |
ID: 57086 | Rating: 0 | rate: / Reply Quote | |
Where did you get the ns/day numbers from? | |
ID: 57087 | Rating: 0 | rate: / Reply Quote | |
Go to slot folder and cat progress.log | |
ID: 57088 | Rating: 0 | rate: / Reply Quote | |
It’s not real. I’ve manipulated the coproc_info file to report what I want. | |
ID: 57089 | Rating: 0 | rate: / Reply Quote | |
RTX 3070The WU you linked had one wingman run it as cuda 10.1 and the other as 11.21 with 155,037 seconds versus 68,000. Isn't that faster? https://www.gpugrid.net/workunit.php?wuid=27075862 What does ns mean? | |
ID: 57090 | Rating: 0 | rate: / Reply Quote | |
I didn’t use the time remaining estimate from BOINC. I estimated it myself based on % complete and elapsed time, assuming a linear completion rate. I usually employ the same method, since Progress % shown by BOINC Manager is quite linear. At my low-end GPUs, I'm still waiting for the first task to complete :-) Evaluating the small sample of tasks that I've received, tasks for this new version are taking longer to complete than previous ones (lets say "by the moment") Estimated completion times for the 5 GPUs that I'm monitoring are as follows: The last three GPUs are Turing GTX 1650 ones, but different graphics cards models and clock frequencies. An editable version of the spreadsheet used can be downloaded from this link | |
ID: 57091 | Rating: 0 | rate: / Reply Quote | |
RTX 3070The WU you linked had one wingman run it as cuda 10.1 and the other as 11.21 with 155,037 seconds versus 68,000. Isn't that faster? nanosecond https://en.wikipedia.org/wiki/Nanosecond#:~:text=A%20nanosecond%20(ns)%20is%20an,or%201%E2%81%841000%20microsecond. Yes there big gap to runtime on other host but it was also using NVIDIA GeForce GTX 1070 | |
ID: 57092 | Rating: 0 | rate: / Reply Quote | |
It’s not real. I’ve manipulated the coproc_info file to report what I want. Ok why i ask was that device name is unknown for my 3080Ti and had some hope that driver you used would fix that. So i could go coproc file and edit instead. | |
ID: 57093 | Rating: 0 | rate: / Reply Quote | |
It’s not real. I’ve manipulated the coproc_info file to report what I want. What driver are you using? The 3080ti won’t be detected until driver 460.84. Anything older will not know what GPU that is. ____________ | |
ID: 57094 | Rating: 0 | rate: / Reply Quote | |
Greger, I just can't get my head around what it means. So out of the 8.64E13 ns in a day you only calculate for 159 ns??? I'm not familiar with that figure of merit. | |
ID: 57095 | Rating: 0 | rate: / Reply Quote | |
The nanoseconds will be the biochemical reaction time that we're modelling - very, very, slowly - in a digital simulation. | |
ID: 57096 | Rating: 0 | rate: / Reply Quote | |
Greger, I just can't get my head around what it means. So out of the 8.64E13 ns in a day you only calculate for 159 ns??? I'm not familiar with that figure of merit. Aren’t you big into folding? ns/day is a very common metric for measuring computation speed in molecular modeling. ____________ | |
ID: 57097 | Rating: 0 | rate: / Reply Quote | |
It’s not real. I’ve manipulated the coproc_info file to report what I want. NVIDIA-SMI 465.27 Driver Version: 465.27 CUDA Version: 11.3 Could not use 460 for 3080Ti so i had to move latest ubuntu provided and it would this version. boinc-client detect name as Coprocessors NVIDIA NVIDIA Graphics Device (4095MB) driver: 465.27 I edit coproc_info.xml but it does not change when i update to project and if i restart boinc-client it will wipe even if ai change driverversin inside file. Maybe i could lock file to root only to prevent boinc to write permission but i better not. | |
ID: 57098 | Rating: 0 | rate: / Reply Quote | |
You need driver 460.84 for 3080ti. You can use that one. | |
ID: 57099 | Rating: 0 | rate: / Reply Quote | |
Greger, I just can't get my head around what it means. So out of the 8.64E13 ns in a day you only calculate for 159 ns??? I'm not familiar with that figure of merit. As mention before here it is possible time the device could genereate a folding event for that device but you need take in count the complexity of folding time in and amount of atoms have big affect on it and possible other parameters in modelling event. Think of see it as a box and you have x y z and it build up protein with atoms then make fold of it. In total result it would be very very short event. There was a free tool before and possible available still today that you could use to open data that done directly with after it was done. users have done this at folding@home and posted in forums. Not sure if that is free for acemd | |
ID: 57100 | Rating: 0 | rate: / Reply Quote | |
You need driver 460.84 for 3080ti. You can use that one. ok thanks | |
ID: 57101 | Rating: 0 | rate: / Reply Quote | |
Yea, snagged a WU and it's running. My guesstimate is 19:44:13 on my 3080 dialed down to 230 Watts. Record breaking long heat wave here and summer peak Time-of-Use electric rates (8.5x higher) have started. Summer is not BOINC season in The Great Basin. | |
ID: 57102 | Rating: 0 | rate: / Reply Quote | |
Linux Mint repository offers 465.31 and 460.84. Is it actually worth reverting to 460.84??? I wouldn't do it until after this WU completes anyway. probably wont matter if the driver you have is working. i don't expect any performance difference between the two. I was just saying that I would use a more recent non-beta driver if i was updating, unless you need some feature in 465 branch specifically. ____________ | |
ID: 57103 | Rating: 0 | rate: / Reply Quote | |
second 3080ti task completed in 11hrs | |
ID: 57104 | Rating: 0 | rate: / Reply Quote | |
peak 28,9°C here today so suspend during daytime after 2 task done. | |
ID: 57105 | Rating: 0 | rate: / Reply Quote | |
Just take in mind that any change in Nvidia driver version while a GPUgrid task is in progress, will cause it to fail when computing is restarted. | |
ID: 57106 | Rating: 0 | rate: / Reply Quote | |
peak 28,9°C here today so suspend during daytime after 2 task done. The VRAM reported wrong is not because of the driver. It’s a problem with BOINC. BOINC uses a detection technique that is only 32-bit (4GB). This can only be fixed by fixing the code in BOINC. ____________ | |
ID: 57107 | Rating: 0 | rate: / Reply Quote | |
I went back to my host and driver crashed. smi unable to open and task failed on another project. Restarted it and back on track. Few minutes later it fetch new task from GPUGrid. Let's hope it does not crash again. | |
ID: 57108 | Rating: 0 | rate: / Reply Quote | |
GPU detection is handled by BOINC, not any individual projects. | |
ID: 57109 | Rating: 0 | rate: / Reply Quote | |
Finally, my first result of a new version 2.12 task came out in my fastest card: | |
ID: 57110 | Rating: 0 | rate: / Reply Quote | |
Compare old and new app on 2070S | |
ID: 57111 | Rating: 0 | rate: / Reply Quote | |
My guesstimate is 19:44:13 on my 3080 dialed down to 230 Watts. 16:06:54 https://www.gpugrid.net/workunit.php?wuid=27077289 | |
ID: 57112 | Rating: 0 | rate: / Reply Quote | |
At this moment, every of my 7 currently working GPUs have any new version 2.12 task in process. acemd3: error while loading shared libraries: libboost_filesystem.so.1.74.0: cannot open shared object file: No such file or directory Chance to remember that there is a remedy for this problem, commented at message #57064 in this same thread. One last update for estimated times to completion on my GPUs: An editable version of the spreadsheet used can be downloaded from this link Changes since previous version: - Lines for two more GPUs are added. - A new cell is added for seconds to D:H:M:S conversion | |
ID: 57113 | Rating: 0 | rate: / Reply Quote | |
So the number of the usable CUDA cores in the 30xx series are half of the advertised number (just as I expected), as 10240/2=5120, 5120/4352=1.1765 (so the 3080Ti has 17.65% more CUDA cores than the 2080Ti has), the CUDA cores of the 3080Ti are 1.4% faster than of the 2080Ti. Does using half of CUDA cores have implications for BOINCing? GG+OPNG at <cpu_usage>1.0</cpu_usage> & <gpu_usage>0.5</gpu_usage> works fine. GG+DaggerHashimoto crashes GG instantly. I hope to try 2xGG today. | |
ID: 57114 | Rating: 0 | rate: / Reply Quote | |
Does using half of CUDA cores have implications for BOINCing?You can't utilize the "extra" CUDA cores by running a second task (regardless of the project). The 30xx series improved gaming experience much more, than the crunching performance. | |
ID: 57115 | Rating: 0 | rate: / Reply Quote | |
So the number of the usable CUDA cores in the 30xx series are half of the advertised number (just as I expected), as 10240/2=5120, 5120/4352=1.1765 (so the 3080Ti has 17.65% more CUDA cores than the 2080Ti has), the CUDA cores of the 3080Ti are 1.4% faster than of the 2080Ti. I think you misunderstand what's happening. running 2x GPUGRID tasks concurrently wont make it "use more". it'll just slow both down, probably slower than half speed due to the constant resource fighting. if GPUGRID isn't seeing the effective 2x benefit of Turing vs Ampere, that tells me one of two things (or maybe some combination of both): 1. that app isn't as FP32 heavy as some have implied, and maybe has a decent amount of INT32 instructions. the INT32 setup of Ampere is the same as Turing 2. there is some additional optimization that needs to be applied to the ACEMD3 app to better take advantage of the extra FP32 cores on Ampere. ____________ | |
ID: 57116 | Rating: 0 | rate: / Reply Quote | |
if GPUGRID isn't seeing the effective 2x benefit of Turing vs Ampere, that tells me one of two things (or maybe some combination of both): The way Ampere works is that half the cores are FP32, and the other half are either FP32 or INT32 depending on need. On Turing (and older), the INT32 half was always INT32. So you're probably right - either GPUGRID has some INT32 load that is using the cores instead, or some kind of application change is required to get it to use the other half. | |
ID: 57118 | Rating: 0 | rate: / Reply Quote | |
I'm not convinced that the extra cores "aren't being used" at all, ie, the cores are sitting idle 100% of the time as a direct result of the architecture or something like that. I think both the application and the hardware are fully aware of the available cores/SMs. just that the application is coded in such a way that it can't take advantage of the extra resources, either in optimization or in the number of INT instructions required. | |
ID: 57119 | Rating: 0 | rate: / Reply Quote | |
The 30xx series improved gaming experience much more, than the crunching performance. I'm thoroughly unimpressed by my 3080. Its performance does not scale with price making it much more expensive for doing calculations. I'll probably test it for a few more days and then sell it. I like to use some metric that's proportional to calculations and optimize calcs/Watt. In the past my experience has been reducing max power improves performance. But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux. nvidia-settings -q all It seems Nvidia chooses a performance level but I can't see how to force it to a desired level: sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -q '[gpu:0]/GPUPerfModes' 3080: 0, 1, 2, 3 & 4 Attribute 'GPUPerfModes' (Rig-05:0[gpu:0]): perf=0, nvclock=210, nvclockmin=210, nvclockmax=420, nvclockeditable=1, memclock=405, memclockmin=405, memclockmax=405, memclockeditable=1, memTransferRate=810, memTransferRatemin=810, memTransferRatemax=810, memTransferRateeditable=1 ; perf=1, nvclock=210, nvclockmin=210, nvclockmax=2100, nvclockeditable=1, memclock=810, memclockmin=810, memclockmax=810, memclockeditable=1, memTransferRate=1620, memTransferRatemin=1620, memTransferRatemax=1620, memTransferRateeditable=1 ; perf=2, nvclock=240, nvclockmin=240, nvclockmax=2130, nvclockeditable=1, memclock=5001, memclockmin=5001, memclockmax=5001, memclockeditable=1, memTransferRate=10002, memTransferRatemin=10002, memTransferRatemax=10002, memTransferRateeditable=1 ; perf=3, nvclock=240, nvclockmin=240, nvclockmax=2130, nvclockeditable=1, memclock=9251, memclockmin=9251, memclockmax=9251, memclockeditable=1, memTransferRate=18502, memTransferRatemin=18502, memTransferRatemax=18502, memTransferRateeditable=1 ; perf=4, nvclock=240, nvclockmin=240, nvclockmax=2130, nvclockeditable=1, memclock=9501, memclockmin=9501, memclockmax=9501, memclockeditable=1, memTransferRate=19002, memTransferRatemin=19002, memTransferRatemax=19002, memTransferRateeditable=1 Nvidia has said, "The -a and -g arguments are now deprecated in favor of -q and -i, respectively. However, the old arguments still work for this release." Sounds like they're planning to reduce or eliminate customers ability to control the products they buy. Nvidia also eliminated GPULogoBrightness so the baby-blinkie lights never turn off. | |
ID: 57120 | Rating: 0 | rate: / Reply Quote | |
running 2x GPUGRID tasks concurrently wont make it "use more". it'll just slow both down, probably slower than half speed due to the constant resource fighting. At less than 5% complete with two WUs running simultaneously and having started within minutes of each other: WU1: 4840 sec at 4.7% implies 102978 sec total WU2: 5409 sec at 4.6% implies 117587 sec total From yesterday's singleton: 2 x 58014 sec = 116028 sec total if independent. | |
ID: 57121 | Rating: 0 | rate: / Reply Quote | |
running 2x GPUGRID tasks concurrently wont make it "use more". it'll just slow both down, probably slower than half speed due to the constant resource fighting. my point exactly. showing roughly half speed, with no real benefit to running multiples. pushing your completion time to 32hours will only reduce your credit reward since you'll be bumped out of the +50% bonus for returning in 24hrs. ____________ | |
ID: 57122 | Rating: 0 | rate: / Reply Quote | |
Aurum wrote: But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux. these options still work. I use them for my 3080Ti. not sure what you mean? this is exactly what I use for my 3080Ti (same on my Turing hosts) /usr/bin/nvidia-smi -pm 1 /usr/bin/nvidia-smi -acp UNRESTRICTED /usr/bin/nvidia-smi -i 0 -pl 320 /usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1" /usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=500" -a "[gpu:0]/GPUGraphicsClockOffset[4]=100" it works as desired. Aurum wrote: It seems Nvidia chooses a performance level but I can't see how to force it to a desired level: what do you mean by "performance level"? if you mean forcing a certain P-state, no you can't do that. and these cards will not allow getting into P0 state unless you're running a 3D application. any compute application will get a best of P2 state. this has been the case ever since Maxwell. workarounds to force P0 state stopped working since Pascal, so this isnt new. if you mean the PowerMizer preferred mode (which is analogous to the power settings in Windows) you can select that easily in Linux too. I always run mine at "prefer max performance" do this with the following command: /usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1" I'm unsure if this really makes much difference though except increasing idle power consumption (forcing higher clocks). the GPU seems to detect loads properly and clock up even when left on the default "Auto" selection. Aurum wrote: Nvidia also eliminated GPULogoBrightness so the baby-blinkie lights never turn off. I'm not sure this was intentional, probably something that fell through the cracks that not enough people have complained about for them to dedicate resources to fixing. there's no gain for nvidia disabling this function. but again, this stopped working with Turing, so it's been this way for like 3 years, not something new. I have mostly EVGA cards, so when I want to mess with the lighting, I just throw the card on my test bench, boot into Windows, change the LED settings there, and then put it back in the crunching rig. the settings are preserved internal to the card (for my cards) so it stays and whatever I left it as. you can probably do the same ____________ | |
ID: 57123 | Rating: 0 | rate: / Reply Quote | |
It sure does not look like running multiple GG WUs on the same GPU has any benefit. | |
ID: 57124 | Rating: 0 | rate: / Reply Quote | |
How do you prove to yourself they work? They don't even exist any more. Run nvidia-settings -q all | grep -C 10 -i GPUMemoryTransferRateOffset and you will not find either of them. | |
ID: 57125 | Rating: 0 | rate: / Reply Quote | |
but all the slightly off-topic aside. | |
ID: 57126 | Rating: 0 | rate: / Reply Quote | |
I prove they work by opening Nvidia X Server Settings and observing that the clock speed offsets have been changed in accordance with the commands and don't give any error when running them. and they have. the commands work 100%. I see you're referencing some other command. I have no idea the function of the command you're trying to use. but my command works. see for yourself: https://i.imgur.com/UFHbhNt.png ____________ | |
ID: 57127 | Rating: 0 | rate: / Reply Quote | |
I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds. | |
ID: 57139 | Rating: 0 | rate: / Reply Quote | |
I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds. How did you install the drivers? Have you ever installed the CUDA toolkit? This was my problem. If you have a CUDA toolkit installed, remove it. I would also be safe and totally purge your nvidia drivers and re-install fresh. ____________ | |
ID: 57140 | Rating: 0 | rate: / Reply Quote | |
Ian&Steve C wrote: It was a great first step to getting the app working for Ampere. it's been long awaited and the new app is much appreciated and now many more cards can help contribute to the project, especially with these newer long running tasks lately. we need powerful cards to handle these tasks. and last, but not least: an app for Windows would be nice :-) | |
ID: 57142 | Rating: 0 | rate: / Reply Quote | |
I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds. Thanks for the quick reply. I had the CUDA toolkit ver 10 installed, but after seeing your previous post about you problem, I had already removed it. I'll try purging and reinstalling my nvidia drivers, thanks. | |
ID: 57143 | Rating: 0 | rate: / Reply Quote | |
did you use the included removal script to remove the toolkit? or did you manually delete some files? definitely try the removal script if you havent already. good luck! | |
ID: 57145 | Rating: 0 | rate: / Reply Quote | |
... For those of us who are using the python app, the correct version is installed in the miniconda folder. locate libboost_filesystem /usr/lib64/libboost_filesystem-mt.so /usr/lib64/libboost_filesystem.so /usr/lib64/libboost_filesystem.so.1.76.0 /usr/lib64/cmake/boost_filesystem-1.76.0/libboost_filesystem-variant-shared.cmake /var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/libboost_filesystem.so /var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/libboost_filesystem.so.1.74.0 /var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/cmake/boost_filesystem-1.74.0/libboost_filesystem-variant-shared.cmake /var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/libboost_filesystem.so /var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/libboost_filesystem.so.1.74.0 /var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/cmake/boost_filesystem-1.74.0/libboost_filesystem-variant-shared.cmake I definitely don't want to downgrade my system version to run a project. Perhaps gpugrid could include the libboost that they already supply for a different app. Could the miniconda folder be somehow included in the app? | |
ID: 57147 | Rating: 0 | rate: / Reply Quote | |
Richard Haselgrove sait at Message #57177: Look at that timeout: host 528201. Oh, Mr. Kevvy, where art thou? 156 libboost errors? You can fix that... Finally, Mr. Kevvy host #537616 processed successfully today these two tasks: e4s113_e1s796p0f577-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND7908_0 e5s9_e3s99p0f334-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND8007_4 If it was due to your fix, congratulations Mr. Kevvy, you've found the right way. Or perhaps it was some fix at tasks from server side? Hard to know till there are plenty of new tasks ready to send. Currently, 7:51:20 UTC, there are 0 tasks left ready to send, 28 tasks left in progress, as Server status page shows. | |
ID: 57192 | Rating: 0 | rate: / Reply Quote | |
I got a note back from Mr. K - he saw the errors, and was going to check his machines. I imagine he's applied Ian's workround. | |
ID: 57193 | Rating: 0 | rate: / Reply Quote | |
On July 3rd 2021, Ian&Steve C. wrote at Message #57087: But it’s not just 3000-series being slow. All cards seem to be proportionally slower with 11.2 vs 10.0, by about 30% While organizing screenshots on one of my hosts, I happened to find comparative images for tasks of old Linux APP V2.11 (CUDA 10.0) and new APP V2.12 (CUDA 11.2) * ACEMD V2.11 tasks on 14/06/2021: * ACEMD V2.12 task on 20/07/2021: Pay attention to device 0, the only comparable one. - ACEMD V2.11 task: 08:10:18 = 29418 seconds past to process 15,04%. Extrapolating, this leads to 195598 seconds of total processing time (2d 06:19:58) - ACEMD V2.12 task: 3d 02:51:01 = 269461 seconds past to process 96,48%. Extrapolating, this leads to 279292 seconds of total processing time (3d 05:34:52) That is, about 42,8% of excess processing time for this particular host and device 0 (GTX 1650 GPU) | |
ID: 57222 | Rating: 0 | rate: / Reply Quote | |
Also bear in mind that your first screenshot shows a D3RBandit task, and your second shows a AdaptiveBandit task. | |
ID: 57223 | Rating: 0 | rate: / Reply Quote | |
Also bear in mind that your first screenshot shows a D3RBandit task, and your second shows a AdaptiveBandit task. Bright observer, and sharp appointment, as always. I agree that tasks aren't probably fully comparable, but they are the most comparable I found: Same host, same device, same ADRIA WUs family, same base credit amount granted: 450000... Now I'm waiting for the next move, and wondering about what will it consist of: An amended V2.12 APP?, a new V2.13 APP?, a "superstitious-proof" new V2.14 APP? ... ;-) | |
ID: 57225 | Rating: 0 | rate: / Reply Quote | |
Is GPU grid still doing anything? I haven't gotten any work in like a month or more. And before that is was just sporadic. I used to always have work units. Now, nothing. | |
ID: 57230 | Rating: 0 | rate: / Reply Quote | |
I am not receiving Windows tasks anymore. My configuration is | |
ID: 57231 | Rating: 0 | rate: / Reply Quote | |
there hasnt been an appreciable amount of work available for over a month. | |
ID: 57232 | Rating: 0 | rate: / Reply Quote | |
there hasnt been an appreciable amount of work available for over a month. :-( :-( :-( | |
ID: 57233 | Rating: 0 | rate: / Reply Quote | |
there hasnt been an appreciable amount of work available for over a month. Currently it's like Gpugrid Project was hibernating. From time to time, when tasks in progress reach zero, some automatism (?) launches 20 more CRYPTICSCOUT_pocket_discovery WUs. But lately only for Linux systems, and with these known problems unsolved. Waiting for everything awakening soon again... | |
ID: 57234 | Rating: 0 | rate: / Reply Quote | |
Yes, that is all I've been getting lately. I had 4 CRYPTICSCOUT_pocket_discovery tasks 4 days ago and I got 2 more today. | |
ID: 57235 | Rating: 0 | rate: / Reply Quote | |
Another two more today. | |
ID: 57236 | Rating: 0 | rate: / Reply Quote | |
what I don't understand is that there is no word whatsoever from the project team about an even tentative schedule :-( | |
ID: 57237 | Rating: 0 | rate: / Reply Quote | |
still no tasks on the two hosts that have been having some issue getting work since the new app was released. I've set NNT on all other hosts in order to try to funnel any available work to these two hosts. | |
ID: 57238 | Rating: 0 | rate: / Reply Quote | |
Well I stepped out on a limb and side emailed the Principal Investigator listed for the project and the University regarding the lack of any communications. | |
ID: 57239 | Rating: 0 | rate: / Reply Quote | |
It worked! And Texas is still there, the last time I checked. | |
ID: 57243 | Rating: 0 | rate: / Reply Quote | |
Moved to libboost thread. | |
ID: 57247 | Rating: 0 | rate: / Reply Quote | |
Hello; | |
ID: 57427 | Rating: 0 | rate: / Reply Quote | |
Hello; Try once more, but let it run for five minutes after the restart. You may find the the progress display jumps back up to what it was showing before. I had a total electricity blackout a week ago, while everything was running. I was seeing 0% as the machine started up, but it jumped back up to 50% or whatever was appropriate, and completed normally. | |
ID: 57430 | Rating: 0 | rate: / Reply Quote | |
Why are so many Acemd3 tasks failing on Windows host(s)? Mine has not succeeded on any task lately. Same errors can be seen on many other hosts too, like WU 27085364. | |
ID: 57759 | Rating: 0 | rate: / Reply Quote | |
There's another Acemd3 test ongoing today: | |
ID: 57780 | Rating: 0 | rate: / Reply Quote | |
all of mine failed, because they sent only the CUDA101 app to my Ampere host. | |
ID: 57784 | Rating: 0 | rate: / Reply Quote | |
all of mine failed, because they sent only the CUDA101 app to my Ampere host. I am wondering that this problem has not yet been solved :-( There should have been time enough in the meantime. | |
ID: 57787 | Rating: 0 | rate: / Reply Quote | |
I am running my 1st New version of ACEMD 2.19 (cuda1121) task and it has reached a percentage of 35.833 and shows a Running status but the Elapsed time and Time remaining are not incrementing ? | |
ID: 58012 | Rating: 0 | rate: / Reply Quote | |
I am running my 1st New version of ACEMD 2.19 (cuda1121) task and it has reached a percentage of 35.833 and shows a Running status but the Elapsed time and Time remaining are not incrementing ? https://www.gpugrid.net/forum_thread.php?id=5297&nowrap=true#58008 Ya need MS Visual C++ 2015 or later. | |
ID: 58014 | Rating: 0 | rate: / Reply Quote | |
Yes I updated my MS Visual C++ to the 2015 or newer level and my old NVDIA 1060 took right off !!! | |
ID: 58020 | Rating: 0 | rate: / Reply Quote | |
Anybody get any new acemd3 tasks and notice if the application name changed? | |
ID: 58385 | Rating: 0 | rate: / Reply Quote | |
Yes, I've had one running on a Linux machine for the last 10 hours or so. Still says v2.19 and cuda 1121. It's approaching 40%, so much the same speed as usual for a GTX 1660 Ti | |
ID: 58386 | Rating: 0 | rate: / Reply Quote | |
That's interesting. | |
ID: 58388 | Rating: 0 | rate: / Reply Quote | |
The task I mentioned has now completed and reported - visible on the link I posted last night. The actual binary executable is still acemd3, dated 28 September 2021 - you can see the name in stderr.txt | |
ID: 58390 | Rating: 0 | rate: / Reply Quote | |
That's what I see from the client_state file: <app> <name>acemd3</name> <user_friendly_name>Advanced molecular dynamics simulations for GPUs</user_friendly_name> <non_cpu_intensive>0</non_cpu_intensive> </app> | |
ID: 58391 | Rating: 0 | rate: / Reply Quote | |
Then the next 2 WUs show up as 1.0 and: <app> <name>acemd4</name> <user_friendly_name>Advanced molecular dynamics simulations for GPUs</user_friendly_name> <non_cpu_intensive>0</non_cpu_intensive> </app> | |
ID: 58392 | Rating: 0 | rate: / Reply Quote | |
That's interesting. You'd think they'd have a link to the Apps page, but no. The first 5 of those new acemd4 WUs failed within a few minutes. Stderr output <core_client_version>7.16.6</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 07:45:46 (99083): wrapper (7.7.26016): starting 07:45:46 (99083): wrapper (7.7.26016): starting 07:45:46 (99083): wrapper: running /bin/tar (xf x86_64-pc-linux-gnu__cuda1121.tar.bz2) 07:52:57 (99083): /bin/tar exited; CPU time 424.196280 07:52:57 (99083): wrapper: running bin/python (pre_run.py) File "/var/lib/boinc-client/slots/36/pre_run.py", line 1 <soft_link>../../projects/www.gpugrid.net/T1_3-RAIMIS_TEST-0-pre_run</soft_link> ^ SyntaxError: invalid syntax 07:52:58 (99083): bin/python exited; CPU time 0.137151 07:52:58 (99083): app exit status: 0x1 07:52:58 (99083): called boinc_finish(195) </stderr_txt> ]]> | |
ID: 58393 | Rating: 0 | rate: / Reply Quote | |
Then the next 2 WUs show up as 1.0 and: Just had one of those run through to completion: T5_5-RAIMIS_TEST-1-3-RND1908_0 I think that's the first I've seen from RAIMIS which both: * Was explicitly designated as a GPU task (cuda 1121) * Ran right through to validation Congratulations! It was a very quick test run - under 7 minutes - but all the moving parts seem to have been assembled into the right order. The actual binary (as listed in stderr_txt) is 'acemd', not acemd4: that might be worth tidying up in the future. | |
ID: 58394 | Rating: 0 | rate: / Reply Quote | |
Ok. | |
ID: 58610 | Rating: 0 | rate: / Reply Quote | |
I had 2 WUs running today. They both made it up to 66.666% complete, then they stayed there for a few hours doing nothing, There was no CPU nor GPU usage. So I aborted both of them. How long was I supposed to keep them "running" like that? | |
ID: 58611 | Rating: 0 | rate: / Reply Quote | |
You should have either exited BOINC and restarted or suspend/resume the tasks to get them moving again. | |
ID: 58612 | Rating: 0 | rate: / Reply Quote | |
ALL my tasks finish with | |
ID: 58630 | Rating: 0 | rate: / Reply Quote | |
ALL my tasks finish with you have several more specific errors. "ACEMD failed: Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700)" "ACEMD failed: Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719)" "ACEMD failed: Particle coordinate is nan" it's possible a driver issue for the CUDA errors. use a program called DDU (Display Driver Uninstaller) to totally wipe out the drivers. then re-install fresh from the nvidia package. In my opinion, the 470-series driver are most stable for crunching. the newer drivers will get you the slightly different CUDA 11.21 app also. "particle coordinate is nan" (nan= not a number) is usually an overclocking issue, or a bad WU. ____________ | |
ID: 58631 | Rating: 0 | rate: / Reply Quote | |
Good afternoon everyone, | |
ID: 58875 | Rating: 0 | rate: / Reply Quote | |
Good afternoon everyone, I would be surprised if you receive a reply :-( | |
ID: 58877 | Rating: 0 | rate: / Reply Quote | |
Good afternoon everyone, so, did I promise too much? | |
ID: 59080 | Rating: 0 | rate: / Reply Quote | |
The preponderance of work lately has been 99:1 for the Python tasks. | |
ID: 59081 | Rating: 0 | rate: / Reply Quote | |
I've had eight ACEMD 3 tasks since Friday - six 'ADRIA' (the long-running ones), and two 'CRYPTICSCOUT' (significantly shorter). One oddity is that the credit for 'ADRIA' tasks has been substantially reduced, but the credit for 'CRYPTICSCOUT' hasn't. Was that deliberate, I wonder? | |
ID: 59083 | Rating: 0 | rate: / Reply Quote | |
Richard, maybe you can answer this puzzle. | |
ID: 59088 | Rating: 0 | rate: / Reply Quote | |
The difference in behaviour will be down to the client scheduler (not a separate program - an integral part of the client code). | |
ID: 59091 | Rating: 0 | rate: / Reply Quote | |
OK, thanks for the comment. As usual, I overthought the problem. | |
ID: 59092 | Rating: 0 | rate: / Reply Quote | |
I am not sure if I am wording this question properly, but does acemd3 use single or double point precision? This is more just out of curiosity versus anything else. Also, is there a way to tell what a program/app is using by looking within the OS that is running it? | |
ID: 59094 | Rating: 0 | rate: / Reply Quote | |
Yes, the running tasks can be identified by their science applications in the running processes on a host. | |
ID: 59095 | Rating: 0 | rate: / Reply Quote | |
Is there a way to tell if it is single or double point precision by looking/inspecting the process in the task manager (Windows)? | |
ID: 59097 | Rating: 0 | rate: / Reply Quote | |
You would have to ask Toni whether the acemd3 application uses single or double precision. | |
ID: 59098 | Rating: 0 | rate: / Reply Quote | |
I'm going to guess that the vast majority is FP32 and INT32. I have not observed any correlation with FP64 across devices on GPUGRID tasks, so if any FP64 operations are being done, the percentage of compute time should be so small to be only marginal. | |
ID: 59100 | Rating: 0 | rate: / Reply Quote | |
I have received 5 acemd tasks and they all failed. | |
ID: 59448 | Rating: 0 | rate: / Reply Quote | |
I have several of those on two machines: Stderr Ausgabe <core_client_version>7.16.6</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 14:52:08 (347837): wrapper (7.7.26016): starting 14:52:29 (347837): wrapper (7.7.26016): starting 14:52:29 (347837): wrapper: running bin/acemd3 (--boinc --device 0) 14:52:30 (347837): bin/acemd3 exited; CPU time 0.003638 14:52:30 (347837): app exit status: 0x1 14:52:30 (347837): called boinc_finish(195) </stderr_txt> ]]> Nice to have ACEMD back, but I'd consider this even nicer if the ACEMD's didn't crash. ;-) ____________ - - - - - - - - - - Greetings, Jens | |
ID: 59629 | Rating: 0 | rate: / Reply Quote | |
Looks like I received 37 work units overnight, all failed after about 10-30 seconds. | |
ID: 59630 | Rating: 0 | rate: / Reply Quote | |
same. all the acemd3 tasks failed without any informative error message. on Linux | |
ID: 59631 | Rating: 0 | rate: / Reply Quote | |
3skh-ADRIA_KDeepMD_100ns_2489-0-1-RND9110_5 Stderr output I have over 40 of these so far. Host OS (win10) and drivers are up-to-date. Are these ACEMD apps also failing under Linux? ____________ "Together we crunch To check out a hunch And wish all our credit Could just buy us lunch" Piasa Tribe - Illini Nation | |
ID: 59632 | Rating: 0 | rate: / Reply Quote | |
Yes, per Ian&Steve C. | |
ID: 59633 | Rating: 0 | rate: / Reply Quote | |
I just had 13 of the apps crash with computation error (195) 5 12/20/2022 2:03:07 PM CUDA: NVIDIA GPU 0: NVIDIA GeForce GTX 1660 Ti (driver version 526.86, CUDA version 12.0, compute capability 7.5, 6144MB, 6144MB available, 5530 GFLOPS peak) 6 12/20/2022 2:03:07 PM OpenCL: NVIDIA GPU 0: NVIDIA GeForce GTX 1660 Ti (driver version 526.86, device version OpenCL 3.0 CUDA, 6144MB, 6144MB available, 5530 GFLOPS peak) | |
ID: 59634 | Rating: 0 | rate: / Reply Quote | |
Linux: Took almost 30 minutes to download but only seconds to error out GPUGRID x86_64-pc-linux-gnu__cuda1121.zip.b4692e2ec3b7e128830af5c05a9f0037 98.225 1013587.50 K 00:28:10 587.44 KBps Downloading dual-linux GPUGRID 2.19 ACEMD 3: molecular dynamics simulations for GPUs (cuda1121) 3sni-ADRIA_KDeepMD_100ns_3150-0-1-RND1427_7 00:00:24 (-) 0.00 100.000 - 12/26/2022 11:33:32 AM 0.993C + 1NV Computation error d 12/21/2022 9:59:43 AM CUDA: NVIDIA GPU 0: NVIDIA P102-100 (driver version 470.99, CUDA version 11.4, compute capability 6.1, 5060MB, 5060MB available, 10771 GFLOPS peak) 12/21/2022 9:59:43 AM OS: Linux Ubuntu: Ubuntu 20.04.5 LTS [5.4.0-135-generic|libc 2.31] ____________ try my performance program, the BoincTasks History Reader. Find and read about it here: https://forum.efmer.com/index.php?topic=1355.0 | |
ID: 59636 | Rating: 0 | rate: / Reply Quote | |
will there be more acemd3 tasks in the near future? | |
ID: 59645 | Rating: 0 | rate: / Reply Quote | |
will there be more acemd3 tasks in the near future? I would hope so. Would be nice to return to quick running acemd3 tasks that run only on the gpu. Let's hope the developer can rework the parameters for these new tasks so that they don't fail instantly on everyone's hosts. | |
ID: 59647 | Rating: 0 | rate: / Reply Quote | |
Let's hope the developer can rework the parameters for these new tasks so that they don't fail instantly on everyone's hosts. + 1 | |
ID: 59649 | Rating: 0 | rate: / Reply Quote | |
Here's an alternate perspective, looking at the time it took to reach the error is probably a valid way to compare the speed of the various hosts that ran the same scripts. Assuming that the error is in the script, of course. | |
ID: 59650 | Rating: 0 | rate: / Reply Quote | |
Well since the wrapper hasn't changed and the app hasn't changed, then the issue with the tasks is that the configuration of the task parameters is doing something rude. | |
ID: 59651 | Rating: 0 | rate: / Reply Quote | |
I am surprised that the Server Status page still shows some 140 tasks "in process". | |
ID: 59654 | Rating: 0 | rate: / Reply Quote | |
I would put my money on the bet that simply the task generation configuration script is generating some values that are "out of bounds" in memory access. This might well be the case, but I have a different shot at an explanation: IIRC the certificates usually had to be renewed sometime in summer or early autumn, and I think it may be possible there's no working certificate laid down for ACEMD at all. ____________ - - - - - - - - - - Greetings, Jens | |
ID: 59660 | Rating: 0 | rate: / Reply Quote | |
[quote]... and I think it may be possible there's no working certificate laid down for ACEMD at all. this has happened numerous times in the past :-( So it would be no surprise if also now this is the reason for the problem. | |
ID: 59668 | Rating: 0 | rate: / Reply Quote | |
I managed to get one of the ACEMD3 tasks that came out and it ran fine on my old GTX 1060. | |
ID: 59717 | Rating: 0 | rate: / Reply Quote | |
sad to see that they didnt update the CUDA version to support Ada | |
ID: 59718 | Rating: 0 | rate: / Reply Quote | |
Yea :/ My 4090 is also waiting for work. | |
ID: 59729 | Rating: 0 | rate: / Reply Quote | |
Either new acemd tasks are smaller or new version is faster. | |
ID: 59756 | Rating: 0 | rate: / Reply Quote | |
Either new acemd tasks are smaller or new version is faster. my experience from the past few days is that they differ in size. Also the credit points earned differ accordingly. | |
ID: 59779 | Rating: 0 | rate: / Reply Quote | |
Message boards : News : Update acemd3 app