What is happening and what will happen at GPUGRID, update for 2021

Message boards : News : What is happening and what will happen at GPUGRID, update for 2021
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57240 - Posted: 7 Aug 2021, 6:44:07 UTC

As you know, GPUGRID was the first BOINC project to run GPU applications, in fact we help in creating the infrastructure for that. This was very many years ago and since then many things changed. In particular, recently, we had not a constant stream of workunits. I would like to explain the present and expected future of GPUGRID here.

In the last few years, we moved from doing science by running very many simulations to develop new methods at the boundary between physical simulations and machine learning methods/artificial intelligence. These new methods did not require a lot of simulations and most of the PhD students in the research group did not use GPUGRID daily. We still had some long term project running on GPUGRID for which you will see results shortly in terms of new scientific publications.

Among other things ACEMD, the application behind GPUGRID is now built partially using OpenMM, of which I am also principal investigator. As you might know OpenMM is also used in Folding@Home. We have received a grant to develop OpenMM very recently with one or two people starting before the end of the year. This will be good for GPUGRID because it means that we will be using GPUGRID a lot more.

Furthermore, we recently found a way to run AI simulations in GPUGRID. We have only run very few test cases, but there is a PhD student in the lab with a thesis on cooperative intelligence, where very many AI agents collaborate to solve tasks. The goal is to understand how cooperative intelligence works. We are also looking for a postdoc in cooperative intelligence, in case you know somebody.

https://www.compscience.org

I hope that this clarify the current situation. On the practical term, we expect to have the ACEMD application fixed for RTX30xx within few weeks, as now the developper of ACEMD is also doing the deployment on GPUGRID, making everything simpler.

GDF
ID: 57240 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 22 May 20
Posts: 110
Credit: 115,525,136
RAC: 345
Level
Cys
Scientific publications
wat
Message 57241 - Posted: 7 Aug 2021, 11:13:43 UTC

Thanks for the much anticipated update! Appreciate that you provide a roadmap for the future. Hopefully there aren't too many roadblocks ahead with the development of OpenMM. The future project direction sounds very exciting :) I'll take that as an opportunity to upgrade my host by the end of the year to contribute more intensively next year!

Keep up the good work
ID: 57241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
baffoni

Send message
Joined: 7 Mar 20
Posts: 1
Credit: 78,381,276
RAC: 0
Level
Thr
Scientific publications
wat
Message 57242 - Posted: 7 Aug 2021, 14:54:01 UTC - in response to Message 57240.  

Are there any plans to add support for AMD GPUs now that ACEMD3 supports OPENCL? https://software.acellera.com/docs/latest/acemd3/capabilities.html This would increase participation.
ID: 57242 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[CSF] Aleksey Belkov

Send message
Joined: 26 Dec 13
Posts: 86
Credit: 1,292,358,731
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57244 - Posted: 7 Aug 2021, 20:11:45 UTC
Last modified: 7 Aug 2021, 20:18:19 UTC

Good news, everyone!
ID: 57244 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey

Send message
Joined: 2 Jan 09
Posts: 303
Credit: 7,321,800,090
RAC: 227,498
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57245 - Posted: 8 Aug 2021, 3:44:03 UTC - in response to Message 57240.  

Great News, Thanks!!!!
ID: 57245 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF] fansyl

Send message
Joined: 26 Sep 13
Posts: 20
Credit: 1,714,356,441
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57246 - Posted: 8 Aug 2021, 8:53:02 UTC

Thanks for the news, I hope the work goes well. I am looking forward to making new calculations.

Go ahead !
ID: 57246 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
erotemic

Send message
Joined: 29 May 21
Posts: 1
Credit: 1,067,880,228
RAC: 0
Level
Met
Scientific publications
wat
Message 57249 - Posted: 9 Aug 2021, 23:15:01 UTC

I've got a 3090 and 2 1080ti's waiting for some work. Looking forward to the new updates.
ID: 57249 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57250 - Posted: 10 Aug 2021, 13:24:16 UTC

Thanks for the update, GDF, it's very much appreciated!

MrS
Scanning for our furry friends since Jan 2002
ID: 57250 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 869
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57251 - Posted: 14 Aug 2021, 16:44:29 UTC - in response to Message 57240.  
Last modified: 14 Aug 2021, 16:44:54 UTC

On the practical term, we expect to have the ACEMD application fixed for RTX30xx within few weeks, as now the developper of ACEMD is also doing the deployment on GPUGRID, making everything simpler.

one of my hosts with two RTX3070 inside will be pleased :-)
ID: 57251 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile luckdragon2000

Send message
Joined: 16 Nov 11
Posts: 4
Credit: 420,687,609
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwat
Message 57252 - Posted: 20 Aug 2021, 6:46:21 UTC - in response to Message 57242.  

Are there any plans to add support for AMD GPUs now that ACEMD3 supports OPENCL? https://software.acellera.com/docs/latest/acemd3/capabilities.html This would increase participation.


I would also like to know if AMD will finally be supported. I have a water-cooled Radeon RX 6800 XT and am ready to utilize its full capacity for cancer and COVID research, as well as other projects as they may come.

AMD Ryzen 9 5950X
AMD Radeon RX 6800 XT
32BG 3200MHz CAS-14 RAM
NVMe 4th Gen storage
Custom water cooling
ID: 57252 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57257 - Posted: 2 Sep 2021, 12:47:05 UTC - in response to Message 57252.  

Some initial new version of ACEMD has been deployed on linux and it's working, but we are still testing.

gdf
ID: 57257 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 57259 - Posted: 2 Sep 2021, 18:06:25 UTC - in response to Message 57257.  
Last modified: 2 Sep 2021, 18:44:40 UTC

Some initial new version of ACEMD has been deployed on linux and it's working, but we are still testing.

gdf


what is the criteria for sending the cuda101 app vs the cuda1121 app?

I see both apps exist. and new drivers on even older cards will support both apps. For example, if you have CUDA 11.2 drivers on a Turing card, you can run both the 11.2 app or the 10.1 app. so what criteria does the server use to determine which app to send my Turing cards?

Of course Ampere cards should only get the 11.2 app.

Also looks like the Windows apps are missing for New ACEMD, are you dropping Windows support?
ID: 57259 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WR-HW95

Send message
Joined: 16 Dec 08
Posts: 7
Credit: 1,549,469,403
RAC: 1
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57260 - Posted: 2 Sep 2021, 19:04:21 UTC

Support for AMD cards would make good for project.
Atm. I´m running mostly Milkyway with my 6900XT and its doing 3 units in 1:50... that takes for 1080Ti about 6:00. I havent checked times to WCG because GPU units cames up so rarely, but I can imagine it to be ok in that too.
ID: 57260 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 57261 - Posted: 2 Sep 2021, 19:38:16 UTC - in response to Message 57257.  
Last modified: 2 Sep 2021, 19:48:46 UTC

Some initial new version of ACEMD has been deployed on linux and it's working, but we are still testing.

gdf


there seems to be a problem with the new 2.17 app. it's always trying to run on GPU0 even when BOINC assigns it to another GPU.

I had this happen on two separate hosts now. the host picked up a new task, BOINC assigns it to some other GPU (like device 6 or device 3) but the acemd process spins up on GPU0 anyway, even though it is already occupied by another BOINC process from another project. I think there's sometime off in that the boinc device assignment isnt being communicated to the app properly. this results in multiple processes running on a single GPU, and no process running on the device that BOINC assigned the GPUGRID task to.

rebooting the BOINC client brings it back to "OK" since it prioritizes the GPUGRID task to GPU0 on startup (probably due to resource share). but I feel this will keep happening.

needs an update ASAP.
ID: 57261 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57262 - Posted: 2 Sep 2021, 21:20:18 UTC

Haven't snagged one of the new ones as yet (I was out all day), but I'll watch out for them, and try to find out where the device allocation is failing.
ID: 57262 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57264 - Posted: 2 Sep 2021, 22:54:46 UTC - in response to Message 57261.  

there seems to be a problem with the new 2.17 app. it's always trying to run on GPU0 even when BOINC assigns it to another GPU.

I had this happen on two separate hosts now. the host picked up a new task, BOINC assigns it to some other GPU (like device 6 or device 3) but the acemd process spins up on GPU0 anyway, even though it is already occupied by another BOINC process from another project. I think there's sometime off in that the boinc device assignment isnt being communicated to the app properly. this results in multiple processes running on a single GPU, and no process running on the device that BOINC assigned the GPUGRID task to.

First of all: Congratulations, both of your multi GPU systems that weren't getting work from previous app versions, seem to have the problem solved with this new one. Welcome back to the field!

I'm experiencing the same behavior, and I can go even further:
I catched six WUs of the new app version 2.17 at my triple 1650 GPU system.
Then I aborted three of these WUs, and two of them were recatched at my twin 1650 GPU system.
At the triple GPU system: While all the three WUs seem to be progressing normally from the Boinc Manager point of view, looking at Psensor only GPU #0 (first PCIE slot) is working and GPUs #1 and #2 are inactive. It's like GPU #0 is carrying all the workload for the three WUs. Same 63% fraction done after 8,25 hours for all the three WUs. However, CPU usage is coherent with three WUs running concurrently at this system.
At the twin GPU system: While both WUs seem to be progressing normally from the Boinc Manager point of view, looking at Psensor only GPU #0 (first PCIE slot) is working and GPU #1 is inactive. It's like GPU #0 is carrying all the workload for both WUs. Same 89% fraction done after 8 hours for both WUs. Also, CPU usage is coherent with two WUs running concurrently at this system.
ID: 57264 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57265 - Posted: 3 Sep 2021, 6:20:59 UTC - in response to Message 57261.  

there seems to be a problem with the new 2.17 app. it's always trying to run on GPU0 even when BOINC assigns it to another GPU.

Confirmed:
While Boinc Manager was saying that Task #32640074, Task #32640075 and Task #32640080 were running at devices #0, #1 and #2 at this triple GTX 1650 GPU system, they actually were all processed concurrently at the same device #0.
ID: 57265 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 57266 - Posted: 3 Sep 2021, 15:05:45 UTC - in response to Message 57264.  


First of all: Congratulations, both of your multi GPU systems that weren't getting work from previous app versions, seem to have the problem solved with this new one. Welcome back to the field!


yeah it was actually partially caused by some settings on my end, combined with the fact that when the cuda1121 app was released on July 1st they deleted/retired/removed the cuda100 app. had they left the cuda100 app in place, I would have at least received that one still. i'll post more details in the original thread about that issue.
ID: 57266 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57298 - Posted: 14 Sep 2021, 13:15:02 UTC

The device problem should be fixed now.
Windows version on their way
ID: 57298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57299 - Posted: 14 Sep 2021, 16:54:23 UTC - in response to Message 57298.  

Windows version deployed
ID: 57299 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : News : What is happening and what will happen at GPUGRID, update for 2021

©2025 Universitat Pompeu Fabra