ATM

Message boards : News : ATM
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 35 · Next

AuthorMessage
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 60095 - Posted: 15 Mar 2023, 15:23:16 UTC - in response to Message 60093.  

GPUgrid is set to only DL 2 WUs per computer.


it's actually 2 per GPU, for up to 8 GPUs. 16 per computer/host.

ACEMD WUs take around 12ish hours and have approxiamtely 50% GPU utilization


acemd3 has always used nearly 100% utilization with a single task on every GPU I've ever run. if you're only seeing 50%, sounds like you're hitting some other kind of bottleneck preventing the GPU from working to its full potential.

ID: 60095 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 60096 - Posted: 15 Mar 2023, 17:53:15 UTC

I just started using nvitop for Linux and it gives a very different image of GPU utilization while running ATM: https://github.com/XuehaiPan/nvitop
ID: 60096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 60097 - Posted: 15 Mar 2023, 18:06:14 UTC - in response to Message 60096.  
Last modified: 15 Mar 2023, 18:09:46 UTC

i would probably give more trust to nvidia's own tools.

watch -n 1 nvidia-smi

or
watch -n 1 nvidia-smi --query-gpu=temperature.gpu,name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,clocks.current.memory,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv


but you said "acemd3" uses 50%. not ATM. overall I'd agree that ATM is closer to 50% effective or a little higher. it cycles between like 90 seconds @95+% and 30 seconds @0% and back and forth for the majority of the run.
ID: 60097 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 60098 - Posted: 15 Mar 2023, 18:09:25 UTC - in response to Message 60094.  
Last modified: 15 Mar 2023, 18:10:23 UTC

I'm running Linux Mint 19 (a bit out of date)
I just retired my last Linux Mint 19 computer yesterday and it had been running ATM, ACEMD & Python WUs on a 2080 Ti (12/7.5) fine. BTW, I tried the LM 21.1 upgrade from LM 20.3 and can't do things like open BOINC folder as admin. I can't see any advantage to 21.1 so I'm going to do a fresh install and revert back to 20.3.

My machine has a gtx-950, so cuda tasks are OK.
Is there a minimum requirement for CUDA and Compute Capability for ATM WUs?
https://www.techpowerup.com/gpu-specs/geforce-gtx-950.c2747 says CUDA 5.2 and https://developer.nvidia.com/cuda-gpus says 5.2.
ID: 60098 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 60099 - Posted: 15 Mar 2023, 18:14:54 UTC - in response to Message 60098.  

Is there a minimum requirement for CUDA and Compute Capability for ATM WUs?
https://www.techpowerup.com/gpu-specs/geforce-gtx-950.c2747 says CUDA 5.2 and https://developer.nvidia.com/cuda-gpus says 5.2.


very likely the min CC is 5.0 (Maxwell) since Kepler cards seem to be erroring with the message that the card is too old.

all cuda 11.x apps are supported by CUDA 11.1+ drivers. with CUDA 11.1, Nvidia introduced forward compatibility of minor versions. so as long as you have 450+ drivers you should be able to run any CUDA app up to 11.8. CUDA 12+ will require moving to CUDA 12+ compatible drivers.
ID: 60099 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 60100 - Posted: 15 Mar 2023, 18:16:47 UTC - in response to Message 60095.  

GPUgrid is set to only DL 2 WUs per computer.

it's actually 2 per GPU, for up to 8 GPUs. 16 per computer/host.
I'm sure you're right, it's been years since I put more than on GPU on a computer.

ACEMD WUs take around 12ish hours and have approxiamtely 50% GPU utilization
acemd3 has always used nearly 100% utilization with a single task on every GPU I've ever run. if you're only seeing 50%, sounds like you're hitting some other kind of bottleneck preventing the GPU from working to its full potential.[/quote]Let me rephrase that since it's been a long time since there was a steady flow of ACEMD. I always run 2 ACEMD WUs per GPU with no other GPU projects running. I can't remember what ACEMD utilization was but I don't recall that they slowed down much by running 2 WUs together.
ID: 60100 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 60101 - Posted: 15 Mar 2023, 18:19:02 UTC - in response to Message 60100.  

maybe not much slower, but also not faster.
ID: 60101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 60102 - Posted: 15 Mar 2023, 18:20:10 UTC - in response to Message 60097.  

i would probably give more trust to nvidia's own tools.

watch -n 1 nvidia-smi

nvitop does that but graphs it.
ID: 60102 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 60103 - Posted: 15 Mar 2023, 18:22:37 UTC - in response to Message 60101.  

maybe not much slower, but also not faster.

But it has the advantage that compared to running a single ACEMD WU and letting the second GG sit idle waiting until it finishes and not getting the quick turnaround bonus feels like getting robbed :-) But who's counting?
ID: 60103 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 60104 - Posted: 15 Mar 2023, 18:26:29 UTC - in response to Message 60103.  
Last modified: 15 Mar 2023, 18:28:30 UTC

until your 12h task turns into two 25hr tasks running two and you get robbed anyway. robbed of the bonus for two tasks instead of just one.

you can set your machine to not download excess tasks by setting a smaller cache size or playing with resource share. that way it wont download the second task until the first one is nearly finished. there are lots of options you can tweak to get the desired behavior.
ID: 60104 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60105 - Posted: 15 Mar 2023, 21:13:38 UTC
Last modified: 15 Mar 2023, 21:13:58 UTC

Picked up another ATM task but not holding much hope that it will run correctly based on the previous wingmen output files. Looks like the configuration is not correct again.

Had hope since the task mentions new in the name.

T_CDK2_new_2_edit_26_1h1q_T4_2_1-QUICO_TEST_ATM-0-1-RND2833_2

[Errno 2] No such file or directory

openmm.OpenMMException: Illegal value for DeviceIndex: 1

Guess I will be the next guinea pig.
ID: 60105 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]

Send message
Joined: 16 Jul 07
Posts: 209
Credit: 5,496,860,456
RAC: 8,582,660
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60106 - Posted: 16 Mar 2023, 1:28:51 UTC

Does the ATM app work with RTX 4000 series?
Reno, NV
Team: SETI.USA
ID: 60106 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 60107 - Posted: 16 Mar 2023, 2:12:42 UTC - in response to Message 60106.  

Does the ATM app work with RTX 4000 series?


Maybe. The Python app does, and the ATM is a similar kind of setup. You’ll have to try it and see.

Not sure how much progress the project has made for Windows though.
ID: 60107 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 60108 - Posted: 16 Mar 2023, 8:06:10 UTC - in response to Message 60098.  

I'm running Linux Mint 19 (a bit out of date)
I just retired my last Linux Mint 19 computer yesterday and it had been running ATM, ACEMD & Python WUs on a 2080 Ti (12/7.5) fine. BTW, I tried the LM 21.1 upgrade from LM 20.3 and can't do things like open BOINC folder as admin. I can't see any advantage to 21.1 so I'm going to do a fresh install and revert back to 20.3.

My machine has a gtx-950, so cuda tasks are OK.
Is there a minimum requirement for CUDA and Compute Capability for ATM WUs?
https://www.techpowerup.com/gpu-specs/geforce-gtx-950.c2747 says CUDA 5.2 and https://developer.nvidia.com/cuda-gpus says 5.2.



Glad to know someone else also has the same problem with Mint 21.1. I will shift to some other flavour.
ID: 60108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 60111 - Posted: 18 Mar 2023, 6:30:31 UTC

Got my first ATM Beta. Completed and validated.
ID: 60111 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Quico
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 28 Feb 23
Posts: 35
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60120 - Posted: 20 Mar 2023, 14:45:24 UTC - in response to Message 60091.  

My observations show the GPU switching from periods of high utilization (~96-98%) to periods of idle (0%). About every minute or two.

i think the current size of the ATM are pretty good. about 4hrs on a 3080Ti and about 5hrs on a 2080Ti.

I'll second Richards's comment that you should put some effort into checkpointing about fixing the completion reporting (add weights to the job.xml file)


That sounds how ATM is intended to work for now. The idle GPU periods correspond to writing coordinates.

Happy to know that size of the jobs are good!


Picked up another ATM task but not holding much hope that it will run correctly based on the previous wingmen output files. Looks like the configuration is not correct again.

Had hope since the task mentions new in the name.

T_CDK2_new_2_edit_26_1h1q_T4_2_1-QUICO_TEST_ATM-0-1-RND2833_2

[Errno 2] No such file or directory

openmm.OpenMMException: Illegal value for DeviceIndex: 1

Guess I will be the next guinea pig.


I have seen your errors but I'm not sure why it's happening since I got several jobs running smoothly right now. I'll ask around.

The new tag is a legacy part on my end about receptor naming.
ID: 60120 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Quico
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 28 Feb 23
Posts: 35
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60121 - Posted: 20 Mar 2023, 14:46:25 UTC

Another heads-up, it seems that the Windows app will available soon! That way we'll be able to look into the progress reporting issue.
ID: 60121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60123 - Posted: 20 Mar 2023, 19:54:13 UTC - in response to Message 60121.  

...it seems that the Windows app will available soon!

that's good news - I'm looking foward to receiving ATM tasks :-)
ID: 60123 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]

Send message
Joined: 16 Jul 07
Posts: 209
Credit: 5,496,860,456
RAC: 8,582,660
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60126 - Posted: 22 Mar 2023, 6:52:36 UTC
Last modified: 22 Mar 2023, 6:53:46 UTC

I see that there is a windows app for ATM. But I have never received an app on any of my win machines, even with an updater. And yes, I have all the right project preferences set (everything checked). So, has anyone received an ATM task on a windows machine?
ID: 60126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Quico
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 28 Feb 23
Posts: 35
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60128 - Posted: 22 Mar 2023, 11:15:48 UTC - in response to Message 60126.  

I see that there is a windows app for ATM. But I have never received an app on any of my win machines, even with an updater. And yes, I have all the right project preferences set (everything checked). So, has anyone received an ATM task on a windows machine?


As far as I know, we are doing the final tests.
I'll let you know once it's fully ready and I have the green light to send jobs through there.
ID: 60128 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 35 · Next

Message boards : News : ATM

©2025 Universitat Pompeu Fabra