Large scale experiment: MDAD

Message boards : News : Large scale experiment: MDAD
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

AuthorMessage
Trotador

Send message
Joined: 25 Mar 12
Posts: 103
Credit: 14,948,929,771
RAC: 12,866
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53496 - Posted: 26 Jan 2020, 8:30:54 UTC

An important issue I've noted after crunching these GPUGrid units in my Ubuntu 16.04 hosts, not in 18.04 ones, is that the rest of BOINC GPU projects (and folding#home) fail with error when trying to crunch. I tested with Amicable, Einstein and FAH.

I've had to reinstall NVIDIA drivers and restart to get things working again. A matter of libraries and links I guess.
ID: 53496 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw

Send message
Joined: 13 Apr 18
Posts: 2
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 53498 - Posted: 26 Jan 2020, 9:53:56 UTC
Last modified: 26 Jan 2020, 10:20:32 UTC

I added GPUGrid to the projects list on one of my machines two years ago, I've never received a work unit. Other GPU projects are not having any trouble. Removed now.
ID: 53498 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53499 - Posted: 26 Jan 2020, 10:26:48 UTC - in response to Message 53498.  

I added GPUGrid to the projects list on one of my machines two years ago, I've never received a work unit. Other GPU projects are not having any trouble. Removed now.


I checked your computer and it appears it has an AMD GPU which is not supported. Only Nvidia cards are supported. Here's the FAQ for the new app:

http://www.gpugrid.net/forum_thread.php?id=5002#52865
ID: 53499 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werkstatt

Send message
Joined: 23 May 09
Posts: 121
Credit: 400,300,664
RAC: 14,406
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53500 - Posted: 26 Jan 2020, 11:02:10 UTC

Some years ago there was a AMD application and is still possible to check the box for AMD wu's in the GPUGRID preferences.
Maybe there will be less confusion if this check-box is removed ..
ID: 53500 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53501 - Posted: 26 Jan 2020, 11:28:43 UTC - in response to Message 53492.  

I believe the limit is 16 per host. That is what I got on my 3 hosts. After that I received the "you have reached the limit of tasks in progress message"


The limit is 2 per GPU. I see your computers are set up to run Seti, where it is common to "spoof" the server into "thinking" you have 32 coprocessors/GPUs per rig.
ID: 53501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw

Send message
Joined: 13 Apr 18
Posts: 2
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 53502 - Posted: 26 Jan 2020, 13:18:11 UTC - in response to Message 53499.  
Last modified: 26 Jan 2020, 13:20:36 UTC

It is a little ironic that a project specially for GPU's supports less GPU's than other projects. Einstein, Milky Way, Seti, etc. no problem.
ID: 53502 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile BladeD

Send message
Joined: 1 May 11
Posts: 9
Credit: 144,358,529
RAC: 0
Level
Cys
Scientific publications
watwatwat
Message 53503 - Posted: 26 Jan 2020, 15:16:00 UTC

Any ideas when new workunits will be release?
ID: 53503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werkstatt

Send message
Joined: 23 May 09
Posts: 121
Credit: 400,300,664
RAC: 14,406
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53504 - Posted: 26 Jan 2020, 15:50:44 UTC

I see your computers are set up to run Seti, where it is common to "spoof" the server into "thinking" you have 32 coprocessors/GPUs per rig.

Tell me more !
Seti has the problem of not beeing always available and not having always wu's available, but the allowed runtime is quite long. So it makes sense to have a larger buffer, but this should only affect the Seti wu's.
ID: 53504 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 53505 - Posted: 26 Jan 2020, 17:07:46 UTC - in response to Message 53501.  
Last modified: 26 Jan 2020, 17:12:20 UTC

I believe the limit is 16 per host. That is what I got on my 3 hosts. After that I received the "you have reached the limit of tasks in progress message"


The limit is 2 per GPU. I see your computers are set up to run Seti, where it is common to "spoof" the server into "thinking" you have 32 coprocessors/GPUs per rig.

I didn't think that was the issue. I never received more than two tasks per gpu on the previous run of work units.

It depends on the project whether they recognize the spoofed gpus. Seti does and why I use it to keep the gpus fed during the ever longer Seti outages.

It may be that this run of work did recognize the spoofed gpus. But the math doesn't add up for the 4 hosts. Each host got 16 WU's. I have three 3 card hosts and one 4 card host. One 3 card host got nothing because it primarily is an Einstein machine and I got nothing but gpu cache is full for a GPUGrid request.

Except for the Einstein host, all the other hosts are spoofed with either 21 or 32 gpus. By your math I should have only received 8 tasks on the 4 card host or 64 tasks. I did neither. It appears to have been fixed at 16 for each host. As I returned work, I kept getting my cache refilled to a 16 count for each host. I figured that was more likely from my global cache setting.
ID: 53505 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 53506 - Posted: 26 Jan 2020, 17:10:20 UTC - in response to Message 53504.  

I see your computers are set up to run Seti, where it is common to "spoof" the server into "thinking" you have 32 coprocessors/GPUs per rig.

Tell me more !
Seti has the problem of not beeing always available and not having always wu's available, but the allowed runtime is quite long. So it makes sense to have a larger buffer, but this should only affect the Seti wu's.

The coproc_info.xml file that is created by the client controls the number of gpus detected.

Manipulate that file and you can tell BOINC that you have as many as 64 gpus. But you can't exceed 64 as that is a hard limit in the server side code.
ID: 53506 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werkstatt

Send message
Joined: 23 May 09
Posts: 121
Credit: 400,300,664
RAC: 14,406
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53509 - Posted: 26 Jan 2020, 20:52:12 UTC

The coproc_info.xml file that is created by the client controls the number of gpus detected.

Got it. THX !
ID: 53509 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53513 - Posted: 27 Jan 2020, 8:37:25 UTC - in response to Message 53509.  

This was the first piece of a larger batch of 14k WUs. It's (amazingly!) already complete. I'll need to process it to create new WUs. The purpose of the work is (broadly speaking) methods development, i.e. build a dataset to improve the foundation of future MD-based research (not just GPUGRID). More details may come if it works ;)

Thanks to everybody for contributing. Also special thanks to those taking care of providing answers to BOINC details.


ID: 53513 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 53515 - Posted: 27 Jan 2020, 14:17:28 UTC

For a serial process like this the optimum would be to only send one WU per GPU.
ID: 53515 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53516 - Posted: 27 Jan 2020, 15:29:02 UTC - in response to Message 53515.  

For a serial process like this the optimum would be to only send one WU per GPU.

not really; because what would happen then is that there always is some idle time between uploading/reporting the result of a task and downloading the next one.
Which means the GPU cools off for a (short) while and heats up once the new task starts being cruched.
If this happens several time per day, over a lenghty period of time, this so-called "thermal cycle" definitely shortens the lifetime of the GPU.
Hence, it's definitely better to have another task already waiting to start immediately after the previous one gets finished.

ID: 53516 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 343
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53517 - Posted: 27 Jan 2020, 18:13:05 UTC - in response to Message 53516.  

not really; because what would happen then is that there always is some idle time between uploading/reporting the result of a task and downloading the next one.
Which means the GPU cools off for a (short) while and heats up once the new task starts being cruched.
If this happens several time per day, over a lenghty period of time, this so-called "thermal cycle" definitely shortens the lifetime of the GPU.
Hence, it's definitely better to have another task already waiting to start immediately after the previous one gets finished.

+1
ID: 53517 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 53519 - Posted: 28 Jan 2020, 13:52:38 UTC - in response to Message 53516.  

...the GPU cools off for a (short) while and heats up once the new task starts being cruched (sic).
If this happens several time per day, over a lengthy period of time, this so-called "thermal cycle" definitely shortens the lifetime of the GPU.
The degradation process for electronics is called electromigration. Flowing current while hot actually moves atoms. Where the conductors neck down, e.g. turning a sharp corner or going over bumps, the current density increases and hence the electromigration increases. This is an irreversible process that accelerates as the conductor chokes down and ultimately results in a broken line and failure.

Since GPUGrid is supply-limited one per GPU would assure that more hosts get a WU before hosts start getting additional WUs. Now that the WUs run in less than half the time two per GPU works well but folks still get left out.

The GPUGrid server is notoriously slow. If it were fast and they had over 10,000 WUs continuously available then one per GPU would be optimum.
ID: 53519 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53520 - Posted: 28 Jan 2020, 14:53:20 UTC - in response to Message 53506.  



Manipulate that file and you can tell BOINC that you have as many as 64 gpus. But you can't exceed 64 as that is a hard limit in the server side code.

[/quote]

Please don't "fake" gpus as it will create WU "hoarding": it will deprive other users of work, and slow down our analysis (we sometimes have to wait for batches to be complete).
ID: 53520 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53521 - Posted: 28 Jan 2020, 15:51:21 UTC - in response to Message 53520.  

Manipulate that file and you can tell BOINC that you have as many as 64 gpus. But you can't exceed 64 as that is a hard limit in the server side code.
Please don't "fake" gpus as it will create WU "hoarding": it will deprive other users of work, and slow down our analysis (we sometimes have to wait for batches to be complete).
Fortunately simple manipulation doesn't work, as this file is overwitten by the BOINC manager at startup.
ID: 53521 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pututu

Send message
Joined: 8 Oct 16
Posts: 27
Credit: 4,153,801,869
RAC: 0
Level
Arg
Scientific publications
watwatwatwat
Message 53522 - Posted: 28 Jan 2020, 16:00:12 UTC - in response to Message 53521.  

Manipulate that file and you can tell BOINC that you have as many as 64 gpus. But you can't exceed 64 as that is a hard limit in the server side code.
Please don't "fake" gpus as it will create WU "hoarding": it will deprive other users of work, and slow down our analysis (we sometimes have to wait for batches to be complete).
Fortunately simple manipulation doesn't work, as this file is overwitten by the BOINC manager at startup.

You can prevent the coproc file from been overwritten by BOINC.
ID: 53522 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53525 - Posted: 28 Jan 2020, 16:46:41 UTC - in response to Message 53522.  
Last modified: 28 Jan 2020, 16:47:17 UTC

Manipulate that file and you can tell BOINC that you have as many as 64 gpus. But you can't exceed 64 as that is a hard limit in the server side code.
Please don't "fake" gpus as it will create WU "hoarding": it will deprive other users of work, and slow down our analysis (we sometimes have to wait for batches to be complete).
Fortunately simple manipulation doesn't work, as this file is overwitten by the BOINC manager at startup.

You can prevent the coproc file from been overwritten by BOINC.


Which may explain tasks failing with

# Engine failed: Illegal value for DeviceIndex: 2

i.e. they attempt to run on non-existent gpus.
ID: 53525 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

Message boards : News : Large scale experiment: MDAD

©2025 Universitat Pompeu Fabra