Message boards :
News :
What is happening and what will happen at GPUGRID, update for 2021
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
| Author | Message |
|---|---|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Great news! Now to just have some tasks ready to send. |
|
Send message Joined: 23 Dec 18 Posts: 12 Credit: 50,868,500 RAC: 0 Level ![]() Scientific publications
|
Good news! :-) |
|
Send message Joined: 18 Jun 12 Posts: 2 Credit: 100,396,087 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Got 1 WU. Boinc shows a runtime of 6 days on my 2080ti with 60% powertarged. Hoppefully its not the real time. Edit: 1% after 10 Minutes so probably something about 16h |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Picked up a task apiece on two hosts. Some 800 tasks in progress now. Hope this means the project is getting back to releasing steady work. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
it seems that even though the cuda1121 app is available and works fine on Ampere cards, there's nothing preventing the cuda101 app from being sent to an Ampere host. these will always fail. example: https://gpugrid.net/result.php?resultid=32643471 The project-side scheduler needs to be adjusted to not allow the cuda101 app from being sent to Ampere hosts. this can be achieved by checking the compute capability reported from the host. In addition to the cuda version checks, the cuda101 app should be limited to hosts with compute capability less than 8.0. hosts with 8.0 or greater should only get the cuda1121 app. or simply remove the cuda101 app and require all users to update their video drivers to use the 1121 app.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
This is correct. I had the same exact failure with the CUDA101 app sent to my Ampere RTX 3080. Failure was the inability to compile the CUDA kernel because it was expecting a different architecture. https://www.gpugrid.net/result.php?resultid=32642922 |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Technical question: has anybody idea if the CC is available in the scheduler? |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Technical question: has anybody idea if the CC is available in the scheduler? I'm sure it is. I'll start digging out some references, if you want. Sorry, you caught me in the middle of a driver update. Try https://boinc.berkeley.edu/trac/wiki/AppPlanSpec#NVIDIAGPUapps: <min_nvidia_compcap>MMmm</min_nvidia_compcap> |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
To expand on what Richard wrote, I’m sure it’s available. Einstein@home uses this metric in their scheduler to gatekeep some of their apps to certain generations of Nvidia GPUs. So it’s definitely information that’s provided from the host to the project via BOINC.
|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So it’s definitely information that’s provided from the host to the project via BOINC. From the most recent sched_request file sent from this computer to your server: <coproc_cuda> ... <major>7</major> <minor>5</minor> |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Uhm... yes but I was wondering how to retrieve it in the C++ code. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Uhm... yes but I was wondering how to retrieve it in the C++ code. Same principle. Start with Specifying plan classes in C++, third example. ...
if (!strcmp(plan_class, "cuda23")) {
if (!cuda_check(c, hu,
100, // minimum compute capability (1.0)
200, // max compute capability (2.0)
2030, // min CUDA version (2.3)
19500, // min display driver version (195.00)
384*MEGA, // min video RAM
1., // # of GPUs used (may be fractional, or an integer > 1)
.01, // fraction of FLOPS done by the CPU
.21 // estimated GPU efficiency (actual/peak FLOPS)
)) {
return false;
}
} |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Uhm... yes but I was wondering how to retrieve it in the C++ code. I guess that there might be an easier workaround, with no need to touch the current code. It would consist of unfolding in Project Preferences page the ACEMD3 app into ACEMD3 (cuda 101) and ACEMD3 (cuda 1121) This way, Ampere users would be able to untick ACEMD3 (cuda 101) app, thus manually preventing to receive tasks that will fail for sure. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Won't work for multi-gpu users that have both Turing and Ampere cards installed in a host. I have a 2080 and 3080 together in a host. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Won't work for multi-gpu users that have both Turing and Ampere cards installed in a host. It should work as a manual selection in Project Preferences for receiving ACEMD3 (cuda 1121) tasks only. Your RTX 3080 (Ampere - device 0) can't process ACEMD3 (cuda 101), as seen in your failed task e1s627_I757-ADRIA_AdB_KIXCMYB_HIP-1-2-RND0972_0, but it can process ACEMD3 (cuda 1121), as seen in your succeeded task e1s385_I477-ADRIA_AdB_KIXCMYB_HIP-0-2-RND6281_2 And your RTX 2080 (Turing - device 1) on the same host can also process ACEMD3 (cuda 1121) tasks, as seen in your succeeded task e1s667_I831-ADRIA_AdB_KIXCMYB_HIP-1-2-RND8282_1 Therefore, restricting preferences in a particular venue for your host # 462662] to only receiving ACEMD3 (cuda 1121) tasks would work for both cards. The exception is the general limitation for ACEMD3 app and for all kind of mixed multi-GPU systems when restarting tasks in a different device. It was described by Toni at his Message #52865, dated on Oct 17 2019. Paragraph: Can I use it on multi-GPU systems? Can I use it on multi-GPU systems? |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There are two different aspects to this debate: 1) What will a project server send to a mixed-GPU client? 2) Which card will the client choose to allocate a task to? The project server will allocate work solely on the basis of Keith's 3080. BOINC has been developed, effectively, to hide his 2080 from the server. Keith has some degree of control over the behaviour of his client. He can exclude certain applications from particular cards (using cc_config.xml), but he can't exclude particular versions of the same application - the control structure is too coarse. He can also control certain behaviours of applications at the plan_class level (using app_config.xml), but that control structure is too fine - it doesn't contain any device-level controls. Other projects have been able to develop general-purpose GPU applications which are at least compatible with mixed-device hosts - tasks assigned to the 'wrong' or 'changed' device at least run, even if efficiency is downgraded. If this project is unable to follow that design criterion (and I don't know why it is unable at this moment), then I think the only available solution at this time is to divide the versions into separate applications - analogous to the old short/long tasks - so that the limited range of available client options can be leveraged. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
I haven’t seen enough compelling evidence to justify keeping the cuda101 app. The cuda1121 app works on all hosts and is basically the same speed. Removing the cuda101 app would solve all problems.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I agree. Simply remove the CUDA101 app and restrict sending tasks to any host that hasn't updated the drivers to the CUDA11.2 level. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
What's with the new ACEMD beta version 9.17, introduced today? What are we testing? I got a couple of really short tasks on Linux host 508381. That risks really messing up DCF. |
|
Send message Joined: 22 Aug 19 Posts: 7 Credit: 168,393,363 RAC: 0 Level ![]() Scientific publications
|
I received 15 of the test wu's no problems on Ampere all crunched without issue just want more :) |
©2025 Universitat Pompeu Fabra