Message boards :
News :
PYSCFbeta: Quantum chemistry calculations on GPU
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 14 · Next
| Author | Message |
|---|---|
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
Here is a tale of 2 computers, one that was getting units, and the other was not. At first I did. But including <fraction_done_exact/> seems to heal that fairly quickly. <app>
<name>PYSCFbeta</name>
<!-- Quantum chemistry calculations on GPU -->
<plan_class>cuda1121</plan_class>
<gpu_versions>
<cpu_usage>1</cpu_usage>
<gpu_usage>1</gpu_usage>
</gpu_versions>
<fraction_done_exact/>
</app> |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
When you send out WUs with 0.991C + 1NV BOINC does not assign a CPU core to that task. You should designate them 1C. the errors have nothing to do with the CPU resource allocation setting. they all errored because of running on GPUs that are too old, the app needs cards with at least CC of 6.0+ (Pascal and up). at worst, if someone is running the CPU full out 100% and not leaving space CPU cycles available (as they should), the worst that happens is that the GPU task might run a little more slowly. but it wont fail. I believe that the issue of "0.991" CPUs or whatever is a byproduct of the BOINC serverside software. from what I've read elsewhere, this value is not intentionally set by the researchers, it is automatically selected by the BOINC server somewhere along the way, and the researchers here have previously commented that they are not aware of any way to override this serverside. so competent users can just override it themselves if they prefer. setting your CPU use in BOINC to like 99 or 98% has the same effect overall though.
|
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 69 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here is a tale of 2 computers, one that was getting units, and the other was not. Thanks for this information. I updated my computers. Now, I remember this <fraction_done_exact/> from a post several years ago. I can't remember the thread. In the past I didn't need this, because the tasks would correct themselves eventually, even the ATMbetas. The Quantum Chemistry on GPU does the complete opposite. I wonder if this is connected to the observation of "upwards of 30 threads utilized per task" as posted by Ian&Steve C.? |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
nah the multi thread issue has already been fixed. the app only uses a single thread now.
|
|
Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,496,860,456 RAC: 12,111 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The work-units require a lot of GPU memory. How much is "a lot" exactly? I have a pacal card, so it meets the compute capability requirement. But it has only 2gb of VRAM. But without knowing the amount of VRAM required, I am not sure if it will work. Reno, NV Team: SETI.USA |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
The work-units require a lot of GPU memory. It requires more than 2GB
|
|
Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,496,860,456 RAC: 12,111 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|
|
Send message Joined: 21 Dec 23 Posts: 51 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
This is all correct I believe. It seems that the jobs have enough retry attempts that all work units end up eventually succeeding. The scheduler has some inbuilt mechanism to classify hosts as "reliable" it also has a mechanism to send workunits that have failed a few times to only hosts that are "reliable". This is not ideal of course. We will try and get the CC requirements honoured but these are project wide scheduler settings which are rather complex to fix without breaking everything else that is currently working. The download limitations is something I will not be able to change easily. A potential reason I can guess for the current settings is to stop a failing host acting as a black-hole of failed jobs or something similar. The large file download should just happen once. The app is deployed in the same way as the ATM app. It is a 2GB zip file that contains a python environment and some cuda libraries. Each work-unit only requires downloading a small file (<1MB I think). This last large scale run has been rather impressive. The throughput was very high! Especially considering that it is only on Linux hosts and not Windows. We will be sending some similar batches over the next few weeks. |
|
Send message Joined: 30 May 14 Posts: 9 Credit: 4,248,480,843 RAC: 835 Level ![]() Scientific publications
|
Hello Steeve. [quote] I would say: this is certainly for that! :D PC are like air conditioning, they becomes useless when you open Windows (L.T) In a world without walls and fences, who needs windows and gates? |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
... Especially considering that it is only on Linux hosts and not Windows. We will be sending some similar batches over the next few weeks. Is there a plan to come up with a Windows version too? |
|
Send message Joined: 23 Feb 22 Posts: 1 Credit: 657,194,329 RAC: 0 Level ![]() Scientific publications
|
Still no work for windows 11 operating systems? I see the occasional task that failed but nothing processed. It worked well for months and then just stopped before xmas. All my software is up to date. I have a dedicated GPU for this project. Where is the best place to find an update on GPUgrid's software migration? Tasks completed 134 Tasks failed 55 Credit User 491,814,968 total, 13,657.85 average Host 150,562,500 total, 13,650.92 average Scheduling Scheduling priority -0.93 Don't request tasks for CPU Project has no apps for CPU NVIDIA GPU task request deferred for 00:03:35 NVIDIA GPU task request deferral interval 00:10:00 Last scheduler reply 2024-01-21 1:55:15 PM |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Most of the work released lately has been the Quantum Chemistry tasks. The researcher said that since most educational and research labs run Linux OS', that Windows applications are a second thought. The only tasks with a Windows app that has appeared somewhat regularly are the acemd tasks. You will have to try and snag one of those when they show up. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The researcher said that since most educational and research labs run Linux OS', that Windows applications are a second thought. it's really too bad that GPUGRID obviously more and more tends to exclude Windows crunchers :-( When I joined this project 8 years ago, at that time and many years thereafter, no lack of Windows tasks. On the other hand: with these few tasks available since last year, it might be the case that the number of Linux crunchers is sufficient for processing them, and the Windows crunchers from before are not needed any longer :-( At least, this is the impression one is bound to get. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
The lack of current Windows applications has more to do with the type of applications and API's being used currently. The latest and current sub-projects are all Python based. Python runs much better on Linux compared to Windows since most development is done in Linux to begin with. Even Microsoft advises that Python application development should be done in Linux rather than Windows. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So - in short - bad times for Windows crunchers. Now and in the future :-( |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
So - in short - bad times for Windows crunchers. Now and in the future :-( Pretty much so. Windows had it best back with the original release of the acemd app. Remember it was a simple, single, executable file of modest size. Derived from source code that could be compiled for Windows or for Linux. But, if you were paying attention lately, the recent acemd tasks no longer use an executable. They are using Python. The Python based tasks are NOT a single executable, they are comprised of a complete packaged python environment of many gigabytes. The nature of the tasks have changed for the project to using complex, state-of-the-art discovery calculation using cutting edge technology. The QChem tasks are even using the Tensor cores of our Nvidia cards now. This is something we asked about several years ago in the forum and were told, maybe, in the future. The future has come and our desires have been answered. But the hardware and software of our hosts now have to rise to meet those challenges. Sadly, the Windows environment is still waiting in the wings. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
...But including <fraction_done_exact/> seems to heal that fairly quickly. Nice advice, thank you! It resulted quickly in an accurate remaining time estimation, so I applied it to ATMbeta tasks also. |
|
Send message Joined: 27 May 21 Posts: 54 Credit: 1,004,151,720 RAC: 0 Level ![]() Scientific publications
|
Choosing not to release Windows apps is a choice they can take, obviously. And maybe their use cases warrant taking the tradeoff inherent in that. If there's often large volumes of work to process in a small time (i.e. you'd need something like a supercomputer ideally if it didn't cost that much), then you'd want to design your apps for what BOINC intended to be all along. Meaning you try to get them ported to as many platforms as you possibly can in order to reach maximum compute power. Or you leverage the power of VBox for non-native platforms. If however the volumes are never going to be that large, where basically any single platform user group can easily provide the necessary compute power, then indeed why bother. Although it would be nice of them to make that choice public and explicit so all non-Linux users can gracefully detach instead of posting frustrated "why no work" messages along the forums. Or indeed spend hours trying to help fix Windows apps ;-) |
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
I believe that the issue of "0.991" CPUs or whatever is a byproduct of the BOINC serverside software. from what I've read elsewhere, this value is not intentionally set by the researchers, it is automatically selected by the BOINC server somewhere along the way, and the researchers here have previously commented that they are not aware of any way to override this serverside. I didn't know that. It's probably a sloppy BOINC design like using percentage to determine the number of CPU threads to use instead of integers. |
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
The work-units require a lot of GPU memory. The highest being used today on my Pascal cards is 795 MB. |
©2025 Universitat Pompeu Fabra