PYSCFbeta: Quantum chemistry calculations on GPU

Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 14 · Next

AuthorMessage
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 0
Level
Trp
Scientific publications
watwatwat
Message 61028 - Posted: 18 Jan 2024, 15:51:15 UTC - in response to Message 61019.  

Here is a tale of 2 computers, one that was getting units, and the other was not.

https://www.gpugrid.net/hosts_user.php?userid=19626

They both have the same GPUGRID preferences.



I am getting tasks on both computers, now. So far, all tasks are completing successfully.



After running these tasks successfully for almost a day on both of my computers, now my BOINC manager, task tab, Remaining (estimated) "time" is telling approximately 24 days to complete on one computer and 62 days on the other, at the task's beginning, and incrementally counts down from there. The task actually completes successfully in a little over an hour. A few hours ago, they were showing the correct times to complete.

Everything else is working fine, but this is definitely unusual. Did anyone else observed this?

At first I did. But including <fraction_done_exact/> seems to heal that fairly quickly.
    <app>
        <name>PYSCFbeta</name>
        <!-- Quantum chemistry calculations on GPU -->
        <plan_class>cuda1121</plan_class>
        <gpu_versions>
            <cpu_usage>1</cpu_usage>
            <gpu_usage>1</gpu_usage>
        </gpu_versions>
        <fraction_done_exact/>
    </app>

ID: 61028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 61029 - Posted: 18 Jan 2024, 16:11:41 UTC - in response to Message 61026.  

When you send out WUs with 0.991C + 1NV BOINC does not assign a CPU core to that task. You should designate them 1C.
I've been changing my Use At Most N-2 CPUs to accommodate these tasks. If not they slow down significantly.

That GTX 960 I pointed out also has BOINC 7.7 installed and may be a Science United member so many failures can be expected. But with 7 errors allowed they'll probably find a qualified cruncher before they die.

You can always override that with an app_config.xml file in the project folder and assign 1.0 cpu threads to the task.

I know I can. What about the many people that leave BOINC on autopilot?
I've seen multiple instances of 5 errors before a WU got to me. It's in Steve's best interest.


the errors have nothing to do with the CPU resource allocation setting. they all errored because of running on GPUs that are too old, the app needs cards with at least CC of 6.0+ (Pascal and up).

at worst, if someone is running the CPU full out 100% and not leaving space CPU cycles available (as they should), the worst that happens is that the GPU task might run a little more slowly. but it wont fail.

I believe that the issue of "0.991" CPUs or whatever is a byproduct of the BOINC serverside software. from what I've read elsewhere, this value is not intentionally set by the researchers, it is automatically selected by the BOINC server somewhere along the way, and the researchers here have previously commented that they are not aware of any way to override this serverside. so competent users can just override it themselves if they prefer. setting your CPU use in BOINC to like 99 or 98% has the same effect overall though.
ID: 61029 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 69
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61032 - Posted: 18 Jan 2024, 23:46:55 UTC - in response to Message 61028.  

Here is a tale of 2 computers, one that was getting units, and the other was not.

https://www.gpugrid.net/hosts_user.php?userid=19626

They both have the same GPUGRID preferences.



I am getting tasks on both computers, now. So far, all tasks are completing successfully.



After running these tasks successfully for almost a day on both of my computers, now my BOINC manager, task tab, Remaining (estimated) "time" is telling approximately 24 days to complete on one computer and 62 days on the other, at the task's beginning, and incrementally counts down from there. The task actually completes successfully in a little over an hour. A few hours ago, they were showing the correct times to complete.

Everything else is working fine, but this is definitely unusual. Did anyone else observed this?

At first I did. But including <fraction_done_exact/> seems to heal that fairly quickly.
    <app>
        <name>PYSCFbeta</name>
        <!-- Quantum chemistry calculations on GPU -->
        <plan_class>cuda1121</plan_class>
        <gpu_versions>
            <cpu_usage>1</cpu_usage>
            <gpu_usage>1</gpu_usage>
        </gpu_versions>
        <fraction_done_exact/>
    </app>




Thanks for this information. I updated my computers.

Now, I remember this <fraction_done_exact/> from a post several years ago. I can't remember the thread. In the past I didn't need this, because the tasks would correct themselves eventually, even the ATMbetas.

The Quantum Chemistry on GPU does the complete opposite. I wonder if this is connected to the observation of "upwards of 30 threads utilized per task" as posted by Ian&Steve C.?

ID: 61032 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 61033 - Posted: 19 Jan 2024, 0:07:25 UTC - in response to Message 61032.  

nah the multi thread issue has already been fixed. the app only uses a single thread now.
ID: 61033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]

Send message
Joined: 16 Jul 07
Posts: 209
Credit: 5,496,860,456
RAC: 12,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61034 - Posted: 19 Jan 2024, 2:30:50 UTC - in response to Message 60963.  

The work-units require a lot of GPU memory.


How much is "a lot" exactly? I have a pacal card, so it meets the compute capability requirement. But it has only 2gb of VRAM. But without knowing the amount of VRAM required, I am not sure if it will work.
Reno, NV
Team: SETI.USA
ID: 61034 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 61035 - Posted: 19 Jan 2024, 3:41:52 UTC - in response to Message 61034.  

The work-units require a lot of GPU memory.


How much is "a lot" exactly? I have a pacal card, so it meets the compute capability requirement. But it has only 2gb of VRAM. But without knowing the amount of VRAM required, I am not sure if it will work.


It requires more than 2GB
ID: 61035 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]

Send message
Joined: 16 Jul 07
Posts: 209
Credit: 5,496,860,456
RAC: 12,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61036 - Posted: 19 Jan 2024, 4:24:35 UTC - in response to Message 61035.  

It requires more than 2GB


Good to know. Thanks!
Reno, NV
Team: SETI.USA
ID: 61036 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 61037 - Posted: 19 Jan 2024, 14:07:54 UTC - in response to Message 61029.  



the errors have nothing to do with the CPU resource allocation setting. they all errored because of running on GPUs that are too old, the app needs cards with at least CC of 6.0+ (Pascal and up).

at worst, if someone is running the CPU full out 100% and not leaving space CPU cycles available (as they should), the worst that happens is that the GPU task might run a little more slowly. but it wont fail.

I believe that the issue of "0.991" CPUs or whatever is a byproduct of the BOINC serverside software. from what I've read elsewhere, this value is not intentionally set by the researchers, it is automatically selected by the BOINC server somewhere along the way, and the researchers here have previously commented that they are not aware of any way to override this serverside. so competent users can just override it themselves if they prefer. setting your CPU use in BOINC to like 99 or 98% has the same effect overall though.


This is all correct I believe.

It seems that the jobs have enough retry attempts that all work units end up eventually succeeding. The scheduler has some inbuilt mechanism to classify hosts as "reliable" it also has a mechanism to send workunits that have failed a few times to only hosts that are "reliable". This is not ideal of course. We will try and get the CC requirements honoured but these are project wide scheduler settings which are rather complex to fix without breaking everything else that is currently working.

The download limitations is something I will not be able to change easily. A potential reason I can guess for the current settings is to stop a failing host acting as a black-hole of failed jobs or something similar.

The large file download should just happen once. The app is deployed in the same way as the ATM app. It is a 2GB zip file that contains a python environment and some cuda libraries. Each work-unit only requires downloading a small file (<1MB I think).


This last large scale run has been rather impressive. The throughput was very high! Especially considering that it is only on Linux hosts and not Windows. We will be sending some similar batches over the next few weeks.
ID: 61037 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Libristes] alain65
Avatar

Send message
Joined: 30 May 14
Posts: 9
Credit: 4,248,480,843
RAC: 835
Level
Arg
Scientific publications
wat
Message 61039 - Posted: 20 Jan 2024, 3:25:58 UTC - in response to Message 61037.  

Hello Steeve.

[quote]

The throughput was very high! Especially considering that it is only on Linux hosts and not Windows.


I would say: this is certainly for that! :D
PC are like air conditioning, they becomes useless when you open Windows (L.T)

In a world without walls and fences, who needs windows and gates?
ID: 61039 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61040 - Posted: 21 Jan 2024, 8:59:42 UTC - in response to Message 61037.  

... Especially considering that it is only on Linux hosts and not Windows. We will be sending some similar batches over the next few weeks.

Is there a plan to come up with a Windows version too?
ID: 61040 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Philip Nicholson

Send message
Joined: 23 Feb 22
Posts: 1
Credit: 657,194,329
RAC: 0
Level
Lys
Scientific publications
wat
Message 61041 - Posted: 21 Jan 2024, 19:01:49 UTC

Still no work for windows 11 operating systems?
I see the occasional task that failed but nothing processed.
It worked well for months and then just stopped before xmas.
All my software is up to date.
I have a dedicated GPU for this project.
Where is the best place to find an update on GPUgrid's software migration?

Tasks completed
134
Tasks failed
55
Credit
User
491,814,968 total, 13,657.85 average
Host
150,562,500 total, 13,650.92 average
Scheduling
Scheduling priority
-0.93
Don't request tasks for CPU
Project has no apps for CPU
NVIDIA GPU task request deferred for
00:03:35
NVIDIA GPU task request deferral interval
00:10:00
Last scheduler reply
2024-01-21 1:55:15 PM
ID: 61041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61042 - Posted: 21 Jan 2024, 21:53:29 UTC

Most of the work released lately has been the Quantum Chemistry tasks. The researcher said that since most educational and research labs run Linux OS', that Windows applications are a second thought.

The only tasks with a Windows app that has appeared somewhat regularly are the acemd tasks.

You will have to try and snag one of those when they show up.
ID: 61042 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61043 - Posted: 22 Jan 2024, 7:52:56 UTC - in response to Message 61042.  
Last modified: 22 Jan 2024, 7:56:36 UTC

The researcher said that since most educational and research labs run Linux OS', that Windows applications are a second thought.

it's really too bad that GPUGRID obviously more and more tends to exclude Windows crunchers :-(
When I joined this project 8 years ago, at that time and many years thereafter, no lack of Windows tasks.
On the other hand: with these few tasks available since last year, it might be the case that the number of Linux crunchers is sufficient for processing them, and the Windows crunchers from before are not needed any longer :-(
At least, this is the impression one is bound to get.
ID: 61043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61044 - Posted: 22 Jan 2024, 17:27:39 UTC

The lack of current Windows applications has more to do with the type of applications and API's being used currently.

The latest and current sub-projects are all Python based. Python runs much better on Linux compared to Windows since most development is done in Linux to begin with.

Even Microsoft advises that Python application development should be done in Linux rather than Windows.

ID: 61044 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61045 - Posted: 23 Jan 2024, 8:06:09 UTC

So - in short - bad times for Windows crunchers. Now and in the future :-(
ID: 61045 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61046 - Posted: 23 Jan 2024, 9:05:47 UTC - in response to Message 61045.  

So - in short - bad times for Windows crunchers. Now and in the future :-(

Pretty much so.

Windows had it best back with the original release of the acemd app. Remember it was a simple, single, executable file of modest size. Derived from source code that could be compiled for Windows or for Linux.

But, if you were paying attention lately, the recent acemd tasks no longer use an executable. They are using Python.

The Python based tasks are NOT a single executable, they are comprised of a complete packaged python environment of many gigabytes.

The nature of the tasks have changed for the project to using complex, state-of-the-art discovery calculation using cutting edge technology.

The QChem tasks are even using the Tensor cores of our Nvidia cards now. This is something we asked about several years ago in the forum and were told, maybe, in the future.

The future has come and our desires have been answered.

But the hardware and software of our hosts now have to rise to meet those challenges. Sadly, the Windows environment is still waiting in the wings.
ID: 61046 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61082 - Posted: 25 Jan 2024, 18:16:11 UTC - in response to Message 61028.  

...But including <fraction_done_exact/> seems to heal that fairly quickly.

Nice advice, thank you!
It resulted quickly in an accurate remaining time estimation, so I applied it to ATMbeta tasks also.
ID: 61082 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[BAT] Svennemans

Send message
Joined: 27 May 21
Posts: 54
Credit: 1,004,151,720
RAC: 0
Level
Met
Scientific publications
wat
Message 61084 - Posted: 25 Jan 2024, 19:24:22 UTC - in response to Message 61046.  

Choosing not to release Windows apps is a choice they can take, obviously. And maybe their use cases warrant taking the tradeoff inherent in that.

If there's often large volumes of work to process in a small time (i.e. you'd need something like a supercomputer ideally if it didn't cost that much), then you'd want to design your apps for what BOINC intended to be all along. Meaning you try to get them ported to as many platforms as you possibly can in order to reach maximum compute power. Or you leverage the power of VBox for non-native platforms.

If however the volumes are never going to be that large, where basically any single platform user group can easily provide the necessary compute power, then indeed why bother.

Although it would be nice of them to make that choice public and explicit so all non-Linux users can gracefully detach instead of posting frustrated "why no work" messages along the forums.
Or indeed spend hours trying to help fix Windows apps ;-)
ID: 61084 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 0
Level
Trp
Scientific publications
watwatwat
Message 61096 - Posted: 26 Jan 2024, 15:51:20 UTC - in response to Message 61029.  

I believe that the issue of "0.991" CPUs or whatever is a byproduct of the BOINC serverside software. from what I've read elsewhere, this value is not intentionally set by the researchers, it is automatically selected by the BOINC server somewhere along the way, and the researchers here have previously commented that they are not aware of any way to override this serverside.

I didn't know that. It's probably a sloppy BOINC design like using percentage to determine the number of CPU threads to use instead of integers.
ID: 61096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 0
Level
Trp
Scientific publications
watwatwat
Message 61097 - Posted: 26 Jan 2024, 15:56:14 UTC - in response to Message 61034.  

The work-units require a lot of GPU memory.


How much is "a lot" exactly? I have a pacal card, so it meets the compute capability requirement. But it has only 2gb of VRAM. But without knowing the amount of VRAM required, I am not sure if it will work.

The highest being used today on my Pascal cards is 795 MB.
ID: 61097 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 14 · Next

Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU

©2025 Universitat Pompeu Fabra