Message boards :
Number crunching :
LLMs crashing
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 30 Apr 13 Posts: 106 Credit: 3,805,237,860 RAC: 65 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
After a decent run of mostly successes, I've had a couple of LLMs error out. e.g. Workunit 31492014 Outcome Computation error Client state Compute error Exit status 195 (0x000000C3) EXIT_CHILD_FAILED <message> The operating system cannot run (null). (0xc3) - exit code 195 (0xc3)</message> The behavior was strange in that it loaded 12 GB into the GPU memory, but the GPU remained basically idle until the task errored. The CPU was used for a while, but not the GPU. I ran an Einstein task just to satisfy myself the the GPU works normally on that project. Of course, LLMs are different. Help appreciated. |
|
Send message Joined: 30 Apr 13 Posts: 106 Credit: 3,805,237,860 RAC: 65 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
After many more errors, one completed successfully, and another |
98J_SSGSend message Joined: 15 Jun 09 Posts: 12 Credit: 729,477,756 RAC: 84 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Good morning! I've received 3 of the small LLM tasks that have all failed and 2 of them had part of the Stderr output the following : C:\ProgramData\BOINC\slots\17\Lib\site-packages\huggingface_hub\file_download.py:144: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\ProgramData\BOINC\slots\.cache\hub\models--Acellera--gpugrid. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development warnings.warn(message) In checking the Hub Python library reference (https://huggingface.co/docs/huggingface_hub/v0.31.2/guides/manage-cache#limitations), it refers to either setting your machine to Developers mode or running Python as an Administrator for the Symlinks. I am not a programmer nor do I know if I've opened a whole new can of worms, but I set my PC to Developer mode to see if it makes any difference if I get any more of the small LLM tasks. Again, since I don't know what is happening and responding to that little detail, I might not be helping anything at all. Any insight from those who know what's happening here would be greatly appreciated.
|
98J_SSGSend message Joined: 15 Jun 09 Posts: 12 Credit: 729,477,756 RAC: 84 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It might have been luck, but this last small LLM task I got was completed successful. It's been the only task I've gotten, so far, so, I don't know if enabling Developer mode helped or I was lucky. If more are successful then it might be more than luck. Does anyone else know if Developer mode increases or frees up GPU memory available for these tasks to use? Thanks!
|
©2025 Universitat Pompeu Fabra