Message boards :
News :
Experimental Python tasks (beta)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 6 Jan 15 Posts: 76 Credit: 25,499,534,331 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
204 failed with 5 succeded. Got one that was running for a while but got runtimerror RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` https://www.gpugrid.net/result.php?resultid=32584418 I did reading at boinc discord today that MLC@Home also testing pytorch and looks like it cause some issues. PyTorch uses SIGARLM internally, which seems to conflict with libboinc API's usage of SIGALRM. I hope Toni would get this working soon it looks to be complex setup. |
Send message Joined: 6 Jan 15 Posts: 76 Credit: 25,499,534,331 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Most off task for Anaconda Python 3 worked well today. Some changes have been made. e1a1-ABOU_testzip13-0-1-RND2694_0 an higher appears to be good. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
It seems that these Python tasks are being used to train some kind of AI/Machine Learning model. can any of the admins or researchers comment on this? I'd like to know more about the work being done. ![]() |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
Most off task for Anaconda Python 3 worked well today. Some changes have been made. side-note: you should set no new tasks or remove GPUGRID from your RTX 30-series hosts. the applications here do not work with RTX 30-series Ampere cards and always produce errors. ![]() |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Looks like he only let one acemd3 task slip through to an Ampere card. I don't think the Python tasks care much about the gpu architecture. If the tasks are formatted correctly they appear to run fine on Ampere cards. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
Looks like he only let one acemd3 task slip through to an Ampere card. They care. They are still CUDA 10.0. And were compiled without the proper configurations for ampere. They will all still fail under an Ampere card. The Python tasks they’ve been pushing out recently never actually run any work on the GPU. They do a little bit of CPU processing and then complete or error. Even the few that succeed never touch the GPU. ![]() |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
I see a bunch of Python tasks went out again. I allowed my hosts to pick up one. I don't have high hopes for it though. it's a _6 already and constant errors from all the hosts before. so I'm expecting it'll fail as well. anyone have a successful run? Maybe an admin comment on why they keep sending out tasks that mostly fail and never seem to use the GPU? -edit- I was right, the Python task failed right around 2mins, never ran anything on the GPU. It's like they aren't even bothering to test that these tasks will fail before sending them out. ![]() |
Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 178,897 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
All junk for me. None have completed. Pretty sure some have before for me. All around 525-530 seconds. Nice ETA of 646 days so BOINC freaks out. CPU usage reported in BOINCTasks goes up to 4 threads worth before leveling off a bit. BOINC only reports CPU time = run time even though that doesn't match what I see. Run time is half of what is reported |
![]() Send message Joined: 6 Mar 09 Posts: 25 Credit: 102,324,681 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
A have 3 of these valid over the past couple days. None of them used the GPU. Did they complete any work? https://www.gpugrid.net/result.php?resultid=32619357 |
Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 178,897 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
This one worked for me after that same PC failed earlier in the week https://www.gpugrid.net/result.php?resultid=32619337 |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
A have 3 of these valid over the past couple days. None of them used the GPU. Did they complete any work? I agree. It’s weird that these tasks are marked as being a GPU task with CUDA10.0, makes the GPU otherwise unavailable for other tasks in BOINC, yet they never touch the GPU. According to the stderr.txt, they seem to spend most of their time extracting and installing packages, then does “something” for a few seconds and completes. It’s obvious that they are exploring some kind of machine learning approach based on the packages used (pytorch, tensorflow, etc) and references to model training. Maybe they are still working out how to properly package the WUs so they have the right configuration for future real tasks. Would be cool to hear what they are actually trying to accomplish with these tasks. ![]() |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 869 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Would be cool to hear what they are actually trying to accomplish with these tasks. I guess you will never hear any details from them. As we know, the GPUGRID people are very taciturn on everything. |
![]() ![]() Send message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 998,578 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Would be cool to hear what they are actually trying to accomplish with these tasks. In other times, when the Gpugrid Project run smoothly, they used to be more polite by returning some feedback to contributors. I guess that there must be heavy reasons for this current lack of communication. |
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
For the time being we are perfecting the WU machinery so to support ML packages + CUDA. All tasks are linux beta for now. Thanks! |
![]() ![]() Send message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 998,578 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you for this pearl! Nice to know that everything is going on... |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
For the time being we are perfecting the WU machinery so to support ML packages + CUDA. All tasks are linux beta for now. Thanks! Thanks, Toni. Can you explain why these tasks are not using the GPU at all? they only run on the CPU. GPU utilization stays at 0% ![]() |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I would like to know whether we are supposed to do the things requested in the output file. Things like updating the various packages that are called out. Or are we supposed to do nothing and let the app/task packagers sort it out before generation? |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
I would like to know whether we are supposed to do the things requested in the output file. Things like updating the various packages that are called out. I'm relatively sure these tasks are sandboxed. the packages being referenced are part of the whole WU (tensorflow). they are installed by the extraction phase in the beginning of the WU. if you check your system you will find that you do not have tensorflow installed most likely. the package updates need to happen on the project side before distribution to us. ![]() |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I wonder if I should add the project to my Nvidia Nano. It has Tensorflow installed by default in the distro. But I wonder if the app would even run on the Maxwell card even though it is mainly a cpu application for the time being and never touches the gpu it seems. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
I wonder if I should add the project to my Nvidia Nano. It has Tensorflow installed by default in the distro. you can try, but I don't think it'll run because of the ARM CPU. there's no app for that here. ![]() |
©2025 Universitat Pompeu Fabra