Experimental Python tasks (beta)

Message boards : News : Experimental Python tasks (beta)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Greger

Send message
Joined: 6 Jan 15
Posts: 76
Credit: 25,499,534,331
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 56879 - Posted: 19 May 2021, 21:08:09 UTC

204 failed with 5 succeded.

Got one that was running for a while but got runtimerror
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

https://www.gpugrid.net/result.php?resultid=32584418

I did reading at boinc discord today that MLC@Home also testing pytorch and looks like it cause some issues.
PyTorch uses SIGARLM internally, which seems to conflict with libboinc API's usage of SIGALRM.


I hope Toni would get this working soon it looks to be complex setup.
ID: 56879 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greger

Send message
Joined: 6 Jan 15
Posts: 76
Credit: 25,499,534,331
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 56880 - Posted: 20 May 2021, 15:34:27 UTC

Most off task for Anaconda Python 3 worked well today. Some changes have been made.

e1a1-ABOU_testzip13-0-1-RND2694_0 an higher appears to be good.
ID: 56880 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 56881 - Posted: 20 May 2021, 16:37:34 UTC

It seems that these Python tasks are being used to train some kind of AI/Machine Learning model.

can any of the admins or researchers comment on this? I'd like to know more about the work being done.
ID: 56881 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 56882 - Posted: 20 May 2021, 16:59:41 UTC - in response to Message 56880.  

Most off task for Anaconda Python 3 worked well today. Some changes have been made.

e1a1-ABOU_testzip13-0-1-RND2694_0 an higher appears to be good.


side-note: you should set no new tasks or remove GPUGRID from your RTX 30-series hosts. the applications here do not work with RTX 30-series Ampere cards and always produce errors.
ID: 56882 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56883 - Posted: 21 May 2021, 1:06:02 UTC

Looks like he only let one acemd3 task slip through to an Ampere card.

I don't think the Python tasks care much about the gpu architecture.

If the tasks are formatted correctly they appear to run fine on Ampere cards.
ID: 56883 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 56884 - Posted: 21 May 2021, 4:50:05 UTC - in response to Message 56883.  

Looks like he only let one acemd3 task slip through to an Ampere card.

I don't think the Python tasks care much about the gpu architecture.

If the tasks are formatted correctly they appear to run fine on Ampere cards.


They care. They are still CUDA 10.0. And were compiled without the proper configurations for ampere. They will all still fail under an Ampere card.

The Python tasks they’ve been pushing out recently never actually run any work on the GPU. They do a little bit of CPU processing and then complete or error. Even the few that succeed never touch the GPU.
ID: 56884 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 56926 - Posted: 2 Jun 2021, 20:58:15 UTC
Last modified: 2 Jun 2021, 21:30:03 UTC

I see a bunch of Python tasks went out again.

I allowed my hosts to pick up one. I don't have high hopes for it though. it's a _6 already and constant errors from all the hosts before. so I'm expecting it'll fail as well.

anyone have a successful run?

Maybe an admin comment on why they keep sending out tasks that mostly fail and never seem to use the GPU?

-edit-
I was right, the Python task failed right around 2mins, never ran anything on the GPU. It's like they aren't even bothering to test that these tasks will fail before sending them out.
ID: 56926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 178,897
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56927 - Posted: 2 Jun 2021, 21:37:13 UTC
Last modified: 2 Jun 2021, 21:38:07 UTC

All junk for me. None have completed. Pretty sure some have before for me. All around 525-530 seconds. Nice ETA of 646 days so BOINC freaks out.

CPU usage reported in BOINCTasks goes up to 4 threads worth before leveling off a bit. BOINC only reports CPU time = run time even though that doesn't match what I see. Run time is half of what is reported
ID: 56927 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile trigggl

Send message
Joined: 6 Mar 09
Posts: 25
Credit: 102,324,681
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 56934 - Posted: 6 Jun 2021, 13:49:38 UTC

A have 3 of these valid over the past couple days. None of them used the GPU. Did they complete any work?
https://www.gpugrid.net/result.php?resultid=32619357
ID: 56934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 178,897
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56935 - Posted: 6 Jun 2021, 18:51:11 UTC

This one worked for me after that same PC failed earlier in the week
https://www.gpugrid.net/result.php?resultid=32619337
ID: 56935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 56936 - Posted: 6 Jun 2021, 19:07:10 UTC - in response to Message 56934.  

A have 3 of these valid over the past couple days. None of them used the GPU. Did they complete any work?
https://www.gpugrid.net/result.php?resultid=32619357


I agree. It’s weird that these tasks are marked as being a GPU task with CUDA10.0, makes the GPU otherwise unavailable for other tasks in BOINC, yet they never touch the GPU.

According to the stderr.txt, they seem to spend most of their time extracting and installing packages, then does “something” for a few seconds and completes. It’s obvious that they are exploring some kind of machine learning approach based on the packages used (pytorch, tensorflow, etc) and references to model training. Maybe they are still working out how to properly package the WUs so they have the right configuration for future real tasks.

Would be cool to hear what they are actually trying to accomplish with these tasks.
ID: 56936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 869
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56938 - Posted: 9 Jun 2021, 4:46:24 UTC - in response to Message 56936.  

Would be cool to hear what they are actually trying to accomplish with these tasks.

I guess you will never hear any details from them.
As we know, the GPUGRID people are very taciturn on everything.
ID: 56938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56939 - Posted: 9 Jun 2021, 5:48:30 UTC - in response to Message 56938.  
Last modified: 9 Jun 2021, 5:49:21 UTC

Would be cool to hear what they are actually trying to accomplish with these tasks.

I guess you will never hear any details from them.
As we know, the GPUGRID people are very taciturn on everything.

In other times, when the Gpugrid Project run smoothly, they used to be more polite by returning some feedback to contributors.
I guess that there must be heavy reasons for this current lack of communication.
ID: 56939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 56940 - Posted: 10 Jun 2021, 10:12:43 UTC - in response to Message 56939.  

For the time being we are perfecting the WU machinery so to support ML packages + CUDA. All tasks are linux beta for now. Thanks!
ID: 56940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56941 - Posted: 10 Jun 2021, 10:28:56 UTC - in response to Message 56940.  

Thank you for this pearl!
Nice to know that everything is going on...
ID: 56941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 56942 - Posted: 10 Jun 2021, 12:59:40 UTC - in response to Message 56940.  

For the time being we are perfecting the WU machinery so to support ML packages + CUDA. All tasks are linux beta for now. Thanks!


Thanks, Toni. Can you explain why these tasks are not using the GPU at all? they only run on the CPU. GPU utilization stays at 0%
ID: 56942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56943 - Posted: 11 Jun 2021, 17:19:49 UTC

I would like to know whether we are supposed to do the things requested in the output file. Things like updating the various packages that are called out.

Or are we supposed to do nothing and let the app/task packagers sort it out before generation?
ID: 56943 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 56944 - Posted: 11 Jun 2021, 17:44:37 UTC - in response to Message 56943.  

I would like to know whether we are supposed to do the things requested in the output file. Things like updating the various packages that are called out.

Or are we supposed to do nothing and let the app/task packagers sort it out before generation?


I'm relatively sure these tasks are sandboxed. the packages being referenced are part of the whole WU (tensorflow). they are installed by the extraction phase in the beginning of the WU. if you check your system you will find that you do not have tensorflow installed most likely.

the package updates need to happen on the project side before distribution to us.
ID: 56944 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 56945 - Posted: 11 Jun 2021, 18:23:02 UTC - in response to Message 56944.  

I wonder if I should add the project to my Nvidia Nano. It has Tensorflow installed by default in the distro.

But I wonder if the app would even run on the Maxwell card even though it is mainly a cpu application for the time being and never touches the gpu it seems.
ID: 56945 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 56946 - Posted: 11 Jun 2021, 18:53:07 UTC - in response to Message 56945.  
Last modified: 11 Jun 2021, 18:54:07 UTC

I wonder if I should add the project to my Nvidia Nano. It has Tensorflow installed by default in the distro.

But I wonder if the app would even run on the Maxwell card even though it is mainly a cpu application for the time being and never touches the gpu it seems.


you can try, but I don't think it'll run because of the ARM CPU. there's no app for that here.
ID: 56946 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : News : Experimental Python tasks (beta)

©2025 Universitat Pompeu Fabra