Experimental Python tasks (beta)

Message boards : News : Experimental Python tasks (beta)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
ALAIN_13013
Avatar

Send message
Joined: 11 Sep 08
Posts: 18
Credit: 1,551,929,462
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56177 - Posted: 29 Dec 2020, 6:50:04 UTC - in response to Message 55588.  

I'm creating some experimental tasks for the Python app (made Beta). They are Linux and CUDA specific and serve in preparation for future batches.

They may use a relatively large amount of disk space (order of 1-10 GB) which persists between runs, and is cleared if you reset the project.



What type of card minimum for this app. My 980Ti don't load WU.
ID: 56177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rod4x4

Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56181 - Posted: 29 Dec 2020, 13:08:38 UTC - in response to Message 56177.  
Last modified: 29 Dec 2020, 13:10:12 UTC

I'm creating some experimental tasks for the Python app (made Beta). They are Linux and CUDA specific and serve in preparation for future batches.

They may use a relatively large amount of disk space (order of 1-10 GB) which persists between runs, and is cleared if you reset the project.



What type of card minimum for this app. My 980Ti don't load WU.

In "GPUGRID Preferences", ensure you select "Python Runtime (beta)" and "Run test applications?"
Your GPU, driver and OS should run these tasks fine
ID: 56181 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ALAIN_13013
Avatar

Send message
Joined: 11 Sep 08
Posts: 18
Credit: 1,551,929,462
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56182 - Posted: 29 Dec 2020, 13:32:58 UTC - in response to Message 56181.  
Last modified: 29 Dec 2020, 13:33:30 UTC

I'm creating some experimental tasks for the Python app (made Beta). They are Linux and CUDA specific and serve in preparation for future batches.

They may use a relatively large amount of disk space (order of 1-10 GB) which persists between runs, and is cleared if you reset the project.



What type of card minimum for this app. My 980Ti don't load WU.

In "GPUGRID Preferences", ensure you select "Python Runtime (beta)" and "Run test applications?"
Your GPU, driver and OS should run these tasks fine


Merci, I just forgot Run test applications :)
ID: 56182 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jiipee

Send message
Joined: 4 Jun 15
Posts: 19
Credit: 8,813,058,416
RAC: 78,330
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56183 - Posted: 29 Dec 2020, 13:35:30 UTC

All of these seem now to error out after computation has finished. On several computers:

<message>
upload failure: <file_xfer_error>
  <file_name>2p95312000-RAIMIS_NNPMM-0-1-RND8920_1_0</file_name>
  <error_code>-131 (file size too big)</error_code>
</file_xfer_error>

</message>


What causes this and how it can be fixed?
ID: 56183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56185 - Posted: 29 Dec 2020, 14:24:17 UTC - in response to Message 56183.  

What causes this and how it can be fixed?

I've just posted instructions in the Anaconda Python 3 Environment v4.01 failures thread (Number Crunching).

Read through the whole post. If you don't understand anything, or you don't know how to do any of the steps I've described - back away. Don't even attempt it until you're sure. You have to edit a very important, protected, file - and that needs care and experience.
ID: 56185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56186 - Posted: 29 Dec 2020, 14:33:52 UTC - in response to Message 56185.  

What causes this and how it can be fixed?

I've just posted instructions in the Anaconda Python 3 Environment v4.01 failures thread (Number Crunching).

Read through the whole post. If you don't understand anything, or you don't know how to do any of the steps I've described - back away. Don't even attempt it until you're sure. You have to edit a very important, protected, file - and that needs care and experience.


really needs to be fixed server side (or would be nice if it were configurable via cc_config but that doesnt look to be the case either).

stopping and starting the client is a recipe for instant errors, and where successful, this process will need to be repeated for every time you download new tasks. not really a viable option unless you want to babysit the system all day.
ID: 56186 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56187 - Posted: 29 Dec 2020, 14:45:32 UTC - in response to Message 56186.  

Stopping and starting the client is a recipe for instant errors, and where successful, this process will need to be repeated for every time you download new tasks. not really a viable option unless you want to babysit the system all day.

By itself, it's fairly safe - provided you know and understand the software on your own system well enough. But you do need to have that experience and knowledge, which I why I put the caveats in.

I agree about having to re-do it for every new task, but I'd like to get my APR back up to something reasonable - and I'm happy to help nudge the admins one more step along the way to a fully-working, 'set and forget', application.
ID: 56187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56189 - Posted: 29 Dec 2020, 16:39:50 UTC - in response to Message 56187.  

They're working on something...

WU 26917726
ID: 56189 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jiipee

Send message
Joined: 4 Jun 15
Posts: 19
Credit: 8,813,058,416
RAC: 78,330
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56208 - Posted: 31 Dec 2020, 8:59:22 UTC - in response to Message 56186.  

What causes this and how it can be fixed?

I've just posted instructions in the Anaconda Python 3 Environment v4.01 failures thread (Number Crunching).

Read through the whole post. If you don't understand anything, or you don't know how to do any of the steps I've described - back away. Don't even attempt it until you're sure. You have to edit a very important, protected, file - and that needs care and experience.


really needs to be fixed server side (or would be nice if it were configurable via cc_config but that doesnt look to be the case either).

stopping and starting the client is a recipe for instant errors, and where successful, this process will need to be repeated for every time you download new tasks. not really a viable option unless you want to babysit the system all day.

Excaltly so. I don't know about others, but I have no time to sit and watch my hosts working. A host is working 10 hours to get the task done, and then everything turns out to be just a waste of time and energy because of this file size limitation. This is somewhat frustrating.
ID: 56208 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56209 - Posted: 31 Dec 2020, 10:16:49 UTC - in response to Message 56208.  

Opt out of the Beta test programme if you don't want to encounter those problems.

But as it happens, I haven't had a single over-run since they cancelled the one I highlighted in the post before yours.
ID: 56209 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jiipee

Send message
Joined: 4 Jun 15
Posts: 19
Credit: 8,813,058,416
RAC: 78,330
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56210 - Posted: 31 Dec 2020, 12:02:22 UTC - in response to Message 56209.  

Opt out of the Beta test programme if you don't want to encounter those problems.

But as it happens, I haven't had a single over-run since they cancelled the one I highlighted in the post before yours.

Yes, I agree - something has changed.

It looks like the last full time (successful) computation on my hosts that produced too large output file was WU 26900019, ended 29 Dec 2020 | 15:00:52 UTC after 31,056 seconds of run time.
ID: 56210 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56864 - Posted: 7 May 2021, 12:33:53 UTC
Last modified: 7 May 2021, 12:46:38 UTC

I see some new Python tasks have gone out. however they seem to be erroring for everyone.

https://www.gpugrid.net/results.php?userid=552015&offset=0&show_names=0&state=0&appid=31

seems to always error with this "os" not defined error. GPU load 0%

Environment
Traceback (most recent call last):
File "run.py", line 5, in <module>
for key, value in os.environ.items():
NameError: name 'os' is not defined

ID: 56864 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56865 - Posted: 7 May 2021, 14:14:09 UTC - in response to Message 56864.  
Last modified: 7 May 2021, 14:16:10 UTC

now seeing this:


==> WARNING: A newer version of conda exists. <==
current version: 4.8.3
latest version: 4.10.1

Please update conda by running

$ conda update -n base -c defaults conda


10:07:30 (341141): /usr/bin/flock exited; CPU time 42.091445
application ./gpugridpy/bin/python missing


and this:

09:57:32 (340085): wrapper (7.7.26016): starting
[input.zip]
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of input.zip or
input.zip.zip, and cannot find input.zip.ZIP, period.
boinc_unzip() error: 9

ID: 56865 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56866 - Posted: 7 May 2021, 14:30:42 UTC - in response to Message 56865.  

just had my first two successful completions. doesn't look like it ran any GPU work though, the GPU was never loaded. just unpacked the WU, ran the setup. then exited. marked as complete with no error. only ran for about 45 seconds.

https://www.gpugrid.net/result.php?resultid=32570561
ID: 56866 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 311
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56867 - Posted: 7 May 2021, 15:04:46 UTC - in response to Message 56866.  

just had my first two successful completions. doesn't look like it ran any GPU work though, the GPU was never loaded. just unpacked the WU, ran the setup. then exited. marked as complete with no error. only ran for about 45 seconds.

https://www.gpugrid.net/result.php?resultid=32570561

Did you have to up-date conda for the two successful tasks? I received a few new WUs but all errored. I will not have access to this computer until tomorrow.
ID: 56867 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56868 - Posted: 7 May 2021, 15:09:16 UTC - in response to Message 56867.  

just had my first two successful completions. doesn't look like it ran any GPU work though, the GPU was never loaded. just unpacked the WU, ran the setup. then exited. marked as complete with no error. only ran for about 45 seconds.

https://www.gpugrid.net/result.php?resultid=32570561

Did you have to up-date conda for the two successful tasks? I received a few new WUs but all errored. I will not have access to this computer until tomorrow.


I didnt make any changes to my system between failed tasks and successful tasks. AFAIK the project is sending conda packaged into these WUs so it doesn't matter what you have installed, it contains everything you should need.

looks like testrun93+ ish are OK, but test runs in the 80s and lower all fail with some form of error like the errors I listed above.
ID: 56868 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jiipee

Send message
Joined: 4 Jun 15
Posts: 19
Credit: 8,813,058,416
RAC: 78,330
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 56874 - Posted: 19 May 2021, 12:03:14 UTC
Last modified: 19 May 2021, 12:05:01 UTC

All of these Python WU's seem to fail. A pair of examples with different problems:

http://www.gpugrid.net/result.php?resultid=32583864

http://www.gpugrid.net/result.php?resultid=32583210
ID: 56874 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56875 - Posted: 19 May 2021, 12:39:26 UTC - in response to Message 56874.  

some succeed. but very few. out of the 94 python tasks i've received recently. only 4 of them succeeded.
ID: 56875 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56876 - Posted: 19 May 2021, 15:11:55 UTC

i see some new tasks going out.

still broken.

https://www.gpugrid.net/result.php?resultid=32584011

11:06:39 (1387708): /usr/bin/flock exited; CPU time 281.233647
11:06:39 (1387708): wrapper: running ./gpugridpy/bin/python (run.py)
WARNING: ray 1.3.0 does not provide the extra 'debug'
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.4.1 requires flatbuffers~=1.12.0, but you have flatbuffers 20210226132247 which is incompatible.
tensorflow 2.4.1 requires gast==0.3.3, but you have gast 0.4.0 which is incompatible.
tensorflow 2.4.1 requires grpcio~=1.32.0, but you have grpcio 1.36.1 which is incompatible.
tensorflow 2.4.1 requires opt-einsum~=3.3.0, but you have opt-einsum 3.1.0 which is incompatible.
/home/icrum/BOINC/slots/41/gpugridpy/lib/python3.7/site-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
"update your install command.", FutureWarning)
Traceback (most recent call last):
File "run.py", line 296, in <module>
main()
File "run.py", line 35, in main
args = get_args()
File "run.py", line 283, in get_args
config_file = open(config_path, 'rt', encoding='utf8')
FileNotFoundError: [Errno 2] No such file or directory: '/home/icrum/BOINC/slots/41/data/conf.yaml'
11:07:04 (1387708): ./gpugridpy/bin/python exited; CPU time 20.831556
11:07:04 (1387708): app exit status: 0x1
11:07:04 (1387708): called boinc_finish(195)

ID: 56876 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56878 - Posted: 19 May 2021, 17:57:38 UTC - in response to Message 56875.  

some succeed. but very few. out of the 94 python tasks i've received recently. only 4 of them succeeded.

65 received / 64 errored / 1 successful is my current balance
ID: 56878 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : News : Experimental Python tasks (beta)

©2025 Universitat Pompeu Fabra