Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 31 · 32 · 33 · 34 · 35 · 36 · 37 . . . 50 · Next

AuthorMessage
bozz4science

Send message
Joined: 22 May 20
Posts: 110
Credit: 115,525,136
RAC: 0
Level
Cys
Scientific publications
wat
Message 59461 - Posted: 17 Oct 2022, 16:36:51 UTC
Last modified: 17 Oct 2022, 16:37:31 UTC

I have seen continiously failed tasks starting today. According to the stderr_txt file I reckon there might be at least two, possibly related, errors.

File "C:\ProgramData\BOINC\slots\5\python_dependencies\buffer.py", line 794, in insert_transition
state_embeds = [i["StateEmbeddings"] for i in sample[prl.INFO]]
File "C:\ProgramData\BOINC\slots\5\python_dependencies\buffer.py", line 794, in <listcomp>
state_embeds = [i["StateEmbeddings"] for i in sample[prl.INFO]]
KeyError: 'StateEmbeddings'
Traceback (most recent call last):
File "C:\ProgramData\BOINC\slots\5\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 196, in get_data
self.next_batch = self.batches.__next__()
AttributeError: 'GWorker' object has no attribute 'batches'


    *KeyError: 'StateEmbeddings'
    *AttributeError: 'GWorker' object has no attribute 'batches'

ID: 59461 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59462 - Posted: 17 Oct 2022, 17:00:22 UTC - in response to Message 59461.  

*KeyError: 'StateEmbeddings'
*AttributeError: 'GWorker' object has no attribute 'batches'

exactly same thing I notice on all my failed tasks.
ID: 59462 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GS

Send message
Joined: 16 Oct 22
Posts: 12
Credit: 1,382,500
RAC: 0
Level
Ala
Scientific publications
wat
Message 59463 - Posted: 17 Oct 2022, 17:12:31 UTC

Same here.

AttributeError: 'GWorker' object has no attribute 'batches'
ID: 59463 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mrchips

Send message
Joined: 9 May 21
Posts: 16
Credit: 1,435,881,404
RAC: 0
Level
Met
Scientific publications
wat
Message 59464 - Posted: 17 Oct 2022, 17:39:44 UTC

my latest WU end with a computation error
ID: 59464 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59465 - Posted: 17 Oct 2022, 17:49:13 UTC - in response to Message 59460.  

Your first task link shows 4 attempts at retrieving the necessary python libraries and failing.

But instead of just stopping right there it looks like it tried to compute anyway with the missing 'batches' library and all the subsequent tasks failed also becauses of the missing batches element.

Seems that the error flow map is not branching out to a proper halt early enough in the task to stop the computation and waste anymore time.
ID: 59465 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59466 - Posted: 17 Oct 2022, 17:57:48 UTC
Last modified: 17 Oct 2022, 18:11:29 UTC

Six tasks, all in a row. Errored out. Seven now and another in the works.
ID: 59466 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59467 - Posted: 17 Oct 2022, 18:16:48 UTC

now the same problem on another host :-(
https://www.gpugrid.net/result.php?resultid=33101249

so, as seen by other members, too: all tasks which were downloaded within the past several hours seem to be faulty.
ID: 59467 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GS

Send message
Joined: 16 Oct 22
Posts: 12
Credit: 1,382,500
RAC: 0
Level
Ala
Scientific publications
wat
Message 59468 - Posted: 17 Oct 2022, 19:31:40 UTC

I joined yesterday and have 13 tasks failed in a row, all with the
AttributeError: 'GWorker' object has no attribute 'batches'.

Is this a failed installation? Should I try to reinstall this BOINC project from scratch?
ID: 59468 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59469 - Posted: 17 Oct 2022, 19:39:44 UTC - in response to Message 59468.  


Is this a failed installation? Should I try to reinstall this BOINC project from scratch?

in view of the above said, the current tasks are probably faulty.
No need to reinstall, I guess
ID: 59469 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59470 - Posted: 17 Oct 2022, 21:09:31 UTC

Yes - just received and returned result 33101290, on a machine which regularly returns good results.

That was replication _6 of a WU which everyone else had failed - a sure sign that the problem was with the workunit, not the host processing it.
ID: 59470 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59471 - Posted: 18 Oct 2022, 5:33:13 UTC

Forty-six failed WU"s? Please stop sending them until the problem is resolved.
ID: 59471 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59472 - Posted: 18 Oct 2022, 6:40:14 UTC - in response to Message 59471.  

Forty-six failed WU"s? Please stop sending them until the problem is resolved.

+ 1
ID: 59472 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59473 - Posted: 18 Oct 2022, 7:14:37 UTC - in response to Message 59472.  

Forty-six failed WU"s? Please stop sending them until the problem is resolved.

+ 1


Sorry. After writing the post I looked at the other computer and it had downloaded another. It lasted three minutes or so. It was still in the unzipping process. I cannot understand the txt files so can someone who can check the files to see what is going on?
ID: 59473 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GS

Send message
Joined: 16 Oct 22
Posts: 12
Credit: 1,382,500
RAC: 0
Level
Ala
Scientific publications
wat
Message 59474 - Posted: 18 Oct 2022, 7:26:12 UTC - in response to Message 59472.  

+1

33 fails in a row. I'll set this project to inactive and wait for a solution.
ID: 59474 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59475 - Posted: 18 Oct 2022, 7:28:58 UTC - in response to Message 59473.  

I cannot understand the txt files so can someone who can check the files to see what is going on?

the task are wrongly configured. Don't download them for the time being.
I guess we will get some kind of "go ahead" here once the problem is solved on the project-side.
ID: 59475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59477 - Posted: 18 Oct 2022, 7:44:11 UTC
Last modified: 18 Oct 2022, 7:54:40 UTC

Hello, thanks you for reporting the job errors. Sorry to all, there was an error on my side setting up a batch of experiment agents. The errors is due to the specific python script of this batch, not related to the application itself. I have just fixed it, and the new jobs should be running correctly. Unfortunately, some already submitted jobs are bound to fail… I apologise for the inconvenience. They will fail briefly after starting as reported, so not a lot of compute will be wasted.
ID: 59477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59478 - Posted: 18 Oct 2022, 9:00:27 UTC

abouh,

could you also please make an adjustment (downwards) to the free disk space requirement of 33GB when downloading a Python task?

see my above Message 59449.

Many thanks :-)
ID: 59478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59479 - Posted: 18 Oct 2022, 9:21:50 UTC - in response to Message 59478.  

Hello! I have checked and the disk space used by the jobs is set to 35e9 bytes.

<rsc_disk_bound>35e9</rsc_disk_bound>


I will change it first to 20e9, let me know if it helps. I can further decreased it in the future if necessary.


ID: 59479 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59480 - Posted: 18 Oct 2022, 9:36:09 UTC - in response to Message 59479.  

Hello! I have checked and the disk space used by the jobs is set to 35e9 bytes.

<rsc_disk_bound>35e9</rsc_disk_bound>


I will change it first to 20e9, let me know if it helps. I can further decreased it in the future if necessary.


Thanks, Abouh, for your quick reaction. The change will definitely help - at least in my case with limited disk space due to Ramdisk.
ID: 59480 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GS

Send message
Joined: 16 Oct 22
Posts: 12
Credit: 1,382,500
RAC: 0
Level
Ala
Scientific publications
wat
Message 59481 - Posted: 18 Oct 2022, 19:52:56 UTC - in response to Message 59477.  

Hello, thanks you for reporting the job errors. Sorry to all, there was an error I have just fixed it, and the new jobs should be running correctly. Unfortunately, some already submitted jobs are bound to fail…


The problem is not fixed, I still get tasks that fail:
AttributeError: 'GWorker' object has no attribute 'batches'
ID: 59481 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 31 · 32 · 33 · 34 · 35 · 36 · 37 . . . 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2025 Universitat Pompeu Fabra