ATM

Message boards : News : ATM
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 35 · Next

AuthorMessage
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 60373 - Posted: 6 May 2023, 16:27:29 UTC - in response to Message 60372.  

atmbeta likely has nothing to do with it.

but ATMbeta uses CUDA, Einstein uses OpenCL. does BOINC still report OpenCL support in the startup log? you might need to reinstall your drivers.

_____________________________

Thank you. I have just finished reinstalling Windows and now the drivers.
ID: 60373 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 60375 - Posted: 8 May 2023, 3:47:12 UTC

Clean Windows install and drivers install.
task 27488709
I am at my wit's end. Now whats wrong?
ID: 60375 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Quico
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 28 Feb 23
Posts: 35
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60376 - Posted: 8 May 2023, 10:34:26 UTC - in response to Message 60358.  

And a similar batch configuration error with today's BACE run, like

BACE_m24_m7e_5-QUICO_ATM_Sage_xTB-0-5-RND7993_0

08:05:32 (386384): wrapper: running bin/bash (run.sh)
bin/bash: run.sh: No such file or directory

(five so far)

Edit - now wasted 20 of the things, and switched to Python to avoid quota errors. I should have dropped in to give you a hand when passing through Barcelona at the weekend!


Yes, big mess up on my end. More painful since it happened to two of the sets with more runs. I just forgot to run the script that copies the run.sh and run.bat files to the batch folders. It happened to 2/8 batches but yeah, big whoop. Apologies on that. The "fixed" runs should be sent soon. The "missing *0.xml" errors should not happen anymore too.

Regarding checkpoint, at least I, cannot do much more than pass the message which I have done several times.

Again, sorry for this. I can understand it to be very annoying.
ID: 60376 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60378 - Posted: 8 May 2023, 10:52:27 UTC - in response to Message 60376.  

Thanks for reporting back.

The good news is that task BACE_m4m_m4n_3_FIX-QUICO_ATM_Sage_xTB-0-5-RND4596_0 is running as it should.
ID: 60378 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,187
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60379 - Posted: 8 May 2023, 20:00:41 UTC

Extrapolated execution times for several of my currently running "BACE_" and "MCL1_" WUs are pointing to be longer than other previous batches.
I hope this don't lead to result files bigger than server can handle...
ID: 60379 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60380 - Posted: 8 May 2023, 20:11:32 UTC

Agreed. My first BASE of the current batch ran for 20 minutes per sample, compared with previous batches which ran at speeds down as low as 5 minutes per sample. It's touch and go whether they will complete within 24 hours (GTX 1660 Ti/super).

But in spite of the apparent speed, it finished and was accepted after 7 hours, as before.
ID: 60380 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60381 - Posted: 9 May 2023, 3:44:44 UTC - in response to Message 60379.  
Last modified: 9 May 2023, 3:45:30 UTC

Extrapolated execution times for several of my currently running "BACE_" and "MCL1_" WUs are pointing to be longer than other previous batches.
I hope this don't lead to result files bigger than server can handle...

I am afraid that just now I am confronted with such a case: the file has size of 719 MB, and it does not upload, just backing off all the time :-(
WTF is this? Did it run 15 hours on a RTX3070 just for nothing?
ID: 60381 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60382 - Posted: 9 May 2023, 7:05:23 UTC

I'm in the same unfortunate situation with too large an upload.

GPUGRID BACE_m7i_m7a_3_FIX-QUICO_ATM_Sage_xTB-0-5-RND9648_1_0 0.000 736531.62 K 00:00:22 - 01:53:11 0.00 Kbps Upload pending (Retry in: 01:14:27), retried: 6 Numbskull
ID: 60382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stoneageman
Avatar

Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60383 - Posted: 9 May 2023, 8:47:07 UTC

Now have 15 BACE tasks backed up because server not accepting the file size.

Will this be fixed or should all BACE tasks be aborted?
ID: 60383 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60384 - Posted: 9 May 2023, 9:04:17 UTC - in response to Message 60383.  

I'd hang on to them for a day or two - it can be fixed, if the right person pulls their finger out.

I have one heading that way, and I'll post the debug messages when it's cooked. At the moment, my suspicion is the Apache configuration, but we need proof.
ID: 60384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60385 - Posted: 9 May 2023, 11:23:44 UTC - in response to Message 60384.  

I'd hang on to them for a day or two - it can be fixed, if the right person pulls their finger out.

Although I doubt that this will happen :-(
ID: 60385 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 60386 - Posted: 9 May 2023, 11:58:44 UTC

previous instances of this problem you could abort the large upload and it will report fine and you still get credit most of the time.
ID: 60386 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60387 - Posted: 9 May 2023, 12:23:26 UTC - in response to Message 60386.  

I'd like to think that all that bandwidth carries something of value to the researchers - that would be the main point of it.
ID: 60387 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 60388 - Posted: 9 May 2023, 12:37:21 UTC - in response to Message 60387.  

i thought one of the researchers said they don't need this file.

but if you don't ever upload it they wont get anything and you wont get credit anyway. not really anything to lose.

they've never been able to raise this limit on their server.
ID: 60388 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stoneageman
Avatar

Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60389 - Posted: 9 May 2023, 12:57:29 UTC

I just aborted such a completed task which then showed as ready to report. Reported it, but zero credit :-(
ID: 60389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60390 - Posted: 9 May 2023, 13:02:56 UTC - in response to Message 60388.  

i thought one of the researchers said they don't need this file.

Perhaps Quico could confirm that, since we seem to have his attention?
ID: 60390 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stoneageman
Avatar

Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60391 - Posted: 9 May 2023, 13:27:07 UTC

As some compensation for the BACE tasks, I'm seeing the MCL1 sage seasoned tasks reporting

after just a minute running and yet getting full credit :-)
ID: 60391 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60392 - Posted: 9 May 2023, 14:21:24 UTC

OK, confirmed - it is still the Apache problem.

Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: HTTP/1.1 413 Request Entity Too Large
Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: Date: Tue, 09 May 2023 14:06:15 GMT
Tue 09 May 2023 15:06:15 BST | GPUGRID | [http] [ID#15383] Received header from server: Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips mod_auth_gssapi/1.5.1 mod_auth_kerb/5.4 mod_fcgid/2.3.9 PHP/5.4.16 mod_wsgi/3.4 Python/2.7.5

File (the larger of two) is 754.1 MB (Linux decimal), 719.15 MB (Boinc binary).

At this end, we have two choices:

1) Abort the data transfer, as Ian suggests.
2) Wait 90 days for somebody to find the key to the server closet.

Quico?
ID: 60392 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60393 - Posted: 9 May 2023, 14:41:54 UTC

is this problem only with "BACE..." tasks, or has anyone seen it with other types of task as well?
ID: 60393 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stoneageman
Avatar

Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60395 - Posted: 9 May 2023, 16:28:52 UTC

Just BACE tasks affected. I've now aborted 14 tasks and half were credited.
ID: 60395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 35 · Next

Message boards : News : ATM

©2025 Universitat Pompeu Fabra