Message boards :
News :
ATM
Message board moderation
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 35 · Next
| Author | Message |
|---|---|
|
Send message Joined: 19 Aug 07 Posts: 46 Credit: 45,339,082 RAC: 28 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
I totally agree. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 261 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Quico has said multiple times that he doesn't know how to fix it (the runtime/% and checkpointing). complaining more wont get it fixed. at this point, it's your own choice to run this or not. if you don't like it, don't do it. Quico is a research scientist - and at least he communicates with us (thank you). I wouldn't expect him to be an expert in project administration. That's why my comment was explicitly directed at the (silent) administrators. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That's why my comment was explicitly directed at the (silent) administrators. yes, they are very silent; and obviously they don't care whether or not we volunteers are confronted with annoyingly faulty tasks :-( |
|
Send message Joined: 13 Apr 15 Posts: 11 Credit: 3,003,712,606 RAC: 1,912 Level ![]() Scientific publications
|
Quico has said multiple times that he doesn't know how to fix it (the runtime/% and checkpointing). complaining more wont get it fixed. at this point, it's your own choice to run this or not. if you don't like it, don't do it. Yes exactly. My comments are about the Admins/Devs...not Quico. And as Richard has said, at least he communicates with us and does what he can. It's a shame the others can't, or won't. |
StoneagemanSend message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
As I understand it, GPUgrid is now just one of several projects under the computational science lab and the developers are mostly involved with Acellera |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
the next problem I have been faced with for several days: the download of a task takes forever. Speed is about 10 kB/ps :-( This is nothing new, though. I remember that this kind of server problem comes up on a pretty regular basis :-( |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
the next problem I have been faced with for several days: the download of a task takes forever. Speed is about 10 kB/ps :-( right now, the download of a task has been taking 1:40 hrs so far and the progress is about 55 %. That's ridiculous :-( What's going on at GPUGRID? Are the servers breaking down? |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 543 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Lots of tasks going out to hosts and lots of results returning. Network speed has decreased under the increased congestion. We've seen this before when we had tons of acemd3 work. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Lots of tasks going out to hosts and lots of results returning. currently only 171 users are receiving and sending tasks with several hours between receiving and sending. So we are definitely not talking about outragiously high network traffic. Something seems to be wrong with their servers. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 543 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
The download times you mentioned are very long and not at all what I am experiencing. Don't know if your network connection speed is very slow or whether your ISP is having issues routing your traffic from the project to you. My download speeds are mainly in the range of 50-100 Mb/s according to the Transfers page in the Manager when I download new work. |
|
Send message Joined: 11 May 10 Posts: 68 Credit: 12,293,503,875 RAC: 3,253 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had previously reported that ATMbeta fails after about 40 seconds on my RTX4080 under Windows 11, while I see other users getting valid results on different RTX40x0s. Yesterday I installed Linux on the same machine and ATMbeta delivered valid results. The known bugs can again be observed, of course: Energy is NaN (some WU), progress bar jumps to 100% (exept the 0-5 units), no checkpoints. On Windows, ATMbeta seems to have a particular problem. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The download times you mentioned are very long and not at all what I am experiencing. here, some downloads get done rather quickly, some others take forever and sometimes they error out after long time. STDERR then says the following: <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>cmet_m16_m20_3-QUICO_ATM_Mck_GAFF2_v4-3-cmet_m16_m20_3-QUICO_ATM_Mck_GAFF2_v4-2-5-RND1222_1</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> </file_xfer_error> </message> The download speed of my ISP is 300 Mbit/s which normally works well as long as the download server at the other end has no problems. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
since yesterday, I face a new problem: a tasks fails after some time, but there is no stderr so that one could see what the problem was. Example: https://www.gpugrid.net/result.php?resultid=33612295 the task failed after 2.731 seconds. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The download times you mentioned are very long and not at all what I am experiencing. here an example of the download problem which I keep facing: https://www.gpugrid.net/result.php?resultid=33613115 Erstellt 3 Sep 2023 | 11:20:22 UTC Gesendet 3 Sep 2023 | 12:29:54 UTC Empfangen 3 Sep 2023 | 12:43:06 UTC since the download still did not get finished after almost 70 minutes, it broke off :-( I think GPUGRID needs to work on their servers quickly. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 543 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
What do you have for transfers in your cc_config.xml file? Just the basic 2 connections? Any rate limiting? I think the default should be 8 connections per project and 32 per host. Especially if there are other BOINC projects running besides GPUGrid. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
What do you have for transfers in your cc_config.xml file? it's 8 connections per project and the downloads get even worse now. Several times now downloads have stopped after proceeding extremely slowly, with "download failed" in the BOINC manager :-( |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
the BOINC event log keeps saying: project servers may be temporarily down. And the pending upload jumps to "repeat in ... hours" immediately. So there is definitely something wrong with the servers over there. P.S.: even sending this posting out took almost 1 minute |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 543 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Still believe the issue is local to you. In all the while you have reported issues with the downloads, I have not experienced any issues or backoffs. Project is working normally for me though the speeds have degraded from what I experienced a month ago or so. Still no issues keeping all the hosts crunching the ATMbeta tasks. |
StoneagemanSend message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I regularly get backoffs on transfers, so it's not just you. Their server has issues for sure. I have to use a different IP address than that which my hosts use, just to access this site. Some ideas, Reboot your router at least every 24hrs. If you are not on a fixed IP, this is likely to get you a new IP address. This helps me greatly with maintaining good transfer speeds. Use a VPN and set location as Spain. Use a script on each host to keep tickling their server, such as.... :top "C:\Program Files\BOINC\boinccmd" --host 127.0.0.1:31416 --passwd "yourpasswordhere" --network_available TIMEOUT /T 300 goto top Create a text file with this script. Edit to suit your install. Save as a batch file then double click to run it. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 3,915 Level ![]() Scientific publications
|
i started processing ATM again on my known stable host (Linux Ubuntu LTS, EPYC + 4x A4000). out of 160 tasks that have processed, 9 had an error ( excluded 9 tasks that had download errors and never wasted any processing time, download errors are just a "cost of doing business" with GPUGRID, IMO) that's roughly a 5% error rate, and reasonable IMO. yeah some failed after a decent processing time, but I'm not gonna get upset about it since the vast majority of tasks that touch my system complete successfully. if anyone is having a significantly higher error rate, you might need to look into the stability of the system itself, or switch to linux, or re-examine how you are operating (dont stop the tasks for any reason if you can help it, don't reboot, dont run other projects, etc) or any combination of the three. when setup properly and accounting for project specific idiosyncrasies, these tasks mostly run fine.
|
©2025 Universitat Pompeu Fabra