Message boards :
Server and website :
NO TASKS AVAILABLE
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
| Author | Message |
|---|---|
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You're welcome! |
|
Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 1,014,572 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Really fancy to see how quickly the roughly 15.000 WUs available in the second half of last week were downloaded (even with the server outage of 2 days) - right now, no tasks available any more :-( |
|
Send message Joined: 21 Mar 16 Posts: 513 Credit: 4,673,458,277 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It appears as if they removed the 4500+ ubiquitin WUs that were in queue. Perhaps there was something wrong with them. Remember people this is real science, not just mindless work for your computer. |
|
Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 1,014,572 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
A few minutes ago, three SDOERR_BNB tasks which were running on three of my hosts were "abortet by server" why so? Funny things happening at GPIGRID lately :-( |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
A few minutes ago, three SDOERR_BNB tasks which were running on three of my hosts were "abortet by server" Me to. I had one that had run 46,000 seconds and cancelled by server. Wonder if we will get an explanation of why thousand of WU's have been withdrawn and running WU's cancelled. https://gpugrid.net/workunit.php?wuid=12273672 |
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
I am really, really sorry everyone. http://gpugrid.net/forum_thread.php?id=4488#46125 |
|
Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I cannot get work as this file cufft32_65.dll cannot complete downloading. John |
|
Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
resolved - communications issue at my end John |
|
Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 1,014,572 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
in this minute, the project status page shows 290 BNBS WUs left for download, no other WUs available (at this time). These 290 WUs will be used up within the next few hours. I am wondering if new ones will be made available shortly. |
|
Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 1,014,572 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
... I am wondering if new ones will be made available shortly.Thanks, Stefan, for feeding us with more WUs in the meantime :-) |
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
BNBS2 simulations are chained (i.e. when one completes it sends the next step of the simulation) so I don't expect to run out yet. Next week...maybe?...hopefully :D? You are totally wrecking it though, I can barely keep up with retreiving and processing them, haha. 3600 simulations running in parallel is pretty nuts. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
3600 simulations running in parallel is pretty nuts.That's the point in grid computing :) That's the power of a large community. |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
BNBS2 simulations are chained (i.e. when one completes it sends the next step of the simulation) so I don't expect to run out yet. Next week...maybe?...hopefully :D? You are totally wrecking it though, I can barely keep up with retreiving and processing them, haha. 3600 simulations running in parallel is pretty nuts. You have to remember most people cache WU's so apart from those crunchers that actually run 2 at a time on a single GPU you don't have 3600 running. Then there are those who will hold on to WU's for 5 days before they are resent. Not a problem while you still have enough to send but becomes a slow down when you haven't and are waiting for the last ones to return. I have one such WU here which is still at the first part of the 5 part chain http://www.gpugrid.net/workunit.php?wuid=12275674 |
|
Send message Joined: 21 Mar 16 Posts: 513 Credit: 4,673,458,277 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I think the admins should have the ability to remote abort if the user is not computing it or it's taking too long, if they don't already have this ability |
|
Send message Joined: 21 Mar 16 Posts: 513 Credit: 4,673,458,277 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
0 unsent again, does this mean there are too many being held or does this mean it's over? |
Logan CarrSend message Joined: 12 Aug 15 Posts: 240 Credit: 64,069,811 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
It looks like it's over for now due to there only being a single project on the list. I wouldn't be surprised if there are new ones coming out soon. Is it usual for gpugrid to do stuff like this? In phases as in lots of work, no work, lots of work, no work, etc. I am not sure because I haven't been around too much. Maybe someone can tell me if this is normal or not? I'd appreciate it. Cruncher/Learner in progress. |
Logan CarrSend message Joined: 12 Aug 15 Posts: 240 Credit: 64,069,811 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
It looks like it's over for now due to there only being a single project on the list. I wouldn't be surprised if there are new ones coming out soon. Ok scratch that. I just got a task for my system. Try to leave yours on and you'll get one like I did. Cruncher/Learner in progress. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It looks like it's over for now due to there only being a single project on the list. I wouldn't be surprised if there are new ones coming out soon.Yes. In phases as in lots of work, no work, lots of work, no work, etc.Even though the server status shows 0~3 unsent workunits there is work until the number of workunits in progress is higher than the number of the active hosts (2000~2500) as new workunits generated from a finished workunit (but it's instantly grabbed by some hungry host, hence there is 0 unsent in these periods). Maybe someone can tell me if this is normal or not? I'd appreciate it.To understand this a litte deeper, you should understand the naming architecture of the workunits. For example: ARG59ALA_S14F21_C2-SDOERR_BNBS2-0-4-RND9058_2 The main parts of the name is separated by dashes ("-"), let's split this name by them: 1. The name (ID) of the molecules involved 2. The scientist's name, and the batch name 3. The actual number of this workunit in the chain (starts at 0) 4. The final number in the chain 5. The random seed number for the given chain, and the actual number of resends. Since the chain starts at 0, the number of workunits in the chain is 1 more than the number in position 4. The last workunit in the chain has the same numbers in position 3 and 4. Ok scratch that. I just got a task for my system. Try to leave yours on and you'll get one like I did.We'll receive new workunits in a descending probability while the number of unsent is at 0 until the number in position 3 reaches the number in position 4, after that we'll receive workunits only when they timed out (sitting in the queue on slow, or overwhelmed, or inactive hosts). |
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Pablo should be sending out some simulations soon (this/next week). |
|
Send message Joined: 21 Mar 16 Posts: 513 Credit: 4,673,458,277 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So are all the important WUs like 0-2 in the chain assigned to the fast cards and the last one assigned to the slower cards? That would make sense as for whatever reason you can only run them in a serial chain rather than all parallel. |
©2026 Universitat Pompeu Fabra