Message boards :
Number crunching :
Aborted by server
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
I want to apologize to everyone who lost work on the BNBS simulations a moment ago. It was an unfortunate step I had to take. There was a small but important mistake in the BNBS simulations which caused them to not chain correctly together. We will still use the simulations which came back but we have a deadline for the publication and I had to make use of all the resources for the fixed BNBS2 simulations. I am really sorry for this. I considered it well before doing it but I believe that it made sense given the circumstances. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Was wondering, just had 4 BNBS WUs aborted while running. Edit, after investigating further it seems there were 6 running. Strange that 2 were actually uploading when they were aborted and on the website report 0 time as if there weren't started? Only 4 of the 8 show time on the website but 6 were running, 2 hadn't yet started. |
|
Send message Joined: 21 Mar 16 Posts: 513 Credit: 4,673,458,277 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thank you for informing us |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thank you for informing us + 1 |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
the newly downloaded WU WT_S3F9_C2-SDOERR_BNBS2-0-4-RND5326_0 errored out after some 7.000 seconds. This has not happened before with any BNBS. Was this just a coincidence, or could anything be wrong with the new WUs? |
|
Send message Joined: 16 May 09 Posts: 11 Credit: 131,226,034 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
the newly downloaded WU WT_S3F9_C2-SDOERR_BNBS2-0-4-RND5326_0 errored out after some 7.000 seconds. This has not happened before with any BNBS. I notice that the table at the bottom of the "Server status" page shows a worrying statement of zero succeses (which could be attributed to a lack of returned WU results) and also a 100% error rate (which suggests all returns to date have failed). Whilst this may be caused by just a few failed WUs thus far, I've not seen that rate of errors since last October's early beta-testing of the PASCAL version of the application so maybe there is, indeed, something amiss within this new batch of WUs. Or else I'm imagining a trend where none exists based on too small a sample size ... which is always possible. Dave |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I notice that the table at the bottom of the "Server status" page shows a worrying statement of zero succeses (which could be attributed to a lack of returned WU results) and also a 100% error rate (which suggests all returns to date have failed).This is normal, as not enough time has passed since the release of this batch to have any successful tasks. Whilst this may be caused by just a few failed WUs thus far, I've not seen that rate of errors since last October's early beta-testing of the PASCAL version of the application so maybe there is, indeed, something amiss within this new batch of WUs.I have 4 running for 3 hours 20 minutes without any failures. They need another 6 hours to finish, so we'll have normalized error rate after that. |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
the newly downloaded WU More likely your card is overclocked too much for this particular WU as you have "Simulation has become unstable" in your output file. |
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Thanks for the understanding. From performance they should be identical. I only changed the input configuration and did a single initial simulation step to generate the needed files. I would say that if it works for one (i.e. Zoltan) it should work for all in the sense that the only thing I can imagine could break would be a broken input file. But since they all share the same input files they should work. But I will check tomorrow the results anyway if anything is going wrong as we don't have the luxury to repeat that mistake. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
the newly downloaded WU hm, I forgot to look at the output file, thanks for the hint. However, I wonder if we can really talk about overclocking at a rate of 1240MHz. On my other GTX980ti (same model as this one here) the clock is 1340MHz - without any problem. But, as you said, it may have had to do with that particular WU. BTW, also yesterday, another BNBS failed at a GTX750ti at 1137MHz. So, maybe these BNBS are somewhat more susceptible to overclocking, in comparison to other WUs. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've had a couple of the new BNBS2 WUs finish on my 1060 cards now. Seem to run fine. Other than the cancelled ones I've seen no failures at all with the BNBS2 WUs and I run 15 750Ti factory OCed cards. |
|
Send message Joined: 16 May 09 Posts: 11 Credit: 131,226,034 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've had a couple of the new BNBS2 WUs finish on my 1060 cards now. Seem to run fine. Other than the cancelled ones I've seen no failures at all with the BNBS2 WUs and I run 15 750Ti factory OCed cards. Mixed success from me: - my 1050ti choked on a BNBS2 yesterday evening after 3.5 hours, then happily crunched a GERRARD_MO_TRV2 without issue and is now 2.5 hours into another BNBS2 (albeit threatening a >31 hour turnaround) - my 1060 has recently finished one in just under 21 hours and is embarking on the next one (with a similar predicted turnaround) Both cards are (now) in the same box with no system o/c and only the factory-set o/c on the cards so my first failure remains a mystery unless these WUs really are particularly sensitive in some circumstances. The error rate for these WUs on the "Server status" page is coming down so that's a good sign ... albeit the turnaround time for some these WUs seems to be on the high side (so more chances of failure, perhaps). |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had one failure on BNBS2: GLU73ALA_S5F20_C2-SDOERR_BNBS2-0-4-RND2615_0 This was on an ASUS GTX 980Ti Strix, which has 3600MHz memory clock, and its GPU is boosted to 1401MHz (which is a bit optimistic under Windows XP, so I shaved off 11MHz now). Due to very low outside temperatures (-11°C, 12°F) I've reanimated my GTX680 (@1189MHz) in a DQ45CB motherboard with a Core2Duo E8500 (3.16GHz). It has successfully crunched a WT_S3F4_C27-SDOERR_BNBS2-0-4-RND5274_0 in 22h 55m 10s :) |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So far 15 of the new BNBS2 WUs completed: 9 on 750Ti 2GB cards 4 on 1060 3GB 1 on 1050Ti 4GB 1 on 670 2GB 4 more are at 92% - 99% done No errors on any... |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My ancient host finished another of these BNBS2 workunits: GLU73ALA_S14F19_C2-SDOERR_BNBS2-0-4-RND8567_0 in 23h 56m 53s. It has missed the 24h bonus by 13 minutes, because it was downloaded earlier, but I've reduced my work cache to 0 days since then. I don't have any failures in the past 24 hours. |
|
Send message Joined: 16 May 09 Posts: 11 Credit: 131,226,034 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The situation for me is similar: - three WUs completed successfully on my 1060 in approx. 20-21 hrs - one early failure and one recent, successful completion on the 1050ti in just under 30 hrs - two new WUs are currently on the go, one on each card; predicted run-times in each case are simliar to the previous ones I too am keeping my work cache at a low level to avoid downloading WUs too soon and missing the 24hr deadline but not right down to zero days as I need some headroom to ensure the CPU tasks on that machine are kept trickling in. |
©2025 Universitat Pompeu Fabra