Message boards :
News :
WU: OPM995 simulations
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Wasn't thinking about task validation in the Boinc sense but rather validation of the experimental procedure - does it hold any weight? If we consider an experiment as a batch of work, validation of the experiment (and procedures) in scientific terms usually requires that the whole experiment be replicated, and perhaps many times before the results/methods are accepted. Of course Stefan might be doing this for different reasons. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I see what you mean now. I hope he has another reason. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
GTX970 on W10 24h and 41min with a bit of upload time too (118MB). http://www.gpugrid.net/result.php?resultid=15125538 Run time 88,881.18 CPU time 88,253.09 Validate state Valid Credit 788,690.00 I expect if a system was setup a bit better it could complete within 24h but I've a second GPU, the room's been 24C to 28C, I'm using the CPU quite a bit and my system is set to drop the clocks to keep the temperature down. This GPU was clocked at ~1300MHz, the second has dropped down to 1088. GDDR5 is @7GHz. Haven't managed to get an OPM on my Linux system yet. The point of installing Ubuntu 16.04 was to see if I could setup a GTX970 system to return these long OPM's inside 24h! FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 52,725 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I was fortunate enough to get and complete successfully 2 of these units: 5f1c-SDOERR_opm995-0-1-RND8074_2 11614800 30 May 2016 | 13:52:39 UTC 31 May 2016 | 6:23:14 UTC Completed and validated 56,458.02 56,161.20 940,443.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) # Time per step (avg over 5000000 steps): 11.257 ms # Approximate elapsed time for entire WU: 56284.859 s # PERFORMANCE: 157144 Natoms 11.257 ns/day 0.000 ms/step 0.000 us/step/atom 02:17:56 (7792): called boinc_finish http://www.gpugrid.net/result.php?resultid=15124495 3jw8R0-SDOERR_opm995-0-1-RND9612_2 11614181 30 May 2016 | 8:49:32 UTC 31 May 2016 | 0:50:29 UTC Completed and validated 55,859.07 55,499.59 956,403.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) # Time per step (avg over 10000000 steps): 5.578 ms # Approximate elapsed time for entire WU: 55780.416 s # PERFORMANCE: 79913 Natoms 5.578 ns/day 0.000 ms/step 0.000 us/step/atom 20:45:10 (7740): called boinc_finish http://www.gpugrid.net/result.php?resultid=15124201 With the 5f1c-SDOERR_opm995-0-1-RND8074_2, my windows 10 computer was able to achieve a 87% maximum GPU usage, while using 1950 MB of memory. While the 3jw8R0-SDOERR_opm995-0-1-RND9612_2, on the same computer, achieved 80% maximum GPU usage, while using 1100 MB of memory. I can't wait to get a few more of these! |
Send message Joined: 7 Jun 09 Posts: 24 Credit: 1,149,643,416 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Is it so, that when the new students arrive, that you would consider creating more short tasks? I think it is a pity, that you mostly cater to the very highend cards here. I'd like to continue supporting this project, but as it is I just can't afford to buy the faster cards. I do own a 970, and it is still a fast card. I would just hate to see it go over that 24H limit in the near future. I understand it is eventually inevitable, but it's barely a year old. Sadly, the highend cards also crunch the short units, when the long unit pool is dry, so they quickly eat up the short pool too. A WU tier would be nice however. I think it's been suggested somewhere else before, in these forums, that you could make a short, medium and long unit pool. That would be cool, so the small cards have the short pool, the cards a bit faster have the medium pool, and finally the highend can get into the top tier, long pool. Still, it was so in the past, that the short units also gave less points per day overall, even if same time is used on same card, but I don't know what the reason is for that. (Maybe the bonus isn't added to those?). Well, just my 2 cents worth of opinion :) |
Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Agreed: pity there are so few shorts..... My 650 Tis are too slow and the 660Tis looking pretty slow compared to many others. I can't afford newer cards and now with electricity costing me 18 cents (Canadian) per kWh, my contribution to GPUGrid will be very low. :( |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Before I received 2m59_SDOERR_opm994 (short WU) - Three prior hosts (GT640 / GTX950 / GTX970 r361&r364 driver) produced outcome -55 exit code (0xffffffffffffffc9) Unknown error zero runtime's. GTX970 (2m59 WU) compute 6.45hr estimated runtime (15.480% per 1hr). 2m59 WU status: 11-14% CPU usage (3.2GHz) / 54% GPU usage (1511MHz) / 24% MCU (7200MHz) / 25% BUS (PCIe3.0 x4) / GPU temp 39C / 33% GPU power (108W) / 550MB memory usage (no display connected) Topology reports 27558 atoms 4344 waters in system Thank you Zoltan for sharing helpful tip (in previous OPM thread) on where to locate a WU's atom amount file. |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 52,725 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I had one of these WUs fail with this error message: upload failure: <file_xfer_error> <file_name>4mt6-SDOERR_opm994-0-1-RND0442_0_11</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> http://www.gpugrid.net/result.php?resultid=15127701 Has this happened to anyone else with these WUs? I remember this happened in the past, and there is a fix to this posted, in the threads somewhere, but I can't remember where. I think this WU would have been otherwise good. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I had one of these WUs fail with this error message: See the WARNING/CHALLENGE: VERY LONG WU (VERYLONG_CXCL12_confAna) thread. It's embarrassing that we've run into this again. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've got 2d57-SDOERR_opm994-0-1-RND4399_1 running. The file description in client_state.xml is <file> <name>2d57-SDOERR_opm994-0-1-RND4399_1_11</name> <nbytes>0.000000</nbytes> <max_nbytes>5000000.000000</max_nbytes> <status>0</status> <upload_url>http://www.gpugrid.org/PS3GRID_cgi/file_upload_handler</upload_url> </file> - so the maximum size allowed is 5,000,000 bytes. So far, it's reached 852 KB at about 80% progress - which sounds like plenty of headroom, and perhaps not a widespread problem. But I'll keep an eye on it as it approaches completion. |
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
I apologize for not answering in a while, I have been a bit busy with writing my thesis. Job replication 2 was my desperate attempt to get my results back faster while also competing with the mass of simulations sent out by Gerard and reducing a bit my failure rates. I hope you don't mind too much since they were only around 300 WUs. If they arrive on the same host of course it's quite pointless. On the subject of short runs, I am unfortunately unable to help you because the equilibration runs cannot be split into smaller chunks. But as Gianni mentioned we are getting new students soon so it is possible that they have something for short. |
Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi, Stefan: Thank you for this- On the subject of short runs, I am unfortunately unable to help you because the equilibration runs cannot be split into smaller chunks. But as Gianni mentioned we are getting new students soon so it is possible that they have something for short. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
2d57-SDOERR_opm994-0-1-RND4399_1 uploaded cleanly, so it's not a universal problem. 4azpR0-SDOERR_opm995-0-1-RND6483_1 might get closer to the limit - I'll keep an eye on it. |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Before I received 2m59_SDOERR_opm994 (short WU) - Three prior hosts (GT640 / GTX950 / GTX970 r361&r364 driver) produced outcome -55 exit code (0xffffffffffffffc9) Unknown error zero runtime's. WUid=11616186 (1a0r OPM994) crashed my system multiple times - this WU had 100% GPU usage / 1% MCU / 20% power (65W) before the (first ever driver reset(s) I've encountered computing ACEMD in three years.) The (1a0r) WU ended with a -97 (0xffffffffffffff9f) Unknown error number after 102sec at reference stock clock once I noticed the first couple of driver recoveries OCed. (FATAL : Cuda driver error 719 in file 'swanlibnv2.cpp' in line 1965) A few other stable wingman (980ti / (2) 970's) high-end RAC systems (6 total) have error(s) (<100sec) with (1a0r) WU. As of now (2) OPM995 are without issue on my 970's at very high OC's: (WUid=11614432) 4a6fRO (50479 atoms with 9411 waters in system) 20.25hr estimated runtime at 12-15% CPU usage (3.2GHz) / 63% GPU usage (1511MHz) / 31% MCU (7200MHz) / 27% BUS (PCIe3.0 x4) / 34% power (110W) / 42C core / 820MB memory usage (WUid=116143650 4u15RO (51270 atoms with 8255 waters in system) 20.5hr estimated runtime at 12-15% CPU usage (3.2GHz) / 65% GPU usage (1511MHz) / 34% MCU (7010MHz) / 22% BUS (PCIe3.0 x8) / 60% power (120W) / 45C core / 843MB memory usage |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
1s4wR0-SDOERR_opm995-0-1-RND5214_0 11614436 3 Jun 2016 | 6:47:02 UTC 3 Jun 2016 | 20:01:33 UTC Completed and validated 45,293.51 20,015.48 147,829.50 Finally got an OPM on my Ubuntu 16.04 rig. Alas it didn't turn out to be an extra-long run and completed in 12h 35min at stock. Based on the run time of other long WU's the credit is about half what it should be. Was hoping to get an extra-long task and to finish inside 24h - c'est la vie... FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
4azpR0-SDOERR_opm995-0-1-RND6483_1 looks safe as well - 1,283 KB at 61%. # Topology reports 50432 atoms |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Too many errors (may have bug) 1a0r-SDOERR_opm994-0-1-RND9594 https://www.gpugrid.net/workunit.php?wuid=11616186 |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
(2) new OPM995 that should make the maximum size file_xfer allowed 5,000,000 bytes: 3nce WU#11614771 (126091 atoms with 25796 waters) status: 20hr estimated runtime at 12-16% CPU usage (3.2GHz) / 76% GPU usage (1511MHz) / 40% MCU (7200MHz) / 33% BUS (PCIe3.0 x4) / 40% power (130W) / 44C temp / 1559MB memory usage 2b6p WU#11614758 (129818 atoms with 23308 waters) status: 21hr estimated runtime at 12-16% CPU usage (3.2GHZ) / 75% GPU usage (1511MHz) / 45% MCU (7010MHz) / 24% BUS (PCIe3.0 x8) / 70% power (140W) / 47C temp / 1662MB memory usage Before I received 2m59_SDOERR_opm994 (short WU) - Three prior hosts (GT640 / GTX950 / GTX970 r361&r364 driver) produced outcome -55 exit code (0xffffffffffffffc9) Unknown error zero runtime's. |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Any TX/980ti/980/970 (Present batch) SDOERR_opm99 grant 1,000,000 credit? My -+ (runtime) Credit: 23,912.30 GPU / 11,332.23 CPU / 41,296.50 credits (27588 atoms) / 5mil step 74,154.80 / 16,389.80 / 377,254.50 credits (126091 atoms) / 5mil step An odd short run 5mil step (~27k atoms) WU cropped up. 0 unsent 271 in progress 1155 success 47.62% error rate |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Any TX/980ti/980/970 (Present batch) SDOERR_opm99 grant 1,000,000 credit?4by0-SDOERR_opm994-0-1-RND5591_1 58.472s (16h 14m 26s) 1.023.036 credits 170941 atoms 11.696 ns/day 5M steps This workunit is very interesting, as the initial replication was 2, the other host which received this workunit also received the +50% bonus, while it has returned it after 1d 14h. |
©2025 Universitat Pompeu Fabra