WU: OPM simulations

Author	Message
Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 57 Level Scientific publications	Message 43580 - Posted: 26 May 2016, 0:09:26 UTC - in response to Message 43545. Wow ok, this thread derailed. We are supposed to keep discussions related just to the specific WUs here, even though I am sure it's a very productive discussions in general :) That's what happens when you allow the lunatics to run the asylum. I am a bit out of time right now so I won't split threads and will just open a new one because I will resend OPM simulations soon. Ok, bring them on. I'm ready. ID: 43580 · Rating: 0 · rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 43581 - Posted: 26 May 2016, 9:31:57 UTC - in response to Message 43580. Last modified: 26 May 2016, 9:33:57 UTC How I imagine your GPUs after OPM: ID: 43581 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 43583 - Posted: 26 May 2016, 11:32:00 UTC - in response to Message 43581. A simulation containing only 35632 atoms is a piece of cake. ID: 43583 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level Scientific publications	Message 43584 - Posted: 26 May 2016, 16:49:45 UTC - in response to Message 43567. ... The <rsc_disk_bound> is set to 8*10^9 (7.45GB) which is at least one order of magnitude higher then necessary. when I temporarily ran BOINC on a RAMDisk some weeks ago, I was harshly confronted with this problem. There was only limited disk space available for BOINC, and each time the free RAMDisk space went below 7.629MB (7.45GB), the BOINC manager did not download new GPUGRID tasks (the event log complained about too little free disk space). I contacted the GPUGRID people, and they told me that they will look into this at some time; it can't be done right now, though, as Matt is not available for some reason (and seems be the only one who could change/fix this). ID: 43584 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43587 - Posted: 26 May 2016, 20:19:57 UTC - in response to Message 43584. Are the GERARD_CXCL12VOLK_ Work Units step 2 of the OPM simulations or extensions of the GERARD_FCCXCL work - or something else? PS Nice to see plenty of tasks over the long weekend: Tasks ready to send 2,413 Tasks in progress 2,089 Will these auto-generate new work? FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 43587 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level Scientific publications	Message 43589 - Posted: 26 May 2016, 20:41:42 UTC - in response to Message 43588. I haven't tried this, but theoretically it should work. What theory is that? It isn't a defined field, according to the Application configuration documentation. ID: 43589 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 43590 - Posted: 26 May 2016, 21:31:25 UTC - in response to Message 43589. Last modified: 26 May 2016, 21:33:32 UTC I haven't tried this, but theoretically it should work. What theory is that? It isn't a defined field, according to the Application configuration documentation. Oh, my bad! That won't work... I read a couple of post about this somewhere, but I've clearly messed it up. Sorry! Sk, Could you hide that post please? ID: 43590 · Rating: 0 · rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 57 Level Scientific publications	Message 43592 - Posted: 26 May 2016, 22:15:01 UTC - in response to Message 43581. How I imagine your GPUs after OPM: For the past few day, while there was little work here, I was crunching at a tough back up project (Einstein), where my computers were able to crunch 2 GPU WUs per card simultaneously with GPU usage of 99% max for my xp computer and 91% max for my windows 10 computer. So, anything you have, should be a walk in the park, even if you come with a 200,000+ atom simulation with 90%+ GPU usage. Good luck!! ID: 43592 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 43593 - Posted: 27 May 2016, 0:46:51 UTC - in response to Message 43584. when I temporarily ran BOINC on a RAMDisk some weeks ago, I was harshly confronted with this problem. There was only limited disk space available for BOINC, and each time the free RAMDisk space went below 7.629MB (7.45GB), the BOINC manager did not download new GPUGRID tasks (the event log complained about too little free disk space). I contacted the GPUGRID people, and they told me that they will look into this at some time; it can't be done right now, though, as Matt is not available for some reason (and seems be the only one who could change/fix this). I had this happen recently when the disk partitions on which BOINC was installed went below that level. Thought it was strange, wasn't sure if it was a GPUGrid or BOINC thing. Anyway resized the partitions with a disk manager and started getting work on those machines again. ID: 43593 · Rating: 0 · rate: / Reply Quote

nanoprobe Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level Scientific publications	Message 43594 - Posted: 27 May 2016, 2:19:39 UTC - in response to Message 43466. Last modified: 27 May 2016, 2:21:03 UTC The Error Rate for the latest GERARD_FX tasks is high and the OPM simulations were higher. Perhaps this should be looked into. _Application_ _unsent_ In Progress Success Error Rate Short runs (2-3 hours on fastest card) SDOERR_opm99 0 60 2412 48.26% Long runs (8-12 hours on fastest card) GERARD_FXCXCL12R_1406742_ 0 33 573 38.12% GERARD_FXCXCL12R_1480490_ 0 31 624 35.34% GERARD_FXCXCL12R_1507586_ 0 25 581 33.14% GERARD_FXCXCL12R_2189739_ 0 42 560 31.79% GERARD_FXCXCL12R_50141_ 0 35 565 35.06% GERARD_FXCXCL12R_611559_ 0 31 565 32.09% GERARD_FXCXCL12R_630477_ 0 34 561 34.31% GERARD_FXCXCL12R_630478_ 0 44 599 34.75% GERARD_FXCXCL12R_678501_ 0 30 564 40.57% GERARD_FXCXCL12R_747791_ 0 32 568 36.89% GERARD_FXCXCL12R_780273_ 0 42 538 39.28% GERARD_FXCXCL12R_791302_ 0 37 497 34.78% 2 or 3 weeks ago the error rate was ~25% to 35% it's now ~35% to 40% - Maybe this varies due to release stage; early in the runs tasks go to everyone so have higher error rates, later more go to the most successful cards so the error rate drops? ... FWIW the ever increasing error rate is why I no longer crunch here. Hours of wasted time and electricity could be better put to use elsewhere like POEM. My 970s are pretty much useless here nowadays and the 750TIs are completely useless. JMHO ID: 43594 · Rating: 0 · rate: / Reply Quote

Stefan Project administrator Project developer Project tester Project scientist Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level Scientific publications	Message 43598 - Posted: 27 May 2016, 8:39:32 UTC - in response to Message 43594. These error rates are a bit exaggerated since AFAIK they include instantaneous errors which don't really bother much. ID: 43598 · Rating: 0 · rate: / Reply Quote

nanoprobe Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level Scientific publications	Message 43607 - Posted: 27 May 2016, 14:31:45 UTC - in response to Message 43598. Last modified: 27 May 2016, 14:43:50 UTC These error rates are a bit exaggerated since AFAIK they include instantaneous errors which don't really bother much. Unfortunately that is not true for me. I almost never have a task that errors out immediately. They're thousand of seconds in before they puke. Especially so in the last few months. FWIW I'm not a points ho but if we got some kind of credit for tasks that error out before finishing like other projects do I'd be more inclined to run them but 6-10 hours of run time for nada just irks me when that run time could be productive somewhere else. And yes I understand that errors still provide useful info. At least I'm assuming they do and so if they supply useful info we should get some credit. JMHO ID: 43607 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level Scientific publications	Message 43608 - Posted: 27 May 2016, 15:32:27 UTC - in response to Message 43607. These error rates are a bit exaggerated since AFAIK they include instantaneous errors which don't really bother much. Unfortunately that is not true for me. I almost never have a task that errors out immediately. They're thousand of seconds in before they puke. Especially so in the last few months. FWIW I'm not a points ho but if we got some kind of credit for tasks that error out before finishing like other projects do I'd be more inclined to run them but 6-10 hours of run time for nada just irks me when that run time could be productive somewhere else. And yes I understand that errors still provide useful info. At least I'm assuming they do and so if they supply useful info we should get some credit. JMHO On the other hand, I can barely remember a task which errored out here for an unexplained reason. I've certainly had some since the last ones showing, which were for October/November 2013. I think my most recent failures were because of improper computer shutdown/retstarts - power outages due to the winter storms. I don't see any reason why the project should reward me for those - my bad for not investing in a UPS. The machine I'm posting from - 45218 - has no "Unrecoverable error" events for GPUGrid as far back as the logs go (13 January 2016), and it runs GPUGrid constantly when tasks are available. If you are seeing a much higher error rate, I think you should look closer to home. I don't think the project's applications and tasks are inherently unstable. ID: 43608 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level Scientific publications	Message 43609 - Posted: 27 May 2016, 16:44:35 UTC - in response to Message 43590. Oh, my bad! That won't work... I read a couple of post about this somewhere, but I've clearly messed it up. Sorry! Yes, indeed it won't work :-( One of the comments, a few weeks ago, in the forum was: The disk space requirement is set in the workunit meta-data. ... If disk usage was associated with the application, you could re-define it in an app_info.xml: but because it's data, it's correctly assigned to the researcher to configure. Meanwhile it doesn't bother me any more, since I gave up running BOINC on a RamDisk. Nevertheless though, I think this should be looked into / questioned by the GPUGRID people. ID: 43609 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 43610 - Posted: 27 May 2016, 18:45:26 UTC - in response to Message 43594. My 970s are pretty much useless here nowadays and the 750TIs are completely useless. JMHO This latest batch might be better, though I have just started. But at 3 hours into the run, it looks like a GERARD_CXCL12VOLK will take 12.5 hours to complete on a GTX 970 running at 1365 MHz (Win7 64-bit). ID: 43610 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level Scientific publications	Message 43612 - Posted: 27 May 2016, 19:47:46 UTC - in response to Message 43610. This latest batch might be better, though I have just started. But at 3 hours into the run, it looks like a GERARD_CXCL12VOLK will take 12.5 hours to complete on a GTX 970 running at 1365 MHz (Win7 64-bit). here it took 12.7 hrs on a GTX970 (running at 1367 MHz) - Win10 64-bit. ID: 43612 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 43613 - Posted: 27 May 2016, 21:09:21 UTC - in response to Message 43594. FWIW the ever increasing error rate is why I no longer crunch here. Hours of wasted time and electricity could be better put to use elsewhere like POEM. My 970s are pretty much useless here nowadays and the 750TIs are completely useless. JMHO According to the performance page the GTX 970 is a pretty productive GPU: It suggests that the reason for the increasing error rate you are experiencing lies at your end. The most probable cause of this is too much overclocking, inadequate cooling or PSU. GPUGrid is more demanding than other GPU projects, so the system/settings which work for other projects could be inappropriate for GPUGrid tasks. In some cases factory overclocked cards will not work here until their factory default frequency is reduced. ID: 43613 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 43614 - Posted: 27 May 2016, 21:27:40 UTC - in response to Message 43613. The comment was thread specific and based on a GTX970 not being able to return some OPM tasks within 24h. That was the case on WDDM systems. While a GTX970 is still quite capable of finishing most tasks within 24h it remains to be seen how it fares with the next round of OPM tasks (or whatever they are now being called). That said and generally speaking a GTX970 will likely remain a very good GPU for many months to come. Can't see it being less than a good GPU before the autumn and in 18months I expect there will still be plenty chipping away here. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 43614 · Rating: 0 · rate: / Reply Quote

TyphooN [Gridcoin] Send message Joined: 29 Jun 14 Posts: 5 Credit: 29,718,557 RAC: 0 Level Scientific publications	Message 43615 - Posted: 28 May 2016, 9:07:39 UTC - in response to Message 43614. Last modified: 28 May 2016, 9:22:25 UTC I wanted to report back in, as my overclock on my 980ti hybrid is now stable. I was out of town for a few days, and my OpenVPN tunnel went down (which I do DNS lookups through). As I was unable to resolve DNS, all of my completed tasks were failing to upload. Once I got back into town, I bounced my OpenVPN tunnel and local DNS server and uploaded 2 complete WUs. I was awarded 187,100 credit instead of 249,600. What is frustrating about this to me is that all of my other machines, including those running POEM, Einstein, and Milkyway were all churning away with no issues or no credit hits. I am awarded the same credit for turning in the work a bit later. With that said, I feel that the deadlines are too short and the penalty is too harsh. I don't understand why the penalty is so quick and severe. I have had to set my system to store only 0.5 days of work, and store up to an additional 0 days of work. When I do this, I only get 2 GPUGRID tasks. Past that, I am queueing 3 jobs and the credit rewarded gradually grows lower. For me it makes me not want to queue many jobs, because if I download too many jobs then I am stuck getting low credit indefinitely. This is something that I only have to worry about on GPUGRID. All of my other machines are set to store at least 3 days of work, and store up to an additional 2 days. I urge that the credit penalty should be removed, or only be active at a more reasonable point in the future. I also delete my second GPUGRID task, because if I suspend BOINC to play a game, the constant processing of WUs is disrupted causing the credit rewarded to plummet. I do like to game from time to time, but it only complicates my usual suspending of BOINC work. Even if I choose to game for too long I might not return the workunit in time to not be punished. So far I have enjoyed my stay here, but can see why some might easily get discouraged. ID: 43615 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 43616 - Posted: 28 May 2016, 9:24:38 UTC - in response to Message 43615. Last modified: 28 May 2016, 9:25:35 UTC Hmm... Instead of thinking of the lower credits as a "penalty", instead think of the higher credits as a "bonus". GPUGrid offers "bonuses" for quick returns, which is pretty unique for a BOINC project, and sometimes some scenarios just can't get those bonuses. For me, I don't care about credit at all. I just ensure that my GPUs can return the GPUGrid tasks within the deadline (within 5 days), such that the project doesn't waste anybody's resources be reassigning those tasks. Also, regarding your gaming situation, are you aware that you can configure BOINC to automatically suspend when certain applications are running? It's called "Exclusive applications", within BOINC's settings, and I think you may find it quite useful. Regards, Jacob ID: 43616 · Rating: 0 · rate: / Reply Quote