Message boards : News : Large scale experiment: MDAD
Author | Message |
---|---|
We are starting a new large-scale experiment. There will be plenty of workunits, whose very first batch is currently being sent. Run times should be around 6h but with a lot of variability. They are very heterogeneous so please don't worry for failures. | |
ID: 53462 | Rating: 0 | rate: / Reply Quote | |
These are running just fine on both Windows and Linux so far. I haven't seen any run times near six hours yet. I also see that the Linux app is loading the GPU much higher than the Windows app is, about double. | |
ID: 53465 | Rating: 0 | rate: / Reply Quote | |
Almost 20 tasks validated so far, but I have also had two WUs end in an error after a few seconds, on two different hosts so far: <core_client_version>7.9.3</core_client_version> Also, while my Linux machines get a GPU core load of 90-100%, the Windows ones aren't doing so great (one thread is set aside for each task in the client and swan_sync is on). Sub-90, sometimes around 80 and the worst I've seen is an RTX 2080 at 70% max. | |
ID: 53466 | Rating: 0 | rate: / Reply Quote | |
Cool, got all 3 Linux NV cards happily crunching away at about 95% gpu usage. | |
ID: 53467 | Rating: 0 | rate: / Reply Quote | |
I'm glad to process Science again! (please, note capital letter for this) | |
ID: 53469 | Rating: 0 | rate: / Reply Quote | |
What is the object of the research? | |
ID: 53471 | Rating: 0 | rate: / Reply Quote | |
Toni, Glad to get the work. | |
ID: 53472 | Rating: 0 | rate: / Reply Quote | |
Toni, Glad to get the work. | |
ID: 53473 | Rating: 0 | rate: / Reply Quote | |
What is the object of the research? Yes, can we have a bit more info on what we crunch? Thanks! Sylvain | |
ID: 53478 | Rating: 0 | rate: / Reply Quote | |
Trying to get some tasks and so far no luck. Am I doing it wrong? | |
ID: 53480 | Rating: 0 | rate: / Reply Quote | |
Trying to get some tasks and so far no luck. Am I doing it wrong? Do you have acemd3 application selected in your project preferences? | |
ID: 53481 | Rating: 0 | rate: / Reply Quote | |
Out of work already! LOL | |
ID: 53482 | Rating: 0 | rate: / Reply Quote | |
Is the policy still to reduce credits on work not uploaded within 24hrs of issue? | |
ID: 53486 | Rating: 0 | rate: / Reply Quote | |
Out of work already! LOL + 1 | |
ID: 53488 | Rating: 0 | rate: / Reply Quote | |
Out of work already! LOLI think it was just the warm-up. Every batch of Toni queued yesterday consisted only a single step, it's no wonder that they didn't last longer. | |
ID: 53489 | Rating: 0 | rate: / Reply Quote | |
Is the policy still to reduce credits on work not uploaded within 24hrs of issue?Yes. But it's actually a +50% bonus for less than 24h, or +25% for less than 48h. | |
ID: 53490 | Rating: 0 | rate: / Reply Quote | |
Thank you. Great job on the new WU incidentally. | |
ID: 53491 | Rating: 0 | rate: / Reply Quote | |
I believe the limit is 16 per host. That is what I got on my 3 hosts. After that I received the "you have reached the limit of tasks in progress message" | |
ID: 53492 | Rating: 0 | rate: / Reply Quote | |
Thank you. Perhaps I'll see that once WU become freely available :-) | |
ID: 53493 | Rating: 0 | rate: / Reply Quote | |
As they are much shorter, could I ask that you please allow download of more than 2 WU per GPU I believe the limit is 16 per host. That is what I got on my 3 hosts. After that I received the "you have reached the limit of tasks in progress message" I guess what was meant in the first above cited posting was to increase the limit of tasks per GPU that can be downloaded at a time. So far, this figure was (and still seems to be) 2. When talking about 16 tasks per host (in the second of the above postings), I guess this was the total number of tasks that were downloaded NOT at a time, but within a certain time frame yesterday, provided a given GPU was fast enough. My various hosts got only up to about 10 tasks each, and that was it. No more downloads since late night. | |
ID: 53495 | Rating: 0 | rate: / Reply Quote | |
An important issue I've noted after crunching these GPUGrid units in my Ubuntu 16.04 hosts, not in 18.04 ones, is that the rest of BOINC GPU projects (and folding#home) fail with error when trying to crunch. I tested with Amicable, Einstein and FAH. | |
ID: 53496 | Rating: 0 | rate: / Reply Quote | |
I added GPUGrid to the projects list on one of my machines two years ago, I've never received a work unit. Other GPU projects are not having any trouble. Removed now. | |
ID: 53498 | Rating: 0 | rate: / Reply Quote | |
I added GPUGrid to the projects list on one of my machines two years ago, I've never received a work unit. Other GPU projects are not having any trouble. Removed now. I checked your computer and it appears it has an AMD GPU which is not supported. Only Nvidia cards are supported. Here's the FAQ for the new app: http://www.gpugrid.net/forum_thread.php?id=5002#52865 | |
ID: 53499 | Rating: 0 | rate: / Reply Quote | |
Some years ago there was a AMD application and is still possible to check the box for AMD wu's in the GPUGRID preferences. | |
ID: 53500 | Rating: 0 | rate: / Reply Quote | |
I believe the limit is 16 per host. That is what I got on my 3 hosts. After that I received the "you have reached the limit of tasks in progress message" The limit is 2 per GPU. I see your computers are set up to run Seti, where it is common to "spoof" the server into "thinking" you have 32 coprocessors/GPUs per rig. | |
ID: 53501 | Rating: 0 | rate: / Reply Quote | |
It is a little ironic that a project specially for GPU's supports less GPU's than other projects. Einstein, Milky Way, Seti, etc. no problem. | |
ID: 53502 | Rating: 0 | rate: / Reply Quote | |
Any ideas when new workunits will be release? | |
ID: 53503 | Rating: 0 | rate: / Reply Quote | |
I see your computers are set up to run Seti, where it is common to "spoof" the server into "thinking" you have 32 coprocessors/GPUs per rig. Tell me more ! Seti has the problem of not beeing always available and not having always wu's available, but the allowed runtime is quite long. So it makes sense to have a larger buffer, but this should only affect the Seti wu's. | |
ID: 53504 | Rating: 0 | rate: / Reply Quote | |
I believe the limit is 16 per host. That is what I got on my 3 hosts. After that I received the "you have reached the limit of tasks in progress message" I didn't think that was the issue. I never received more than two tasks per gpu on the previous run of work units. It depends on the project whether they recognize the spoofed gpus. Seti does and why I use it to keep the gpus fed during the ever longer Seti outages. It may be that this run of work did recognize the spoofed gpus. But the math doesn't add up for the 4 hosts. Each host got 16 WU's. I have three 3 card hosts and one 4 card host. One 3 card host got nothing because it primarily is an Einstein machine and I got nothing but gpu cache is full for a GPUGrid request. Except for the Einstein host, all the other hosts are spoofed with either 21 or 32 gpus. By your math I should have only received 8 tasks on the 4 card host or 64 tasks. I did neither. It appears to have been fixed at 16 for each host. As I returned work, I kept getting my cache refilled to a 16 count for each host. I figured that was more likely from my global cache setting. | |
ID: 53505 | Rating: 0 | rate: / Reply Quote | |
I see your computers are set up to run Seti, where it is common to "spoof" the server into "thinking" you have 32 coprocessors/GPUs per rig. The coproc_info.xml file that is created by the client controls the number of gpus detected. Manipulate that file and you can tell BOINC that you have as many as 64 gpus. But you can't exceed 64 as that is a hard limit in the server side code. | |
ID: 53506 | Rating: 0 | rate: / Reply Quote | |
The coproc_info.xml file that is created by the client controls the number of gpus detected. Got it. THX ! | |
ID: 53509 | Rating: 0 | rate: / Reply Quote | |
This was the first piece of a larger batch of 14k WUs. It's (amazingly!) already complete. I'll need to process it to create new WUs. The purpose of the work is (broadly speaking) methods development, i.e. build a dataset to improve the foundation of future MD-based research (not just GPUGRID). More details may come if it works ;) | |
ID: 53513 | Rating: 0 | rate: / Reply Quote | |
For a serial process like this the optimum would be to only send one WU per GPU. | |
ID: 53515 | Rating: 0 | rate: / Reply Quote | |
For a serial process like this the optimum would be to only send one WU per GPU. not really; because what would happen then is that there always is some idle time between uploading/reporting the result of a task and downloading the next one. Which means the GPU cools off for a (short) while and heats up once the new task starts being cruched. If this happens several time per day, over a lenghty period of time, this so-called "thermal cycle" definitely shortens the lifetime of the GPU. Hence, it's definitely better to have another task already waiting to start immediately after the previous one gets finished. | |
ID: 53516 | Rating: 0 | rate: / Reply Quote | |
not really; because what would happen then is that there always is some idle time between uploading/reporting the result of a task and downloading the next one. +1 | |
ID: 53517 | Rating: 0 | rate: / Reply Quote | |
...the GPU cools off for a (short) while and heats up once the new task starts being cruched (sic).The degradation process for electronics is called electromigration. Flowing current while hot actually moves atoms. Where the conductors neck down, e.g. turning a sharp corner or going over bumps, the current density increases and hence the electromigration increases. This is an irreversible process that accelerates as the conductor chokes down and ultimately results in a broken line and failure. Since GPUGrid is supply-limited one per GPU would assure that more hosts get a WU before hosts start getting additional WUs. Now that the WUs run in less than half the time two per GPU works well but folks still get left out. The GPUGrid server is notoriously slow. If it were fast and they had over 10,000 WUs continuously available then one per GPU would be optimum. | |
ID: 53519 | Rating: 0 | rate: / Reply Quote | |
[/quote] Please don't "fake" gpus as it will create WU "hoarding": it will deprive other users of work, and slow down our analysis (we sometimes have to wait for batches to be complete). | |
ID: 53520 | Rating: 0 | rate: / Reply Quote | |
Fortunately simple manipulation doesn't work, as this file is overwitten by the BOINC manager at startup.Manipulate that file and you can tell BOINC that you have as many as 64 gpus. But you can't exceed 64 as that is a hard limit in the server side code.Please don't "fake" gpus as it will create WU "hoarding": it will deprive other users of work, and slow down our analysis (we sometimes have to wait for batches to be complete). | |
ID: 53521 | Rating: 0 | rate: / Reply Quote | |
Fortunately simple manipulation doesn't work, as this file is overwitten by the BOINC manager at startup.Manipulate that file and you can tell BOINC that you have as many as 64 gpus. But you can't exceed 64 as that is a hard limit in the server side code.Please don't "fake" gpus as it will create WU "hoarding": it will deprive other users of work, and slow down our analysis (we sometimes have to wait for batches to be complete). You can prevent the coproc file from been overwritten by BOINC. | |
ID: 53522 | Rating: 0 | rate: / Reply Quote | |
Fortunately simple manipulation doesn't work, as this file is overwitten by the BOINC manager at startup.Manipulate that file and you can tell BOINC that you have as many as 64 gpus. But you can't exceed 64 as that is a hard limit in the server side code.Please don't "fake" gpus as it will create WU "hoarding": it will deprive other users of work, and slow down our analysis (we sometimes have to wait for batches to be complete). Which may explain tasks failing with # Engine failed: Illegal value for DeviceIndex: 2 i.e. they attempt to run on non-existent gpus. | |
ID: 53525 | Rating: 0 | rate: / Reply Quote | |
Both of these present in a GPU (or any modern electronics made of chips)....the GPU cools off for a (short) while and heats up once the new task starts being cruched (sic).The degradation process for electronics is called electromigration. Flowing current while hot actually moves atoms. Where the conductors neck down, e.g. turning a sharp corner or going over bumps, the current density increases and hence the electromigration increases. This is an irreversible process that accelerates as the conductor chokes down and ultimately results in a broken line and failure. The thermal cycle (a single period of thermal expansion and contraction) hurts the contact points (the ball grid soldering) between the chip's PCB and the card's PCB. It's most prominent for the GPU chip and the RAM chips on a GPU card. It's effect can be lessen by better cooling, lower power dissipation (=lower clock speeds and lower voltages), but most importantly stable working temperatures (of the chip itself). No idling -> the chip stays hot -> no thermal contraction -> no thermal cycle. Electromigration can be lessen by lower currents which is the result of lower voltages and lower frequency. It can be prevented by not using the chip at all, but we're here for using our chips all the time as fast as possible, so we can't or won't do anything to lessen electromigration. Intel had a problem with that a couple of years ago (IIRC the SATA3 controller of the 6th south bridge chip could fail to go that fast before their planned lifetime). Electromigration is one of the practical reasons for the limit of the minimum size for a transistor inside the chip. The present size of these basic elements are very close to their practical minimum, so it's getting harder to shrink their size (= to make the fabrication process profitable). The other limit of the minimum size is theoretical, as (according to quantum mechanic) a bunch of silicone (+ doping) atoms simply won't work as a transistor. Since GPUGrid is supply-limited one per GPU would assure that more hosts get a WU before hosts start getting additional WUs. Now that the WUs run in less than half the time two per GPU works well but folks still get left out.The number of workunits per GPU depends on the ratio of the supply and the active hosts. One per GPU would be favorable for the present ratio, but when there's a lot of work queued then the 2 per GPU seems too low. 2 per GPU is a compromise, as the download / upload time could be significant (for example to upload the 138MB result file). The GPUGrid server is notoriously slow. If it were fast and they had over 10,000 WUs continuously available then one per GPU would be optimum.It's not just the speed. There's some DDOS prevention algorithm in operation, because my hosts gets blocked if they try to contact the server one by one in rapid succession (from the same public IP address). | |
ID: 53526 | Rating: 0 | rate: / Reply Quote | |
It's a sign of that. So luckily it's not enough to prevent the BOINC manager to overwrite this file.Which may explain tasks failing withYou can prevent the coproc file from been overwritten by BOINC.Fortunately simple manipulation doesn't work, as this file is overwitten by the BOINC manager at startup.Manipulate that file and you can tell BOINC that you have as many as 64 gpus. But you can't exceed 64 as that is a hard limit in the server side code.Please don't "fake" gpus as it will create WU "hoarding": it will deprive other users of work, and slow down our analysis (we sometimes have to wait for batches to be complete). This is very counterproductive to use this method (for example to prevent running dry during a shortage / outage or a combined event). The users of this method don't care about their fellow (unaware) crunchers, as this method is directly aimed at them (not just the "precious" tasks on the server). | |
ID: 53527 | Rating: 0 | rate: / Reply Quote | |
1. Ban people who rig the system. | |
ID: 53529 | Rating: 0 | rate: / Reply Quote | |
1. Banning is a bit extreme. I think just asking people not to do it should be enough. | |
ID: 53530 | Rating: 0 | rate: / Reply Quote | |
I agree it would be preferable to keep the chips warm and busy by having 1 extra task available so that there is little lag between switching so that voltages and temps don't fluctuate significantly over an extended period of time. that's exactly what I said - so the 1 extra task should continue being provided, in any case. (BTW, while there were no GPU tasks available during the past few days, I switched to Einstein - and these tasks showed a strange behaviour [as opposed to about a year ago]: for the first 80-100 seconds and the last 50-60 seconds of a task, only the CPU was crunching, NOT the GPU. Figuring that the tasks' lengh was about 12-14 minutes, the GPU was suffering a thermal cycle about 5 x per hour). | |
ID: 53531 | Rating: 0 | rate: / Reply Quote | |
2. The spoofed client wasn't meant for GPUGrid. It was developed for Seti. It has no effect on Einstein@home and is surprising that it adversely affects the GPUGrid project. I beg to disagree. I run the spoofed client, and I have it set to 'declare' 16 GPUs. At Einstein, it always fetches 16 tasks, even with a cache setting of 0.01 days + 0.01 days: BOINC automatically requests work to fill all apparently 'idle' devices. The spoofing system works alongside the use of <max_concurrent> for the project, to ensure that tasks are never allocated to a GPU beyond the actual count of physical GPUs present - two in my case. Managed correctly, it should never permit BOINC to assign a task to an imaginary GPU - though I'm not sure how it would react if the configuration implied a limit of two Einstein tasks and two GPUGrid tasks. Best to think that one through very carefully. I can see that allowing my machine to request 16 tasks from GPUGrid would be detrimental to this project's desire to have the fastest possible turnround. | |
ID: 53532 | Rating: 0 | rate: / Reply Quote | |
(BTW, while there were no GPU tasks available during the past few days, I switched to Einstein - and these tasks showed a strange behaviour [as opposed to about a year ago]: for the first 80-100 seconds and the last 50-60 seconds of a task, only the CPU was crunching, NOT the GPU. Figuring that the tasks' lengh was about 12-14 minutes, the GPU was suffering a thermal cycle about 5 x per hour). The gravity wave work on Einstein involves a CPU preparation phase before the GPU gets involved. I have seen that on other projects as well. But if you are concerned about thermal cycles, what about the gamers? They would have destroyed their cards long before you. It is not a problem. But if too many tricks are used to fix it, they will generate other problems. | |
ID: 53534 | Rating: 0 | rate: / Reply Quote | |
... But if you are concerned about thermal cycles, what about the gamers? They would have destroyed their cards long before you. that's what I have been thinking already. On the other hand, games are not running 24/7. Back to the current tasks: they were all used up during last night, so again no ones available for download :-( | |
ID: 53537 | Rating: 0 | rate: / Reply Quote | |
(BTW, while there were no GPU tasks available during the past few days, I switched to Einstein - and these tasks showed a strange behaviour [as opposed to about a year ago]: for the first 80-100 seconds and the last 50-60 seconds of a task, only the CPU was crunching, NOT the GPU. Figuring that the tasks' lengh was about 12-14 minutes, the GPU was suffering a thermal cycle about 5 x per hour). Einstein allows you a setup to run multiple wu's per gpu. Results in average gpu-usage of > 98% | |
ID: 53538 | Rating: 0 | rate: / Reply Quote | |
Toni wrote Please don't "fake" gpus as it will create WU "hoarding": it will deprive other users of work, and slow down our analysis (we sometimes have to wait for batches to be complete). a short look at the users list easily reveals some of the "faked" GPUs - their hosts show 48 GPUs per host(!) So no wonder that they download dozens of tasks at a time and are still processing these tasks long time after other users are through with the only 2 tasks their hosts could download. This procedure is highly unfair, and GPUGRID should quickly develop steps against it. | |
ID: 53539 | Rating: 0 | rate: / Reply Quote | |
It's the motherboard, it's always the motherboard. The MB is the most unreliable part of a computer. I have a stack of dead ones. I wish there was a MB designed specifically for distributed computing with no baby blinky lights and other excessive features etc. | |
ID: 53540 | Rating: 0 | rate: / Reply Quote | |
It's not just the speed. There's some DDOS prevention algorithm in operation, because my hosts gets blocked if they try to contact the server one by one in rapid succession (from the same public IP address).What can we do to mitigate this effect??? OAS: Many projects are adding a Max # WUs option in Preferences. Maybe add it with the choice of 1 or 2. OAS: Bunkering for serial projects should be banned one way or another. These "races" and "sprints" have some folks requesting as many WUs per host as they can get but they don't get submitted to the work server until after the race start time, i.e. bunkering. I triggered something a few days ago on GPUGrid that I've never seen before on a BOINC project. It was a fluke combination of things that had me upgrade my drivers but delayed a reboot. It wouldn't have bothered anything else but an unbeknownst slug of GPUGrid WUs had appeared. All those WUs had computation errors. Then both computers got banned with a Project Request. I thought it would be a 24-hour timeout I'd seen folks mention before but it persisted for days. After a few days I tried a manual Project Update and it started working again. Can this Project Requested Ban be applied to bunkerers??? | |
ID: 53541 | Rating: 0 | rate: / Reply Quote | |
Yes, Keith and by now I got 150 tasks, yesterday that is, but none this morning, so far. | |
ID: 53543 | Rating: 0 | rate: / Reply Quote | |
Yes, Keith and by now I got 150 tasks, yesterday that is, but none this morning, so far. Good for you Miklos. And I see you have made the project happy by returning all within 24 hours. Looks like Toni's comment about plenty of work forthcoming is true. | |
ID: 53544 | Rating: 0 | rate: / Reply Quote | |
Toni wroteThat's easy: limit the number of simultaneous tasks per host to 16.Please don't "fake" gpus as it will create WU "hoarding": it will deprive other users of work, and slow down our analysis (we sometimes have to wait for batches to be complete). | |
ID: 53545 | Rating: 0 | rate: / Reply Quote | |
There's no easy way to fix this in our end.It's not just the speed. There's some DDOS prevention algorithm in operation, because my hosts gets blocked if they try to contact the server one by one in rapid succession (from the same public IP address).What can we do to mitigate this effect??? OAS: Bunkering for serial projects should be banned one way or another. These "races" and "sprints" have some folks requesting as many WUs per host as they can get but they don't get submitted to the work server until after the race start time, i.e. bunkering.Agreed. I triggered something a few days ago on GPUGrid that I've never seen before on a BOINC project. It was a fluke combination of things that had me upgrade my drivers but delayed a reboot. It wouldn't have bothered anything else but an unbeknownst slug of GPUGrid WUs had appeared. All those WUs had computation errors.That's most probably because of the delayed reboot. Then both computers got banned with a Project Request.This "banning" is done by simply reducing the max task per day to 1, while the tasks done on that day is above 1, so the project won't send more work for that host on that day when the host asks for it. The next day the task done on that day starts from 0, so the project will send work to your host when it asks for it the next time. I thought it would be a 24-hour timeout I'd seen folks mention before but it persisted for days.That's because your BOINC manager entered an extended back-off of the GPUGrid project (because the project didn't send work to your host for several task requests). Perhaps other projects kept your host busy. After a few days I tried a manual Project Update and it started working again.That made the BOINC manager to ask GPUGrid for work, and because this request was successful, it ended the extended back-off. Can this Project Requested Ban be applied to bunkerers???No. (Probably you can see this by the order of the events by now.) | |
ID: 53546 | Rating: 0 | rate: / Reply Quote | |
I was thinking about the range of "Store at least X days of work" and Resource Share values to avoid setting off the DDoS alarm.There's no easy way to fix this in our end.It's not just the speed. There's some DDOS prevention algorithm in operation, because my hosts gets blocked if they try to contact the server one by one in rapid succession (from the same public IP address).What can we do to mitigate this effect??? I triggered something a few days ago on GPUGrid that I've never seen before on a BOINC project. It was a fluke combination of things that had me upgrade my drivers but delayed a reboot. It wouldn't have bothered anything else but an unbeknownst slug of GPUGrid WUs had appeared. All those WUs had computation errors. That's most probably because of the delayed reboot.The reboot delay was only 30 minutes or so. I was working on a non-BOINC project was not aware GG WUs had arrived so when they started to error out they went fast as GG server would send them. How was not the point, it was a fluke resulting from the feast or famine nature of GG. | |
ID: 53547 | Rating: 0 | rate: / Reply Quote | |
That's easy: limit the number of simultaneous tasks per host to 16. Which goes back to my original post in this thread. I think that is what they have done since the beginning of the new work generation. I keep bumping up against that 16 task per host number. I turn tasks in and I get more, up to the the 16 count. And the next scheduler connection 31 seconds later after refilling gets me: Pipsqueek 70548 GPUGRID 1/29/2020 9:28:16 AM This computer has reached a limit on tasks in progress As long a host turns in valid work and in a timely manner, I don't think any kind of new restriction is needed. The faster hosts get more work done for the project which should keep the scientists happy with the progress of their research. | |
ID: 53548 | Rating: 0 | rate: / Reply Quote | |
To come back on topic, there is a batch ("MDADpr1") of ~50k workunits being created. I hope it's correct. | |
ID: 53549 | Rating: 0 | rate: / Reply Quote | |
To come back on topic, there is a batch ("MDADpr1") of ~50k workunits being created. I hope it's correct. I got 1a0aA00_320_1-TONI_MDADpr1-0-5-RND6201 over an hour ago, but no sign of any of the others. | |
ID: 53550 | Rating: 0 | rate: / Reply Quote | |
Actually they were only 500. Better this way - they came out too large. Feel free to abort them. | |
ID: 53551 | Rating: 0 | rate: / Reply Quote | |
Actually they were only 500. Better this way - they came out too large. Feel free to abort them. Had 6 of them, about 4500s-4900s into them when the server cancelled them..... Now you have me curious as to how long they would have run.... ____________ | |
ID: 53552 | Rating: 0 | rate: / Reply Quote | |
about an hour ago, I had two tasks (on two different hosts) that were "aborted by project" after about 5.900 seconds: To come back on topic, there is a batch ("MDADpr1") of ~50k workunits being created. I hope it's correct.were aborted by server, right after start. What's wrong with them? | |
ID: 53553 | Rating: 0 | rate: / Reply Quote | |
Ok, I did not know the server would cancel running WUs. Good to know. They would have run around 6h-ish, but I was not sure they wouldn't fail at the end due to large uploads. | |
ID: 53554 | Rating: 0 | rate: / Reply Quote | |
Ok, I did not know the server would cancel running WUs. Good to know. They would have run around 6h-ish, but I was not sure they wouldn't fail at the end due to large uploads. Okay, glad to see that I have the good ones! ____________ | |
ID: 53555 | Rating: 0 | rate: / Reply Quote | |
As long a host turns in valid work and in a timely manner, I don't think any kind of new restriction is needed. The faster hosts get more work done for the project which should keep the scientists happy with the progress of their research.GPUGrid differs from SETI@home in the way the progress of the research actually made by our computers, as for GPUGrid our hosts actually make the data to be analysed by the scientists, while SETI@home use pre-recorded data split into many small chunks to be processed by the hosts. At SETI@home the individual pieces can be processed independently, but at GPUGrid fresh workunits are generated from the result of the previous run. If your host grabs 64 workunits, but actually process only 1, then your host hinder the progress of the other 63 "chain of workunits". The more you grab the more delay you put into the progress of the ongoing MD simulation batches. | |
ID: 53558 | Rating: 0 | rate: / Reply Quote | |
Ok, I did not know the server would cancel running WUs. Good to know. They would have run around 6h-ish, but I was not sure they wouldn't fail at the end due to large uploads. The MDADpr2 batch ain't small in their own right. 188MB upload only at 60% so far after an hour. [Edit] Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time. https://www.gpugrid.net/workunit.php?wuid=16977060 More in line with the previous batch of work. | |
ID: 53559 | Rating: 0 | rate: / Reply Quote | |
[Edit] Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time. hm, for the first time that I read someone complaining about too high credit :-) | |
ID: 53560 | Rating: 0 | rate: / Reply Quote | |
[Edit] Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time. My comment was simply an observation. The discussion about credit awarded among projects needs to be in another thread. That has been hashed to death before many times over. Search on CreditScrew or CreditNew. Oh where is Jeff Cobb? | |
ID: 53561 | Rating: 0 | rate: / Reply Quote | |
It's not just the speed. There's some DDOS prevention algorithm in operation, because my hosts gets blocked if they try to contact the server one by one in rapid succession (from the same public IP address).What can we do to mitigate this effect??? PrimeGrid has found a way to reduce bunkering - in the races, count only tasks that were both downloaded and returned during the period scheduled for the race. | |
ID: 53563 | Rating: 0 | rate: / Reply Quote | |
Keith Myers wrote: Also see Toni made good on the credit re-adjustment. Now only getting a quarter of what was awarded prior for 4 times the length of processing time. however, even now there are some unexplainable differences, e.g. between the following two tasks which ran on the same GPU (GTX980Ti) in the same PC: http://www.gpugrid.net/result.php?resultid=21645452 runtime: 39.444 secs - 202.525 credit points http://www.gpugrid.net/result.php?resultid=21645453 runtime: 39.899 secs - 168,771 credit points any idea how come? Edit: only now I realized what happened: the second above cited task missed the 24-hours limit by 1 minute 17 seconds. Hence the difference of credit by 20 % :-( | |
ID: 53569 | Rating: 0 | rate: / Reply Quote | |
Also Toni explained over in the QC Chemistry forum that tasks run for different lengths of times depending how many atoms are in the model. | |
ID: 53570 | Rating: 0 | rate: / Reply Quote | |
I crunch competitively on up to 20 nVidia Turing cards and believe that every WU I do is returned within 24 hours. | |
ID: 53572 | Rating: 0 | rate: / Reply Quote | |
It is a little ironic that a project specially for GPU's supports less GPU's than other projects. Einstein, Milky Way, Seti, etc. no problem. If i'm not wrong the problem is that they have not an "hard" gpu developer. Today is not impossible to convert Cuda code to OpenCl, but it seems that they are not able to do this. | |
ID: 53573 | Rating: 0 | rate: / Reply Quote | |
I crunch competitively on up to 20 nVidia Turing cards and believe that every WU I do is returned within 24 hours.I've got a better idea, avoid primegrid. | |
ID: 53574 | Rating: 0 | rate: / Reply Quote | |
Today is not impossible to convert Cuda code to OpenCl, but it seems that they are not able to do this. There is no reason to. They have more than enough volunteers with Nvidia cards, and it is simpler to support one set rather than two. In fact, even if you went to OpenCL for both, I think it is harder to support both manufacturers from the problems I have seen. Supporting both is more for political-correctness reasons rather than need. | |
ID: 53575 | Rating: 0 | rate: / Reply Quote | |
[quote]... They have more than enough volunteers with Nvidia cards and very often they don't have enough work for them. Hence, to bring, in addition, a second group of crunchers on bord would only enlarge the problem of "no tasks available" ... | |
ID: 53576 | Rating: 0 | rate: / Reply Quote | |
It is a little ironic that a project specially for GPU's supports less GPU's than other projects. Einstein, Milky Way, Seti, etc. no problem. I've seen a program called swan that is supposed to be able to do this automatically. No idea if an up-to-date version is available. I'd expect whether GPUGRID actually does this to depend on how fast the resulting OpenCL code runs. If it is much slower than the CUDA code, why would they want to release it? Note - I found a version of swan, with a note saying that it is no longer maintained and is therefore deprecated. If you're good enough in both CUDA and OpenCL, why don't you take over maintenance of this program, and see if you can make it produce an OpenCL version of the GPUGRID code that runs fast enough to be worth releasing? https://github.com/Acellera/swan | |
ID: 53577 | Rating: 0 | rate: / Reply Quote | |
As was correctly said above, it's not a technical problem, but a matter of putting effort where it is more critical, i.e. the scientific part (experiment preparation and analysis). | |
ID: 53578 | Rating: 0 | rate: / Reply Quote | |
I crunch competitively on up to 20 nVidia Turing cards and believe that every WU I do is returned within 24 hours. Cheers! Congratulation for your personal success so you are able to buy so many GPUs and maintain them crunching for all the years to come! You have already solved the 'bunkering' problem but if you want to improve the supply of WU to us volunteers it is very very simple. Just follow Primegrid's lead and remove GPUgrid from the projects white-listed by GridCoin. As I understand for the project team it is better to get the results sooner than later, so they are able to analyze and investigate them and issue new WUs if needed, rather than to wait for a few happy crunchers to crunch them for a long time (as might be the case with primegrid – just as you mentioned them), so they have an interest to have the biggest pool of Nvidia GPUs as possible at their disposal! I never read, BOINC guaranties an un-interrupted work supply, so the volunteers will have always work to crunch. Keep it to unpaid volunteers PAID?! Where is this paid “volunteer”? Just as an example, I spend about USD 300.00 on electric bills per month just for BOINC, beside all the hardware I buy for BOINC - I would not buy, if I would not be an addict. I earn about USD 9.00 equivalent of Gridcoins per month, so I would rather see it as a very small subsidy at best, or just another dope to keep me crunching BOINC! | |
ID: 53579 | Rating: 0 | rate: / Reply Quote | |
Thank you for the explanation. $9 is indeed a paltry amount | |
ID: 53580 | Rating: 0 | rate: / Reply Quote | |
Got 3 today, for 4 computers, could use many more. It started great a few days back, but now getting too few. | |
ID: 53584 | Rating: 0 | rate: / Reply Quote | |
Thank you Toni, just got one more. | |
ID: 53585 | Rating: 0 | rate: / Reply Quote | |
The current situation at GPUGRID is definitely better than the situation at the Predictor@Home project for several months before it shut down. Their development team had split up. One part kept the server, the right to use the Predictor @Home name, and so on. The part that left took away the knowledge of how to create useful new workunits. The remainder of the team could only increase the number of failures each workunit could have every time a previous task for that workunit failed, so for several months. For several months, this meant that very few tasks were available, and all of them failed. | |
ID: 53586 | Rating: 0 | rate: / Reply Quote | |
The message is: | |
ID: 53587 | Rating: 0 | rate: / Reply Quote | |
Anyone having issues with the GPU work units crashing their Geforce RTX 2080 Ti's? | |
ID: 53588 | Rating: 0 | rate: / Reply Quote | |
Anyone having issues with the GPU work units crashing their Geforce RTX 2080 Ti's?They are working fine on my hosts. Perhaps your RTX 2080Ti is overclocked (too much). What PSU do you use? Does it have two independent 8-pin PCI-E power connectors? Are those connected to your RTX 2080Ti? | |
ID: 53589 | Rating: 0 | rate: / Reply Quote | |
Jacosito, in the BOINC Manager look at Options/Computing Preferences/Disk & Memory tab. There are 3 check boxes. I uncheck the first two and only check the third. Mine says "Use no more than 80% of total." Make sure you give BOINC permission to use enough storage. | |
ID: 53590 | Rating: 0 | rate: / Reply Quote | |
Then suddenly system hangs, with all the fans (CPU, GPU, Chassis, etc) all off and the motherboard unresponsive to the reset buttons and the power button. The LED's on the Chipset and Motherboard remain lit.This describes behavior I see occasionally with my 1080 Ti's but I don't recall it happening on my 2080 Ti's. I don't know why it happens, I just reboot and it goes away. I never overclock and it's not specific to GG. | |
ID: 53591 | Rating: 0 | rate: / Reply Quote | |
Anyone having issues with the GPU work units crashing their Geforce RTX 2080 Ti's? If you have to switch of Power Supply AC Side, the Power Supply is blocked by Overcurrent or unstable DC-Voltage. Switching off resets the 'electronic' fuse. There can be different reasons, overcurrent for Power supply itself, overcurrent detected by mainboard, unstable AC Input Voltage. Perhaps the RTX2080ti got power load Peaks. The magazine ct has measured for a RTX2080 peaks of 380W without overclocking depending on Card model. | |
ID: 53592 | Rating: 0 | rate: / Reply Quote | |
...Then suddenly system hangs, with all the fans (CPU, GPU, Chassis, etc) all off and the motherboard unresponsive to the reset buttons and the power button. The LED's on the Chipset and Motherboard remain lit... I had a similar problem last year. I started seeing invalid work across multiple projects, gradually increasing for awhile until one day almost everything was failing. I found the power cables to the GPU had some burnt pins. Replacing that fixed it for awhile, then I started having problems exactly like you describe. This time I found burnt pins in the PSU. I replaced the PSU and eventually had to RMA the GPU, I think the PSU problems broke something. Fortunately it was repaired under warranty and works great now. If your PSU power cables and connections to the GPU are ok then I would suspect and test for a failing GPU. Trying another PSU is also a good idea if you have the option. This assumes you haven't done anything to change the GPU behavior, like overclock it or install new monitoring software. I once had major problems with a certain manufacturer's GPU utility, now I stick to Afterburner or Nvidia Inspector. If you've changed something like this, revert back. ____________ Team USA forum | Team USA page Join us and #crunchforcures. We are now also folding:join team ID 236370! | |
ID: 53593 | Rating: 0 | rate: / Reply Quote | |
Yes (2) 8 pin supplies to my RTX 2080Ti. | |
ID: 53614 | Rating: 0 | rate: / Reply Quote | |
Thanks. | |
ID: 53615 | Rating: 0 | rate: / Reply Quote | |
At least your receiving work units. Last two weeks I have not received any work units. All equipment running good. Work units average five hours.Please send some work. | |
ID: 53630 | Rating: 0 | rate: / Reply Quote | |
At least your receiving work units. Last two weeks I have not received any work units. All equipment running good. Work units average five hours.Please send some work. Update your graphics drivers. You are using versions known to have problems with some OpenCL work. | |
ID: 53632 | Rating: 0 | rate: / Reply Quote | |
... some OpenCL work. GPUGrid writes its apps in CUDA. | |
ID: 53633 | Rating: 0 | rate: / Reply Quote | |
Supporting both is more for political-correctness reasons rather than need. You're right. There is no work for cuda, let alone for opencl | |
ID: 53635 | Rating: 0 | rate: / Reply Quote | |
The powersupply is a Corsair CX750M. I have the same power supply Corsair CX750M. And I have the same problem you describe: Suddenly this month it seems like something is making my system overheat if I enable the GPU tasks. My system is AMD 1700x and a GTX1070. I tried to resolve the problem by lowering clocks on the CPU since the beginning. What seems to help, is lowering the frequency of the GPU by 120 MHz and increase the fan speed to 97% on this particular GPU. But still the computer freezes frequently. Lately I was wondering if it might be the PSU as well, as I had a "bluescreen" problem on another computer, I solved with a certified, higher Watt PSU a few years ago. So it seems to me, that this might be a bad PSU design for 24/7 crunching. | |
ID: 53641 | Rating: 0 | rate: / Reply Quote | |
The same, not WU. | |
ID: 53642 | Rating: 0 | rate: / Reply Quote | |
Can you send me your app_config.xml? | |
ID: 53643 | Rating: 0 | rate: / Reply Quote | |
I'm not getting any work units. Why? | |
ID: 53651 | Rating: 0 | rate: / Reply Quote | |
I get "no tasks available" over and over | |
ID: 53652 | Rating: 0 | rate: / Reply Quote | |
I get "no tasks available" over and over Why don't you click on Donate and send them enough money that they can hire another person to create workunits? | |
ID: 53653 | Rating: 0 | rate: / Reply Quote | |
Why don't you click on Donate and send them enough money that they can hire another person to create workunits? I decided to catch your suggest on the fly. Always claiming for new WUs/features, I also think that it may be fair to collaborate with some counterpart, beyond our computing power. But unfortunately, it seems that Donation form is currently unavailable... | |
ID: 53654 | Rating: 0 | rate: / Reply Quote | |
I should be able to make WUs this week. | |
ID: 53656 | Rating: 0 | rate: / Reply Quote | |
I should be able to make WUs this week. Yippee!!! All 50,000? | |
ID: 53657 | Rating: 0 | rate: / Reply Quote | |
I should be able to make WUs this week. thanks, Toni, for the information :-) | |
ID: 53658 | Rating: 0 | rate: / Reply Quote | |
Wait and see !!! | |
ID: 53659 | Rating: 0 | rate: / Reply Quote | |
Wait and see !!! I enjoy competing as much as anyone but competition and stats are the last thing projects should be worried about. They should focus on doing good and useful science first, and issue work according to their needs in whatever manner best meets their goals. There is zero reason to limit work here as you describe, unless the project sees a need for it. Some getting more work than others and some sort of potential stat distortion isn't compelling enough to make such drastic changes. Let's remember that we serve the project, not the other way around. If they're getting work done in a timely manner and getting the results they want, that's really what counts. ____________ Team USA forum | Team USA page Join us and #crunchforcures. We are now also folding:join team ID 236370! | |
ID: 53660 | Rating: 0 | rate: / Reply Quote | |
It will say the first takes all, and so can falsify the world competition.This is not a competition. This is cooperation. | |
ID: 53661 | Rating: 0 | rate: / Reply Quote | |
It will say the first takes all, and so can falsify the world competition.This is not a competition. This is cooperation. 👍👍 May I also add that being here volunteering time on a GPU that takes more than 80 hours to complete and upload a WU is actually delaying the project, IMHO. Do you agree, Retvari? I was finished with all mine a week ago. | |
ID: 53662 | Rating: 0 | rate: / Reply Quote | |
A user with hundred GPU, receive hundred of of WU, user with one or two GPU, receive ... nothing !!! Sorry, but this is silly. When there is work, a user with 100 GPUs gets at most 100 WUs, whereas a user with 1 GPU gets at most 2 WUs. When no work is available, none of them get any work. How is this not fair? You are not seriously suggesting that the scientists limit their progress rate so that users with fewer GPUs can have bigger numbers on BOINCstats etc.? MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 53671 | Rating: 0 | rate: / Reply Quote | |
Competition?!? Pffftt! I'm in it for the money. Cheques are mailed out at the end of the month, right? | |
ID: 53673 | Rating: 0 | rate: / Reply Quote | |
I create WUs when they are actually useful. It may not be appropriate for a competition setting. | |
ID: 53679 | Rating: 0 | rate: / Reply Quote | |
@Toni - you must absolutely do what you consider best for the project at all times. | |
ID: 53680 | Rating: 0 | rate: / Reply Quote | |
The powersupply is a Corsair CX750M. I have Asus B85M-E mobo with i5-4690K 4 core cpu and RTX 2080 super gpu. This system has Corsair HX1000i psu. Steady running GPUGRID with Win10, gpu load rate 96-97 percent. Psu fan hardly ever rotates, usually only during power-on stage. | |
ID: 53711 | Rating: 0 | rate: / Reply Quote | |
Watching to the header of this thread: “Large scale experiment: MDAD” | |
ID: 53712 | Rating: 0 | rate: / Reply Quote | |
Where can I find information on the relative performances of GTX 10 series graphics boards to the newer GTX 16 and RTX 20 series boards for MDAD workunits? | |
ID: 53778 | Rating: 0 | rate: / Reply Quote | |
Where can I find information on the relative performances of GTX 10 series graphics boards to the newer GTX 16 and RTX 20 series boards for MDAD workunits? These data are based on Folding@home work": https://docs.google.com/spreadsheets/d/1vcVoSVtamcoGj5sFfvKF_XlvuviWWveJIg_iZ8U2bf0/pub?output=html https://docs.google.com/spreadsheets/d/1v5gXral3BcFOoXs5n1M6l_Uo3pZpQYogn6gVlxRPnz0/edit#gid=0 | |
ID: 53780 | Rating: 0 | rate: / Reply Quote | |
The purpose of the work is (broadly speaking) methods development, i.e. build a dataset to improve the foundation of future MD-based research (not just GPUGRID). More details may come if it works ;) Still waiting with bated breath for more details... | |
ID: 53874 | Rating: 0 | rate: / Reply Quote | |
I'm receiving many tasks which are the last one of their batch: | |
ID: 53884 | Rating: 0 | rate: / Reply Quote | |
Good. I've raised the priority for a selection of the WUs because they were coming in too slow. | |
ID: 53885 | Rating: 0 | rate: / Reply Quote | |
Why does the ACEMD need so much free disk space? Message from Server: New version of ACEMD needs 3814.70MB more disk space. You currently have 0.00 MB available and it needs 3814.70 MB. | |
ID: 53889 | Rating: 0 | rate: / Reply Quote | |
You need to increase your disk limits in the Manager or in your Preferences at the website for your host. | |
ID: 53890 | Rating: 0 | rate: / Reply Quote | |
So the question on what calculations are actually being executed on the machines is not really answered yet IMO. | |
ID: 53901 | Rating: 0 | rate: / Reply Quote | |
[snip] A way to stop bunkering: Persuade all projects that have such races or sprints to require that only WUs downloaded during the race count toward the race. | |
ID: 53904 | Rating: 0 | rate: / Reply Quote | |
Vastly easier to show per-user stats on average turnaround time. Much of the code must already exist as the project already awards bonuses based on turnaround time. | |
ID: 53906 | Rating: 0 | rate: / Reply Quote | |
Zoltan wrote on March 10th: I expect the number of unsent tasks in the queue will drop significantly during the next days. well, there are still 300.349 left at this point of time (= i.e. 3 days after your posting) :-) | |
ID: 53910 | Rating: 0 | rate: / Reply Quote | |
Zoltan wrote on March 10th:The highest number of unsent workunits was over 310.000 so every 5.000 drop is 1.61%.I expect the number of unsent tasks in the queue will drop significantly during the next days. Now there's 297.472 unsent workunits which decreased to this by 5 since I started posting this. 310k to 297k workunits is 4.2% percent decrease in about 13 days. | |
ID: 53920 | Rating: 0 | rate: / Reply Quote | |
Could you mention whether the MDAD work happens to be related to COVID-19? | |
ID: 53952 | Rating: 0 | rate: / Reply Quote | |
Could you mention whether the MDAD work happens to be related to COVID-19? The last thing that Toni mentioned about the purpose of these workunits was this. The purpose of the work is (broadly speaking) methods development, i.e. build a dataset to improve the foundation of future MD-based research (not just GPUGRID). More details may come if it works ;) | |
ID: 53953 | Rating: 0 | rate: / Reply Quote | |
Could you mention whether the MDAD work happens to be related to COVID-19? MDAD workunits are an ambitious effort to map the protein conformational space. Although the scope of the work is general, we expect that virion proteins will be among the very first test cases. | |
ID: 53962 | Rating: 0 | rate: / Reply Quote | |
Could you mention whether the MDAD work happens to be related to COVID-19? Thank you. | |
ID: 53964 | Rating: 0 | rate: / Reply Quote | |
Could you mention whether the MDAD work happens to be related to COVID-19? Do you have also planned any CPU work, or is it GPU only? Would be happy, if my system can also compute different efforts / methods on COVID. | |
ID: 54010 | Rating: 0 | rate: / Reply Quote | |
Rosetta@Home has announced that they are doing some COVID-19 work, CPU only. | |
ID: 54018 | Rating: 0 | rate: / Reply Quote | |
MDAD cosa studia questo progetto? | |
ID: 54019 | Rating: 0 | rate: / Reply Quote | |
Rosetta@Home has announced that they are doing some COVID-19 work, CPU only. They're being issued, I have 6 machines running the Rosetta Covid-19 WUs. | |
ID: 54075 | Rating: 0 | rate: / Reply Quote | |
Rosetta@Home has announced that they are doing some COVID-19 work, CPU only. They also announced that not all of the workunits related to COVID-19 have COVID-19 in their names. Some of those with foldit in their names are also related. | |
ID: 54076 | Rating: 0 | rate: / Reply Quote | |
They also announced that not all of the workunits related to COVID-19 have COVID-19 in their names. Some of those with foldit in their names are also related.Rosetta@Home has announced that they are doing some COVID-19 work, CPU only.They're being issued, I have 6 machines running the Rosetta Covid-19 WUs. Exactly. | |
ID: 54081 | Rating: 0 | rate: / Reply Quote | |
Did we do this running ACEMD3??? | |
ID: 54112 | Rating: 0 | rate: / Reply Quote | |
Toni, I could deliver more work if you'd up our ration to 3 WUs per GPU to eliminate the pregnant pauses. | |
ID: 54121 | Rating: 0 | rate: / Reply Quote | |
I'm one of the newcomers, and I'm finding so far that my GTX 960/Ryzen 7-1700 combo is happily crunching away with 14 CPU threads on Rosetta, one plus the GPU for GPUGRID, and one in reserve. | |
ID: 54143 | Rating: 0 | rate: / Reply Quote | |
The Washington Post has a video of the work on Covid-19 at the University of Torno: | |
ID: 54145 | Rating: 0 | rate: / Reply Quote | |
ciao il progetto è rivoloto al covid-19? | |
ID: 54201 | Rating: 0 | rate: / Reply Quote | |
ciao il progetto è rivoloto al covid-19? https://www.gpugrid.net/forum_thread.php?id=5089#54179 | |
ID: 54202 | Rating: 0 | rate: / Reply Quote | |
Toni, a question if I may. | |
ID: 54405 | Rating: 0 | rate: / Reply Quote | |
Toni, a question if I may.No, they don't have the MDAD in their name. They are a follow-up of a previous batch from 2019. | |
ID: 54409 | Rating: 0 | rate: / Reply Quote | |
They are a follow-up of a previous batch from 2019. Thanks much, RZ. Is it maybe a rerun of the batch without wrappers that were labeled as long runs a couple weeks ago? | |
ID: 54410 | Rating: 0 | rate: / Reply Quote | |
They are a follow-up of a previous batch from 2019. Most probably this is it. | |
ID: 54477 | Rating: 0 | rate: / Reply Quote | |
@Toni Can we know what are we crunching? tons of users asked that, we're curious. | |
ID: 55041 | Rating: 0 | rate: / Reply Quote | |
@Toni Can we know what are we crunching? tons of users asked that, we're curious. This post by Toni briefly describes the purpose of the current work units: https://gpugrid.net/forum_thread.php?id=5121&nowrap=true#54701 | |
ID: 55043 | Rating: 0 | rate: / Reply Quote | |
Thanks! | |
ID: 55044 | Rating: 0 | rate: / Reply Quote | |
@Toni Can we know what are we crunching? tons of users asked that, we're curious. Another post here from Gianni descibing the MDAD work units. http://www.gpugrid.net/forum_thread.php?id=5089&nowrap=true#54172 | |
ID: 55045 | Rating: 0 | rate: / Reply Quote | |
Message boards : News : Large scale experiment: MDAD