Message boards :
News :
New D3RBanditTest workunits
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 14 · Next
Author | Message |
---|---|
Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level ![]() Scientific publications ![]() |
The WorldCommunityGrid has started releasing some GPU tasks of OpenPandemicsbeta but I haven't been lucky to get them. Tullio |
Send message Joined: 14 Mar 20 Posts: 7 Credit: 11,283,596 RAC: 0 Level ![]() Scientific publications ![]() |
The WorldCommunityGrid has started releasing some GPU tasks of OpenPandemicsbeta but I haven't been lucky to get them. Luckily I got some of them. It looks like WCG is looking for suitable size of WU. Excitedly, they seem to be much faster than the CPU version. |
Send message Joined: 22 May 20 Posts: 110 Credit: 115,525,136 RAC: 345 Level ![]() Scientific publications ![]() |
Hey Toni! Thanks for the new supply of work first of all. I was wondering if you'd possibly think about adjusting time limit for bonus points upon a timely return of a WU slightly upwards, to allow volunteers with less powerful cards (like 1660 series cards) to not be penalized for a mere 5% over the defined limit. I reckon that the limit was put in place to ensure a timely computation of the WU, but while 24hrs might be fine for just a 2-4 hrs WU, the situation drastically shifted with average runtimes of ~10h on the fastest cards. I understand that increasing the deadline of WU is sth you don't want to do in order to ensure timeliness of WUs, but just extending the originally set 24hrs time limit to say 25/26 hrs surely wouldn't hurt this performance goal a lot. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,722,595 RAC: 4,266,994 Level ![]() Scientific publications ![]() |
I don't think you should think of it as a "penalty" you're not getting penalized, just not getting the quick return bonus since you didn't make the cutoff. Personally I don't think this needs to be changed. it's a bonus for exceptional work, not an entitlement. if you return within 2 days you still get some bonus. I think if you still want the bonus, you should invest in your systems to make them faster. and yes, I am myself subject to missing out on the bonus for one of my systems since the GTX 1660Super (device-1 in the RTX3070 host) is unable to meet the 24hr cutoff (routinely ~27hrs). ![]() |
Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,798,881,008 RAC: 311 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have quite a high ERROR rate on this host: http://www.gpugrid.net/results.php?hostid=523675 Quite a few WUs crash with: "Error invoking kernel: CUDA_ERROR_ILLEGAL_INSTRUCTION (715)" (5 units) "Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719)" (1 unit) "Particle coordinate is nan" (1 unit) "process exited with code 195 (0xc3, -61)</message>" (1 unit) Any idea what the cause might be? The other Linux computer does work flawlessly. And the other two Windows 10 computers do not produces ERRORs either. |
Send message Joined: 14 Aug 08 Posts: 18 Credit: 16,944 RAC: 0 Level ![]() Scientific publications ![]() |
I did not see a GTX960 mentioned here. BOINC thinks the new WU will take 3 hours to complete. Is that accurate? |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I did not see a GTX960 mentioned here. BOINC thinks the new WU will take 3 hours to complete. Is that accurate? Not yet. After the first one has completed, subsequent estimates will bemore realistic. With the current tasks, I'd guess something in the range 1.5 days - 2 days. I ditched my 970s last year, because I could see the writing on the wall - after a good few years of faithful service, they were no longer fit to match the current beasts. I went for 1660 (super or Ti) instead. This project tends to run different sub-projects, working with data and parameters from different researchers. And they don't reset the task estimates when they change the jobs. Your card would have been very comfortable with the previous run, but not so happy with this one. Only time will tell what the next one will be - we tend not to find out until after it's started. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,722,595 RAC: 4,266,994 Level ![]() Scientific publications ![]() |
I have quite a high ERROR rate on this host: http://www.gpugrid.net/results.php?hostid=523675 “Particle coordinate is nan” is usually too much overclocking. Or card too hot causing instability. Remove any overclock and ensure the card has good airflow for reasonable temps. The other errors might be driver related. Try to remove the old drivers with DDU from safe mode, be sure to select the option in the settings to prevent Windows from automatically installing drivers. Then go to nvidia’s website and download the latest drivers for your system, selecting custom install and clean install during the install process. That would be my next steps. ![]() |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
... download the latest drivers for your system ... I'm always a bit cautious about going for 'the latest' of anything. Driver release is usually driven by gaming first, and sometimes they break other things - like computing for science - along the way. I usually go for the final, bugfix, sub-version of the previous major release. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,722,595 RAC: 4,266,994 Level ![]() Scientific publications ![]() |
... download the latest drivers for your system ... whatever suits your fancy. The important bit is to totally wipe the old drivers, and do not allow Microsoft to auto-install their own, and do a clean install of the package provided by Nvidia. (I prefer to avoid Geforce Experience as well, but up to you I guess). ![]() |
Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,798,881,008 RAC: 311 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The other errors might be driver related. Try to remove the old drivers with DDU from safe mode, be sure to select the option in the settings to prevent Windows from automatically installing drivers. Then go to nvidia’s website and download the latest drivers for your system, selecting custom install and clean install during the install process. It is a Linux Box... I switched already back to Nouveau driver. Restarted - no image… luckily I was able to restart with GRUB (recovery mode). After that I was able to install latest Nvidia Driver again. Hope this will solve the problem! |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,722,595 RAC: 4,266,994 Level ![]() Scientific publications ![]() |
The other errors might be driver related. Try to remove the old drivers with DDU from safe mode, be sure to select the option in the settings to prevent Windows from automatically installing drivers. Then go to nvidia’s website and download the latest drivers for your system, selecting custom install and clean install during the install process. apologies, i must have read your previous post too quickly and thought you said it was a windows system. I'd still try to reinstall the drivers, and do a full uninstall/purge/reinstall. also, you should make sure the system isnt going to sleep or hibernation or anything like that. if you can make sure the computation isnt interrupted that seems to run the best in my experience. ![]() |
Send message Joined: 18 Dec 09 Posts: 6 Credit: 1,046,736,560 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
As of the date of the post, all work units had failed. I didn't complete one successfully until the 17th. To date, I have:
9 Error while computing 3 Completed and validated 1 Cancelled by server
|
Send message Joined: 26 Apr 20 Posts: 3 Credit: 1,219,253 RAC: 0 Level ![]() Scientific publications ![]() |
Is there no way to make the work units smaller so those of us that have older systems can still participate in the project? |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,722,595 RAC: 4,266,994 Level ![]() Scientific publications ![]() |
As of the date of the post, all work units had failed. I didn't complete one successfully until the 17th. from what I can infer from other posts, this project has never had, nor promised, a homogeneous supply of tasks. it seems like the relatively small MDADs that we had for several months was the exception. as to your errors, on your single GTX 660 host. you were given two tasks. but that GPU is too slow to complete a single task in the 5-day limit, let alone two. it looks like you started one, and the other sat waiting until it hit the deadline, at which point it was canceled for not even started yet. this is standard BOINC behavior. your other task, appears to still be in-progress on your system even though it's past the deadline. you may as well just cancel that unit, since it was already sent out to another system, and received a valid result 4 days ago. even if you continue crunching it and submit it, it's unlikely that you will receive any credit for it. I would just cancel it and set NNT for GPUGRID on that system until suitable WUs are available here again. http://www.gpugrid.net/workunit.php?wuid=27025213 as to your other system with 2 GPUs. almost of all of the errors are for the same reason: ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device! this is a known problem with the app here. if a task is interrupted, and tries to restart on a different device, you are likely to get this error. the only real solution is to not interrupt the task. which means not stopping computation and not turning the system off. I understand that not everyone wants to operate this way, but there are also several other projects to choose from that will allow you to operate this way. Folding@home seems to be a popular choice for folks around here for older nvidia cards, or for people who wish to contribute less often/less resources. ![]() |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,722,595 RAC: 4,266,994 Level ![]() Scientific publications ![]() |
Is there no way to make the work units smaller so those of us that have older systems can still participate in the project? no way to make the WUs smaller. we get what the project gives us. you cannot manipulate those tasks client-side at all. ![]() |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 869 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
it seems like the relatively small MDADs that we had for several months was the exception. yes and no. until some time ago, there were so-called "short runs" and "long runs" (you can still see this when looking at the lower left section in the server status page), and the user could choose in his/her settings. The small MDADs we recently got would definitely have fallen under "short runs". But never before there were such long runs like the current series, not even under "long runs". So, as I said before, for users with older cards it would help if the 5-days-deadline would be extended by 1 or 2 days. No idea why this is not being done :-( |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,722,595 RAC: 4,266,994 Level ![]() Scientific publications ![]() |
short runs seem to be defined as 2-3 hrs on the fastest card. the MDADs were way shorter than that. running only about 15-20mins on a 2080ti. I'd say that's out of the norm for the project history. even long runs are defined as 8-12hrs on fastest card. and I'd say these bandit tasks certainly fall into that category. with a 2080ti usually taking about 10hrs. ![]() |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 869 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
short runs seem to be defined as 2-3 hrs on the fastest card. ... these definitions on the server status page: Short runs (2-3 hours on fastest card) Long runs (8-12 hours on fastest card) have been like this over many years. It's never been changed, as far as I can remember. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It's never been changed, as far as I can remember. But "the fastest card", being a relative term, has certainly changed its meaning over the years. |
©2025 Universitat Pompeu Fabra