New D3RBanditTest workunits

Message boards : News : New D3RBanditTest workunits
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 14 · Next

AuthorMessage
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 56648 - Posted: 21 Feb 2021, 7:03:33 UTC

The WorldCommunityGrid has started releasing some GPU tasks of OpenPandemicsbeta but I haven't been lucky to get them.
Tullio
ID: 56648 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RockLr

Send message
Joined: 14 Mar 20
Posts: 7
Credit: 11,283,596
RAC: 0
Level
Pro
Scientific publications
wat
Message 56656 - Posted: 22 Feb 2021, 7:14:54 UTC - in response to Message 56648.  

The WorldCommunityGrid has started releasing some GPU tasks of OpenPandemicsbeta but I haven't been lucky to get them.

Luckily I got some of them. It looks like WCG is looking for suitable size of WU.
Excitedly, they seem to be much faster than the CPU version.
ID: 56656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 22 May 20
Posts: 110
Credit: 115,525,136
RAC: 345
Level
Cys
Scientific publications
wat
Message 56658 - Posted: 22 Feb 2021, 15:29:30 UTC

Hey Toni!

Thanks for the new supply of work first of all. I was wondering if you'd possibly think about adjusting time limit for bonus points upon a timely return of a WU slightly upwards, to allow volunteers with less powerful cards (like 1660 series cards) to not be penalized for a mere 5% over the defined limit. I reckon that the limit was put in place to ensure a timely computation of the WU, but while 24hrs might be fine for just a 2-4 hrs WU, the situation drastically shifted with average runtimes of ~10h on the fastest cards.

I understand that increasing the deadline of WU is sth you don't want to do in order to ensure timeliness of WUs, but just extending the originally set 24hrs time limit to say 25/26 hrs surely wouldn't hurt this performance goal a lot.

ID: 56658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56659 - Posted: 22 Feb 2021, 16:03:09 UTC - in response to Message 56658.  

I don't think you should think of it as a "penalty" you're not getting penalized, just not getting the quick return bonus since you didn't make the cutoff.

Personally I don't think this needs to be changed. it's a bonus for exceptional work, not an entitlement. if you return within 2 days you still get some bonus. I think if you still want the bonus, you should invest in your systems to make them faster.

and yes, I am myself subject to missing out on the bonus for one of my systems since the GTX 1660Super (device-1 in the RTX3070 host) is unable to meet the 24hr cutoff (routinely ~27hrs).
ID: 56659 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 311
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56664 - Posted: 22 Feb 2021, 21:10:04 UTC

I have quite a high ERROR rate on this host: http://www.gpugrid.net/results.php?hostid=523675
Quite a few WUs crash with:
"Error invoking kernel: CUDA_ERROR_ILLEGAL_INSTRUCTION (715)" (5 units)
"Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719)" (1 unit)
"Particle coordinate is nan" (1 unit)
"process exited with code 195 (0xc3, -61)</message>" (1 unit)
Any idea what the cause might be? The other Linux computer does work flawlessly. And the other two Windows 10 computers do not produces ERRORs either.
ID: 56664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 14 Aug 08
Posts: 18
Credit: 16,944
RAC: 0
Level

Scientific publications
wat
Message 56665 - Posted: 22 Feb 2021, 21:25:17 UTC

I did not see a GTX960 mentioned here. BOINC thinks the new WU will take 3 hours to complete. Is that accurate?
ID: 56665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56666 - Posted: 22 Feb 2021, 21:51:33 UTC - in response to Message 56665.  

I did not see a GTX960 mentioned here. BOINC thinks the new WU will take 3 hours to complete. Is that accurate?

Not yet. After the first one has completed, subsequent estimates will bemore realistic.

With the current tasks, I'd guess something in the range 1.5 days - 2 days. I ditched my 970s last year, because I could see the writing on the wall - after a good few years of faithful service, they were no longer fit to match the current beasts. I went for 1660 (super or Ti) instead.

This project tends to run different sub-projects, working with data and parameters from different researchers. And they don't reset the task estimates when they change the jobs. Your card would have been very comfortable with the previous run, but not so happy with this one. Only time will tell what the next one will be - we tend not to find out until after it's started.
ID: 56666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56668 - Posted: 22 Feb 2021, 22:08:11 UTC - in response to Message 56664.  

I have quite a high ERROR rate on this host: http://www.gpugrid.net/results.php?hostid=523675
Quite a few WUs crash with:
"Error invoking kernel: CUDA_ERROR_ILLEGAL_INSTRUCTION (715)" (5 units)
"Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719)" (1 unit)
"Particle coordinate is nan" (1 unit)
"process exited with code 195 (0xc3, -61)</message>" (1 unit)
Any idea what the cause might be? The other Linux computer does work flawlessly. And the other two Windows 10 computers do not produces ERRORs either.

“Particle coordinate is nan” is usually too much overclocking. Or card too hot causing instability. Remove any overclock and ensure the card has good airflow for reasonable temps.

The other errors might be driver related. Try to remove the old drivers with DDU from safe mode, be sure to select the option in the settings to prevent Windows from automatically installing drivers. Then go to nvidia’s website and download the latest drivers for your system, selecting custom install and clean install during the install process.

That would be my next steps.
ID: 56668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56669 - Posted: 22 Feb 2021, 22:22:03 UTC - in response to Message 56668.  

... download the latest drivers for your system ...

I'm always a bit cautious about going for 'the latest' of anything. Driver release is usually driven by gaming first, and sometimes they break other things - like computing for science - along the way.

I usually go for the final, bugfix, sub-version of the previous major release.
ID: 56669 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56670 - Posted: 22 Feb 2021, 23:05:05 UTC - in response to Message 56669.  

... download the latest drivers for your system ...

I'm always a bit cautious about going for 'the latest' of anything. Driver release is usually driven by gaming first, and sometimes they break other things - like computing for science - along the way.

I usually go for the final, bugfix, sub-version of the previous major release.


whatever suits your fancy. The important bit is to totally wipe the old drivers, and do not allow Microsoft to auto-install their own, and do a clean install of the package provided by Nvidia.

(I prefer to avoid Geforce Experience as well, but up to you I guess).
ID: 56670 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 311
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56671 - Posted: 23 Feb 2021, 0:41:12 UTC - in response to Message 56668.  
Last modified: 23 Feb 2021, 1:01:43 UTC

The other errors might be driver related. Try to remove the old drivers with DDU from safe mode, be sure to select the option in the settings to prevent Windows from automatically installing drivers. Then go to nvidia’s website and download the latest drivers for your system, selecting custom install and clean install during the install process.

It is a Linux Box...
I switched already back to Nouveau driver. Restarted - no image… luckily I was able to restart with GRUB (recovery mode). After that I was able to install latest Nvidia Driver again.
Hope this will solve the problem!
ID: 56671 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56674 - Posted: 23 Feb 2021, 1:44:50 UTC - in response to Message 56671.  

The other errors might be driver related. Try to remove the old drivers with DDU from safe mode, be sure to select the option in the settings to prevent Windows from automatically installing drivers. Then go to nvidia’s website and download the latest drivers for your system, selecting custom install and clean install during the install process.

It is a Linux Box...
I switched already back to Nouveau driver. Restarted - no image… luckily I was able to restart with GRUB (recovery mode). After that I was able to install latest Nvidia Driver again.
Hope this will solve the problem!


apologies, i must have read your previous post too quickly and thought you said it was a windows system. I'd still try to reinstall the drivers, and do a full uninstall/purge/reinstall.

also, you should make sure the system isnt going to sleep or hibernation or anything like that. if you can make sure the computation isnt interrupted that seems to run the best in my experience.
ID: 56674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
d_a_dempsey

Send message
Joined: 18 Dec 09
Posts: 6
Credit: 1,046,736,560
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56676 - Posted: 23 Feb 2021, 15:13:19 UTC - in response to Message 56614.  

As of the date of the post, all work units had failed. I didn't complete one successfully until the 17th.

To date, I have:
    3 In progress
    9 Error while computing
    3 Completed and validated
    1 Cancelled by server



3 out of 14 is not a good ratio.

I do not crunch 24/7. I actually use my computer for real life stuff, and some of that includes me using my GPUs for something other than this project. Has not been a problem before these work units. Before Feb. 10, my last failed work units were in 2018.

ID: 56676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jimvt

Send message
Joined: 26 Apr 20
Posts: 3
Credit: 1,219,253
RAC: 0
Level
Ala
Scientific publications
wat
Message 56677 - Posted: 23 Feb 2021, 15:14:26 UTC - in response to Message 56504.  

Is there no way to make the work units smaller so those of us that have older systems can still participate in the project?
ID: 56677 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56678 - Posted: 23 Feb 2021, 15:37:35 UTC - in response to Message 56676.  
Last modified: 23 Feb 2021, 16:27:54 UTC

As of the date of the post, all work units had failed. I didn't complete one successfully until the 17th.

To date, I have:
    3 In progress
    9 Error while computing
    3 Completed and validated
    1 Cancelled by server



3 out of 14 is not a good ratio.

I do not crunch 24/7. I actually use my computer for real life stuff, and some of that includes me using my GPUs for something other than this project. Has not been a problem before these work units. Before Feb. 10, my last failed work units were in 2018.



from what I can infer from other posts, this project has never had, nor promised, a homogeneous supply of tasks. it seems like the relatively small MDADs that we had for several months was the exception.

as to your errors, on your single GTX 660 host. you were given two tasks. but that GPU is too slow to complete a single task in the 5-day limit, let alone two. it looks like you started one, and the other sat waiting until it hit the deadline, at which point it was canceled for not even started yet. this is standard BOINC behavior. your other task, appears to still be in-progress on your system even though it's past the deadline. you may as well just cancel that unit, since it was already sent out to another system, and received a valid result 4 days ago. even if you continue crunching it and submit it, it's unlikely that you will receive any credit for it. I would just cancel it and set NNT for GPUGRID on that system until suitable WUs are available here again.

http://www.gpugrid.net/workunit.php?wuid=27025213

as to your other system with 2 GPUs. almost of all of the errors are for the same reason:
ERROR: src\mdsim\context.cpp line 318: Cannot use a restart file on a different device!


this is a known problem with the app here. if a task is interrupted, and tries to restart on a different device, you are likely to get this error. the only real solution is to not interrupt the task. which means not stopping computation and not turning the system off.

I understand that not everyone wants to operate this way, but there are also several other projects to choose from that will allow you to operate this way. Folding@home seems to be a popular choice for folks around here for older nvidia cards, or for people who wish to contribute less often/less resources.
ID: 56678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56679 - Posted: 23 Feb 2021, 15:38:21 UTC - in response to Message 56677.  

Is there no way to make the work units smaller so those of us that have older systems can still participate in the project?


no way to make the WUs smaller. we get what the project gives us. you cannot manipulate those tasks client-side at all.
ID: 56679 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 869
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56682 - Posted: 24 Feb 2021, 5:49:51 UTC - in response to Message 56678.  

it seems like the relatively small MDADs that we had for several months was the exception.

yes and no.

until some time ago, there were so-called "short runs" and "long runs" (you can still see this when looking at the lower left section in the server status page), and the user could choose in his/her settings.
The small MDADs we recently got would definitely have fallen under "short runs".
But never before there were such long runs like the current series, not even under "long runs".
So, as I said before, for users with older cards it would help if the 5-days-deadline would be extended by 1 or 2 days. No idea why this is not being done :-(
ID: 56682 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 56683 - Posted: 24 Feb 2021, 13:19:50 UTC - in response to Message 56682.  

short runs seem to be defined as 2-3 hrs on the fastest card. the MDADs were way shorter than that. running only about 15-20mins on a 2080ti. I'd say that's out of the norm for the project history.

even long runs are defined as 8-12hrs on fastest card. and I'd say these bandit tasks certainly fall into that category. with a 2080ti usually taking about 10hrs.
ID: 56683 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 869
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 56684 - Posted: 24 Feb 2021, 14:23:40 UTC - in response to Message 56683.  

short runs seem to be defined as 2-3 hrs on the fastest card. ...
long runs are defined as 8-12hrs on fastest card.

these definitions on the server status page:

Short runs (2-3 hours on fastest card)
Long runs (8-12 hours on fastest card)


have been like this over many years. It's never been changed, as far as I can remember.


ID: 56684 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56685 - Posted: 24 Feb 2021, 15:10:01 UTC - in response to Message 56684.  

It's never been changed, as far as I can remember.

But "the fastest card", being a relative term, has certainly changed its meaning over the years.
ID: 56685 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 14 · Next

Message boards : News : New D3RBanditTest workunits

©2025 Universitat Pompeu Fabra