Message boards :
Wish list :
More resiliant WU processing
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 7 Apr 15 Posts: 33 Credit: 1,201,157,375 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi Everyone, Recently, due to a faulty GPU Titan card, my system became unpredictable and started crashing regularly which resulted in calculation errors on the GPUGrid WU's. I have multiple GPU's (GTX 1070 & 1070 Ti) in my system. If my system crashes due to this faulty card, the WU d'office is totally lost due these calculation errors. This in stark contrast with CPU WU's which seem to recover fully and just restart their calculations from a previously saved intermediate point, and calculate their way to successful completioin. (Rosetta, LHC, WorldCommunity Grid, ClimatePrediction) e.g. ClimatePrediction has WU's running for 36 hours on end. Long runs take between 6 and 12 hours, if you're at 5h30 or further in a WU, that's a massive loss of time + power. Can this be developed for GPU Grid too please ? I would be very grateful (and I guess many crunchers with me) if you could introduce this functionality soon. Many thanks in advance ! BelgianEnthousiast. |
|
Send message Joined: 30 Apr 13 Posts: 106 Credit: 3,805,237,860 RAC: 65 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I second BelgianEnthousiast's suggestion. There is a periodic checkpoint of some sort, but it doesn't seem to do much good for post-crash recovery of completed work. Win |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If your card was faulty it wouldn't matter how many times the WU started at last good point it would still end in failure. Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline |
©2025 Universitat Pompeu Fabra