Message boards :
Graphics cards (GPUs) :
Two Computer Errors
Message board moderation
| Author | Message |
|---|---|
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Well this is depressing. Two tasks with compute errors. The good news is that the errors are different and they happened on different systems. The first error is ERROR: tclutil.cu, line 23: get_Dvec() not a 3 vector which I have not seen at all on the boards. The second is Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E which I believe I have seen before ... Though I do find it interesting that they seem to be the same "class" task from the task name: jh21064-SMD05-0-1-SH2_SMD_1_0 ik16247-SMD01-1-4-SH2_SMD_1_0 |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
job names are SMD05, SMD01. So they are also different. gdf |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
job names are SMD05, SMD01. So they are also different. Ok, well, they both died ... they have that in common ... In that I have been running like forever with no errors ... well ... this is worrisome ... |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
No reason to get depressed Paul. We've probably spotted the source of the error and it has to do with the nature of the WU type and its input parameters. These SMD* series (SMD01,SMD02,SMD05,SMD10) are some punctual tests needed to improve the performance of the main WUs (SH2_US_* series). An improvement in performance meaning a quicker convergence to the goal of these simulations. And this goal is to obtain equivalent results to the experimental values reported for the interaction affinity of our main system of study, the SH2-ligand complex. Therefore, in order to analyze properly the SMD* WUs we changed the frequency at which a certain output file was written. And this may have caused the problem by increasing the size of this output file above the limits set on the templates. We are working on a workaround for this problem. Sorry for the inconveniences. ignasi |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It should be fixed. Report any misfunction. New WUs look like this one: lF22075-SMD10_1-0-1-SH2 Expect shorter computation times for the SMD10_1 set. thanks, ignasi |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
No reason to get depressed Paul. Sadly, I don't need a reason to get depressed. The state of my life ... BUT, the important thing is that "we", (me and the mouse in my pocket?), discovered the problem and that is the main point. Every project has those tasks that fail and it is just part of the business... I just want to see that the problems get fixed early and often ... :) Thanks for the feedback ... |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Unfortunately there seems to be something else with these non-common WUs. SMD0*_1 series. They will be totally discontinued for now. Sorry for the inconveniences, ignasi |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Unfortunately there seems to be something else with these non-common WUs. Thank you for cancelling them ... For all those lurking ... do a manual update to flush the bad tasks ... the server will cancel them for you ... thank you for watching ... :) |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for the detailed feedback, it's appreciated :) MrS Scanning for our furry friends since Jan 2002 |
rebirtherSend message Joined: 7 Jul 07 Posts: 53 Credit: 3,048,781 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
http://www.ps3grid.net/result.php?resultid=310418 Maximum disk usage exceeded, Iam confused?! 4GB HDD free ^^, WU finished completely. |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
http://www.ps3grid.net/result.php?resultid=310418 What disk usage preferences do you have? |
rebirtherSend message Joined: 7 Jul 07 Posts: 53 Credit: 3,048,781 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
http://www.ps3grid.net/result.php?resultid=310418 Sorry for late answer, I almost use 100% of disk space total: <disk_interval>60.000000</disk_interval> <disk_max_used_gb>100.000000</disk_max_used_gb> <disk_max_used_pct>100.000000</disk_max_used_pct> <disk_min_free_gb>0.000000</disk_min_free_gb> <vm_max_used_pct>75.000000</vm_max_used_pct> <ram_max_used_busy_pct>90.000000</ram_max_used_busy_pct> <ram_max_used_idle_pct>100.000000</ram_max_used_idle_pct> WU ran around 20000sec. |
|
Send message Joined: 2 Dec 08 Posts: 5 Credit: 3,027,593 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I got the same error here. The disk usage preferences setting: <disk_interval>60</disk_interval> <disk_max_used_gb>100</disk_max_used_gb> <disk_max_used_pct>50</disk_max_used_pct> <disk_min_free_gb>0.1</disk_min_free_gb> <vm_max_used_pct>80</vm_max_used_pct> <ram_max_used_busy_pct>75</ram_max_used_busy_pct> <ram_max_used_idle_pct>90</ram_max_used_idle_pct> |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That is more than enough. Anyway, we have just cancelled all these WUs for the moment. The main issue here is not to make you waste crunching time. sorry for that, ignasi |
rebirtherSend message Joined: 7 Jul 07 Posts: 53 Credit: 3,048,781 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
That is more than enough. Pls run test WUs by setting "run test applications" in prefs, this avoid most of computation errors and aborting WUs while they are cancelled by the server. Not all can checking there hosts and wasting much more time. |
Bender10Send message Joined: 3 Dec 07 Posts: 167 Credit: 8,368,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Pls run test WUs by setting "run test applications" in prefs, this avoid most of computation errors and aborting WUs while they are cancelled by the server. Not all can checking there hosts and wasting much more time. I thought all (GPU) Wu's run here were TEST Wu's......?? The GPUgrid portion of this project is still Beta right..? Or maybe I missed a memo... Consciousness: That annoying time between naps...... Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I thought all (GPU) Wu's run here were TEST Wu's......?? The GPUgrid portion of this project is still Beta right..? I agree.. so maybe I also missed the memo ;) (I think Rebirther suggests to use the *new* BOINC functionality to treat special test WUs, which are more of a test than the normal test WUs.) MrS Scanning for our furry friends since Jan 2002 |
©2025 Universitat Pompeu Fabra