Message boards :
Number crunching :
A workunit at 99.960% progress for at least 18 hours
Message board moderation
| Author | Message |
|---|---|
robertmilesSend message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
For many hours, I've had a workunit showing unusual progress numbers: Long runs (8-12 hours on fastest card) 6.16 (cuda31) I4R57-NATHAN-RPS1120528-2-166-RND7359 Running 0.185 CPUs + 1 NVIDIA GPU 5000000 GFLOPS 00:41:51 (CPU at last checkpoint) 19:19:11 (CPU time) (slowly rising) 70:45:57 (Elapsed time) (and rising) --- (Estimated time remaining; not changing) 99.960% (fraction done; not changing for several hours now) Is there something wrong with this workunit? Or is this just the way that application handles a serious underestimate of the time the workunit should run? |
robertmilesSend message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Now about 24 hours. 6/5/2012 10:25:37 PM | | No config file found - using defaults 6/5/2012 10:25:37 PM | | Starting BOINC client version 7.0.25 for windows_x86_64 6/5/2012 10:25:37 PM | | log flags: file_xfer, sched_ops, task 6/5/2012 10:25:37 PM | | Libraries: libcurl/7.21.6 OpenSSL/1.0.0d zlib/1.2.5 6/5/2012 10:25:37 PM | | Data directory: C:\ProgramData\BOINC 6/5/2012 10:25:37 PM | | Running under account Bobby 6/5/2012 10:25:37 PM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz [Family 6 Model 42 Stepping 7] 6/5/2012 10:25:37 PM | | Processor: 256.00 KB cache 6/5/2012 10:25:37 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx smx tm2 popcnt aes pbe 6/5/2012 10:25:37 PM | | OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00) 6/5/2012 10:25:37 PM | | Memory: 15.98 GB physical, 31.96 GB virtual 6/5/2012 10:25:37 PM | | Disk: 136.03 GB total, 65.07 GB free 6/5/2012 10:25:37 PM | | Local time is UTC -5 hours 6/5/2012 10:25:37 PM | | NVIDIA GPU 0: GeForce GT 440 (driver version 301.42, CUDA version 4.20, compute capability 2.1, 1536MB, 1442MB available, 342 GFLOPS peak) 6/5/2012 10:25:37 PM | | OpenCL: NVIDIA GPU 0: GeForce GT 440 (driver version 301.42, device version OpenCL 1.1 CUDA, 1536MB, 1442MB available) |
|
Send message Joined: 5 Dec 11 Posts: 147 Credit: 69,970,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
that sounds like there is something wrong there. Have you tried re-starting your computer? |
robertmilesSend message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have now. Lost everything done since about 52 hours. One of the many GPU workunits downloaded during that situation took over. |
robertmilesSend message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It now restarted and completed successfully. Returned not quite 4 days after it was sent. |
|
Send message Joined: 5 Dec 11 Posts: 147 Credit: 69,970,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have now. Lost everything done since about 52 hours. :-( Sorry you lost all that work. I've had one get stuck before like that, and a simple restart fixed it. Getting stuck at that percentage seems to be something to do with writing the completed file to disk before transmission. Background task preventing disk writes perhaps? virus scan or something like that. I'm not sure..... |
|
Send message Joined: 26 Dec 10 Posts: 115 Credit: 416,576,946 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There might be a larger issue here. Two of the I1R45-NATHAN_RPS work units failed for me: http://www.gpugrid.net/result.php?resultid=5479036 http://www.gpugrid.net/result.php?resultid=5478382 Thx - Paul Note: Please don't use driver version 295 or 296! Recommended versions are 266 - 285. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There might be a larger issue here. Two of the I1R45-NATHAN_RPS work units failed for me: This is a different problem. You should follow the outcome of those workunits on other hosts to see if it's the workunits' fault. However, I've caught a PAOLA_1H46-15-20 running (at 99% GPU usage) on my PC for more than 20 hours, while the progress indicator was showing 9.23%. I've paused and restarted this workunit, and its running time indicator dropped back to 45 minutes, and the progress indicator to 9%. Since then it's running fine (1h14m, 13.4%) |
©2025 Universitat Pompeu Fabra