Message boards :
Number crunching :
NOELIAs are back!
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
| Author | Message |
|---|---|
|
Send message Joined: 30 Aug 08 Posts: 12 Credit: 15,800,629 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There is a problem on some Linux operating systems; they want to take for ever to complete. I think it's the more recent versions of Linux that are impacted, but not all. It's possible there is something missing in the drivers or Linux that is preventing the correct use of the drivers; missing libraries. I'm running Debian Wheezy, so yes, a very recent version of kernel, drivers and libraries. And if I want to install « glibc-2.13-1 » (containing libpthread.so.0 mentioned in the error message), apt tells me that « libc6 » is already installed instead. So yes indeed, might be that this is a choice of library to compile the Linux application that is not compatible with latest versions (but I'm no developer, so that is just an assumption). |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Finished my first Noelia without errors on XP running a 660TI. Took about 35 minutes longer than the same card on Linux Mint While you have only run one task type each on Linux and XP, it looks like Linux Mint (3.5.0-17-generic) is ~5% faster (4.5% for Nathan's and 5.8% for Noelia's): Linux 306px37x2-NOELIA_klebe_run-1-3-RND7661_1 4440201 10 May 2013 | 20:53:39 UTC 11 May 2013 | 7:16:08 UTC Completed and validated 36,653.31 16,521.36 127,800.00 I40R14-NATHAN_dhfr36_6-10-32-RND5144_0 4440304 10 May 2013 | 20:53:39 UTC 11 May 2013 | 12:08:30 UTC Completed and validated 17,866.73 17,627.17 70,800.00 XP 306px2x1-NOELIA_klebe_run-1-3-RND0127_0 4442470 11 May 2013 | 19:28:14 UTC 12 May 2013 | 6:21:56 UTC Completed and validated 38,796.59 17,414.91 127,800.00 I12R11-NATHAN_dhfr36_6-13-32-RND4528_0 4442251 11 May 2013 | 19:29:37 UTC 12 May 2013 | 11:33:18 UTC Completed and validated 18,676.30 18,565.16 70,800.00 All 'Long runs (8-12 hours on fastest card) v6.18 (cuda42)' That's more than I thought it would be (1%, possibly 3%). There might be some task variation, but running on the same system is a very solid. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I2HDQ_17R4-SDOERR_2HDQc-1-4-RND1951_0 I99R11-NATHAN_dhfr36_6-8-32-RND2501_0 202 (0xca) EXIT_ABORTED_BY_PROJECT http://www.gpugrid.net/result.php?resultid=6878277 http://www.gpugrid.net/result.php?resultid=6851367 Errors and errors again and again, and another noelias incomimng-((( After cca two weeks without problems and restarting/blue screen. Every time when i have rac about 620k incoming some wrong jobs noelia..but is not the first time when im complained to the problem when I have just about 600-620 rac..Now it is all my participation in the project after two weeks in the ass. Is there any conspiracy behind it or just incompetence? Can I do something more than complain here?...-))) |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Is there any conspiracy behind it or just incompetence? Neither. Noelia is not to blame, she's just the first to use new features which the project needs in the future. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Every time when i have rac about 620k incoming some wrong jobs noelia..but is not the first time when im complained to the problem when I have just about 600-620 rac..It is the third time exactly when again I have a problem of Noelia and just when I got 620k rac, it is amazing Mr. Scientist----------And that's your answer, Mr. moderator? I have to prove it in the logs of work. do you think that people in not perfect English can not understand simulations of proteins?! shame om you |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It's obviously an international conspiracy to keep your RAC low. We're all involved and participate in LJRAC (Lowering Jozef's Recent Average Credit). BTW, the checks are in the mail... JK. Seriously, we're all suffering the same problem. This is not what one would call a smoothly running project. Just saying... |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Well I have two systems with nVidia cards, slow ones though. However they do Noelia's without problems so far, taking between 2 and 3 days. The systems are stable nothing is overclocked and not the latest BOINC or drivers. If it works than I leave it as is. If not I'll try to update the video drivers. One CPU core is always free, that seems to be important. It could be the setup of system and drivers that results in errors, Microsoft Windows overall is a complex and heavy controlling OS. Greetings from TJ |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If it works than I leave it as is. Always the best advice. Unfortunately I test stupid problems and get errors for my efforts. Today while testing something and looking into another issue/fix, I had to suspend WU's. This caused the driver to restart two or three times, and then I got a blue screen. On reboot lots of C+ errors and all my running WU's crashed and burned. Not an issue if I had been running FightMalaria@home, but I was running 5 climate models and lost several hundred hours - scunnered! Possible fix here - works for me. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I was running 5 climate models and lost several hundred hours - scunnered! Ouch. |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have read here in the forum many times that suspending a GPUGRID WU will cause error and blue screen. That is why I have never tried it. For Albert and Einstein at home it can be done without harm (in my case). Greetings from TJ |
[PUGLIA] kidkidkid3Send message Joined: 23 Feb 11 Posts: 101 Credit: 1,589,743,957 RAC: 360 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi all, after 25 hours (72 % completed) i suspend this Noelia's WU. During resume i had this error, after my abort because it starts from 0%. Did you have an idea about this ? Thanks in advance. K. http://www.gpugrid.net/result.php?resultid=6879994 Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing. (Martin Luther King) |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I think it's pretty safe to say that with the curent Noelias suspending a WU almost certainly triggers a driver reset. For me this has taken down a few hours of Einstein work, twice. Now I do my testing whenever I have other WUs running. Not ideal, but better than the alternative. @Jozef: when was the last time that throwing insults at people actually helped you? Looking at your tasks I can see that in the last 2 weeks you had 3 Noelias and 2 Nathans fail with computation errors. That's unfortunate, but not unusual and you can be sure the scientists are looking into it. But it's nowhere near the scale of the global conspiracy which you seem to suspect. Actually everyone gets bonus credits for each long-run WU as the risk of loosing them is higher than for shorter WUs. So you're being compensated for a certain failure rate from the beginning on. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The suspend-restart blue screen has never happened to me and I suspend quite often (Windows XP Pro x64). Maybe it's an OS specific issue, I also have my checkpoints set to 900 seconds (15 minutes), I did this mainly for the climate models I run. I do have problems when finishing a SDOERR and starting a NOELIA on the same GPU, no crashes, just the card running wild on the GPU clock. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've only seen the 'suspend & crash' problem on W7. Saying as different OS's handle the drivers differently it's bound to be OS related. On XP I think you still can't set Prefer maximum performance in NVidia control panel - might explain the 'card running wild on the GPU clock' issue. Anyway, it's a driver issue; they took that feature away. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've only seen the 'suspend & crash' problem on W7. Add W8 to that! MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 23 Dec 09 Posts: 189 Credit: 4,798,881,008 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This caused the driver to restart two or three times, and then I got a blue screen. On reboot lots of C+ errors and all my running WU's crashed and burned. Not an issue if I had been running FightMalaria@home, but I was running 5 climate models and lost several hundred hours - scunnered! Sometimes I do think it is not necessary the GPUGRID WUs, which causes the bluescreen, I think it might be also the CLIMATEPREDICTION.NET WUs: I just had a bluescreen around the same time as you, and then one of the CLIMATEPREDICTION.NET WUs did not work anymore, and the GPUGRID did continue. However it mostly on a system with a GTX 570 card. |
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The climate models (CPDN) are very, very sensitive to any kind of an interruption. When I set my checkpoints to every 15 minutes, my computation error rate dropped by 70% and if I do 3 or more suspend/restarts within 10 minutes, I'll get at least 1 error. When I reboot my computers (every 200 hours), I suspend the tasks and close BOINC by clicking exit, that works every time for me. If I get any kind of a crash or system freeze (neither have happened in over a year), it's a guarantee that I will lose some Climate models, I even have new APC units just incase. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The climate models (CPDN) are very, very sensitive to any kind of an interruption. When I set my checkpoints to every 15 minutes, my computation error rate dropped by 70% and if I do 3 or more suspend/restarts within 10 minutes, I'll get at least 1 error. I've been wondering about CPDN, because the people reporting crashes often mention that they loose CPDN work. I'm not running that project and also have never had any hard crashes, nothing but some ACEMD errors on certain WU types. Nothing else running on the machine is ever affected. |
|
Send message Joined: 5 May 13 Posts: 187 Credit: 349,254,454 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi all, My GTX 650Ti is currently working on a NOELIA, but the GPU utilization looks pretty low: elapsed 10h, remaining 17h. That will be a total of 27 hours! A previous SDOERR took 18h. I use Linux and so can't observe GPU utilization directly, but judging by the temperature (50C), the GPU is clearly not being fully used. It gets at >60C when it is. Has anyone else observed this? Edit: The WU's process (acemd.2868) consumes 5-10% CPU, but this doesn't seem to be the cause of the under-utilization, rather the symptom of it. I tried suspending CPU tasks and projects and it didn't change. My configuration: Ubuntu Server 12.04 x86_64 Kernel 3.2.0-41-generic. NVIDIA driver 319.17 BOINC 7.0.65 |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My GTX 650Ti is currently working on a NOELIA, but the GPU utilization looks pretty low: elapsed 10h, remaining 17h. That will be a total of 27 hours! A previous SDOERR took 18h. On my 650 Ti GPUs (and others) the GPU utilization runs 5-6% lower on NOELIA and NATHAN_KID WUs than on SDOERR WUs. (Win7-64) |
©2025 Universitat Pompeu Fabra