Advanced search

Message boards : Number crunching : BOINC manager v7.8.2 has been released

Author Message
Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2353
Credit: 16,339,242,850
RAC: 5,288,795
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47866 - Posted: 12 Sep 2017 | 23:12:47 UTC

You can download it from Berkeley's website.

kain
Send message
Joined: 3 Sep 14
Posts: 152
Credit: 866,451,384
RAC: 2,752,415
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 47869 - Posted: 13 Sep 2017 | 23:28:03 UTC

What are the changes? I can't find any info about it (?).

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,189,076,736
RAC: 19,067,520
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47870 - Posted: 13 Sep 2017 | 23:40:27 UTC - in response to Message 47866.

You can download it from Berkeley's website.

But I advise you not to bother, unless you're a masochist. It's riddled with bugs.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2353
Credit: 16,339,242,850
RAC: 5,288,795
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47871 - Posted: 13 Sep 2017 | 23:48:39 UTC - in response to Message 47870.

You can download it from Berkeley's website.

But I advise you not to bother, unless you're a masochist. It's riddled with bugs.

Oh, I've just updated my hosts with this version.
Why it is released, if it's full of bugs?
(that would look nice on the change list: "We've put more bugs in it than we and you combined could imagine")

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,189,076,736
RAC: 19,067,520
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47872 - Posted: 14 Sep 2017 | 9:09:55 UTC - in response to Message 47871.

Why it is released, if it's full of bugs?

That's a very deep question, and needs some background.

Somewhat more than two years ago, the US Government's NSF decided not to renew a research grant which paid the salaries of the three key workers who managed and maintained the BOINC project. Those workers lost their jobs.

The BOINC project was - nominally - handed over to the community to manage and maintain, but no preparation had been done: the community wasn't ready to receive it. It dropped into their lap, and they did nothing with it.

Two months ago, elements of the community came together in a working group to prepare procedures through which the community could pick up the responsibility which had been thrust upon it. I wrote about it here: I'm still a member of the group, and our work is ongoing.

In the meantime, the NSF has funded a new project which expands and builds upon the original BOINC project (described here). That needs some additional features in BOINC, and development work has started again.

The v7.8 branch/release is really intended as a simple refresh to ensure that there is a stable base for the new NSF work (which won't appear until v7.10), and to apply some needed updates like a new version of VBox compatible with Windows 10. But life is never as simple as that...

In the past, BOINC version releases have gone through a slow and extended process of alpha test releases, debugging, and re-releasing. It's taken months. The working group is moving towards more modern software practices where testing is a continual (and largely automated) process as code is written: that should allow new versions to be deployed practically 'on demand' as circumstances - like Windows and OS X updates - require them.

The v7.8.2 release is the first attempt at combining the old and the new ways of working. It's revealed where the gaps lie, and we're working to fix them: I had my own first bugfix accepted into the master codebase this morning - yay! Only 19 left to go...

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2353
Credit: 16,339,242,850
RAC: 5,288,795
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47873 - Posted: 14 Sep 2017 | 11:44:45 UTC - in response to Message 47872.

Thank you for your work, and for this explanation.

gianni
Send message
Joined: 11 Jul 08
Posts: 18
Credit: 105,098
RAC: 0
Level

Scientific publications
watwatwat
Message 47874 - Posted: 14 Sep 2017 | 14:09:19 UTC - in response to Message 47873.

Nice to know. Keep us posted.

gdf

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47875 - Posted: 14 Sep 2017 | 14:14:44 UTC
Last modified: 14 Sep 2017 | 14:15:23 UTC

I've updated on Win 10 x64 creator's edition, and I have not noticed any bugs.

Thanks for your dedication, Richard!
____________

gianni
Send message
Joined: 11 Jul 08
Posts: 18
Credit: 105,098
RAC: 0
Level

Scientific publications
watwatwat
Message 47876 - Posted: 14 Sep 2017 | 15:07:46 UTC - in response to Message 47874.

In particular we would like to know what is the status of virtualization because we are trying to setup in gpugrid a CPU application for which there is only a Linux distribution.


g

kain
Send message
Joined: 3 Sep 14
Posts: 152
Credit: 866,451,384
RAC: 2,752,415
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 47878 - Posted: 14 Sep 2017 | 15:28:35 UTC

Ok, so what are the changes?

In release notes there is no info about this version.

[CSF] Aleksey Belkov
Send message
Joined: 26 Dec 13
Posts: 86
Credit: 1,277,553,450
RAC: 308,082
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47879 - Posted: 15 Sep 2017 | 10:10:03 UTC - in response to Message 47878.
Last modified: 15 Sep 2017 | 10:14:32 UTC

Ok, so what are the changes?

In release notes there is no info about this version.


At the moment there is no official list of changes.

"Ageless" wrote:
I asked the release manager and he hopes someone else will do them. So it's anyone's guess.
I have an unofficial change log thread over on the BOINC forums: https://boinc.berkeley.edu/dev/forum_thread.php?id=11539 but we're not sure all those changes between 7.6.33 and 7.7.2 and BOINC 7.8.0 are in there. Only the release manager knows, and he won't tell.

Source: https://boinc.berkeley.edu/dev/forum_thread.php?id=11818

Assuming the above lists, there are no new "features" or serious bugsfixes(for Windows/Linux).

Surprisingly, even have not been updated to the openssl library, stuck on version 1.02 g(release 1 Mar 2016] and bundle of root certificates(ca-bundle.crt) (30 April 2015).
Had to once again to do it manually : /

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47880 - Posted: 16 Sep 2017 | 8:17:35 UTC
Last modified: 16 Sep 2017 | 8:39:23 UTC

I have (or had) BOINC 7.8.2 installed on three Ubuntu 17.04 machines and one Win7 64-bit machine with no problems. That is, until BOINC crashed (manager could not connect to client) on one of the Ubuntu machines. Even after a reboot it did not work, which I don't recall ever seeing before. So I uninstalled BOINC, and went back to 7.6.33, but it was still borked. The only other thing I can think of is that VirtualBox 5.1.28 was installed, but not attached to any projects, and removing it did not fix anything.

I would not normally mention it here (GPUGrid was not in use on that machine), but it is so unusual that maybe it has something to do with the BOINC 7.8.2 bugs. I have to reinstall the OS, and will stick with 7.6.33.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,189,076,736
RAC: 19,067,520
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47881 - Posted: 16 Sep 2017 | 9:01:58 UTC - in response to Message 47880.

I don't think that's a known problem with v7.8.2 (most of the bugs are more subtle than a downright crash), but it's hard to be sure from the information given. The most common cause of 'manager couldn't connect to client' is that the client isn't running: I'm not sure exactly what logs you get in Linux when an application tries, but fails, to start - that would have been the place to look, but it's probably water under the bridge now.

In the meantime, Jord has posted the most significant change logs known so far.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47882 - Posted: 16 Sep 2017 | 9:44:16 UTC - in response to Message 47881.
Last modified: 16 Sep 2017 | 10:19:20 UTC

OK, I thought it might be a bit much for a mere bug. I have not reinstalled yet, so I will look for any logs, and post as necessary.

Normally, I would expect a severe problem to take out the OS, but it seems that only BOINC was affected. I was running only CPDN and WCG, which should be relatively benign, though CPDN requires installation of the 32-bit libraries. That has worked for month on two machines, but it is possible that a bad work unit might trigger something I suppose.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,732,827,502
RAC: 880,227
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47884 - Posted: 16 Sep 2017 | 14:10:52 UTC - in response to Message 47882.
Last modified: 16 Sep 2017 | 14:12:35 UTC

I experienced the exact same thing yesterday too (as Jim1348)!
Lubuntu 16.04
BOINC 7.6.33
AMD Ryzen 1700x (it is my only Linux box, and you see still the abandoned gpugrid task in the log)
Running climateprediction and gpugrid, maybe a primegrid wu on the cpu as well.
This machine was running rock-solid for several months without restarting or any other intervention from my side. And then suddenly disconnected from BOINC client. I have not been able to reconnect either.
I reinstalled everything and up-graded to Lubuntu 17.04 as I assumed this might have been the problem. The computer is running again.}
Very strange!

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,189,076,736
RAC: 19,067,520
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47886 - Posted: 16 Sep 2017 | 14:46:15 UTC

And behold, after we'd all written that, we get a report of an apple client crashing at startup.

That user posted:

After some digging, I found that the client is crashing pretty much immediately, with the following stack trace on thread 0:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff90e0fd42 __pthread_kill + 10
1 libsystem_pthread.dylib 0x00007fff90efd457 pthread_kill + 90
2 libsystem_c.dylib 0x00007fff90d754bb __abort + 140
3 libsystem_c.dylib 0x00007fff90d75d7e __stack_chk_fail + 205
4 boinc 0x0000000100053d9d 0x100000000 + 343453
5 boinc 0x00000001000531e4 0x100000000 + 340452
6 boinc 0x00000001000520e6 0x100000000 + 336102
7 boinc 0x0000000100051ce0 0x100000000 + 335072
8 boinc 0x000000010002d80d 0x100000000 + 186381
9 boinc 0x000000010002dad5 0x100000000 + 187093
10 boinc 0x0000000100001034 0x100000000 + 4148

If anyone can get a similar report out of your Linux crashes (Linux and OS X are pretty similar behind the fancy graphics, apparently), could you post it here, please?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47887 - Posted: 16 Sep 2017 | 15:49:30 UTC - in response to Message 47886.

I found and saved my BOINC logs, but don't know which one it is. But I have wiped out the OS, so can't do a "stack trace" now. Does that do you any good?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,189,076,736
RAC: 19,067,520
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47888 - Posted: 16 Sep 2017 | 17:08:03 UTC - in response to Message 47887.

I found and saved my BOINC logs, but don't know which one it is. But I have wiped out the OS, so can't do a "stack trace" now. Does that do you any good?

Logs on their own probably won't help, although stderrdae.txt and stdoutdae.txt might have something. Could you look at the end of those files, please, and see if there's any mention of a failure around the time your problems started?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47889 - Posted: 16 Sep 2017 | 17:23:40 UTC - in response to Message 47888.

In stderrdae.txt, I see "buffer overflow" at the beginning and " [vsyscall] SIGABRT: abort called Stack trace (18 frames):" at the end, with a whole lot in between.

Here is the last part in full, but if you need more, let me know:

[vsyscall]
SIGABRT: abort called
Stack trace (18 frames):
/usr/lib/x86_64-linux-gnu/libboinc.so.7(boinc_catch_signal+0x1d8)[0x7fd5504932ac]
/lib/x86_64-linux-gnu/libc.so.6(+0x357f0)[0x7fd5507287f0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x9f)[0x7fd55072877f]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7fd55072a37a]
/lib/x86_64-linux-gnu/libc.so.6(+0x79090)[0x7fd55076c090]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x54)[0x7fd55080df84]
/lib/x86_64-linux-gnu/libc.so.6(+0x118f00)[0x7fd55080bf00]
/lib/x86_64-linux-gnu/libc.so.6(+0x1184b9)[0x7fd55080b4b9]
/lib/x86_64-linux-gnu/libc.so.6(_IO_default_xsputn+0xa9)[0x7fd5507709a9]
/lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0x1ccc)[0x7fd55074255c]
/lib/x86_64-linux-gnu/libc.so.6(__vsprintf_chk+0x84)[0x7fd55080b544]
/lib/x86_64-linux-gnu/libc.so.6(__sprintf_chk+0x7d)[0x7fd55080b49d]
/usr/bin/boinc(+0x9ddec)[0x560aa99fcdec]
/usr/bin/boinc(+0x7f9a9)[0x560aa99de9a9]
/usr/bin/boinc(+0x39e70)[0x560aa9998e70]
/usr/bin/boinc(+0xc4a9)[0x560aa996b4a9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fd5507133f1]
/usr/bin/boinc(+0xec9a)[0x560aa996dc9a]

Exiting...



As for stdoutdae.txt, I just see the usual parameters such as upload and download rate, with no obvious problems. I will be happy to zip and send you both of them, if there is a way.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,189,076,736
RAC: 19,067,520
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47890 - Posted: 16 Sep 2017 | 18:23:09 UTC - in response to Message 47889.

I don't know yet whether they'll be any help, but please keep them in a safe place in case we need to call for them. Just to be certain, can you be sure (from file timestamps or however) that this stack trace comes from the time when you were running v7.8.2 under, I think you said, Ubuntu 17.04?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47891 - Posted: 16 Sep 2017 | 20:11:56 UTC - in response to Message 47890.
Last modified: 16 Sep 2017 | 20:17:26 UTC

I don't know yet whether they'll be any help, but please keep them in a safe place in case we need to call for them. Just to be certain, can you be sure (from file timestamps or however) that this stack trace comes from the time when you were running v7.8.2 under, I think you said, Ubuntu 17.04?

I have been running Ubuntu 17.04 for several weeks, and BOINC 7.8.2 since at least 12 September, which I know from the CPDN results page; probably longer, though I can't tell from the file dates on stderrdae.txt and stdoutdae.txt since they were lost on copying. As for the time stamps, I don't really know, except that the first thing that looks like one is
======= Memory map: ========
564646ee0000-564646fc6000 r-xp 00000000 08:05 3145802 /usr/bin/boinc


and the last one is
7fd550ce0000-7fd550ce1000 rw-p 00026000 08:05 2752569 /lib/x86_64-linux-gnu/ld-2.24.so


If that is referring to 08:05 UTC (04:05 EDT), then that is the right time, or for the reboot after I detected it if BOINC was still operational at that point. That would not be more than a couple of hours after it occurred. Beyond that, I will certainly save all the logs and you can PM me here or on BOINC and I will be glad to send them for your expert inspection.

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47892 - Posted: 16 Sep 2017 | 22:15:36 UTC
Last modified: 16 Sep 2017 | 22:25:12 UTC

In later Linux kernels vsyscall is disabled. I'm running Debian and can't go past the 4.9 kernel (without fiddling) due to it. Ubuntu 17.04 ships with the 4.10 kernel as default. My machines are Ryzen 1700 and running BOINC 7.8.2 from the Stretch-backports repo. See this thread at Einstein.
____________
BOINC blog

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,189,076,736
RAC: 19,067,520
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47895 - Posted: 19 Sep 2017 | 10:33:53 UTC - in response to Message 47880.

I have (or had) BOINC 7.8.2 installed on three Ubuntu 17.04 machines and one Win7 64-bit machine with no problems. That is, until BOINC crashed (manager could not connect to client) on one of the Ubuntu machines. Even after a reboot it did not work, which I don't recall ever seeing before. So I uninstalled BOINC, and went back to 7.6.33, but it was still borked. The only other thing I can think of is that VirtualBox 5.1.28 was installed, but not attached to any projects, and removing it did not fix anything.

Just reporting back on this one for completeness.

All reported cases of "won't connect, won't run, won't even run with old version" have now been traced to a newly released batch (batch 658) of CPDN climate models - sprecifically, WAH2 for the PNW region. These tasks all fail after one simulation month under Linux and OS X (CPDN are trying to track down the reason for that - their problem). When the tasks crash, they leave behind a huge crash dump in stderr_txt, and 51 failed upload messages.

BOINC - all current versions - can't cope with that much error information, and fails with the symptoms described here. There are two known recovery routes:

a) Delete the file 'account_climateprediction.net.xml' from BOINC's data directory. This detaches you temporarily from the CPDN project, until the problems are resolved and you can re-attach.

b) Very carefully, edit client_state.xml to remove the <workunit> and <result> sections for any WAH2 PNW tasks you may have. Set 'no new tasks' for CPDN as soon as you get back control of BOINC.

BOINC v7.8.2 is NOT, it turns out, implicated in this problem. A fix has been written, and will be included in the next BOINC release - whenever that is.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47896 - Posted: 19 Sep 2017 | 13:35:08 UTC - in response to Message 47895.

That is a very nice summary, and I (and a lot of other people) are fortunate that Richard visited this forum at the right time. I would add only that the problem does not appear to affect the Windows version of BOINC on CPDN, though it is not clear why not.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 9,189,076,736
RAC: 19,067,520
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47897 - Posted: 19 Sep 2017 | 14:11:49 UTC - in response to Message 47896.

I would add only that the problem does not appear to affect the Windows version of BOINC on CPDN, though it is not clear why not.

Because the Windows version of the CPDN application doesn't crash after the first month, and doesn't produce the huge crash dump.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,732,827,502
RAC: 880,227
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 47898 - Posted: 19 Sep 2017 | 17:38:23 UTC - in response to Message 47895.

All reported cases of "won't connect, won't run, won't even run with old version" have now been traced to a newly released batch (batch 658) of CPDN climate models - sprecifically, WAH2 for the PNW region. These tasks all fail after one simulation month under Linux and OS X (CPDN are trying to track down the reason for that - their problem). When the tasks crash, they leave behind a huge crash dump in stderr_txt, and 51 failed upload messages.

BOINC - all current versions - can't cope with that much error information, and fails with the symptoms described here. There are two known recovery routes:[...]

I am pretty sure, that this was the problem in my case, as I am running 14 WUs of climateprediction.net alongside of gpugrid.net.

It is not the first time, that climateprediction.net shut down one of my computers, because the model crashes.

But as I wanted to install Lubuntu 17.04 and overclock my RAM anyway, I was quick to install everything anew.

And now it works without any problems for three days. I will handpick the WUs of climateprediction.net at this moment.

Post to thread

Message boards : Number crunching : BOINC manager v7.8.2 has been released

//