WU: OPM simulations

Message boards : News : WU: OPM simulations
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

AuthorMessage
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43458 - Posted: 20 May 2016, 15:35:00 UTC - in response to Message 43456.  
Last modified: 20 May 2016, 15:35:22 UTC

Interesting that most of the failures were from fast GPUs, even 3x 980Ti and a Titan among others. Are people OCing to much? In the "research" I mentioned above I've noticed MANY 980Ti, Titan and Titan X cards throwing constant failures. Surprised me to say the least.
There are different reasons for those failures as missing libraries, overclocking, wrong driver installation.
The reason of timeouts are: too slow card and/or too many GPU tasks queued from different projects.
ID: 43458 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43461 - Posted: 20 May 2016, 20:29:12 UTC - in response to Message 43457.  

I have some similar experiences:
e5s22_e1s14p0f264-GERARD_FXCXCL12R_2189739_2-0-1-RND1099 5 days:
1. Jonny's desktop with i7-3930K and two GTX 780s it has 4 successive timeouts

Here's an interesting one:

https://www.gpugrid.net/workunit.php?wuid=11593078

I'm the 8th user to receive this "SHORT" OPM WU originally issued on May 9. The closest to success was by a GTX970 (until the user aborted it). Now it's running on one of my factory OCed 750 Ti cards. That card finishes the GERARD LONG WUs in 25-25.5 hours (yeah, cry me a river). This "SHORT" WU is 60% done and should complete with a total time of about 27 hours.

Show me a GPU that can finish this WU in anywhere near 2-3 hours and I'll show you a fantasy world where unicorns romp through the streets.
ID: 43461 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt
Avatar

Send message
Joined: 11 Jan 13
Posts: 216
Credit: 846,538,252
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43462 - Posted: 20 May 2016, 21:23:33 UTC - in response to Message 43450.  

Ubuntu 16.04 has been released recently. I'm looking to try it soon and see if there is a simple way to get it up and running for here; repository drivers + Boinc from the repository. If I can I will write it up. Alas, with every version so many commands change and new problems pop up that it's always a learning process.


Straying a bit off topic again, I'll risk posting this. I consider myself fairly computer literate, having built several PCs and having a little coding experience. However, I have nearly always used Windows. I've been very interested in Linux, but every time I've tried to set up a Linux host for BOINC I've been defeated. Either I couldn't get GPU drivers installed correctly or BOINC was somehow not set up correctly within Linux. If anyone would be willing to put together a step-by-step "Idiot's Guide" it would be HUGELY appreciated.
ID: 43462 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43464 - Posted: 20 May 2016, 22:44:28 UTC
Last modified: 20 May 2016, 22:46:54 UTC

ID: 43464 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43465 - Posted: 21 May 2016, 1:47:55 UTC - in response to Message 43462.  

I've been very interested in Linux, but every time I've tried to set up a Linux host for BOINC I've been defeated. Either I couldn't get GPU drivers installed correctly or BOINC was somehow not set up correctly within Linux.

Something always goes wrong for me too, and I question my judgement for trying it once again. But I think when Mint 18 comes out, it will be worth another go. It should be simple enough (right).
ID: 43465 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43466 - Posted: 21 May 2016, 8:54:07 UTC - in response to Message 43465.  
Last modified: 21 May 2016, 9:39:50 UTC

The Error Rate for the latest GERARD_FX tasks is high and the OPM simulations were higher. Perhaps this should be looked into.
_Application_ 	_unsent_ 	In Progress    Success 	Error Rate
Short runs (2-3 hours on fastest card)
SDOERR_opm99	0	60	2412	48.26%

Long runs (8-12 hours on fastest card)
GERARD_FXCXCL12R_1406742_	0	33	573	38.12%
GERARD_FXCXCL12R_1480490_	0	31	624	35.34%
GERARD_FXCXCL12R_1507586_	0	25	581	33.14%
GERARD_FXCXCL12R_2189739_	0	42	560	31.79%
GERARD_FXCXCL12R_50141_	        0	35	565	35.06%
GERARD_FXCXCL12R_611559_	0	31	565	32.09%
GERARD_FXCXCL12R_630477_	0	34	561	34.31%
GERARD_FXCXCL12R_630478_	0	44	599	34.75%
GERARD_FXCXCL12R_678501_	0	30	564	40.57%
GERARD_FXCXCL12R_747791_	0	32	568	36.89%
GERARD_FXCXCL12R_780273_	0	42	538	39.28%
GERARD_FXCXCL12R_791302_	0	37	497	34.78%

2 or 3 weeks ago the error rate was ~25% to 35% it's now ~35% to 40% - Maybe this varies due to release stage; early in the runs tasks go to everyone so have higher error rates, later more go to the most successful cards so the error rate drops?

Selection of more choice systems might have helped with the OPM but that would also have masked the problems too. GPUGrid has always been faced with user's Bad Setup problems. If you have 2 or 3 GPU's in a box and dont use temp controlling/fan controlling software, or if you overclock the GPU or GDDR too much there is little the project can do about that (at least now). It's incredibly simple to install a program such as NVIDIA Inspector and set it to prioritise temperature, yet so few do this. IMO the GPUGrid app should by default set the temperature control. However that's an app dev issue and probably isn't something Stefan has the time to work on, even if he could do it.

I've noticed some new/rarely seen before errors with these WU's, so perhaps that could be looked at too?

On the side-show to this thread 'Linux' (as it might get those 25/26h runs below 24h), the problem is that lots of things change with each version and that makes instructions for previous versions obsolete. Try to follow the instructions tested under Ubuntu 11/12/13 while working with 15.4/10 and you will probably not succeed - the short-cuts have changed the commands have changed the security rights have changed, the repo drivers are too old...
I've recently tried to get Boinc on a Ubuntu 15.10 system to see an NV GPU that I popped in without success. Spent ~2 days at this on and off. Systems sees the card, X-server works fine. Boinc just seems oblivious. Probably some folder security issue. Tried to upgrade to 16.04 only to be told (after downloading) that the (default sized) boot partition is too small... Would probably need to boot into Grub to repartition - too close to brain surgery to go down that route. Thought it would be faster and easier to format and install 16.04. Downloaded an image onto a W10 system, but took half a day to find an external DVD-Writer and still can't find a DVD I can write the image to (~1.4GB)...
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43466 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43467 - Posted: 21 May 2016, 9:21:10 UTC - in response to Message 43466.  
Last modified: 21 May 2016, 9:42:02 UTC

If the project denied WUs to machines that continually errored and timed out we could have that error rate below 5%.

And here's another one https://www.gpugrid.net/workunit.php?wuid=11600492
ID: 43467 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43469 - Posted: 21 May 2016, 9:49:09 UTC - in response to Message 43467.  
Last modified: 21 May 2016, 9:56:03 UTC

I agree on that, no doubt, but where do you draw the line? 50% failure systems, 30% 10%...? I still think a better system needs to be introduced to exclude bad systems until the user responds to a PM/Notice/Email... It could accommodate resolution to such problems and facilitate crunching again once resolved (helps the cruncher and the project). Sometimes you just get an unstable system on which every task fails until it is restarted and then it works again fine, but even that could and should be accommodated. More often it's a bad setup; wrong drivers, heavy OC/bad cooling, wrong config/ill-advise use, but occasionally the card's a dud or it's something odd/difficult to work out.

Some time ago I suggested having a test app which could be sent to such systems, say after a reboot/user reconfiguration. The purpose would be to test that the card/system is actually capable of running a task. A 10min Test task would be sufficient to assess the system's basic capabilities. After that 1 task could be sent to the system and if it succeeded in completing that tasks it's status could then go back to normal, or say 50% successful.

IMO the Boinc system for this was devised for CPU's and isn't good enough for GPU's so this should be done either in some sort of GPU module or by the project.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43469 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
sis651

Send message
Joined: 25 Nov 13
Posts: 66
Credit: 282,724,028
RAC: 62
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwat
Message 43470 - Posted: 21 May 2016, 9:53:32 UTC - in response to Message 43462.  

I've been using Kubuntu for 2 - 3 years.
I downloaded Boinc from here, development version 7.4.22:
https://boinc.berkeley.edu/download_all.php

I just install the nvidia drivers from driver manager page of system settings, those are drivers in the Ubuntu package repository. Sometimes they're not up to date but works fine. Or sometimes I use Muon package manager to install some more nvidia related packages.

I use Nvidia Optimus supported notebook, which means Nvidia GPU is secondaryy and just renders an image and sends it to Intel GPU to be displayed on the screen. Thus I use Prime package and Bumblebee packages. Configuring them sometimes may be problematic, but usually no issues and once done works for months until the next Kubuntu version. In fact the issue is Boinc is run on the Nvidia GPU but CUDA detection doesn't happens. But by installing some other packages and some more commands everything works well...

I can try to help in case you try with Ubuntu/Kubuntu.
ID: 43470 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43471 - Posted: 21 May 2016, 9:58:08 UTC - in response to Message 43469.  
Last modified: 21 May 2016, 10:05:53 UTC

I agree on that, no doubt, but where do you draw the line? 50% failure systems, 30% 10%...? I still think a better system needs to be introduced to exclude bad systems until the user responds to a PM/Notice/Email... It could accommodate resolution to such problems and facilitate crunching again once resolved (helps the cruncher and the project). Sometimes you just get an unstable system on which every task fails until it is restarted and then it works again fine, but even that could and should be accommodated. More often it's a bad setup; wrong drivers, heavy OC/bad cooling, wrong config/ill-advise use, but occasionally the card's a dud or it's something odd/difficult to work out.


I think you would have to do an impact assessment on the project on what level of denial produces benefits and at what point that plateaus and turns into a negative impact. With the data that this project already has, that shouldn't be difficult.

Totally agree with the idea of a test unit. Very good idea.

If it is only to last 10 minutes then it must be rigorous enough to make a bad card/system fail very quickly and it must have a short completion deadline.
ID: 43471 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43474 - Posted: 21 May 2016, 10:44:35 UTC - in response to Message 43471.  
Last modified: 21 May 2016, 11:08:32 UTC

The majority of failures tend to be almost immediate <1min. If the system could deal with those it would be of great benefit, even if it can't do much more.

Maybe with a 2 Test system you could set the 1st task with high priority (run ASAP) to test actual functionality. With the second Test task set a deadline of 2days but send a server abort after 1day to exclude people who keep a long queue? They would never run the 2nd task and that would prevent many bad systems from hogging tasks, which is the second biggest problem IMO. A Notice/email/PM & FAQ recommendation would give them the opportunity to reduce their queue/task cache.

Heat/cooling/OC related failures would take a bit longer to identify but cards heat up quickly if they are not properly cooled. How long they run before failing is a bit random but would increase with time. Unfortunately you also get half-configured systems; 3 cards, two setup properly, one cooker. What else is running would also impact on temp, but dealing with situations like that isn't a priority.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43474 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43475 - Posted: 21 May 2016, 11:16:40 UTC - in response to Message 43474.  
Last modified: 21 May 2016, 11:17:39 UTC


Maybe with a 2 Test system you could set the 1st task with high priority to test actual functionality. With the second Test task set a deadline of 2days but send a server abort after 1day to exclude people who keep a long queue? They would never run the 2nd task but that would prevent people hogging tasks, which is the second biggest problem IMO. A Notice/email/PM would give them the opportunity to reduce their cache.


Once again I totally agree and to address your other question

I agree on that, no doubt, but where do you draw the line? 50% failure systems, 30% 10%...?


I think you have to be brutal in your approach and give this project "high standards" instead of "come one come all". This project is already an elite one based on the core contributors and the money, time and effort they put into it.

Bad hosts hog WU's, slow results and deprive good hosts of work which, frustrates good hosts which, you may lose and may keep new ones from joining so "raise the bar" and turn this into a truly elite project, we all know people want to go to TOP clubs, restaraunts, universities etc.

Heat/cooling/OC related failures would take a bit longer to identify but cards heat up quickly if they are not properly cooled. How long they run before failing is a bit random but would increase with time. Unfortunately you also get half-configured systems; 3 cards, two setup properly, one cooker. What else is running would also impact on temp, but dealing with situations like that isn't a priority.


They can get help via the forums as usual but as far as the project is concerned you can't make their problem your problem.
ID: 43475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43476 - Posted: 21 May 2016, 12:21:47 UTC - in response to Message 43475.  

Some kind of test is a very good idea, but it have to be done on a regular basis on every hosts, even on the reliable ones, as I think this test should watch for GPU temperatures as well, and if GPU temps are too high (above 85°C) then the given host should be excluded.

I agree on that, no doubt, but where do you draw the line? 50% failure systems, 30% 10%...?

I think you have to be brutal in your approach and give this project "high standards" instead of "come one come all". This project is already an elite one based on the core contributors and the money, time and effort they put into it.
The actual percentage could be set by scientific means based on the data available for the project, but there should be a time limit for the ban and a manual override for the user. (and a regular re-evaluation of the banned hosts). I would set it to 10%.

Bad hosts hog WU's, slow results and deprive good hosts of work which, frustrates good hosts which, you may lose and may keep new ones from joining so "raise the bar" and turn this into a truly elite project, we all know people want to go to TOP clubs, restaurants, universities etc.
I agree partly. I think there should be a queue available only for "elite"=reliable&fast users (or hosts), but basically it should contain the same type of work as the "normal" queue, but the batches should be separated. In this way a part of the batches would finish earlier, or they can be a single-step workunits with very long (24h+ on a GTX 980 Ti) processing times.

Heat/cooling/OC related failures would take a bit longer to identify but cards heat up quickly if they are not properly cooled. How long they run before failing is a bit random but would increase with time. Unfortunately you also get half-configured systems; 3 cards, two setup properly, one cooker. What else is running would also impact on temp, but dealing with situations like that isn't a priority.

They can get help via the forums as usual but as far as the project is concerned you can't make their problem your problem.

Until our reliability assessment dreams come true (~never), we should find other means to reach the problematic contributors.
It should be made very clear right at the start (on the project's homepage, in the BOINC manager when a user tries to join the project, in the FAQ etc) the project's minimum requirements:
1. A decent NVidia GPU (GTX 760+ or GTX 960+)
2. No overclocking (later you can try, but read the forums)
3. Other GPU projects are allowed only as a backup (0 resource share) project.
Some tips should be broadcast by the project as a notice on a regular basis about the above 3 points. Also there should be someone/something who could send an email to the user who have unreliable host(s), or perhaps their username/hostname should be broadcast as a notice.
ID: 43476 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43477 - Posted: 21 May 2016, 12:55:51 UTC - in response to Message 43476.  
Last modified: 21 May 2016, 13:18:47 UTC

Communications would need to be automated IMO. Too big a task for admin and mods to perform manually; would be hundreds of messages daily. There is also a limit on how many PM's you and I can send. It's about 30/day for me, might be less for others? I trialled contacting people directly who are failing all workunits. From ~100 PM's I think I got about 3 replies, 1 was 6months later IIRC. Suggests ~97% of people attached don't read their PM's/check their email or they can't be bothered/don't understand how to fix their issues.

If the app could be configured to create a default temperature preference of say 69C that would save a lot of pain. If lots of the errors were down to cards simply not having enough memory to run the OPM's, which might be the case that's another app only fix issue.

Like the idea where Tips are sent to the Notices on a regular basis. Ideally this could be personalised but that would be down to Boinc central to introduce. IMO Log messages would be almost of zero use - most users rarely read the Boinc log files, if ever.

Perhaps a project requirement to log into the forums every month would help? It's not a set and forget project.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43478 - Posted: 21 May 2016, 14:14:48 UTC

It would be very helpful, to some (especially to me!), to see a notice returned from the project's server.

It could say: "In the past week, this host has had x failed tasks, and y tasks with instability warnings. Reliability is a factor that this project uses to determine which hosts get tasks. Click here for more info." I believe it's relatively easy for a project to do that.

Also, it would be nice if the project had a way to communicate this via email. A web preference, let's say, defaulting to being on. And it could evaluate all the hosts for a user, and send an email weekly or monthly, with a way to turn it off in the web preferences. I know I'd use it!

Regarding my particular scenario, I have 2 PCs with NVIDIA GPUs - Speed Racer has 2 GTX 980 Tis, and Racer X has a GTX 970 and 2 GTX 660 Tis. I overclock all 5 of the GPUs to the maximum stable clock, regardless of temps, such that I never see "Simulation has become unstable" (I check regularly). I run GPUGrid tasks 2-per-GPU as my primary project, but have several other backup projects. GPU temps are usually in the range of 65*C to 85*C, with a custom fan curve that hits max fan set at 70*C for GPU Boost v1 and 90*C for GPU Boost v2, with no problems completing tasks. So, I certainly don't want the notices to be based on temperature at all. :)

Until this notification scheme happens, I'll routinely monitor my own results to make sure my overclocks are good. If I ever see "Unstable" in a result, I downclock the GPU another 10 MHz. Note: Some of my recent GPUGrid failures are due to me testing the CPU overclocking limits of Speed Racer, he's only a few weeks old :)

That's my take!
ID: 43478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43479 - Posted: 21 May 2016, 14:31:00 UTC - in response to Message 43477.  
Last modified: 21 May 2016, 14:47:23 UTC

Everything is getting complicated again and unfortunately that's where people tune out and NOTHING gets done.

Use the principle "KISS" "Keep It Simple Stupid"

Exclude the hosts that need excluding, send them a PM and/or email if they don't respond they stay excluded. You have to bear in mind if PM or email does not garner a response then they are probably not interested and couldn't care less so they stay excluded, FULL STOP.

When you start getting "creative" with methodologies on how to re-interest, educate/inform these people you introduce problems and complications that need not be there.

Please remember there are HOT cards that produce perfectly good results, there are SLOW cards that are reliable and fast enough.

Unreliable hosts stick out like a sore thumb and can be dealt with easily and without recourse to changing BOINC or the GPUGrid app and if we keep it simple we MAY be able to convince administrators of GPUGrid to make the changes.
ID: 43479 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43483 - Posted: 21 May 2016, 15:30:20 UTC - in response to Message 43479.  

From personal experience it's usually the smaller and older cards that are less capable of running at higher temps. It's also the case that their safe temp limit is impacted by task type; some batches run hotter. I've seen cards that are not stable at even reasonable temps (65 to 70C) but run fine if the temps are reduced to say 59C (which while this isn't reasonable [requires downclocking] is still achievable). There was several threads here about factory overclocked cards not working out of the box, but they worked well when set to reference clocks, or their voltage nudged up a bit.

IF a default setting for temperature prioritization was linked to a Test app, then that could correct settings for people who don't really know what they are doing. The people who do can change what they like. In fact, if your settings are saved in something like MSI Afterburner they are likely to change automatically, certainly on a restart if you have saved your settings. If you just use NVidia inspector you can save a file and get it to automatically start when a user logs in (if you know what you are doing).
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43485 - Posted: 21 May 2016, 16:34:57 UTC - in response to Message 43466.  
Last modified: 21 May 2016, 17:13:28 UTC

On the side-show to this thread 'Linux' (as it might get those 25/26h runs below 24h), the problem is that lots of things change with each version and that makes instructions for previous versions obsolete. Try to follow the instructions tested under Ubuntu 11/12/13 while working with 15.4/10 and you will probably not succeed - the short-cuts have changed the commands have changed the security rights have changed, the repo drivers are too old...
I've recently tried to get Boinc on a Ubuntu 15.10 system to see an NV GPU that I popped in without success. Spent ~2 days at this on and off. Systems sees the card, X-server works fine. Boinc just seems oblivious. Probably some folder security issue. Tried to upgrade to 16.04 only to be told (after downloading) that the (default sized) boot partition is too small... Would probably need to boot into Grub to repartition - too close to brain surgery to go down that route. Thought it would be faster and easier to format and install 16.04. Downloaded an image onto a W10 system, but took half a day to find an external DVD-Writer and still can't find a DVD I can write the image to (~1.4GB)...

Lots of luck. By fortuitous (?) coincidence, my SSD failed yesterday, and I tried Ubuntu 16.04 this morning. The good news is that after figuring out the partitioning, I was able to get it installed without incident, except that you have to use a wired connection at first; the WiFi would not connect.

Even installing the Nvidia drivers for my GTX 960 was easy enough with the "System Setting" icon and then "Software Updates/Additional Drivers". That I thought would be the hardest part. Then, I went to "Ubuntu Software" and searched for BOINC. Wonder of wonders, it found it (I don't know which version), and it installed without incident. I could even attach to POEM, GPUGrid and Universe. We are home free, right?

Not quite. None of them show any work available, which is not possible. So we are back to square zero, and I will re-install Win7 when a new (larger) SSD arrives.

EDIT: Maybe I spoke too soon. The POEM website does show one work unit completed under Linux at 2,757 seconds, which is faster than the 3,400 seconds that I get for that series (1vii) under Windows. So maybe it will work, but it appears that you have to manage BOINC through the website; I don't see much in the way of local settings or information available yet. We will see.
ID: 43485 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43490 - Posted: 21 May 2016, 19:05:51 UTC - in response to Message 43485.  

Thanks Jim,
Great to know you can get Ubuntu 16.04 up and running for here (and other GPU based Boinc projects) easily.

There is a dwindling number of tasks available here. Only 373 in progress, which will keep falling to zero/1 until a new batch of tasks are released (possibly next week, but unlikely beforehand).

Einstein should have work if you can't pick up any at the other projects. Note however that it can take some time to get work as your system will not have a history of completing work and the tasks being sent out might be prioritised towards known good systems with fast turnaround times.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 43490 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43491 - Posted: 21 May 2016, 19:09:20 UTC - in response to Message 43485.  
Last modified: 21 May 2016, 19:44:54 UTC

More good news: BOINC downloaded a lot of Universe work units too.
More bad news: The one POEM work unit was the only one it ran. It would not process any more of them, or any of the Universe ones either. But Ubuntu did pop up a useful notice to the effect that Nvidia cards using CUDA 6.5 or later drivers won't work on CUDA or OpenCL projects. Thanks a lot. I wonder how it did complete the one POEM one?

Finally, I was able to remote in using the X11VNC server. One time that is. After that, it refused all further connections.

I will leave Linux to the experts and retire to Windows. Maybe Mint 18 will be more useful for me. One can always hope.
ID: 43491 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

Message boards : News : WU: OPM simulations

©2025 Universitat Pompeu Fabra