More CPU jobs

Message boards : News : More CPU jobs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 50311 - Posted: 28 Aug 2018, 12:59:34 UTC
Last modified: 28 Aug 2018, 13:00:12 UTC

Actually 14 out of the total 17 failures are on your machines Thomas so it might be specific to your case. Generally they seem ok.
They should use only 4GB of memory each WU.
ID: 50311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50312 - Posted: 28 Aug 2018, 13:00:45 UTC - in response to Message 50308.  
Last modified: 28 Aug 2018, 13:20:56 UTC

It's running at 1.8 GHz and I have a 1220 Opteron in my drawers at 2.8 GHz. It's been running since January 2008.My electricity costs me 0.21 euro /kWh and I have 3 computers running 24/7, this Opteron, an AMD E-450 and a A10-6700 which should have 4 cores but Windows Task Manager says 2 cores and 4 logical processors. My total electricity expenditure is about 60 euro/month.
Tullio
I forgot to mention my ulefone smart phone with its arm64-v8a CPU running Android 7.1.1 on SETI@home and Einstein@home.
ID: 50312 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan

Send message
Joined: 25 Mar 09
Posts: 25
Credit: 582,385
RAC: 0
Level
Gly
Scientific publications
wat
Message 50314 - Posted: 28 Aug 2018, 13:02:27 UTC

I have an Intel 8 core (16 thread) Xeon server that has a 146 GB disk drive (has 2 of them but one died). It also has 24GB RAM.
WUs are allowed to run with 8 cores.

I am getting the message that I need 28610.23 MB Disk Space, I currently have 9486.42 MB spare, so it needs another 19123.81 MB of Disk Space.

I leave 10GB that BOINC can't use, other programmes use 12.69 GB, BOINC is using 17.02 GB.

Of that 17.02 GB that BOINC is using, GPU Grid is using 8.29 GB, even when it is not running anything.

If I allow all my spare space to be used I would just have enough disk space for GPUGrid to run (maybe), however I don't intend to give all that space to BOINC so I can't download and run some of these work units.

If they are 6GB then there is no problem.

Why does GPUGrid need over 8 GB of disk space just to hold the project files?
(I have another computer that is showing the same amount of used disk space so this is normal amount used by the project but Why?

(My other computer has a much larger Disk so is not having the same issues).

Conan
ID: 50314 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 50316 - Posted: 28 Aug 2018, 13:33:33 UTC - in response to Message 50314.  

@Conan: the QM calculations need to store lots of data in memory for best performance. Since we cannot ask for 20GB of RAM the software instead writes any amount of calculation data that exceeds the RAM limit (4GB) to the hard drive.
ID: 50316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 50317 - Posted: 28 Aug 2018, 13:48:55 UTC - in response to Message 50316.  

The current "disk limit" for CPU jobs is set at 20 GB. This is a ballpark estimate to accommodate both the software and libraries (largish by themselves) and the temporary (scratch) data.

The software is reused between WUs, but you can reclaim the space by resetting the project. The scratch space is only occupied when a WU is running (or paused).
ID: 50317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 50318 - Posted: 28 Aug 2018, 13:49:59 UTC - in response to Message 50309.  

new WUs don't seem to work: they consume a lot of memory, throw computation errors or just rest at 10% progress forever.


On your failures I see "connection errors". Could be firewall filtering, or the like.
ID: 50318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50323 - Posted: 28 Aug 2018, 17:10:17 UTC

First SELE task done by my Old Faithful Opteron 1210 running SuSE Linux Leap 42.3.
Tullio
ID: 50323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50335 - Posted: 30 Aug 2018, 8:23:47 UTC

I have a funny SELE task on my Linux laptop. It is stuck at 10% after 14 hours 38 min, but the remaining estimated time is rising to more than 5 days. All seems normal by the "top" command and it has lots of disk space.
Tullio
ID: 50335 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 10 Sep 10
Posts: 163
Credit: 388,132
RAC: 0
Level

Scientific publications
wat
Message 50336 - Posted: 30 Aug 2018, 9:30:10 UTC - in response to Message 50318.  

new WUs don't seem to work: they consume a lot of memory, throw computation errors or just rest at 10% progress forever.


On your failures I see "connection errors". Could be firewall filtering, or the like.


No firewall here.
And same problem.
ID: 50336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DRSMT

Send message
Joined: 23 Feb 17
Posts: 21
Credit: 5,528,199,475
RAC: 0
Level
Tyr
Scientific publications
watwatwatwat
Message 50337 - Posted: 30 Aug 2018, 10:29:37 UTC - in response to Message 50336.  

As said, those WUs do not work properly. I am away for another project and come back, if they are fixed.
ID: 50337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan

Send message
Joined: 25 Mar 09
Posts: 25
Credit: 582,385
RAC: 0
Level
Gly
Scientific publications
wat
Message 50343 - Posted: 30 Aug 2018, 12:32:31 UTC
Last modified: 30 Aug 2018, 12:32:58 UTC

OK, thanks Toni and Stefan for the information, that explains a lot.

I will run what I can.

Thanks again
Conan
ID: 50343 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50347 - Posted: 30 Aug 2018, 13:41:55 UTC
Last modified: 30 Aug 2018, 13:42:26 UTC

In the slot of a running task there is an output directory which leads to a report of what the program is doing in physical terms. Maybe some explanation by the admins would be welcome.
Tullio
ID: 50347 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 50353 - Posted: 31 Aug 2018, 8:19:05 UTC

We investigated another algorithm which doesn't use scratch disk space. Unfortunately on my test it was 13x slower than the one that uses disk (25 minutes became 5:30 hours).
So it is not a realistic choice for us. After this batch of simulations I will probably have to submit more which will use more scratch disk up to 30GB so I assume we are going to fill up some disks.
ID: 50353 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50354 - Posted: 31 Aug 2018, 8:31:45 UTC - in response to Message 50353.  
Last modified: 31 Aug 2018, 8:47:07 UTC

I got plenty of disk space on my two Linux boxes because the slots directory is in my /home/user partition,which has more than 700 GB on my SuSE Linux Leap 42.3 and Leap 15.0 OS. What amazes me is that QC tasks are always stuck at 10% progress while GPU tasks show progress as increasing.
Tullio
ID: 50354 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 50355 - Posted: 31 Aug 2018, 10:33:02 UTC - in response to Message 50354.  
Last modified: 31 Aug 2018, 10:34:25 UTC

I got plenty of disk space on my two Linux boxes because the slots directory is in my /home/user partition,which has more than 700 GB on my SuSE Linux Leap 42.3 and Leap 15.0 OS. What amazes me is that QC tasks are always stuck at 10% progress while GPU tasks show progress as increasing.
Tullio


The 10% progress is explained as follows: updating (if necessary) the app is 10%, and usually happens immediately. The remaining 90% advances when molecules are calculated (e.g. 5 molecules = 90%/5 increments). However very big WUs have only one molecule, so no apparent progress until the end. (We have no finer grain progress).
ID: 50355 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Chilean
Avatar

Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50389 - Posted: 4 Sep 2018, 12:27:11 UTC - in response to Message 50355.  

I got plenty of disk space on my two Linux boxes because the slots directory is in my /home/user partition,which has more than 700 GB on my SuSE Linux Leap 42.3 and Leap 15.0 OS. What amazes me is that QC tasks are always stuck at 10% progress while GPU tasks show progress as increasing.
Tullio


The 10% progress is explained as follows: updating (if necessary) the app is 10%, and usually happens immediately. The remaining 90% advances when molecules are calculated (e.g. 5 molecules = 90%/5 increments). However very big WUs have only one molecule, so no apparent progress until the end. (We have no finer grain progress).


So how much space do these WUs need? I'm running 12 at a time with 64 GB of RAM, but no swap space. I see that not all 48 threads are at 100%, I'm thinking it's the lack of swap.
ID: 50389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50390 - Posted: 4 Sep 2018, 12:45:44 UTC - in response to Message 50389.  

In the old UNIX days a rule of thumb was that you needed a swap space twice the RAM, which was usually small.Now RAM is plenty. I got 22 GB RAM on the Windows 10 PC, and 8 GB RAM on each Linux box. GGPUGRID CPU tasks use some swap but most is not used.
tullio
ID: 50390 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Chilean
Avatar

Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50391 - Posted: 4 Sep 2018, 13:01:41 UTC

I amped the swap to 300GB, but it only seems to be using RAM. Is this "scratch space" used in swap space or does the WU use the file directory for storage? I'm thinking it is the latter since the BOINC space usage goes up and down.

Thing is my install directory is only 120GB...

I also have this feeling that now I have 300GB of swap space for nothing lol. I am not a smart man.
ID: 50391 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50392 - Posted: 4 Sep 2018, 13:19:06 UTC - in response to Message 50391.  

I see temporary files in the slots/0 directory They are named psi.25019.number
Tullio
ID: 50392 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 50393 - Posted: 4 Sep 2018, 15:59:03 UTC
Last modified: 4 Sep 2018, 15:59:17 UTC

Yes afaik it doesn't use swap space, so increasing that will not help. It's probably where Tullio mentioned. The files are called `psi.XXXXX.XX`. Usually there are two and the second can grow significantly.
ID: 50393 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : News : More CPU jobs

©2025 Universitat Pompeu Fabra