Advanced search

Message boards : Number crunching : Strange host

Author Message
Profile tito
Send message
Joined: 21 May 09
Posts: 22
Credit: 1,532,138,678
RAC: 5,988,067
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 61661 - Posted: 10 Aug 2024 | 5:56:42 UTC
Last modified: 10 Aug 2024 | 5:57:05 UTC

May somebody give advice how is it possible that host no2 on top list compute ATMML WU under 500sec?
Additionally it's GPU is listed as 1660S
https://www.gpugrid.net/show_host_detail.php?hostid=624047

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1333
Credit: 7,237,642,459
RAC: 12,736,137
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61662 - Posted: 10 Aug 2024 | 7:40:16 UTC - in response to Message 61661.

May somebody give advice how is it possible that host no2 on top list compute ATMML WU under 500sec?
Additionally it's GPU is listed as 1660S
https://www.gpugrid.net/show_host_detail.php?hostid=624047

Yes, something strange going on here, I agree.
Somebody has found a way to 'game' the system.

roundup
Send message
Joined: 11 May 10
Posts: 63
Credit: 7,587,505,193
RAC: 45,438,853
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61663 - Posted: 10 Aug 2024 | 11:04:10 UTC - in response to Message 61662.

Somebody has found a way to 'game' the system.


Who is that? ononoki?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,127,912,024
RAC: 16,985,962
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61664 - Posted: 10 Aug 2024 | 22:27:12 UTC - in response to Message 61661.

May somebody give advice how is it possible that host no2 on top list compute ATMML WU under 500sec?

May be some kind of misconfiguration at this host is causing that its ATMML tasks are jumping directly from the environment extracting phase to the end, skipping the long machine-learning phase.
Perhaps, the main question is: Are these tasks helping to the Science involved, or not?
If not, their utility would reduce to an amazing RAC raising to that "strange" host...

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,127,912,024
RAC: 16,985,962
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61665 - Posted: 11 Aug 2024 | 10:50:43 UTC

Talking about strange hosts, I've also noticed some of them at current Hosts ranking positions 11, 12, 13, 15, 17, 20 and 27.
They all indicate "[40] NVIDIA NVIDIA TITAN V (4095MB)"
This would add up to a total of 7x40=280 NVIDIA TITAN V graphics cards for some anonymous owner(s)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1333
Credit: 7,237,642,459
RAC: 12,736,137
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61666 - Posted: 11 Aug 2024 | 14:39:28 UTC - in response to Message 61665.

More likely they are one owner using cloud instance rentals. They could also be spoofing the coproc_info.xml file to report 40 cards on each host. I don't see how a 8 core cpu could support that many real cards.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1065
Credit: 40,231,533,983
RAC: 11,345
Level
Trp
Scientific publications
wat
Message 61667 - Posted: 11 Aug 2024 | 15:30:00 UTC - in response to Message 61665.

Talking about strange hosts, I've also noticed some of them at current Hosts ranking positions 11, 12, 13, 15, 17, 20 and 27.
They all indicate "[40] NVIDIA NVIDIA TITAN V (4095MB)"
This would add up to a total of 7x40=280 NVIDIA TITAN V graphics cards for some anonymous owner(s)


these are hosts from members of TSBT. either PecosRiver or Megacruncher or both.

they are spoofing the GPU count, which is very easy to do. each host probably only has 1 or 2 Titan Vs. all of the platforms are fairly old/low-end and wouldnt support more than a couple Titan Vs anyway. their production doesnt seem weird or overly impressive for what 1 or 2 titan Vs could do since QChem is very good for strong FP64 cards. they're getting a lot of errors from low VRAM tho.

spoofing the GPUs to such a high number is a holdover from SETI, where you could spoof up to 64 GPUs and get proportionally more tasks. I don't think any other project these days will react the same way to that extent. both Einstein and GPUGRID will cap the effective GPU count (what's used for scheduling decisions) to just 8 GPUs and any number above that does not count for getting more work. GPUGRID used to only give you 2 tasks per GPU, but I think they changed that a month or two ago to 4 tasks per GPU.


____________

pututu
Send message
Joined: 8 Oct 16
Posts: 25
Credit: 3,426,001,869
RAC: 35,302,968
Level
Arg
Scientific publications
watwatwatwat
Message 61668 - Posted: 11 Aug 2024 | 18:38:29 UTC

Talking about strange hosts, this is more of a compliment to user wscr http://www.gpugrid.net/show_user.php?userid=5728 with two GTX 1660 Super hosts with 6GB vram running qchem with low OOM failures.

http://www.gpugrid.net/results.php?hostid=598633&offset=0&show_names=0&state=0&appid=47 ~2% error rate in qchem


http://www.gpugrid.net/results.php?hostid=227353&offset=0&show_names=0&state=0&appid=47 ~ 1% error rate in qchem.

wscr other RTX 2070 below also has ~1 % error in qchem http://www.gpugrid.net/results.php?hostid=618605&offset=40&show_names=0&state=0&appid=47 but his/her other 2070 has high error.

Kudos!

Dmit
Send message
Joined: 12 Sep 10
Posts: 8
Credit: 155,670,524
RAC: 2,402
Level
Ile
Scientific publications
watwatwat
Message 61669 - Posted: 11 Aug 2024 | 22:49:41 UTC
Last modified: 11 Aug 2024 | 22:56:23 UTC

I crunched one very short ATMML task in the beginning of them, ended in similar very short time, before that - crunched 24/7 for weeks ACEMD3 tasks only, because my old GPU not supported by Quantum Chemistry tasks.

Freewill
Send message
Joined: 18 Mar 10
Posts: 20
Credit: 25,518,257,894
RAC: 155,376,400
Level
Trp
Scientific publications
watwatwatwatwat
Message 61673 - Posted: 15 Aug 2024 | 18:40:55 UTC

The host that started this thread is active again after taking a few days off. Same output in the stderr file. The line "tar: run.log: file changed as we read it" may be what triggers skipping over the science? Seems like someone (I'm guessing ononoki owns this PC). has an issue or a great hack for points on their system, but that's beyond my skillset.

https://www.gpugrid.net/results.php?hostid=624047

https://www.gpugrid.net/result.php?resultid=35656131

+ tar cjvf output.tar.bz2 run.log r0/QB_A12_A01.out r1/QB_A12_A01.out r10/QB_A12_A01.out r11/QB_A12_A01.out r12/QB_A12_A01.out r13/QB_A12_A01.out r14/QB_A12_A01.out r15/QB_A12_A01.out r16/QB_A12_A01.out r17/QB_A12_A01.out r18/QB_A12_A01.out r19/QB_A12_A01.out r2/QB_A12_A01.out r20/QB_A12_A01.out r21/QB_A12_A01.out r3/QB_A12_A01.out r4/QB_A12_A01.out r5/QB_A12_A01.out r6/QB_A12_A01.out r7/QB_A12_A01.out r8/QB_A12_A01.out r9/QB_A12_A01.out r0/QB_A12_A01.dcd r1/QB_A12_A01.dcd r10/QB_A12_A01.dcd r11/QB_A12_A01.dcd r12/QB_A12_A01.dcd r13/QB_A12_A01.dcd r14/QB_A12_A01.dcd r15/QB_A12_A01.dcd r16/QB_A12_A01.dcd r17/QB_A12_A01.dcd r18/QB_A12_A01.dcd r19/QB_A12_A01.dcd r2/QB_A12_A01.dcd r20/QB_A12_A01.dcd r21/QB_A12_A01.dcd r3/QB_A12_A01.dcd r4/QB_A12_A01.dcd r5/QB_A12_A01.dcd r6/QB_A12_A01.dcd r7/QB_A12_A01.dcd r8/QB_A12_A01.dcd r9/QB_A12_A01.dcd
tar: run.log: file changed as we read it
+ true
+ echo 'Save restart'
+ tar cjvf restart.tar.bz2 r0/QB_A12_A01_ckpt.xml r1/QB_A12_A01_ckpt.xml r10/QB_A12_A01_ckpt.xml r11/QB_A12_A01_ckpt.xml r12/QB_A12_A01_ckpt.xml r13/QB_A12_A01_ckpt.xml r14/QB_A12_A01_ckpt.xml r15/QB_A12_A01_ckpt.xml r16/QB_A12_A01_ckpt.xml r17/QB_A12_A01_ckpt.xml r18/QB_A12_A01_ckpt.xml r19/QB_A12_A01_ckpt.xml r2/QB_A12_A01_ckpt.xml r20/QB_A12_A01_ckpt.xml r21/QB_A12_A01_ckpt.xml r3/QB_A12_A01_ckpt.xml r4/QB_A12_A01_ckpt.xml r5/QB_A12_A01_ckpt.xml r6/QB_A12_A01_ckpt.xml r7/QB_A12_A01_ckpt.xml r8/QB_A12_A01_ckpt.xml r9/QB_A12_A01_ckpt.xml
2024-08-16 00:29:43 (36298): bin/bash exited; CPU time 198.063084
2024-08-16 00:29:43 (36298): called boinc_finish(0)

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 9,127,912,024
RAC: 16,985,962
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61674 - Posted: 16 Aug 2024 | 6:46:57 UTC - in response to Message 61673.

May be Project Scientists have something to say about this matter.
If tasks processed by this host were not useful for what they are intended, it would be disturbing by "burning" tasks instead of "crunching" them...

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1333
Credit: 7,237,642,459
RAC: 12,736,137
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61675 - Posted: 16 Aug 2024 | 7:09:33 UTC - in response to Message 61674.

So far the project scientist hasn't done anything about this host other than confirming that the host is only producing 'garbage' results.

Not sure why inaction is the only current decision. Maybe they are not concerned because those bad results don't ever corrupt the science.

Profile tito
Send message
Joined: 21 May 09
Posts: 22
Credit: 1,532,138,678
RAC: 5,988,067
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 61676 - Posted: 18 Aug 2024 | 6:45:11 UTC - in response to Message 61675.

...

Not sure why inaction is the only current decision. Maybe they are not concerned because those bad results don't ever corrupt the science.


But it corrupts our society.
BTW - is there any thread regarding credits given here on GPUGrid? They look insane high comparing to other projects (like Collatz years ago).

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1333
Credit: 7,237,642,459
RAC: 12,736,137
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61677 - Posted: 18 Aug 2024 | 14:28:26 UTC - in response to Message 61676.

Unless a project sticks to box stock, broken CreditNew BOINC credit algorithm, credit awarding is entirely arbitrary depending on what project admins decide.

An admin can award high task credit to increase the 'attractiveness' of their project in hope it will gain more volunteer participation and increase their science production.

Pascal
Send message
Joined: 15 Jul 20
Posts: 74
Credit: 1,235,722,434
RAC: 10,517,148
Level
Met
Scientific publications
wat
Message 61678 - Posted: 18 Aug 2024 | 16:27:50 UTC - in response to Message 61677.

il suffit de regarder le classement mondial boinc avec bitcoin utopia,des personnes comme moi qui fait tourner un pc toute la journée ne peuvent rivaliser.

Just look at the world ranking boinc with bitcoin utopia, people like me who runs a pc all day can’t compete.

____________

Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 21 Dec 23
Posts: 46
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 61679 - Posted: 19 Aug 2024 | 8:46:01 UTC

Hello, we are looking into this! thanks

pututu
Send message
Joined: 8 Oct 16
Posts: 25
Credit: 3,426,001,869
RAC: 35,302,968
Level
Arg
Scientific publications
watwatwatwat
Message 61680 - Posted: 19 Aug 2024 | 19:02:17 UTC
Last modified: 19 Aug 2024 | 19:04:43 UTC

Another host with GTX960 with 5.x cc but can ATMML task runs with version 5.x cc (unless the coporoc_info.xml was modified,)? Over the past 24-48 hours, the run time varies from 122khrs and seems to stabilize with shorter run time.
https://www.gpugrid.net/results.php?hostid=550055&offset=0&show_names=0&state=3&appid=

Freewill
Send message
Joined: 18 Mar 10
Posts: 20
Credit: 25,518,257,894
RAC: 155,376,400
Level
Trp
Scientific publications
watwatwatwatwat
Message 61681 - Posted: 19 Aug 2024 | 19:23:21 UTC - in response to Message 61680.

Another host with GTX960 with 5.x cc but can ATMML task runs with version 5.x cc (unless the coporoc_info.xml was modified,)? Over the past 24-48 hours, the run time varies from 122khrs and seems to stabilize with shorter run time.
https://www.gpugrid.net/results.php?hostid=550055&offset=0&show_names=0&state=3&appid=

Wow, I was focused on buying newer gpus, but it seems very old ones are the way to go...perhaps with some kind of hack. Just kidding. I hope Steve can find and plug this problem.

Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 21 Dec 23
Posts: 46
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 61682 - Posted: 20 Aug 2024 | 8:44:50 UTC - in response to Message 61681.

Hello. We have identified the problem and it has been fixed in our code. The next round of WUs should not have this problem.

This is not any sort of hack. It is just a case of some specific error types (that occur with old GPUs we do not have available locally to test on) not raising a proper error code and slipping through the validation.

pututu
Send message
Joined: 8 Oct 16
Posts: 25
Credit: 3,426,001,869
RAC: 35,302,968
Level
Arg
Scientific publications
watwatwatwat
Message 61683 - Posted: 20 Aug 2024 | 13:59:08 UTC - in response to Message 61682.

Hello. We have identified the problem and it has been fixed in our code. The next round of WUs should not have this problem.

This is not any sort of hack. It is just a case of some specific error types (that occur with old GPUs we do not have available locally to test on) not raising a proper error code and slipping through the validation.



Thanks. Isn't the GTX 1660 Super is technically a newer card (turing) than GTX 1080 that you were using for the testing unless the host has modified coproc_info.xml?

Profile tito
Send message
Joined: 21 May 09
Posts: 22
Credit: 1,532,138,678
RAC: 5,988,067
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 61684 - Posted: 20 Aug 2024 | 14:30:33 UTC - in response to Message 61682.

Good to hear.

pututu
Send message
Joined: 8 Oct 16
Posts: 25
Credit: 3,426,001,869
RAC: 35,302,968
Level
Arg
Scientific publications
watwatwatwat
Message 61786 - Posted: 9 Sep 2024 | 19:23:47 UTC
Last modified: 9 Sep 2024 | 19:26:11 UTC

@Steve, can a GTX 950 run ATMML task? I think the "bug" that was hilited a few weeks ago may not be fully resolved. See this host:https://www.gpugrid.net/results.php?hostid=421254&offset=0&show_names=0&state=3&appid=:

Though there are more failures, those that were validated completed much faster than say a RTX4090 on average.

Post to thread

Message boards : Number crunching : Strange host

//