1000's of strange event messages

Message boards : Number crunching : 1000's of strange event messages
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile JStateson
Avatar

Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,578,903,157
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52093 - Posted: 18 Jun 2019, 2:03:35 UTC
Last modified: 18 Jun 2019, 2:06:08 UTC

Not sure what is causing this, but I have over 3000 messages that are in pairs as shown:

GPUGRID	6/17/2019 8:55:53 PM	[coproc] NVIDIA instance 0; 1.000000 pending for e93s49_e72s18p0f35-PABLO_v3Q86UU0_MOR_6_IDP-1-2-RND0688_0	
GPUGRID	6/17/2019 8:55:53 PM	[coproc] NVIDIA instance 0: confirming 1.000000 instance for e93s49_e72s18p0f35-PABLO_v3Q86UU0_MOR_6_IDP-1-2-RND0688_0	


Thera are two gtx1070 in this system but only once has a job and I read the following:

3760	GPUGRID	6/17/2019 8:59:07 PM	Requesting new tasks for NVIDIA GPU	
3761	GPUGRID	6/17/2019 8:59:09 PM	Scheduler request completed: got 0 new tasks	
3762	GPUGRID	6/17/2019 8:59:09 PM	No tasks sent	
3763	GPUGRID	6/17/2019 8:59:09 PM	Project has no tasks available	
3764	GPUGRID	6/17/2019 9:00:01 PM	[coproc] NVIDIA instance 0; 1.000000 pending for e93s49_e72s18p0f35-PABLO_v3Q86UU0_MOR_6_IDP-1-2-RND0688_0	
3765	GPUGRID	6/17/2019 9:00:01 PM	[coproc] NVIDIA instance 0: confirming 1.000000 instance for e93s49_e72s18p0f35-PABLO_v3Q86UU0_MOR_6_IDP-1-2-RND0688_0	
3766	GPUGRID	6/17/2019 9:01:01 PM	[coproc] NVIDIA instance 0; 1.000000 pending for e93s49_e72s18p0f35-PABLO_v3Q86UU0_MOR_6_IDP-1-2-RND0688_0	
3767	GPUGRID	6/17/2019 9:01:01 PM	[coproc] NVIDIA instance 0: confirming 1.000000 instance for e93s49_e72s18p0f35-PABLO_v3Q86UU0_MOR_6_IDP-1-2-RND0688_0	
..etc...
ID: 52093 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JStateson
Avatar

Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,578,903,157
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52095 - Posted: 18 Jun 2019, 14:15:50 UTC - in response to Message 52093.  

There are now two tasks running but only the second one is making progress. Noticed added another 3,500 same strange messages since last post so i restarted boinc to see if that fixes the problem. if it does not help then i will abort the stuck task and it will become someone else's problem.
ID: 52095 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zalster
Avatar

Send message
Joined: 26 Feb 14
Posts: 211
Credit: 4,496,324,562
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 52096 - Posted: 18 Jun 2019, 16:28:59 UTC - in response to Message 52095.  

reboot the system?
ID: 52096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JStateson
Avatar

Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,578,903,157
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52101 - Posted: 19 Jun 2019, 3:08:52 UTC - in response to Message 52096.  

reboot the system?


Shut it down when you first posted and I finally got around to booting it back up. There seems to be other problems:

52	GPUGRID	6/18/2019 9:41:42 PM	[error] no project URL in task state file	
65			6/18/2019 9:41:47 PM	[error] Inconsistent signing key from account manager	


The url missing I occasionally see on other projects and seems to be ignored but the one about the signing key is new.

System is crunching and another 2 work units (gpugrid) showed up but I also see the first pair of those strange warnings.

Got another 20 of them in the time I wrote this. I hate losing tasks, especially gpugrid but going to detach.

Could not detach through BAM!. Log never showed the sync with project manager but reset worked. I made a note of the names of the work units that were lost and will check to see if the problem show up elsewhere but I assume something just got corrupted here.
ID: 52101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JStateson
Avatar

Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,578,903,157
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52108 - Posted: 19 Jun 2019, 12:40:58 UTC

strange messages was my "the sky is falling" as I didn't realize the debug flag was set in cc_config.
Anyway, things seem to be working after the abort project was done.

I did have one observation: The pair of aborted programs are missing from the error list under my account. I do have the names of the two programs that were running

e93s49_e72s18pOf35-PABLO_v3Q86UU0_MOR_6_IDP-1-2-RND0688_0
e97s47_e59s104p1f3l9-PABLO_v3075376_MOR_58_IDP-0-2-RND6678_0


but without the workunit name it is difficult to see if any one else had a problem.

One of the above two was hung the other one chugged along fine but I failed to make a note of which one had the problem.
ID: 52108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : 1000's of strange event messages

©2025 Universitat Pompeu Fabra