have a lot of stuck tasks, abort some?

Message boards : Number crunching : have a lot of stuck tasks, abort some?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile JStateson
Avatar

Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,578,903,157
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53140 - Posted: 27 Nov 2019, 4:34:55 UTC
Last modified: 27 Nov 2019, 5:05:38 UTC

have never seen this many before other than when my router was turned off. Other systems are running fine even those with gpugrid tasks. Looks like all tasks completed just fine, no error, but cannot upload. A restart of boinc did not help.

GPUGRID	initial_1344-ELISA_GSN4V1-8-100-RND6294_0_1	15.093	3817.50 K	00:23:01 - 16:48:28	0.00 Kbps	Upload pending (Retry in: 02:31:01), retried: 8	JYSArea51	
GPUGRID	initial_1344-ELISA_GSN4V1-8-100-RND6294_0_2	20.116	3817.50 K	00:26:10	0.71 Kbps	Uploading	JYSArea51	
GPUGRID	initial_1344-ELISA_GSN4V1-8-100-RND6294_0_9	0.850	67761.89 K	00:22:58 - 17:47:06	0.00 Kbps	Upload pending (Retry in: 03:29:39), retried: 8	JYSArea51	
GPUGRID	initial_1381-ELISA_GSN0V1-9-100-RND4251_0_1	1.683	3816.54 K	00:02:34	0.00 Kbps	Upload pending, retried: 1	JYSArea51	
GPUGRID	initial_1381-ELISA_GSN0V1-9-100-RND4251_0_2	1.683	3816.54 K	00:02:34	0.00 Kbps	Upload pending, retried: 1	JYSArea51	
GPUGRID	initial_1381-ELISA_GSN0V1-9-100-RND4251_0_9	0.094	68042.38 K	00:02:36	0.00 Kbps	Upload pending, retried: 1	JYSArea51	
GPUGRID	initial_1512-ELISA_GSN0V1-6-100-RND5965_0_1	3.359	3816.54 K	00:05:13	0.00 Kbps	Upload pending, retried: 2	JYSArea51	
GPUGRID	initial_1512-ELISA_GSN0V1-6-100-RND5965_0_2	3.359	3816.54 K	00:05:12	0.00 Kbps	Upload pending, retried: 2	JYSArea51	
GPUGRID	initial_1512-ELISA_GSN0V1-6-100-RND5965_0_9	0.094	67987.78 K	00:02:36	0.00 Kbps	Upload pending, retried: 1	JYSArea51	
GPUGRID	initial_1719-ELISA_GSN0V1-5-100-RND4368_0_1	1.683	3816.54 K	00:02:37	0.00 Kbps	Upload pending, retried: 1	JYSArea51	
GPUGRID	initial_1719-ELISA_GSN0V1-5-100-RND4368_0_2	1.683	3816.54 K	00:02:35	0.00 Kbps	Upload pending, retried: 1	JYSArea51	
GPUGRID	initial_1719-ELISA_GSN0V1-5-100-RND4368_0_9	0.095	67536.57 K	00:02:36	0.00 Kbps	Upload pending, retried: 1	JYSArea51	
GPUGRID	test265-TONI_GSNTEST3-11-100-RND0660_0_1	1.682	3817.50 K	00:02:36	0.00 Kbps	Upload pending, retried: 1	JYSArea51	
GPUGRID	test265-TONI_GSNTEST3-11-100-RND0660_0_2	1.682	3817.50 K	00:02:34	0.00 Kbps	Upload pending, retried: 1	JYSArea51	
GPUGRID	test265-TONI_GSNTEST3-11-100-RND0660_0_9	0.094	68081.72 K	00:02:36	0.00 Kbps	Upload pending, retried: 1	JYSArea51	
GPUGRID	test360-TONI_GSNTEST3-6-100-RND5366_0_1	18.440	3817.50 K	00:23:32	0.71 Kbps	Uploading	JYSArea51	
GPUGRID	test360-TONI_GSNTEST3-6-100-RND5366_0_2	10.064	3817.50 K	00:15:35	0.00 Kbps	Upload pending, retried: 6	JYSArea51	
GPUGRID	test360-TONI_GSNTEST3-6-100-RND5366_0_9	0.470	68081.16 K	00:12:57	0.00 Kbps	Upload pending, retried: 5	JYSArea51	


[EDIT] reboot of windows started things going. I suspect the first 67mb files caused a problem which was compouned by subsequent ones of same size all trying to upload concurrently. Need to figure a was to stop this. Have three 1070ti boards but network cant seem to handle the large files when all get done near same time.

In other news I got my first Linux cuda100. It is running on gtx 1660ti.
ID: 53140 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 9 May 13
Posts: 171
Credit: 4,594,296,466
RAC: 140
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53155 - Posted: 27 Nov 2019, 13:49:14 UTC - in response to Message 53140.  

JStateson wrote:

I suspect the first 67mb files caused a problem which was compouned by subsequent ones of same size all trying to upload concurrently. Need to figure a was to stop this.


Just a thought, in the cc_config.xml file there is an option for
<max_file_xfers_per_project>N</max_file_xfers_per_project>
.
Maybe that would help.
ID: 53155 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53157 - Posted: 27 Nov 2019, 15:32:13 UTC

I am not alone! You are not alone!

I observe also slow uploads of the finish WUs on all my computers. BOINC reports about 6 KBps.

I do have particularly problems with this computer: http://www.gpugrid.net/show_host_detail.php?hostid=512293 Uploads stall for hours! Yes, the computer has also some climateprediction.net files to upload, but other projects have no problems to upload and download with faster speeds.

It reminds me of the bandwidth problems GRIDCOIN had with their IT department (not giving sufficient bandwidth to GPUGRID) years ago. Might somebody from the project look into it. Make the WUs longer so the server does not get hammered by so many computers at the same time?
ID: 53157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 53160 - Posted: 27 Nov 2019, 16:16:43 UTC

Since I run multiple projects on the same hosts, I need to provide sufficient network communication threads for all the uploads/downloads.

Does not help I have asymmetrical upload/downloads speeds because of ADSL2. My 1Mbps upload link is not big enough to handle the large result files from GPUGrid without some strain. Not having any issues uploading though as long as the project servers are accepting connection.

I know that they will take at least a half hour to upload. I use these parameters in cc_config.xml

<max_file_xfers>16</max_file_xfers>
<max_file_xfers_per_project>8</max_file_xfers_per_project>
ID: 53160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,798,881,008
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53162 - Posted: 27 Nov 2019, 18:19:26 UTC - in response to Message 53160.  

I use these parameters in cc_config.xml
<max_file_xfers>16</max_file_xfers>
<max_file_xfers_per_project>8</max_file_xfers_per_project>


So the cc_config.xml would like look like:
<cc_config>
<options>
<max_file_xfers>16</max_file_xfers>
<max_file_xfers_per_project>8</max_file_xfers_per_project>
</options>
</cc_config>
ID: 53162 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JStateson
Avatar

Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,578,903,157
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53164 - Posted: 27 Nov 2019, 20:01:58 UTC

All tasks finally uploaded after a reboot

Need to configure that max allowable number of transfers


Thanks!

ID: 53164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dingo
Avatar

Send message
Joined: 1 Nov 07
Posts: 20
Credit: 128,376,317
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53245 - Posted: 1 Dec 2019, 10:39:07 UTC

I have a few tasks that have not uploaded for a while and all have "Upload Pending Project Backoff" Do I just let them sit there and wait till they upload. I have tried stopping and starting BOINC but that did not fix it.

GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_0 1.054 10.65 K 00:00:20 - 15:10:42 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_1 0.003 3816.54 K 00:00:18 - 14:50:55 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_2 0.003 3816.54 K 00:00:11 - 12:41:05 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_9 0.000 68065.03 K 00:00:07 - 12:29:29 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1687-ELISA_GSN0V1-8-100-RND7000_0_10 100.000 0.27 K 00:00:39 - 12:37:38 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_0 1.057 10.62 K 00:00:42 - 12:39:17 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_1 0.003 3816.54 K 00:00:22 - 12:24:37 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_2 0.003 3816.54 K 00:00:21 - 12:20:38 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_9 0.000 68067.50 K 00:00:04 - 12:08:51 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1440-ELISA_GSN0V1-9-100-RND9376_0_10 100.000 0.27 K 00:00:03 - 06:35:08 0.00 Kbps Upload pending (Project backoff: 00:27:19) Rack-01
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_0 1.022 10.99 K 00:00:14 - 08:49:10 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_1 0.003 3816.54 K 00:00:16 - 07:01:25 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_2 0.003 3816.54 K 00:00:11 - 07:53:31 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_9 0.000 68066.14 K 00:00:08 - 06:20:56 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1509-ELISA_GSN0V1-9-100-RND3769_0_10 100.000 0.27 K 00:00:07 - 06:05:22 0.00 Kbps Upload pending (Project backoff: 00:06:46) bundy-2
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_0 1.031 10.90 K 00:00:20 - 16:13:14 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_1 0.003 3817.50 K 00:00:20 - 14:14:37 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_2 0.003 3817.50 K 00:00:11 - 13:01:50 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_9 0.000 68080.19 K 00:00:10 - 12:35:48 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
GPUGRID initial_1622-ELISA_GSN4V1-14-100-RND6105_2_10 100.000 0.27 K 00:00:07 - 11:41:37 0.00 Kbps Upload pending (Project backoff: 00:35:15) bundy-3
ID: 53245 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
stiwi

Send message
Joined: 18 Jun 12
Posts: 2
Credit: 100,396,087
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 53246 - Posted: 1 Dec 2019, 10:45:48 UTC - in response to Message 53245.  

01.12.2019 10:51:50 | GPUGRID | [error] Error reported by file upload server: Server is out of disk space


We have to wait until they fix it :)
ID: 53246 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,187
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53247 - Posted: 1 Dec 2019, 10:46:33 UTC - in response to Message 53245.  

This is being treated on this other thread:
http://www.gpugrid.net/forum_thread.php?id=5027
ID: 53247 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 53250 - Posted: 1 Dec 2019, 13:44:56 UTC - in response to Message 53247.  

Please be patient, no need to abort
ID: 53250 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : have a lot of stuck tasks, abort some?

©2025 Universitat Pompeu Fabra