Old Noelia WUs

Message boards : News : Old Noelia WUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 17 · Next

AuthorMessage
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29108 - Posted: 10 Mar 2013, 23:37:34 UTC

True. Not 100%, but doable.
ID: 29108 · Rating: 0 · rate: Rate + / Rate - Report as offensive
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29109 - Posted: 10 Mar 2013, 23:38:23 UTC
Last modified: 10 Mar 2013, 23:38:41 UTC

If I babbyseat the machines, I mean.... I will be traveling in two days, then the worst is expected.
ID: 29109 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dylan

Send message
Joined: 16 Jul 12
Posts: 98
Credit: 386,043,752
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 29110 - Posted: 11 Mar 2013, 0:29:01 UTC - in response to Message 29109.  
Last modified: 11 Mar 2013, 0:29:08 UTC

If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free.


https://play.google.com/store/apps/details?id=com.teamviewer.teamviewer.market.mobile&hl=en
ID: 29110 · Rating: 0 · rate: Rate + / Rate - Report as offensive
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29111 - Posted: 11 Mar 2013, 1:06:02 UTC - in response to Message 29110.  

If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free.


https://play.google.com/store/apps/details?id=com.teamviewer.teamviewer.market.mobile&hl=en

exactly what I do on my tablet. Problem is, when the big rig starts to reboot, I can´t access it, Hope it won´t happen.
ID: 29111 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Mumak
Avatar

Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29113 - Posted: 11 Mar 2013, 8:43:15 UTC

Now I got more problems even with short Noelia tasks. They were stuck, caused errors or app crash. A reboot was needed to start a new GPU task.
I have ordered a new GPU for GPUGrid, but I think I'll suspend this whole project (and switch to another one) until these problems are solved.
ID: 29113 · Rating: 0 · rate: Rate + / Rate - Report as offensive
STE\/E

Send message
Joined: 18 Sep 08
Posts: 368
Credit: 4,174,624,885
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29114 - Posted: 11 Mar 2013, 9:08:08 UTC

Same here, got 5 Box's running the shorter ones & I think all 5 are hung Wu's right no, one at 37 Hr's ...
STE\/E
ID: 29114 · Rating: 0 · rate: Rate + / Rate - Report as offensive
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29115 - Posted: 11 Mar 2013, 10:43:09 UTC
Last modified: 11 Mar 2013, 10:47:18 UTC

No problems with short NOELIA tasks. I have not attempted any long NOELIAs for about a week.

PC #1 AMD 1090T with Acer GTX 650 Ti
PC #2 AMD A10 5800K with Acer GTX 650 Ti
John
ID: 29115 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ken_g6

Send message
Joined: 6 Aug 11
Posts: 8
Credit: 76,046,994
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 29116 - Posted: 11 Mar 2013, 16:55:18 UTC

Short Noelias were going fine, until I had to abort this one, which was restarting repeatedly with error:
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
ID: 29116 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 47,738
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29122 - Posted: 12 Mar 2013, 11:24:18 UTC
Last modified: 12 Mar 2013, 11:34:37 UTC

226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS error on the latest beta units. This is a new one!


After running flawlessly, I got a few units with this error, on the latest set of betas.

http://www.gpugrid.net/result.php?resultid=6611952

http://www.gpugrid.net/result.php?resultid=6610530

http://www.gpugrid.net/result.php?resultid=6610707
ID: 29122 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile nate

Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29123 - Posted: 12 Mar 2013, 11:47:08 UTC

SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.


It looks like most of the major errors are gone (severe error % is good), but this one does seem to be occurring more frequently than we would like. We'll see if we can find a cause.
ID: 29123 · Rating: 0 · rate: Rate + / Rate - Report as offensive
cciechad

Send message
Joined: 28 Dec 10
Posts: 13
Credit: 37,543,525
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 29124 - Posted: 12 Mar 2013, 12:28:12 UTC - in response to Message 29123.  

dmesg from the beta WU's

[400033.132826] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400049.637834] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400054.854423] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400066.358868] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400082.863901] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400099.368878] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400115.873938] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400119.305177] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400133.382624] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400136.664677] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400149.890962] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400166.399277] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400182.904290] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400198.412211] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400215.917612] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400220.224939] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400244.929342] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400260.437256] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400276.942267] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400293.450605] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400308.955195] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400325.463524] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400341.968561] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400358.476864] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400369.667884] NVRM: Xid (0000:01:00): 13, 0001 00000000 000090c0 00001b0c 00000000 00000000
[400382.174156] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400397.678751] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400414.183758] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400430.692078] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400446.196682] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400461.704604] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400464.387651] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400484.212040] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400500.218499] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400516.723568] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400533.231872] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400535.747891] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400555.739274] NVRM: Xid (0000:01:00): 8, Channel 00000001
[401174.487665] NVRM: Xid (0000:01:00): 8, Channel 00000001
[401189.992293] NVRM: Xid (0000:01:00): 8, Channel 00000001

I suspect I will have to reboot to recover from these.

ID: 29124 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 47,738
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29125 - Posted: 12 Mar 2013, 12:30:39 UTC - in response to Message 29122.  

226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS error on the latest beta units. This is a new one!


After running flawlessly, I got a few units with this error, on the latest set of betas.

http://www.gpugrid.net/result.php?resultid=6611952

http://www.gpugrid.net/result.php?resultid=6610530

http://www.gpugrid.net/result.php?resultid=6610707


Is it my imagination or did you change the error message these units?


ID: 29125 · Rating: 0 · rate: Rate + / Rate - Report as offensive
cciechad

Send message
Joined: 28 Dec 10
Posts: 13
Credit: 37,543,525
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 29126 - Posted: 12 Mar 2013, 12:38:32 UTC - in response to Message 29124.  

Verified the beta WU's hang the GPU in some manner. rmmoding nvidia and modprobing nvidia does not resolve. The system must be rebooted to recover from whatever the WU is causing. On Nvidia 313.26.
ID: 29126 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29127 - Posted: 12 Mar 2013, 13:28:05 UTC
Last modified: 12 Mar 2013, 13:38:17 UTC

I wanted to chime in to say I just had 12 NOELIA tasks fail hard on the "ACEMD beta version v6.49 (cuda42)" app, using Windows 8 Pro x64, BOINC v7.0.55 x64 beta, nVidia 314.14 beta drivers, GTX 660 Ti (which usually works on GPUGRID) and GTX 460 (which usually works on World Community Grid)

The tasks resulted in "Driver stopped responding" errors, and Windows restarted the drivers to recover. But the failures also appear to have caused other GPUs (which were working on entirely different projects, like World Community Grid)... to also fail.

I know this is the beta app, but...
Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.

ie: Many unnecessary communications, errors with unrelated projects, time spent reporting avoidable bugs, etc.

Looking for more stability, even in the beta app,
Jacob

================================================
PS: The 12 that failed were:

063ppx43-NOELIA_063pp_equ-1-2-RND4865_1
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

148px44-NOELIA_148p_equ-1-2-RND1140_2
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

216px20-NOELIA_216p_equ-1-2-RND7557_1
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

041px45-NOELIA_041p_equ-1-2-RND6478_1
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

041px33-NOELIA_041p_equ-1-2-RND8614_2
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

255px9-NOELIA_255p_equ-1-2-RND6395_1
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

063ppx29-NOELIA_063pp_equ-1-2-RND2517_1
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

148nx39-NOELIA_148n_equ-1-2-RND5760_1
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

063ppx16-NOELIA_063pp_equ-1-2-RND8732_1
The system cannot find the path specified.
(0x3) - exit code 3 (0x3)

063ppx18-NOELIA_063pp_equ-1-2-RND6787_0
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

109nx31-NOELIA_109n_equ-1-2-RND1501_0
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

148nx37-NOELIA_148n_equ-1-2-RND2228_0
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59
ID: 29127 · Rating: 0 · rate: Rate + / Rate - Report as offensive
ETQuestor

Send message
Joined: 11 Jul 09
Posts: 27
Credit: 1,000,618,568
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29128 - Posted: 12 Mar 2013, 14:53:25 UTC

These NOELIA acemdbeta WUs are all hanging for me. They get stuck at a "Current CPU Time" of between 1 and 5 seconds. I had to abort them.


http://www.gpugrid.net/result.php?resultid=6610160
http://www.gpugrid.net/result.php?resultid=6610894

http://www.gpugrid.net/show_host_detail.php?hostid=43352
ID: 29128 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29135 - Posted: 12 Mar 2013, 23:28:41 UTC
Last modified: 12 Mar 2013, 23:34:32 UTC

On my system Vista 32bit, BOINC 6.10.58 nVidia 314.7 the latest Noelia beta errored out after more than 11 hours. It is this one:
http://www.gpugrid.net/workunit.php?wuid=4248935
Greetings from TJ
ID: 29135 · Rating: 0 · rate: Rate + / Rate - Report as offensive
flashawk

Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29137 - Posted: 13 Mar 2013, 0:45:34 UTC - in response to Message 29127.  


I know this is the beta app, but...
Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.

ie: Many unnecessary communications, errors with unrelated projects, time spent reporting avoidable bugs, etc.

Looking for more stability, even in the beta app


They would need 10 to 15 computers (dual booting or virtual pc) with every operating system on them plus all the different versions of BOINC everyone's running not to mention the different video cards. They'll never be able to please everyone, I always suspend other jobs or clear them out if I know I'm going to beta test but that's just me not 20/20 hindsight. What I'm trying to say is that if they did do some limited testing, who's to say what OS they choose? It certainly wouldn't be Windows 8, it's turning out to be a flop and a real disappointment for Microsoft and their vendors. I don't want to sound too harsh (if I do I apologize) but that's what beta testing is all about, right?
ID: 29137 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29138 - Posted: 13 Mar 2013, 10:34:33 UTC - in response to Message 29137.  


I know this is the beta app, but...
Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.

ie: Many unnecessary communications, errors with unrelated projects, time spent reporting avoidable bugs, etc.

Looking for more stability, even in the beta app


They would need 10 to 15 computers (dual booting or virtual pc) with every operating system on them plus all the different versions of BOINC everyone's running not to mention the different video cards. They'll never be able to please everyone, I always suspend other jobs or clear them out if I know I'm going to beta test but that's just me not 20/20 hindsight. What I'm trying to say is that if they did do some limited testing, who's to say what OS they choose? It certainly wouldn't be Windows 8, it's turning out to be a flop and a real disappointment for Microsoft and their vendors. I don't want to sound too harsh (if I do I apologize) but that's what beta testing is all about, right?


I agree with you flashawk. We crunchers need to do the testing with all the different set-ups and platforms. Win8 is a pain indeed.

Greetings from TJ
ID: 29138 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile nate

Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29139 - Posted: 13 Mar 2013, 10:49:48 UTC

Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.


We do test them locally, to the extent we can. Part of the issue is that running locally for us vs. running on BOINC are not comparable. We do have an in-house fake BOINC project, but even that isn't exactly comparable to sending to you users. Additionally, we have very limited ability to test on Windows. In the future we will improve there, but we have limited resources right now.

What we are thinking is that this might be related to the Windows application. Has anyone who experiences these problems seen them on a linux box? Is it only Windows? The more we know, the more quickly we can improve. The last thing we want is to crash your machines. A failed WU is one thing. Locking up cruncher machines is much, much worse. Please let us know so we can fix it.
ID: 29139 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Fred Bayliss

Send message
Joined: 27 May 11
Posts: 9
Credit: 255,985,614
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29140 - Posted: 13 Mar 2013, 10:57:02 UTC - in response to Message 29139.  

I'm running these om Win7 with GTX670 and often get a windows message Nvidia driver stopped working
Hope this helps.
ID: 29140 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 17 · Next

Message boards : News : Old Noelia WUs

©2025 Universitat Pompeu Fabra