Message boards :
News :
Old Noelia WUs
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 17 · Next
Author | Message |
---|---|
Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
True. Not 100%, but doable. |
Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If I babbyseat the machines, I mean.... I will be traveling in two days, then the worst is expected. |
Send message Joined: 16 Jul 12 Posts: 98 Credit: 386,043,752 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free. https://play.google.com/store/apps/details?id=com.teamviewer.teamviewer.market.mobile&hl=en |
Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free. exactly what I do on my tablet. Problem is, when the big rig starts to reboot, I can´t access it, Hope it won´t happen. |
![]() Send message Joined: 7 Dec 12 Posts: 92 Credit: 225,897,225 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Now I got more problems even with short Noelia tasks. They were stuck, caused errors or app crash. A reboot was needed to start a new GPU task. I have ordered a new GPU for GPUGrid, but I think I'll suspend this whole project (and switch to another one) until these problems are solved. |
Send message Joined: 18 Sep 08 Posts: 368 Credit: 4,174,624,885 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Same here, got 5 Box's running the shorter ones & I think all 5 are hung Wu's right no, one at 37 Hr's ... STE\/E |
Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No problems with short NOELIA tasks. I have not attempted any long NOELIAs for about a week. PC #1 AMD 1090T with Acer GTX 650 Ti PC #2 AMD A10 5800K with Acer GTX 650 Ti John |
Send message Joined: 6 Aug 11 Posts: 8 Credit: 76,046,994 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
Short Noelias were going fine, until I had to abort this one, which was restarting repeatedly with error: SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841. |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 47,738 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS error on the latest beta units. This is a new one! After running flawlessly, I got a few units with this error, on the latest set of betas. http://www.gpugrid.net/result.php?resultid=6611952 http://www.gpugrid.net/result.php?resultid=6610530 http://www.gpugrid.net/result.php?resultid=6610707 |
![]() Send message Joined: 6 Jun 11 Posts: 124 Credit: 2,928,865 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841. It looks like most of the major errors are gone (severe error % is good), but this one does seem to be occurring more frequently than we would like. We'll see if we can find a cause. |
Send message Joined: 28 Dec 10 Posts: 13 Credit: 37,543,525 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
dmesg from the beta WU's [400033.132826] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400049.637834] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400054.854423] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000 [400066.358868] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400082.863901] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400099.368878] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400115.873938] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400119.305177] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000 [400133.382624] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400136.664677] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000 [400149.890962] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400166.399277] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400182.904290] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400198.412211] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400215.917612] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400220.224939] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000 [400244.929342] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400260.437256] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400276.942267] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400293.450605] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400308.955195] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400325.463524] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400341.968561] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400358.476864] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400369.667884] NVRM: Xid (0000:01:00): 13, 0001 00000000 000090c0 00001b0c 00000000 00000000 [400382.174156] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400397.678751] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400414.183758] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400430.692078] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400446.196682] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400461.704604] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400464.387651] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000 [400484.212040] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400500.218499] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400516.723568] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400533.231872] NVRM: Xid (0000:01:00): 8, Channel 00000001 [400535.747891] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000 [400555.739274] NVRM: Xid (0000:01:00): 8, Channel 00000001 [401174.487665] NVRM: Xid (0000:01:00): 8, Channel 00000001 [401189.992293] NVRM: Xid (0000:01:00): 8, Channel 00000001 I suspect I will have to reboot to recover from these. |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 47,738 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS error on the latest beta units. This is a new one! Is it my imagination or did you change the error message these units? |
Send message Joined: 28 Dec 10 Posts: 13 Credit: 37,543,525 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() |
Verified the beta WU's hang the GPU in some manner. rmmoding nvidia and modprobing nvidia does not resolve. The system must be rebooted to recover from whatever the WU is causing. On Nvidia 313.26. |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I wanted to chime in to say I just had 12 NOELIA tasks fail hard on the "ACEMD beta version v6.49 (cuda42)" app, using Windows 8 Pro x64, BOINC v7.0.55 x64 beta, nVidia 314.14 beta drivers, GTX 660 Ti (which usually works on GPUGRID) and GTX 460 (which usually works on World Community Grid) The tasks resulted in "Driver stopped responding" errors, and Windows restarted the drivers to recover. But the failures also appear to have caused other GPUs (which were working on entirely different projects, like World Community Grid)... to also fail. I know this is the beta app, but... Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first. ie: Many unnecessary communications, errors with unrelated projects, time spent reporting avoidable bugs, etc. Looking for more stability, even in the beta app, Jacob ================================================ PS: The 12 that failed were: 063ppx43-NOELIA_063pp_equ-1-2-RND4865_1 SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 148px44-NOELIA_148p_equ-1-2-RND1140_2 SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 216px20-NOELIA_216p_equ-1-2-RND7557_1 SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330. Assertion failed: a, file swanlibnv2.cpp, line 59 041px45-NOELIA_041p_equ-1-2-RND6478_1 SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 041px33-NOELIA_041p_equ-1-2-RND8614_2 SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330. Assertion failed: a, file swanlibnv2.cpp, line 59 255px9-NOELIA_255p_equ-1-2-RND6395_1 SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 063ppx29-NOELIA_063pp_equ-1-2-RND2517_1 SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330. Assertion failed: a, file swanlibnv2.cpp, line 59 148nx39-NOELIA_148n_equ-1-2-RND5760_1 SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330. Assertion failed: a, file swanlibnv2.cpp, line 59 063ppx16-NOELIA_063pp_equ-1-2-RND8732_1 The system cannot find the path specified. (0x3) - exit code 3 (0x3) 063ppx18-NOELIA_063pp_equ-1-2-RND6787_0 SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 109nx31-NOELIA_109n_equ-1-2-RND1501_0 SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 148nx37-NOELIA_148n_equ-1-2-RND2228_0 SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574. Assertion failed: a, file swanlibnv2.cpp, line 59 |
Send message Joined: 11 Jul 09 Posts: 27 Credit: 1,000,618,568 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
These NOELIA acemdbeta WUs are all hanging for me. They get stuck at a "Current CPU Time" of between 1 and 5 seconds. I had to abort them. http://www.gpugrid.net/result.php?resultid=6610160 http://www.gpugrid.net/result.php?resultid=6610894 http://www.gpugrid.net/show_host_detail.php?hostid=43352 |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
On my system Vista 32bit, BOINC 6.10.58 nVidia 314.7 the latest Noelia beta errored out after more than 11 hours. It is this one: http://www.gpugrid.net/workunit.php?wuid=4248935 Greetings from TJ |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I know this is the beta app, but... They would need 10 to 15 computers (dual booting or virtual pc) with every operating system on them plus all the different versions of BOINC everyone's running not to mention the different video cards. They'll never be able to please everyone, I always suspend other jobs or clear them out if I know I'm going to beta test but that's just me not 20/20 hindsight. What I'm trying to say is that if they did do some limited testing, who's to say what OS they choose? It certainly wouldn't be Windows 8, it's turning out to be a flop and a real disappointment for Microsoft and their vendors. I don't want to sound too harsh (if I do I apologize) but that's what beta testing is all about, right? |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I agree with you flashawk. We crunchers need to do the testing with all the different set-ups and platforms. Win8 is a pain indeed. Greetings from TJ |
![]() Send message Joined: 6 Jun 11 Posts: 124 Credit: 2,928,865 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first. We do test them locally, to the extent we can. Part of the issue is that running locally for us vs. running on BOINC are not comparable. We do have an in-house fake BOINC project, but even that isn't exactly comparable to sending to you users. Additionally, we have very limited ability to test on Windows. In the future we will improve there, but we have limited resources right now. What we are thinking is that this might be related to the Windows application. Has anyone who experiences these problems seen them on a linux box? Is it only Windows? The more we know, the more quickly we can improve. The last thing we want is to crash your machines. A failed WU is one thing. Locking up cruncher machines is much, much worse. Please let us know so we can fix it. |
Send message Joined: 27 May 11 Posts: 9 Credit: 255,985,614 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm running these om Win7 with GTX670 and often get a windows message Nvidia driver stopped working Hope this helps. |
©2025 Universitat Pompeu Fabra