New project in long queue

Message boards : News : New project in long queue
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
GPUGRID

Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29040 - Posted: 7 Mar 2013, 12:28:04 UTC - in response to Message 29038.  

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs.


Thanks, we're looking at it. Obviously this is pretty serious. I will submit some additional stuff to long that I know for sure are good simulations so that we can get a handle on this.

Edit: So I have submitted to long queue some simulations we know are good. If it is an issue with the app, we will find out. They have name NATHAN_dhfr36_3



I noticed the NATHAN units, they are coming really good. All machines are back.....will report results ASAP :D
ID: 29040 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29041 - Posted: 7 Mar 2013, 12:47:12 UTC

A NATHAN has started running OK here too, even with no reboot after the NOELIA failure (technique as described in NC).
ID: 29041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 47,738
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29042 - Posted: 7 Mar 2013, 13:18:55 UTC - in response to Message 29038.  

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are runnig fine these NOELIAs.


Thanks, we're looking at it. Obviously this is pretty serious. I will submit some additional stuff to long that I know for sure are good simulations so that we can get a handle on this.

Edit: So I have submitted to long queue some simulations we know are good. If it is an issue with the app, we will find out. They have name NATHAN_dhfr36_3


In my case, both the 32 bit windows xp and the 64 bit windows 7 are having errors, this morning. The units I downloaded yesterday, seem to be okay.

Though, I did get a crash on my windows 7 computer, on a unit running fine, when I did a reboot, though another running on the other card didn't crash. The setting (speed of GPU, memory and fan) on the video card which the unit crashed on were reset. I had to do another reboot, with the units suspended to get the video card settings right.

I also noticed the that on windows 7 machine the units take 18 hours plus to finish, while on the windows xp machine it takes about 13 hours. This difference seems to be excessive.



ID: 29042 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Belgique] bill1170

Send message
Joined: 4 Jan 09
Posts: 13
Credit: 1,382,704,222
RAC: 12
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29043 - Posted: 7 Mar 2013, 13:54:13 UTC - in response to Message 29042.  

It's not limited to XP64, My XP32 got the error in acemd.2865P.exe as well.
ID: 29043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cciechad

Send message
Joined: 28 Dec 10
Posts: 13
Credit: 37,543,525
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 29044 - Posted: 7 Mar 2013, 13:54:55 UTC - in response to Message 29042.  

5-6 WU's failed for me this morning. I'm on driver 313.26. As far as I can tell these WU's have also failed for everyone else they were distributed to. I'm seeing these in dmesg (I thought my card might be failing but I think there is a problem with the new WU's)

[649702.679741] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[649705.790669] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[649715.295948] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649730.302031] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649745.308105] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649760.317500] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649776.323990] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649791.831930] NVRM: Xid (0000:01:00): 8, Channel 00000001
[649806.838000] NVRM: Xid (0000:01:00): 8, Channel 00000001
ID: 29044 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nate

Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29047 - Posted: 7 Mar 2013, 14:24:13 UTC
Last modified: 7 Mar 2013, 14:24:40 UTC

New news here: http://www.gpugrid.net/forum_thread.php?id=3318

It looks like it might be an extension of the issue I discussed before, but we're not sure. We're going to run tests on the beta queue to try and figure it out.
ID: 29047 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29060 - Posted: 7 Mar 2013, 19:52:12 UTC - in response to Message 29032.  
Last modified: 7 Mar 2013, 19:55:03 UTC

It looks to me that this problem is related to the architecture of the host operating system, as all (1, 2, 3) of my Windows XP x64 systems have a lot of errors, while all (1, 2, 3) of my Windows XP x86 systems are running fine these NOELIAs.

After my post (above), my 32 bit hosts had some failures and stuck workunits, so their previous relatively successful behavior maybe just by chance.
ID: 29060 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : News : New project in long queue

©2025 Universitat Pompeu Fabra