hERG: information and issues

Message boards : Graphics cards (GPUs) : hERG: information and issues
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 13850 - Posted: 9 Dec 2009, 12:01:16 UTC
Last modified: 9 Dec 2009, 13:27:53 UTC

Dear crunchers,

I'm starting this topic to collect information and feedback on the HERG workunits, all in a single place. The idea (under test) is to provide a quick-to-find reference for both those of you curious about the purpose of the WU they are crunching, and a place to report issues.

This post, and the one below, may be updated from time to time.


Scientific rationale.

First of all, some background information on the experiment: we are doing various studies on the so-called "hERG channel". You can find a (longish) description on Wikipedia's hERG page.
This complex of four proteins (tetramer) is found in many of the body cells, and most notably the heart tissue, where it plays a very important role: it conducts charged particles (potassium ions), which flow through it cyclically, ultimately governing the heart beat.

The molecule is of especial interest because interferences with its functioning, e.g. unintentional side effects of drugs, and congenital mutations, cause potentially fatal alterations in the cardiac rhythm, including the long QT syndrome.

The curious ones may find an image of the tetramer on our Flickr photostream.
ID: 13850 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 13851 - Posted: 9 Dec 2009, 12:02:40 UTC
Last modified: 9 Dec 2009, 18:18:42 UTC

Crunching issues.

The TONI_HERG workunits use the same parameters as many others. As far as we know, they have the same failure rate as other workunits, but I am trying to get some sounder statistics. If you see more HERG failures, it could be that there are many of those WU out right now.


[This post reserved for future updates]
ID: 13851 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13869 - Posted: 10 Dec 2009, 18:49:47 UTC - in response to Message 13851.  

ID: 13869 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13870 - Posted: 10 Dec 2009, 20:11:52 UTC - in response to Message 13851.  

Crunching issues.

The TONI_HERG workunits use the same parameters as many others. As far as we know, they have the same failure rate as other workunits, but I am trying to get some sounder statistics. If you see more HERG failures, it could be that there are many of those WU out right now.

The TONI_HERG run fine on GTX 260 and above. On my 4 G92 based cards they almost always fail, so I now abort them on those cards when they arrive. Other WUs are much much better, most types never fail on any of the cards.
ID: 13870 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 13875 - Posted: 11 Dec 2009, 11:07:57 UTC - in response to Message 13870.  
Last modified: 11 Dec 2009, 11:16:34 UTC

So, from what I understand, these WUs sometimes fail on older cards? I'm trying to collect statistics on non-overclocked cards.

From what I see in SKGiven's task list for host 51279, he had at least three TONI_HERG successfully completed, as well 1572466, 1606985 and 1558388. BTW, isn't the card overclocked at 1.85 GHz?
ID: 13875 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13876 - Posted: 11 Dec 2009, 11:48:07 UTC - in response to Message 13875.  

So, from what I understand, these WUs sometimes fail on older cards? I'm trying to collect statistics on non-overclocked cards.

I would put it more strongly than that - they have a high probability of failing, even if some succeed. And by 'age' of the card, you mean the technology generation they incorporate.

I have three 9800GT series cards, all purchased in January this year. The straight 9800GTs are not overclocked, the 9800GTX+ runs on factory overclock settings. I haven't noteiced any significant difference in failure rate between the cards: so I don't think the problem is related to (moderate) overclocking.

Also, I've been running the same drivers (190.38, 32-bit WinXP) since July: the increased error rate has become apparent much more recently than that - late October, IIRC. So I'm not inclined to blame it on drivers, either.

No, it seems to be related to specific model types. TONI_HERG is a fairly recent addition to the list of problematic models - searching the message boards suggests that my report on 24 November was the first sighting. Previously, we had been commenting on IBUCH_TRYP and OTTO_HERG in thread 1468
ID: 13876 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
canardo

Send message
Joined: 11 Feb 09
Posts: 4
Credit: 8,675,472
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 13881 - Posted: 11 Dec 2009, 17:36:57 UTC - in response to Message 13875.  

Hello,
Just have a look here comp id: 26091
worked fine untill i upgraded to BOINC 6.10.18
allthough it might be coincidence with HERG units coming in
SETI & Einstein have no problems though
Ciao,
Jaak

ID: 13881 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13906 - Posted: 13 Dec 2009, 12:41:58 UTC - in response to Message 13875.  
Last modified: 13 Dec 2009, 13:33:06 UTC

So, from what I understand, these WUs sometimes fail on older cards? I'm trying to collect statistics on non-overclocked cards.

From what I see in SKGiven's task list for host 51279, he had at least three TONI_HERG successfully completed, as well 1572466, 1606985 and 1558388. BTW, isn't the card overclocked at 1.85 GHz?


Yes, 3 tasks did complete on the GTS 250, but there were too many failures.
The clock settings are in fact Factory settings, but yes they are higher than other cards, but it is fairly new and the core sits at 66 degrees (5 fans on case, + GPU, CPU and PSU fans) and UPS! The GTS 250 success rates are much higher for other tasks.

On the other hand my 8800GTS 512MB G92, could not complete any TONI_HERG tasks. As there were so many being sent I was down to an almost zero return for that card on the project. That card was also not able to handle other recent tasks too well. I guess it is down to the G92 cores limitations.

My GTS250 spec:
Palit card. 65nm, G92 rev A2. Bios 62.92.7D.00.10
11.9562, CUDA 3 (better than 2.3)!
GPU @745, Memory @1000MHz, Shaders @1848MHz
754M Transistors.
GPUGrid temp=66 Degrees C
For Ref. Einstein temp=48 Degrees C (but that barely uses the GPU)!

System: Q9400CPU @3.46GHz crunching other Boinc tasks (24/7, no outages as on UPS) and Win7 Pro 64bit. 4GB RAM plenty HDD space.

I will allow it to try another Herg task. Report back tomorrow, hopefully!

The GTX260 is still working well for all tasks, but that uses a GT200 A2.
ID: 13906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13925 - Posted: 14 Dec 2009, 18:53:27 UTC - in response to Message 13875.  
Last modified: 14 Dec 2009, 18:53:55 UTC

So, from what I understand, these WUs sometimes fail on older cards? I'm trying to collect statistics on non-overclocked cards.

As Richard stats, "high probability of failing" is a better description. They will occasionally complete but usually fail. On the GTX 260 and above they run fine. BTW, they often fail on the new GTS 240 and GT 240 cards too even with their 1.2 compute capability:

http://www.gpugrid.net/result.php?resultid=1592578
http://www.gpugrid.net/result.php?resultid=1590198
http://www.gpugrid.net/result.php?resultid=1610106
ID: 13925 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13927 - Posted: 14 Dec 2009, 19:16:26 UTC - in response to Message 13925.  

My GTS250 managed to complete one! http://www.gpugrid.net/result.php?resultid=1625604

The success percentage of these HERG tasks for anything less than a GTX260 seems to be poor, with the older cards being less reliable.

Just because an NVidia card is new does not mean there is any new technology inside!
ID: 13927 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 13947 - Posted: 15 Dec 2009, 14:17:10 UTC - in response to Message 13927.  

We are keeping eyes on the failure rate wrt card types (in absence of overclock). As said, the matter is puzzling because there should be no major difference with other WU types. For now, I reduced the number of HERG WUs out, and possibly I'll reduce their length a bit in order to increase the chances of correct termination.

Almost all of the failures seem to be related to the infamous CUDA FFT bug, on which we have little to no control (i.e., errors in "pme" or "fft" kernels).

Definitely, thanks for bearing with us.
ID: 13947 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13951 - Posted: 15 Dec 2009, 17:19:35 UTC - in response to Message 13947.  

Almost all of the failures seem to be related to the infamous CUDA FFT bug, on which we have little to no control (i.e., errors in "pme" or "fft" kernels).

Could you give us a little bit more detail about this bug, as this is the first time I've heard about it? It may only be "infamous" in developer circles.

I'm aware of an infamous bug in the BOINC CUDA application which NVidia developed for SETI@home, but that just causes certain tasks ('VLAR') to run extremely slowly, and inhibits screen re-drawing while they're running. Apart from that, SETI is an extremely heavy user of FFTs at a wide range of problem sizes, and benefits enormously from the additional capabilities of cufft v2.3: I've not come across a single SETI task which has failed because of a CUDA FFT bug.
ID: 13951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 13953 - Posted: 15 Dec 2009, 17:55:52 UTC - in response to Message 13951.  
Last modified: 15 Dec 2009, 18:02:19 UTC

It's a long standing issue that hits older cards especially hard. Please see here or here. For what concerns FFT being ok with SETI, in fact there are many types of FFT, and it's not surprising that the bug only manifests for some of them.
ID: 13953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13954 - Posted: 15 Dec 2009, 18:49:36 UTC - in response to Message 13953.  

It's a long standing issue that hits older cards especially hard. Please see here or here. For what concerns FFT being ok with SETI, in fact there are many types of FFT, and it's not surprising that the bug only manifests for some of them.

I had hoped that you would direct me to a relevant discussion here. The only thing of relevance in those threads seems to be message 12734:

We have contacted AGAIN Nvidia yesterday.

gdf

That was almost three months ago, and is the very last post in the thread. Did he ever get a reply?
ID: 13954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13955 - Posted: 15 Dec 2009, 20:08:41 UTC - in response to Message 13954.  

Perhaps the FFT bug is being compounded by a mixture of G92/65nm cores and old firmware?

Reducing the work length would help, as the tasks that failed on my systems seemed to do so randomly, in terms of time. If they fail after 10sec its not really a problem that effects turnover, but after 6h is not good.

Ultimately if you could match cards to work units it would resolve this issue. It might even be better than card pairing, though both could be done.

No hERG tasks for G92 cards would soon sort a lot of problems out.
ID: 13955 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tom Philippart

Send message
Joined: 12 Feb 09
Posts: 57
Credit: 23,376,686
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 13957 - Posted: 15 Dec 2009, 20:50:07 UTC

great to see this thread!! thanks a lot!
ID: 13957 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 13982 - Posted: 18 Dec 2009, 9:49:06 UTC - in response to Message 13957.  
Last modified: 18 Dec 2009, 9:49:35 UTC

I can just repeat what I have already said somewhere in the forum.
We have furnished a reproducer of the bug to Nvidia. We contacted them back several times. They say that there they are looking at it. Another time, they said that technical stuff is trying to find the problem and the are discussions on what to do. But then nothing. This is common with Nvidia, we have sent several bug reproducers but they only fixed once another other bug with their FFT which we have sent. In my experience, they use bug reports to fix bugs on new chips not older ones. It also makes some sense given the rate at which new GPUs are produced. So we have stopped reporting bugs for older cards.


GDF
ID: 13982 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MarkJ
Volunteer moderator
Volunteer tester

Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14077 - Posted: 29 Dec 2009, 23:57:17 UTC

Had two TONI_HERG's fail. They were run on a GTX295 (single PCB variety, so the newer model).

WU 1
WU 2

Both say "Cuda error: Kernel [pme_fill_charges_overflow] failed in file 'fillcharges.cu' in line 97 : unknown error".

I know there isn't much you can do if nvidia don't want to fix their software.
BOINC blog
ID: 14077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14159 - Posted: 8 Jan 2010, 15:59:04 UTC - in response to Message 13982.  

I can just repeat what I have already said somewhere in the forum.
We have furnished a reproducer of the bug to Nvidia. We contacted them back several times. They say that there they are looking at it. Another time, they said that technical stuff is trying to find the problem and the are discussions on what to do. But then nothing. This is common with Nvidia, we have sent several bug reproducers but they only fixed once another other bug with their FFT which we have sent. In my experience, they use bug reports to fix bugs on new chips not older ones. It also makes some sense given the rate at which new GPUs are produced. So we have stopped reporting bugs for older cards.


GDF


I've downloaded the Nvidia SDKs for the older CUDA versions. Are you interested in sending me the source code for the current Windows application and letting me check if whatever method you use to compile it also works with the older SDKs? Or would you prefer to download those SDKs yourself? I'd expect either method to produce versions with better support for some of the older Nvidia boards, IF they don't need major source code modifications to work at all.

I intended to start learning enough CUDA that I could start helping a few BOINC projects start a GPU version, but so far it looks like I won't be ready to actually start modifying the code very soon.

Another idea: Ask the BOINC developers to add more code for reporting the GPU chip type, in order to get more information about which of the older Nvidia boards are still usable.
ID: 14159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14160 - Posted: 8 Jan 2010, 17:46:38 UTC - in response to Message 13850.  

First of all, some background information on the experiment: we are doing various studies on the so-called "hERG channel". You can find a (longish) description on Wikipedia's hERG page.
This complex of four proteins (tetramer) is found in many of the body cells, and most notably the heart tissue, where it plays a very important role: it conducts charged particles (potassium ions), which flow through it cyclically, ultimately governing the heart beat.


Since that means your software is now ready to handle a tetramer, here's some information on a trimer you're likely to be interested in as well:

A trimer of the gp120 protein that the HIV-1 virus uses to enter human cells. If your software can handle docking of assorted compounds the that trimer and choose those that dock to the trimer without too much being wasted also docking to the single units of the gp120 protein elsewhere on the virus coat, you're likely to get the groups interested in HIV/AIDS research very interested in using your software.

At this moment, I'm having trouble getting the links from one of my other computers to this one, but will post several related links if they look useful for you.

Atre you interested in getting enough grants that you will have to hire yet another researcher or two to handle them all?


ID: 14160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Graphics cards (GPUs) : hERG: information and issues

©2026 Universitat Pompeu Fabra