Message boards :
Number crunching :
Unit crash after 0 second
Message board moderation
| Author | Message |
|---|---|
ZarckSend message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Unit crash after 0 second, No problem with Asteroids Gpu, Einstein Gpu, Milkyway Gpu, Seti Gpu. Blocage des unités après 0 seconde, Pas de problème avec Asteroids Gpu, Einstein Gpu, Folding@home Gpu, Milkyway Gpu, Séti Gpu. @+ *_* Nvidia Quadro K5000 + Geforce Titan. |
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Both your cards are pretty old, they may not be capable of working with these WU's. Have you completed any work units?
|
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Unit crash after 0 second, The error message at the end of the stderr.txt is: SWAN : FATAL Unable to load module .nonbonded.cu. (300)It means that the Quadro K5000 is too old for this project as it is only Compute Capability 3.0. Your Titan should work fine, but it gets very hot (83°C): (Task 20673822) <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -52 (0xffffffcc)</message> <stderr_txt> # GPU [GeForce GTX TITAN] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 0 : # Name : GeForce GTX TITAN # ECC : Disabled # Global mem : 6144MB # Capability : 3.5 # PCI ID : 0000:28:00.0 # Device clock : 928MHz # Memory clock : 3004MHz # Memory width : 384bit # Driver version : r419_29 : 41935 # GPU 0 : 81C # GPU 1 : 70C # GPU 0 : 82C # GPU 0 : 83C # GPU [Quadro K5000] Platform [Windows] Rev [3212] VERSION [80] # SWAN Device 1 : # Name : Quadro K5000 # ECC : Disabled # Global mem : 4096MB # Capability : 3.0 # PCI ID : 0000:0F:00.0 # Device clock : 705MHz # Memory clock : 2700MHz # Memory width : 256bit # Driver version : r419_29 : 41935 SWAN : FATAL Unable to load module .nonbonded.cu. (300) </stderr_txt> ]]> I don't see any other task assigned to your Titan, so you've probably excluded it by mistake (you should exclude the Quadro K5000) from getting GPUGrid work. |
ZarckSend message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
How to exclude a card in boinc ? Comment exclure une carte dans boinc ? |
ZarckSend message Joined: 16 Aug 08 Posts: 145 Credit: 328,473,995 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My config is now too old for your project, I can always do all the others. |
|
Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 259 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
How to exclude a card in boinc ? The format in cc_config.xml <exclude_gpu> <url>project_URL</url> <device_num>N</device_num> <type>NVIDIA|ATI|intel_gpu</type> <app>appname</app> </exclude_gpu> Type is needed if you have more than 1 manufacture. Intel iGPU + NV as an example. https://boinc.berkeley.edu/wiki/Client_configuration |
Michael H.W. WeberSend message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Since May 13th, all newly loaded tasks end with an error after 0 seconds of compute time. Log for all of these: <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 212 (0xd4, -44)</message> <stderr_txt> </stderr_txt> ]]> The same machine (it is a GTX750Ti under Ubuntu 18.04LTS Linux) has completed hundreds of tasks for a few years. Any idea what exactly the error means? I just resetted GPUGRID on that machine hoping it will resume computation properly. Meanwhile I am testing Einstein to see whether the card is still OK. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
|
Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There seems to be alot of work units that have high failure rates. I noticed that my mutilcard machine has no work and checked. All work units have errored out. Thought it was a problem with my machine then checked the work units and see that not 1 of them has been completed by numerous other machines. Looks like all the work units are on their way to 9 failed attempts. Checked Server Status and can see the different types of work units which are failing is climbing.
|
|
Send message Joined: 25 Nov 13 Posts: 66 Credit: 282,724,028 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The noise my notebook was making was very low. I understood there was a problem. I checked Boinc and GPUgrid works were failing with error code 212. I had one failure that Boinc couldn't resume the job. Later all jobs ended without starting with error code 212. |
Michael H.W. WeberSend message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The problem is discussed here: http://www.gpugrid.net/forum_thread.php?id=4924&nowrap=true#51786 Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. |
©2025 Universitat Pompeu Fabra