Message boards :
Number crunching :
WU failures discussion
Message board moderation
Previous · 1 · 2 · 3 · 4
| Author | Message |
|---|---|
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've wrote a batch program to watch if a workunit is stuck, and when it happens this batch program restarts the operating system, but it could be programmed to take other actions (like deleting files from the failed workunit to make it run to an error instead of hanging at next start, but a simple OS restart seems to resolve the majority of the WU hangs). It works on all current Windows versions, 32 and 64 bit (XP, 7, 8) The batch program consists of two batch files, which make another batch files depending on how many workunits are running at the same time. You have to save these batch files into the same directory, in which you have all access rights (write, read, execute, modify, delete), for example in a folder on your desktop. I call the first file check.bat, to create it you should start notepad, copy and paste the following text, and then save it to your designated folder as "check.bat", and don't forget to set the file type to "all files" before you press "save" (or else the notepad will save it as check.bat.txt) @ECHO OFF IF "%ALLUSERSPROFILE%"=="%SYSTEMDRIVE%\ProgramData" GOTO Win7 SET SLOTDIR=%ALLUSERSPROFILE%\Application Data\BOINC\slots GOTO WinXP :Win7 SET SLOTDIR=%ALLUSERSPROFILE%\BOINC\slots :WinXP IF NOT EXIST slotnum.bat GOTO src4slots CALL slotnum.bat IF %SLOTNUM%==SLOTCHANGE GOTO src4slots SET SLOTCOUNT=0 SET APPNAME=acemd.800-42.exe FOR /L %%i IN (1,1,20) DO CALL slotcheck %%i c SET APPNAME=acemd.800-55.exe FOR /L %%i IN (1,1,20) DO CALL slotcheck %%i c IF NOT %SLOTNUM%==%SLOTCOUNT% GOTO src4slots IF %SLOTNUM%==0 GOTO end FOR /L %%i IN (1,1,%SLOTNUM%) DO CALL slot%%i IF EXIST slotnum.bat GOTO end echo === RESTART: ACEMD stuck ==== >>check.log DATE /t >>check.log TIME /t >>check.log echo . >>check.log SHUTDOWN /r /f /d 4:5 /c "ACEMD stuck" GOTO end :src4slots IF EXIST slotnum.bat DEL slotnum.bat /q /f SET SLOTNUM=0 SET APPNAME=acemd.800-42.exe FOR /L %%i IN (1,1,20) DO CALL slotcheck %%i SET APPNAME=acemd.800-55.exe FOR /L %%i IN (1,1,20) DO CALL slotcheck %%i ECHO SET SLOTNUM=%SLOTNUM% >slotnum.bat :end If your host use the CUDA5.5 client, the brown section is not needed. If your host use the CUDA4.2 client, the green section is not needed. You can use this batch program to check any client's progress (other than GPUGrid's client), all you have to do is to replace the name of the acemd client with the name of the designated client's executable file at the end of the first line in the brown, or the green section. You have to repeat these two sections as many times as many client's progress you want to check. The second batch file: (it must be named as slotcheck.bat, as the first batch file refers to this file with that name.) IF NOT EXIST "%SLOTDIR%\%1\%APPNAME%" GOTO end IF NOT .%2.==.. GOTO count IF %SLOTNUM%==8 SET SLOTNUM=9 IF %SLOTNUM%==7 SET SLOTNUM=8 IF %SLOTNUM%==6 SET SLOTNUM=7 IF %SLOTNUM%==5 SET SLOTNUM=6 IF %SLOTNUM%==4 SET SLOTNUM=5 IF %SLOTNUM%==3 SET SLOTNUM=4 IF %SLOTNUM%==2 SET SLOTNUM=3 IF %SLOTNUM%==1 SET SLOTNUM=2 IF %SLOTNUM%==0 SET SLOTNUM=1 DEL slot%SLOTNUM%.bat /q /f ECHO IF EXIST "%SLOTDIR%\%1\%APPNAME%" GOTO checkprogress >slot%SLOTNUM%.bat ECHO IF EXIST slotnum.bat ECHO SET SLOTNUM=SLOTCHANGE ^>slotnum.bat >>slot%SLOTNUM%.bat ECHO GOTO end >>slot%SLOTNUM%.bat ECHO :checkprogress >>slot%SLOTNUM%.bat ECHO FIND "<fraction_done>" ^<"%SLOTDIR%\%1\boinc_task_state.xml" ^>%SLOTNUM%.txt >>slot%SLOTNUM%.bat ECHO FC %SLOTNUM%.txt %SLOTNUM%.xml >>slot%SLOTNUM%.bat ECHO IF ERRORLEVEL 1 GOTO ok >>slot%SLOTNUM%.bat ECHO FIND "<result_name>" ^<"%SLOTDIR%\%1\boinc_task_state.xml" ^>^>check.log >>slot%SLOTNUM%.bat ECHO TYPE %SLOTNUM%.txt ^>^>check.log >>slot%SLOTNUM%.bat ECHO DEL slotnum.bat /q /f >>slot%SLOTNUM%.bat ECHO :ok >>slot%SLOTNUM%.bat ECHO COPY %SLOTNUM%.txt %SLOTNUM%.xml /y >>slot%SLOTNUM%.bat ECHO :end >>slot%SLOTNUM%.bat FIND "<fraction_done>" <"%SLOTDIR%\%1\boinc_task_state.xml" >%slotnum%.xml GOTO end :count IF %SLOTCOUNT%==8 SET SLOTCOUNT=9 IF %SLOTCOUNT%==7 SET SLOTCOUNT=8 IF %SLOTCOUNT%==6 SET SLOTCOUNT=7 IF %SLOTCOUNT%==5 SET SLOTCOUNT=6 IF %SLOTCOUNT%==4 SET SLOTCOUNT=5 IF %SLOTCOUNT%==3 SET SLOTCOUNT=4 IF %SLOTCOUNT%==2 SET SLOTCOUNT=3 IF %SLOTCOUNT%==1 SET SLOTCOUNT=2 IF %SLOTCOUNT%==0 SET SLOTCOUNT=1 :end You should create a scheduled task to run "check.bat" every 10 minutes (shorter period is not recommended), with the highest access rights on Win7 and 8, or administrator privileges on WinXP (or else it won't be able to restart the OS) Known limitations: It can monitor 9 slots at the most. It checks only the first 20 slots for the targeted clients (it can be easily modified in the green and blue section) You should not pause any tasks which is monitored by this batch program and already processed to any extent (or else the batch program will restart your host's OS every 20 minutes) If the monitored application writes its progress to the disk less frequent than every 10 minutes, you should increase the repetition interval according to the application. I'm using it on WinXP, and haven't tested on other Windows (7,8), but it should work. It creates the folloing files: - check.log: record of every restart with the date, time, workunit name and its progress - slotnum.bat file: it tells the batch program how many slots it has to monitor - slotn.bat file for every slot the batch program has to monitor - n.xml and n.txt files to record every slot's progress |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There is a "typo" in the brown and green sections, as the slot numbering starts at 0, also the app names should be modified to reflect the new app versions: For the CUDA5.5 client you should use: SET APPNAME=acemd.802-55.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i c SET APPNAME=acemd.803-55.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i c SET APPNAME=acemd.804-55.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i c SET APPNAME=acemd.802-55.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i SET APPNAME=acemd.803-55.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i SET APPNAME=acemd.804-55.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i For the CUDA 4.2 client you should use: SET APPNAME=acemd.802-42.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i c SET APPNAME=acemd.803-42.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i c SET APPNAME=acemd.804-42.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i c SET APPNAME=acemd.802-42.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i SET APPNAME=acemd.803-42.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i SET APPNAME=acemd.804-42.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i I've had a couple of stuck tasks in the last few days, and it seems to me that sometimes there is no boinc_task_state.xml file in the slot directory when it happens. The batch program is still working in that case, but the result name is missing from the log file it creates. I've added additional debug info to the slotcheck.bat (like the name of the stuck application), I'll publish the new version when I'll have more info about the missing boinc_task_state.xml. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here comes the second version of my monitoring batch programs. This version counts the running tasks, instead looking for a stuck task, so there's no problem if the BOINC manager (or the user) pauses a task because of an overestimated remaining time of a new workunit. You should set the number of concurrently running workunits in the line marked with indigo color. (Now it's set to 2, as I have 2 GPUs in my system) When less workunits made progress than that number, this batch program restarts the operating system, but it could be programmed to take other actions (like deleting files from the failed workunit to make it run to an error instead of hanging at next start, but a simple OS restart seems to resolve the majority of the WU hangs). It works on all current Windows versions, 32 and 64 bit (XP, 7, 8) The batch program consists of two batch files, which make another batch files depending on how many workunits are running at the same time. You have to save these batch files into the same directory, in which you have all access rights (write, read, execute, modify, delete), for example in a folder on your desktop. I call the first file check.bat, to create it you should start notepad, copy and paste the following (colored) text, and then save it to your designated folder as "check.bat", and don't forget to set the file type to "all files" before you press "save" (or else the notepad will save it as check.bat.txt) @ECHO OFF IF "%ALLUSERSPROFILE%"=="%SYSTEMDRIVE%\ProgramData" GOTO Win7 SET SLOTDIR=%ALLUSERSPROFILE%\Application Data\BOINC\slots GOTO WinXP :Win7 SET SLOTDIR=%ALLUSERSPROFILE%\BOINC\slots :WinXP IF NOT EXIST slotnum.bat GOTO src4slots CALL slotnum.bat SET SLOTCOUNT=0 SET APPNAME=acemd.800-55.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i c SET APPNAME=acemd.814-55.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i c IF NOT %SLOTNUM%==%SLOTCOUNT% GOTO src4slots IF %SLOTNUM%==0 GOTO end SET INPROGRESS=0 FOR /L %%i IN (1,1,%SLOTNUM%) DO CALL slot%%i IF NOT EXIST slotnum.bat GOTO src4slots IF %INPROGRESS% GEQ 2 GOTO end IF %INPROGRESS%==%SLOTNUM% GOTO end echo ======= RESTART: ACEMD stuck ======= >>check.log SHUTDOWN /r /f /d 4:5 /c "ACEMD stuck" GOTO end :src4slots SET SLOTNUM=0 SET APPNAME=acemd.800-55.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i SET APPNAME=acemd.814-55.exe FOR /L %%i IN (0,1,20) DO CALL slotcheck %%i ECHO SET SLOTNUM=%SLOTNUM% >slotnum.bat :end If your host using the CUDA4.2 client, you should change the appname in the brown and the green sections to this: SET APPNAME=acemd.800-42.exe SET APPNAME=acemd.814-42.exe You can use this batch program to check any client's progress (other than GPUGrid's client), all you have to do is to replace the name of the acemd client with the name of the designated client's executable file at the end of the first line in the brown, or the green section. You have to repeat these two sections as many times as many client's progress you want to check. (however it's not recommended to mix different applications, since this version counts the running tasks, instead looking for a stuck task) The second batch file: (it must be named as slotcheck.bat, as the first batch file refers to this file with that name.) IF NOT EXIST "%SLOTDIR%\%1\%APPNAME%" GOTO end IF NOT .%2.==.. GOTO count IF %SLOTNUM%==8 SET SLOTNUM=9 IF %SLOTNUM%==7 SET SLOTNUM=8 IF %SLOTNUM%==6 SET SLOTNUM=7 IF %SLOTNUM%==5 SET SLOTNUM=6 IF %SLOTNUM%==4 SET SLOTNUM=5 IF %SLOTNUM%==3 SET SLOTNUM=4 IF %SLOTNUM%==2 SET SLOTNUM=3 IF %SLOTNUM%==1 SET SLOTNUM=2 IF %SLOTNUM%==0 SET SLOTNUM=1 DEL slot%SLOTNUM%.bat /q /f ECHO IF NOT EXIST slotnum.bat GOTO end >slot%SLOTNUM%.bat ECHO IF EXIST "%SLOTDIR%\%1\%APPNAME%" GOTO checkprogress >slot%SLOTNUM%.bat ECHO IF EXIST slotnum.bat DEL slotnum.bat /q /f >>slot%SLOTNUM%.bat ECHO GOTO end >>slot%SLOTNUM%.bat ECHO :checkprogress >>slot%SLOTNUM%.bat ECHO FIND "<fraction_done>" ^<"%SLOTDIR%\%1\boinc_task_state.xml" ^>%SLOTNUM%.txt >>slot%SLOTNUM%.bat ECHO FC %SLOTNUM%.txt %SLOTNUM%.xml >>slot%SLOTNUM%.bat ECHO IF ERRORLEVEL 1 GOTO ok >>slot%SLOTNUM%.bat ECHO ECHO . ^>^>check.log >>slot%SLOTNUM%.bat ECHO DATE /t ^>^>check.log >>slot%SLOTNUM%.bat ECHO TIME /t ^>^>check.log >>slot%SLOTNUM%.bat ECHO ECHO application %APPNAME% is stuck in slot %1 ^>^>check.log >>slot%SLOTNUM%.bat ECHO IF NOT EXIST "%SLOTDIR%\%1\boinc_task_state.xml" ECHO %SLOTDIR%\%1\boinc_task_state.xml is not exists! ^>^>check.log >>slot%SLOTNUM%.bat ECHO FIND "<result_name>" ^<"%SLOTDIR%\%1\boinc_task_state.xml" ^>^>check.log >>slot%SLOTNUM%.bat rem ECHO TYPE "%SLOTDIR%\%1\boinc_task_state.xml" ^>^>check.log >>slot%SLOTNUM%.bat ECHO TYPE %SLOTNUM%.xml ^>^>check.log >>slot%SLOTNUM%.bat ECHO GOTO end >>slot%SLOTNUM%.bat ECHO :ok >>slot%SLOTNUM%.bat ECHO COPY %SLOTNUM%.txt %SLOTNUM%.xml /y >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==8 SET INPROGRESS=9 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==7 SET INPROGRESS=8 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==6 SET INPROGRESS=7 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==5 SET INPROGRESS=6 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==4 SET INPROGRESS=5 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==3 SET INPROGRESS=4 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==2 SET INPROGRESS=3 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==1 SET INPROGRESS=2 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==0 SET INPROGRESS=1 >>slot%SLOTNUM%.bat ECHO :end >>slot%SLOTNUM%.bat FIND "<fraction_done>" <"%SLOTDIR%\%1\boinc_task_state.xml" >%slotnum%.xml GOTO end :count IF %SLOTCOUNT%==8 SET SLOTCOUNT=9 IF %SLOTCOUNT%==7 SET SLOTCOUNT=8 IF %SLOTCOUNT%==6 SET SLOTCOUNT=7 IF %SLOTCOUNT%==5 SET SLOTCOUNT=6 IF %SLOTCOUNT%==4 SET SLOTCOUNT=5 IF %SLOTCOUNT%==3 SET SLOTCOUNT=4 IF %SLOTCOUNT%==2 SET SLOTCOUNT=3 IF %SLOTCOUNT%==1 SET SLOTCOUNT=2 IF %SLOTCOUNT%==0 SET SLOTCOUNT=1 :end You should create a scheduled task to run "check.bat" every 10 minutes (shorter period is not recommended), with the highest access rights on Win7 and 8, or administrator privileges on WinXP (or else it won't be able to restart the OS) Known limitations: It can monitor 9 slots at the most. It checks only the first 20 slots for the targeted clients (it can be easily modified in the green and brown section) You have to set the number of GPUs manually in the check.bat batch file (in the indigo colored line) You should not pause more tasks (which is monitored by this batch program and already processed to any extent) than you've set in that line (or else the batch program will restart your host's OS every 20 minutes) If the monitored application writes its progress to the disk less frequent than every 10 minutes, you should increase the repetition interval according to the application. I'm using it on WinXP, and haven't tested on other Windows (7,8), but it should work. It creates the following files: - check.log: record of every restart with the date, time, workunit name and its progress - slotnum.bat file: it tells the batch program how many slots it has to monitor - slotn.bat file for every slot the batch program has to monitor - n.xml and n.txt files to record every slot's progress |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There is a surprisingly high rate at which the task completion coincides with the scheduled start of my batch program, and in this case the previous versions trigger a false positive, so I've modified the slotcheck.bat not to consider a task as stuck, when there is no boinc_task_state.xml present in the slot directory (only when there is no such file for the 2nd consecutive checking) IF NOT EXIST "%SLOTDIR%\%1\%APPNAME%" GOTO end IF NOT .%2.==.. GOTO count IF %SLOTNUM%==8 SET SLOTNUM=9 IF %SLOTNUM%==7 SET SLOTNUM=8 IF %SLOTNUM%==6 SET SLOTNUM=7 IF %SLOTNUM%==5 SET SLOTNUM=6 IF %SLOTNUM%==4 SET SLOTNUM=5 IF %SLOTNUM%==3 SET SLOTNUM=4 IF %SLOTNUM%==2 SET SLOTNUM=3 IF %SLOTNUM%==1 SET SLOTNUM=2 IF %SLOTNUM%==0 SET SLOTNUM=1 DEL slot%SLOTNUM%.bat /q /f ECHO IF NOT EXIST slotnum.bat GOTO end >slot%SLOTNUM%.bat ECHO IF EXIST "%SLOTDIR%\%1\%APPNAME%" GOTO checkprogress >slot%SLOTNUM%.bat ECHO IF EXIST slotnum.bat DEL slotnum.bat /q /f >>slot%SLOTNUM%.bat ECHO GOTO end >>slot%SLOTNUM%.bat ECHO :checkprogress >>slot%SLOTNUM%.bat ECHO IF EXIST "%SLOTDIR%\%1\boinc_task_state.xml" GOTO chk2 >>slot%SLOTNUM%.bat ECHO IF NOT EXIST %SLOTNUM%.txt GOTO stuck >>slot%SLOTNUM%.bat ECHO DEL %SLOTNUM%.txt / q /f >>slot%SLOTNUM%.bat ECHO GOTO ok2 >>slot%SLOTNUM%.bat ECHO :chk2 >>slot%SLOTNUM%.bat ECHO FIND "<fraction_done>" ^<"%SLOTDIR%\%1\boinc_task_state.xml" ^>%SLOTNUM%.txt >>slot%SLOTNUM%.bat ECHO FC %SLOTNUM%.txt %SLOTNUM%.xml >>slot%SLOTNUM%.bat ECHO IF ERRORLEVEL 1 GOTO ok >>slot%SLOTNUM%.bat ECHO :stuck >>slot%SLOTNUM%.bat ECHO ECHO . ^>^>check.log >>slot%SLOTNUM%.bat ECHO DATE /t ^>^>check.log >>slot%SLOTNUM%.bat ECHO TIME /t ^>^>check.log >>slot%SLOTNUM%.bat ECHO ECHO application %APPNAME% is stuck in slot %1 ^>^>check.log >>slot%SLOTNUM%.bat ECHO IF NOT EXIST "%SLOTDIR%\%1\boinc_task_state.xml" ECHO %SLOTDIR%\%1\boinc_task_state.xml is not exists! ^>^>check.log >>slot%SLOTNUM%.bat ECHO FIND "<result_name>" ^<"%SLOTDIR%\%1\boinc_task_state.xml" ^>^>check.log >>slot%SLOTNUM%.bat rem ECHO TYPE "%SLOTDIR%\%1\boinc_task_state.xml" ^>^>check.log >>slot%SLOTNUM%.bat ECHO TYPE %SLOTNUM%.xml ^>^>check.log >>slot%SLOTNUM%.bat ECHO GOTO end >>slot%SLOTNUM%.bat ECHO :ok >>slot%SLOTNUM%.bat ECHO COPY %SLOTNUM%.txt %SLOTNUM%.xml /y >>slot%SLOTNUM%.bat ECHO :ok2 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==8 SET INPROGRESS=9 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==7 SET INPROGRESS=8 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==6 SET INPROGRESS=7 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==5 SET INPROGRESS=6 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==4 SET INPROGRESS=5 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==3 SET INPROGRESS=4 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==2 SET INPROGRESS=3 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==1 SET INPROGRESS=2 >>slot%SLOTNUM%.bat ECHO IF %%INPROGRESS%%==0 SET INPROGRESS=1 >>slot%SLOTNUM%.bat ECHO :end >>slot%SLOTNUM%.bat FIND "<fraction_done>" <"%SLOTDIR%\%1\boinc_task_state.xml" >%slotnum%.xml COPY %SLOTNUM%.xml %SLOTNUM%.txt /y GOTO end :count IF %SLOTCOUNT%==8 SET SLOTCOUNT=9 IF %SLOTCOUNT%==7 SET SLOTCOUNT=8 IF %SLOTCOUNT%==6 SET SLOTCOUNT=7 IF %SLOTCOUNT%==5 SET SLOTCOUNT=6 IF %SLOTCOUNT%==4 SET SLOTCOUNT=5 IF %SLOTCOUNT%==3 SET SLOTCOUNT=4 IF %SLOTCOUNT%==2 SET SLOTCOUNT=3 IF %SLOTCOUNT%==1 SET SLOTCOUNT=2 IF %SLOTCOUNT%==0 SET SLOTCOUNT=1 :end |
|
Send message Joined: 21 Jan 10 Posts: 46 Credit: 1,388,234,528 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
After looking at my stats I came here and found out about the stuck WUs and it looks like I wasted a MONTH of GPU time. I reset the project and am now getting different WUs. Oh joy. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 261 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
We seem to have a persistent problem with WU 4792977 (I60R5-NATHAN_KIDKIXc22_6-9-50-RND2135). Three computers have failed to run it so far, all with 'exit status 98' after two or three seconds. The error messages are variously ERROR: file mdsim.cpp line 985: Invalid celldimension(linux) ERROR: file pme.cpp line 85: PME NX too small(windows) |
©2025 Universitat Pompeu Fabra