segmentation fault while running cclm job – in #9: CCLM

in #9: CCLM

<p> Hi, </p> <p> while running a job chain, I get this error for the cclm job: </p> <pre> OPEN: ncdf-file: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out06/lffd2041030500p.nc [m10091:30787:0] Caught signal 11 (Segmentation fault) .... backtrace 2 0x00000000000572cc mxm_handle_error() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:641 3 0x000000000005743c mxm_error_signal_handler() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:616 .... srun: error: m10090: tasks 0-23: Segmentation fault srun: Terminating job step 6664508.0 srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: m10091: tasks 24-47: Segmentation fault srun: error: m10095: tasks 48-71: Segmentation fault </pre> <p> I use the <span class="caps"> BULL </span> <span class="caps"> MPI </span> environment as described here: http://redc.clm-community.eu/projects/cclmdkrz/wiki/Fopts <br/> and the batch setting recommended for mistral (http://redc.clm-community.eu/projects/cclmdkrz/wiki/Run-scripts) <br/> All necessary modules are loaded. </p> <p> Thanks. </p>

  @redc_migration in #ad9adce

<p> Hi, </p> <p> while running a job chain, I get this error for the cclm job: </p> <pre> OPEN: ncdf-file: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out06/lffd2041030500p.nc [m10091:30787:0] Caught signal 11 (Segmentation fault) .... backtrace 2 0x00000000000572cc mxm_handle_error() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:641 3 0x000000000005743c mxm_error_signal_handler() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:616 .... srun: error: m10090: tasks 0-23: Segmentation fault srun: Terminating job step 6664508.0 srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: m10091: tasks 24-47: Segmentation fault srun: error: m10095: tasks 48-71: Segmentation fault </pre> <p> I use the <span class="caps"> BULL </span> <span class="caps"> MPI </span> environment as described here: http://redc.clm-community.eu/projects/cclmdkrz/wiki/Fopts <br/> and the batch setting recommended for mistral (http://redc.clm-community.eu/projects/cclmdkrz/wiki/Run-scripts) <br/> All necessary modules are loaded. </p> <p> Thanks. </p>

segmentation fault while running cclm job

Hi,

while running a job chain, I get this error for the cclm job:

OPEN: ncdf-file: 
 /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out06/lffd2041030500p.nc
 [m10091:30787:0] Caught signal 11 (Segmentation fault)
 ....
  backtrace 
2 0x00000000000572cc mxm_handle_error()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:641
3 0x000000000005743c mxm_error_signal_handler()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:616
....
srun: error: m10090: tasks 0-23: Segmentation fault
srun: Terminating job step 6664508.0
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: m10091: tasks 24-47: Segmentation fault
srun: error: m10095: tasks 48-71: Segmentation fault

I use the BULL MPI environment as described here: http://redc.clm-community.eu/projects/cclmdkrz/wiki/Fopts
and the batch setting recommended for mistral (http://redc.clm-community.eu/projects/cclmdkrz/wiki/Run-scripts)
All necessary modules are loaded.

Thanks.

View in channel
<p> Often this error appears when the basic data for the vertical interpolation have unrealistic or even NaN values. Have you checked the data in the files listed below? </p> <p> <pre><br/> 00: /scratch/b/b324052/rcp85_41_60_125/input/cclm/2041_03/lbfd2041030503.nc<br/> 00: <span class="caps">CLOSING</span> ncdf FILE<br/> 00: <span class="caps">OPEN</span>: ncdf-file: <br/> 00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out01/lffd2041030500.nc<br/> 00: <span class="caps">CLOSING</span> ncdf FILE<br/> 00: <span class="caps">OPEN</span>: ncdf-file: <br/> 00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out02/lffd2041030500.nc<br/> 00: <span class="caps">CLOSING</span> ncdf FILE<br/> 00: <span class="caps">OPEN</span>: ncdf-file: <br/> 00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out03/lffd2041030500.nc<br/> 00: <span class="caps">CLOSING</span> ncdf FILE<br/> 00: <span class="caps">OPEN</span>: ncdf-file: <br/> 00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out04/lffd2041030500.nc<br/> 00: <span class="caps">CLOSING</span> ncdf FILE<br/> 00: <span class="caps">OPEN</span>: ncdf-file: <br/> 00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out05/lffd2041030500.nc<br/> 00: <span class="caps">CLOSING</span> ncdf <span class="caps">FILE</span></pre> </p> <p> </p>

  @burkhardtrockel in #5420d0f

<p> Often this error appears when the basic data for the vertical interpolation have unrealistic or even NaN values. Have you checked the data in the files listed below? </p> <p> <pre><br/> 00: /scratch/b/b324052/rcp85_41_60_125/input/cclm/2041_03/lbfd2041030503.nc<br/> 00: <span class="caps">CLOSING</span> ncdf FILE<br/> 00: <span class="caps">OPEN</span>: ncdf-file: <br/> 00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out01/lffd2041030500.nc<br/> 00: <span class="caps">CLOSING</span> ncdf FILE<br/> 00: <span class="caps">OPEN</span>: ncdf-file: <br/> 00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out02/lffd2041030500.nc<br/> 00: <span class="caps">CLOSING</span> ncdf FILE<br/> 00: <span class="caps">OPEN</span>: ncdf-file: <br/> 00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out03/lffd2041030500.nc<br/> 00: <span class="caps">CLOSING</span> ncdf FILE<br/> 00: <span class="caps">OPEN</span>: ncdf-file: <br/> 00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out04/lffd2041030500.nc<br/> 00: <span class="caps">CLOSING</span> ncdf FILE<br/> 00: <span class="caps">OPEN</span>: ncdf-file: <br/> 00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out05/lffd2041030500.nc<br/> 00: <span class="caps">CLOSING</span> ncdf <span class="caps">FILE</span></pre> </p> <p> </p>

Often this error appears when the basic data for the vertical interpolation have unrealistic or even NaN values. Have you checked the data in the files listed below?


00: /scratch/b/b324052/rcp85_41_60_125/input/cclm/2041_03/lbfd2041030503.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out01/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out02/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out03/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out04/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out05/lffd2041030500.nc
00: CLOSING ncdf FILE