segmentation fault while running cclm job – in #9: CCLM

in #9: CCLM

Hi,

while running a job chain, I get this error for the cclm job:

OPEN: ncdf-file: 
 /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out06/lffd2041030500p.nc
 [m10091:30787:0] Caught signal 11 (Segmentation fault)
 ....
  backtrace 
2 0x00000000000572cc mxm_handle_error()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:641
3 0x000000000005743c mxm_error_signal_handler()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:616
....
srun: error: m10090: tasks 0-23: Segmentation fault
srun: Terminating job step 6664508.0
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: m10091: tasks 24-47: Segmentation fault
srun: error: m10095: tasks 48-71: Segmentation fault

I use the BULL MPI environment as described here: http://redc.clm-community.eu/projects/cclmdkrz/wiki/Fopts
and the batch setting recommended for mistral (http://redc.clm-community.eu/projects/cclmdkrz/wiki/Run-scripts)
All necessary modules are loaded.

Thanks.

  @redc_migration in #ad9adce

Hi,

while running a job chain, I get this error for the cclm job:

OPEN: ncdf-file: 
 /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out06/lffd2041030500p.nc
 [m10091:30787:0] Caught signal 11 (Segmentation fault)
 ....
  backtrace 
2 0x00000000000572cc mxm_handle_error()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:641
3 0x000000000005743c mxm_error_signal_handler()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:616
....
srun: error: m10090: tasks 0-23: Segmentation fault
srun: Terminating job step 6664508.0
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: m10091: tasks 24-47: Segmentation fault
srun: error: m10095: tasks 48-71: Segmentation fault

I use the BULL MPI environment as described here: http://redc.clm-community.eu/projects/cclmdkrz/wiki/Fopts
and the batch setting recommended for mistral (http://redc.clm-community.eu/projects/cclmdkrz/wiki/Run-scripts)
All necessary modules are loaded.

Thanks.

segmentation fault while running cclm job

Hi,

while running a job chain, I get this error for the cclm job:

OPEN: ncdf-file: 
 /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out06/lffd2041030500p.nc
 [m10091:30787:0] Caught signal 11 (Segmentation fault)
 ....
  backtrace 
2 0x00000000000572cc mxm_handle_error()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:641
3 0x000000000005743c mxm_error_signal_handler()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.6.392-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.4/src/mxm/util/debug/debug.c:616
....
srun: error: m10090: tasks 0-23: Segmentation fault
srun: Terminating job step 6664508.0
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: m10091: tasks 24-47: Segmentation fault
srun: error: m10095: tasks 48-71: Segmentation fault

I use the BULL MPI environment as described here: http://redc.clm-community.eu/projects/cclmdkrz/wiki/Fopts
and the batch setting recommended for mistral (http://redc.clm-community.eu/projects/cclmdkrz/wiki/Run-scripts)
All necessary modules are loaded.

Thanks.

View in channel

Often this error appears when the basic data for the vertical interpolation have unrealistic or even NaN values. Have you checked the data in the files listed below?


00: /scratch/b/b324052/rcp85_41_60_125/input/cclm/2041_03/lbfd2041030503.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out01/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out02/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out03/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out04/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out05/lffd2041030500.nc
00: CLOSING ncdf FILE

  @burkhardtrockel in #5420d0f

Often this error appears when the basic data for the vertical interpolation have unrealistic or even NaN values. Have you checked the data in the files listed below?


00: /scratch/b/b324052/rcp85_41_60_125/input/cclm/2041_03/lbfd2041030503.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out01/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out02/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out03/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out04/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out05/lffd2041030500.nc
00: CLOSING ncdf FILE

Often this error appears when the basic data for the vertical interpolation have unrealistic or even NaN values. Have you checked the data in the files listed below?


00: /scratch/b/b324052/rcp85_41_60_125/input/cclm/2041_03/lbfd2041030503.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out01/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out02/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out03/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out04/lffd2041030500.nc
00: CLOSING ncdf FILE
00: OPEN: ncdf-file:
00: /scratch/b/b324052/rcp85_41_60_125/output/cclm/2041_03/out05/lffd2041030500.nc
00: CLOSING ncdf FILE