CCLM simulations fail on Mistral - floating point exception C – in #9: CCLM

in #9: CCLM

<p> Dear colleagues, </p> <p> I have been trying to run a 1-day, test simulation with <span class="caps"> CCLM </span> cosmo4.8_clm19 on Mistral for the first time since Blizzard was retired. <br/> I am running the <span class="caps"> CCLM </span> with an almost standard configuration for the 0.0625° horizontal resolution that I have successfully employed in several experiments on Blizzard. </p> <p> I have modified the batch script as suggested here http://redc.clm-community.eu/projects/cclmdkrz/wiki/Run-scripts. </p> <p> After a series of minor problems that were solved thanks to the error messages included in the .out and .err files I came to a dead-end. <br/> Now when I submit my job with sbatch the simulation runs for some seconds, produces the lffd1996100400c.nc file and exits without leaving any error message in the .out file. However, in the .err file I get multiple errors in this form: </p> <p> 56: [m10393:41443:0] Caught signal 8 (Floating point exception) </p> <p> 56: backtrace <br/> 56: 2 0×00000000000548cc mxm_handle_error() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u4-x86-64- <span class="caps"> MOFED </span> - <span class="caps"> CHECKER </span> /hpcx_root/src/hpcx-v1.2.0-268-gcc- <span class="caps"> OFED </span> -3.12-redhat6.4/mxm-master/src/mxm/util/debug/debug.c:641 <br/> 56: 3 0×0000000000054a3c mxm_error_signal_handler() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u4-x86-64- <span class="caps"> MOFED </span> - <span class="caps"> CHECKER </span> /hpcx_root/src/hpcx-v1.2.0-268-gcc- <span class="caps"> OFED </span> -3.12-redhat6.4/mxm-master/src/mxm/util/debug/debug.c:616 <br/> 56: 4 0×00000000000326a0 killpg() ??:0 <br/> 56: 5 0×00000000002dabd5 pow.L() ??:0 <br/> 56: 6 0×000000000001ed5d __libc_start_main() ??:0 </p> <p> srun: error: m10393: tasks 40,45-50,52-59: Floating point exception <br/> srun: Terminating job step 2108157.0 <br/> 00: slurmstepd: *** <span class="caps"> STEP </span> 2108157.0 ON m10314 <span class="caps"> CANCELLED </span> AT 2016-03-15T20:09:55 *** <br/> srun: Job step aborted: Waiting up to 32 seconds for job step to finish. </p> <p> The model sources have been complied correctly and have been successfully used by another member of our <span class="caps"> DKRZ </span> account. It seems that there is a floating point exception that I cannot figure out in any way. </p> <p> Does any of you ever encountered such problems, or have a clue at what might be causing all this? </p> <p> I have attached my batch script, my .err and .out files along with the <span class="caps"> YUSPECIF </span> , the <span class="caps"> YUDEBUG </span> and the <span class="caps"> YUCHKDAT </span> . </p> <p> Your help would be incredibly appreciated. </p> <p> Best, </p> <p> Edoardo Mazza </p>

  @redc_migration in #9d805d7

<p> Dear colleagues, </p> <p> I have been trying to run a 1-day, test simulation with <span class="caps"> CCLM </span> cosmo4.8_clm19 on Mistral for the first time since Blizzard was retired. <br/> I am running the <span class="caps"> CCLM </span> with an almost standard configuration for the 0.0625° horizontal resolution that I have successfully employed in several experiments on Blizzard. </p> <p> I have modified the batch script as suggested here http://redc.clm-community.eu/projects/cclmdkrz/wiki/Run-scripts. </p> <p> After a series of minor problems that were solved thanks to the error messages included in the .out and .err files I came to a dead-end. <br/> Now when I submit my job with sbatch the simulation runs for some seconds, produces the lffd1996100400c.nc file and exits without leaving any error message in the .out file. However, in the .err file I get multiple errors in this form: </p> <p> 56: [m10393:41443:0] Caught signal 8 (Floating point exception) </p> <p> 56: backtrace <br/> 56: 2 0×00000000000548cc mxm_handle_error() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u4-x86-64- <span class="caps"> MOFED </span> - <span class="caps"> CHECKER </span> /hpcx_root/src/hpcx-v1.2.0-268-gcc- <span class="caps"> OFED </span> -3.12-redhat6.4/mxm-master/src/mxm/util/debug/debug.c:641 <br/> 56: 3 0×0000000000054a3c mxm_error_signal_handler() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u4-x86-64- <span class="caps"> MOFED </span> - <span class="caps"> CHECKER </span> /hpcx_root/src/hpcx-v1.2.0-268-gcc- <span class="caps"> OFED </span> -3.12-redhat6.4/mxm-master/src/mxm/util/debug/debug.c:616 <br/> 56: 4 0×00000000000326a0 killpg() ??:0 <br/> 56: 5 0×00000000002dabd5 pow.L() ??:0 <br/> 56: 6 0×000000000001ed5d __libc_start_main() ??:0 </p> <p> srun: error: m10393: tasks 40,45-50,52-59: Floating point exception <br/> srun: Terminating job step 2108157.0 <br/> 00: slurmstepd: *** <span class="caps"> STEP </span> 2108157.0 ON m10314 <span class="caps"> CANCELLED </span> AT 2016-03-15T20:09:55 *** <br/> srun: Job step aborted: Waiting up to 32 seconds for job step to finish. </p> <p> The model sources have been complied correctly and have been successfully used by another member of our <span class="caps"> DKRZ </span> account. It seems that there is a floating point exception that I cannot figure out in any way. </p> <p> Does any of you ever encountered such problems, or have a clue at what might be causing all this? </p> <p> I have attached my batch script, my .err and .out files along with the <span class="caps"> YUSPECIF </span> , the <span class="caps"> YUDEBUG </span> and the <span class="caps"> YUCHKDAT </span> . </p> <p> Your help would be incredibly appreciated. </p> <p> Best, </p> <p> Edoardo Mazza </p>

CCLM simulations fail on Mistral - floating point exception C

Dear colleagues,

I have been trying to run a 1-day, test simulation with CCLM cosmo4.8_clm19 on Mistral for the first time since Blizzard was retired.
I am running the CCLM with an almost standard configuration for the 0.0625° horizontal resolution that I have successfully employed in several experiments on Blizzard.

I have modified the batch script as suggested here http://redc.clm-community.eu/projects/cclmdkrz/wiki/Run-scripts.

After a series of minor problems that were solved thanks to the error messages included in the .out and .err files I came to a dead-end.
Now when I submit my job with sbatch the simulation runs for some seconds, produces the lffd1996100400c.nc file and exits without leaving any error message in the .out file. However, in the .err file I get multiple errors in this form:

56: [m10393:41443:0] Caught signal 8 (Floating point exception)

56: backtrace
56: 2 0×00000000000548cc mxm_handle_error() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u4-x86-64- MOFED - CHECKER /hpcx_root/src/hpcx-v1.2.0-268-gcc- OFED -3.12-redhat6.4/mxm-master/src/mxm/util/debug/debug.c:641
56: 3 0×0000000000054a3c mxm_error_signal_handler() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u4-x86-64- MOFED - CHECKER /hpcx_root/src/hpcx-v1.2.0-268-gcc- OFED -3.12-redhat6.4/mxm-master/src/mxm/util/debug/debug.c:616
56: 4 0×00000000000326a0 killpg() ??:0
56: 5 0×00000000002dabd5 pow.L() ??:0
56: 6 0×000000000001ed5d __libc_start_main() ??:0

srun: error: m10393: tasks 40,45-50,52-59: Floating point exception
srun: Terminating job step 2108157.0
00: slurmstepd: *** STEP 2108157.0 ON m10314 CANCELLED AT 2016-03-15T20:09:55 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

The model sources have been complied correctly and have been successfully used by another member of our DKRZ account. It seems that there is a floating point exception that I cannot figure out in any way.

Does any of you ever encountered such problems, or have a clue at what might be causing all this?

I have attached my batch script, my .err and .out files along with the YUSPECIF , the YUDEBUG and the YUCHKDAT .

Your help would be incredibly appreciated.

Best,

Edoardo Mazza

View in channel
<p> Dear Edoardo, </p> <p> having a very first and quick look into your <span class="caps"> YUCHKDAT </span> I would say that something is wrong with your forcing data. <br/> Look at your T_SO values in the deeper layers. <br/> They become very small and even negative!!!! The unit for T_SO is Kelvin!! </p> <p> Furthermore I saw in your <span class="caps"> YUSPECIF </span> that you run the model in <span class="caps"> NWP </span> mode, not in climate mode (lbdclim=.FALSE.). <br/> Is this what you want to do? </p> <p> Hans-Juergen </p>

  @hans-jürgenpanitz in #f050312

<p> Dear Edoardo, </p> <p> having a very first and quick look into your <span class="caps"> YUCHKDAT </span> I would say that something is wrong with your forcing data. <br/> Look at your T_SO values in the deeper layers. <br/> They become very small and even negative!!!! The unit for T_SO is Kelvin!! </p> <p> Furthermore I saw in your <span class="caps"> YUSPECIF </span> that you run the model in <span class="caps"> NWP </span> mode, not in climate mode (lbdclim=.FALSE.). <br/> Is this what you want to do? </p> <p> Hans-Juergen </p>

Dear Edoardo,

having a very first and quick look into your YUCHKDAT I would say that something is wrong with your forcing data.
Look at your T_SO values in the deeper layers.
They become very small and even negative!!!! The unit for T_SO is Kelvin!!

Furthermore I saw in your YUSPECIF that you run the model in NWP mode, not in climate mode (lbdclim=.FALSE.).
Is this what you want to do?

Hans-Juergen

<p> Dear Hans-Juergen, </p> <p> Thank you very much for your support and sorry for the late reply but it took me a few days to go back to the roots of the problem. <br/> I agree that there’s something wrong with those temperature, therefore I went back to the previous downscaling step to see where these weird values came from. </p> <p> I wanted to repeat the simulation driven with <span class="caps"> ERA </span> -Interim obtained from the <span class="caps"> DKRZ </span> directory /pool/data/CCLM/reanalyses/ERAInterim. I adapted the run_int2lm script for the gcm2cclm case. Again, I wanted to test that it was working fine for 24 hours. </p> <p> Unfortunately the situation does not seem to have changed at all. The “floating point exception” error is still causing the program to quit. So it seems that the problem goes beyond the T_SO values. I am really losing the focus on what the problem is right now. I have checked and double-checked but clearly there’s something wrong that I can’t find. </p> <p> Please find attached the run_int2lm, the .out, <span class="caps"> YUCHKDAT </span> , <span class="caps"> INPUT </span> , <span class="caps"> OUTPUT </span> and <span class="caps"> YUDEBUG </span> files. </p> <p> Best wishes, </p> <p> Edoardo </p>

  @redc_migration in #b140824

<p> Dear Hans-Juergen, </p> <p> Thank you very much for your support and sorry for the late reply but it took me a few days to go back to the roots of the problem. <br/> I agree that there’s something wrong with those temperature, therefore I went back to the previous downscaling step to see where these weird values came from. </p> <p> I wanted to repeat the simulation driven with <span class="caps"> ERA </span> -Interim obtained from the <span class="caps"> DKRZ </span> directory /pool/data/CCLM/reanalyses/ERAInterim. I adapted the run_int2lm script for the gcm2cclm case. Again, I wanted to test that it was working fine for 24 hours. </p> <p> Unfortunately the situation does not seem to have changed at all. The “floating point exception” error is still causing the program to quit. So it seems that the problem goes beyond the T_SO values. I am really losing the focus on what the problem is right now. I have checked and double-checked but clearly there’s something wrong that I can’t find. </p> <p> Please find attached the run_int2lm, the .out, <span class="caps"> YUCHKDAT </span> , <span class="caps"> INPUT </span> , <span class="caps"> OUTPUT </span> and <span class="caps"> YUDEBUG </span> files. </p> <p> Best wishes, </p> <p> Edoardo </p>

Dear Hans-Juergen,

Thank you very much for your support and sorry for the late reply but it took me a few days to go back to the roots of the problem.
I agree that there’s something wrong with those temperature, therefore I went back to the previous downscaling step to see where these weird values came from.

I wanted to repeat the simulation driven with ERA -Interim obtained from the DKRZ directory /pool/data/CCLM/reanalyses/ERAInterim. I adapted the run_int2lm script for the gcm2cclm case. Again, I wanted to test that it was working fine for 24 hours.

Unfortunately the situation does not seem to have changed at all. The “floating point exception” error is still causing the program to quit. So it seems that the problem goes beyond the T_SO values. I am really losing the focus on what the problem is right now. I have checked and double-checked but clearly there’s something wrong that I can’t find.

Please find attached the run_int2lm, the .out, YUCHKDAT , INPUT , OUTPUT and YUDEBUG files.

Best wishes,

Edoardo

<p> Please try </p> <p> <pre> lprog_qi = .TRUE.,</pre> </p> <p> </p>

  @burkhardtrockel in #213af95

<p> Please try </p> <p> <pre> lprog_qi = .TRUE.,</pre> </p> <p> </p>

Please try

  lprog_qi = .TRUE.,

<p> Dear Edoardo, </p> <p> did you realize that the <span class="caps"> ERA </span> -Interim data (caf-files) in /pool/data/CCLM/reanalyses are Netcdf4 compressed? <br/> There is a <span class="caps"> README </span> in the <span class="caps"> ERAINT </span> directory telling that. <br/> Perhaps that is your problem. </p> <p> Alternatively you can try to use the umcompressed caf-files that are available from my workspace: <br/> /work/bb0849/b364034/ERAINT/CCLM_Forcing_Data/ </p> <p> Furthermore, the <span class="caps"> ERAINT </span> data have T_SKIN, thus set “luse_t_skin=.TRUE.“ <br/> Of course, consider also Burkhardt’s suggestion “lprog_qi=.TRUE.”, since QI is also available </p> <p> Hans-Juergen </p>

  @hans-jürgenpanitz in #3996ef2

<p> Dear Edoardo, </p> <p> did you realize that the <span class="caps"> ERA </span> -Interim data (caf-files) in /pool/data/CCLM/reanalyses are Netcdf4 compressed? <br/> There is a <span class="caps"> README </span> in the <span class="caps"> ERAINT </span> directory telling that. <br/> Perhaps that is your problem. </p> <p> Alternatively you can try to use the umcompressed caf-files that are available from my workspace: <br/> /work/bb0849/b364034/ERAINT/CCLM_Forcing_Data/ </p> <p> Furthermore, the <span class="caps"> ERAINT </span> data have T_SKIN, thus set “luse_t_skin=.TRUE.“ <br/> Of course, consider also Burkhardt’s suggestion “lprog_qi=.TRUE.”, since QI is also available </p> <p> Hans-Juergen </p>

Dear Edoardo,

did you realize that the ERA -Interim data (caf-files) in /pool/data/CCLM/reanalyses are Netcdf4 compressed?
There is a README in the ERAINT directory telling that.
Perhaps that is your problem.

Alternatively you can try to use the umcompressed caf-files that are available from my workspace:
/work/bb0849/b364034/ERAINT/CCLM_Forcing_Data/

Furthermore, the ERAINT data have T_SKIN, thus set “luse_t_skin=.TRUE.“
Of course, consider also Burkhardt’s suggestion “lprog_qi=.TRUE.”, since QI is also available

Hans-Juergen

<p> Dear all, <br/> I currently encountered a very similar problem. I get the following messages: <br/> <pre> 239: [m10063:37251:0] Caught signal 8 (Floating point exception) 248: backtrace 248: 2 0x000000000005767c mxm_handle_error() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:641 248: 3 0x00000000000577ec mxm_error_signal_handler() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:616 248: 4 0x0000000000032510 killpg() ??:0 248: 5 0x00000000008606ab src_soil_multlay_mp_terra_multlay_() ??:0 248: 6 0x000000000055e4bf organize_physics_() ??:0 248: 7 0x000000000058d900 MAIN__() ??:0 248: 8 0x00000000004052fe main() ??:0 248: 9 0x000000000001ed1d __libc_start_main() ??:0 248: 10 0x00000000004051f9 _start() ??:0 248: =============== </pre> <br/> I already tried to decompress the <span class="caps"> ERA </span> -Interim data and I also considered the hints you gave before, but it still doesn’t work. <br/> I would be very grateful for help. <br/> Thank you very much and best regards, <br/> Eva </p>

  @evanowatzki in #56d8c5f

<p> Dear all, <br/> I currently encountered a very similar problem. I get the following messages: <br/> <pre> 239: [m10063:37251:0] Caught signal 8 (Floating point exception) 248: backtrace 248: 2 0x000000000005767c mxm_handle_error() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:641 248: 3 0x00000000000577ec mxm_error_signal_handler() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:616 248: 4 0x0000000000032510 killpg() ??:0 248: 5 0x00000000008606ab src_soil_multlay_mp_terra_multlay_() ??:0 248: 6 0x000000000055e4bf organize_physics_() ??:0 248: 7 0x000000000058d900 MAIN__() ??:0 248: 8 0x00000000004052fe main() ??:0 248: 9 0x000000000001ed1d __libc_start_main() ??:0 248: 10 0x00000000004051f9 _start() ??:0 248: =============== </pre> <br/> I already tried to decompress the <span class="caps"> ERA </span> -Interim data and I also considered the hints you gave before, but it still doesn’t work. <br/> I would be very grateful for help. <br/> Thank you very much and best regards, <br/> Eva </p>

Dear all,
I currently encountered a very similar problem. I get the following messages:

239: [m10063:37251:0] Caught signal 8 (Floating point exception)
248:  backtrace 
248:  2 0x000000000005767c mxm_handle_error()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:641
248:  3 0x00000000000577ec mxm_error_signal_handler()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:616
248:  4 0x0000000000032510 killpg()  ??:0
248:  5 0x00000000008606ab src_soil_multlay_mp_terra_multlay_()  ??:0
248:  6 0x000000000055e4bf organize_physics_()  ??:0
248:  7 0x000000000058d900 MAIN__()  ??:0
248:  8 0x00000000004052fe main()  ??:0
248:  9 0x000000000001ed1d __libc_start_main()  ??:0
248: 10 0x00000000004051f9 _start()  ??:0
248: ===============

I already tried to decompress the ERA -Interim data and I also considered the hints you gave before, but it still doesn’t work.
I would be very grateful for help.
Thank you very much and best regards,
Eva

<p> Dear all, <br/> by changing the INT2LM I could solve the problem I posted before, but now a new error appears, that is also kind of similar. <br/> <pre> 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc 0: CLOSING ncdf FILE 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc 0: CLOSING ncdf FILE 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100z.nc 0: CLOSING ncdf FILE 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100p.nc 0: CLOSING ncdf FILE 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc 0: smoothing pmsl over mountainous terrain 0: CLOSING ncdf FILE 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc 549: [m11510:32171:0] Caught signal 11 (Segmentation fault) 77: [m11397:46159:0] Caught signal 11 (Segmentation fault) ... 2: [m11394:43474:0] Caught signal 11 (Segmentation fault) 0: CLOSING ncdf FILE 0: [m11394:43472:0] Caught signal 11 (Segmentation fault) ... 380: [m11503:12975:0] Caught signal 11 (Segmentation fault) 28: backtrace 28: 2 0x000000000005767c mxm_handle_error() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:641 28: 3 0x00000000000577ec mxm_error_signal_handler() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:616 28: 4 0x0000000000032510 killpg() ??:0 28: 5 0x000000000053343b organize_data_() ??:0 28: 6 0x000000000058de82 MAIN__() ??:0 28: 7 0x00000000004052fe main() ??:0 28: 8 0x000000000001ed1d __libc_start_main() ??:0 28: 9 0x00000000004051f9 _start() ??:0 28: =============== </pre> <br/> Could anyone please help me with this problem? I would be very grateful for help. <br/> Thank you very much and best regards, <br/> Eva Nowatzki </p>

  @evanowatzki in #c06843e

<p> Dear all, <br/> by changing the INT2LM I could solve the problem I posted before, but now a new error appears, that is also kind of similar. <br/> <pre> 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc 0: CLOSING ncdf FILE 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc 0: CLOSING ncdf FILE 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100z.nc 0: CLOSING ncdf FILE 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100p.nc 0: CLOSING ncdf FILE 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc 0: smoothing pmsl over mountainous terrain 0: CLOSING ncdf FILE 0: OPEN: ncdf-file: 0: /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc 549: [m11510:32171:0] Caught signal 11 (Segmentation fault) 77: [m11397:46159:0] Caught signal 11 (Segmentation fault) ... 2: [m11394:43474:0] Caught signal 11 (Segmentation fault) 0: CLOSING ncdf FILE 0: [m11394:43472:0] Caught signal 11 (Segmentation fault) ... 380: [m11503:12975:0] Caught signal 11 (Segmentation fault) 28: backtrace 28: 2 0x000000000005767c mxm_handle_error() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:641 28: 3 0x00000000000577ec mxm_error_signal_handler() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:616 28: 4 0x0000000000032510 killpg() ??:0 28: 5 0x000000000053343b organize_data_() ??:0 28: 6 0x000000000058de82 MAIN__() ??:0 28: 7 0x00000000004052fe main() ??:0 28: 8 0x000000000001ed1d __libc_start_main() ??:0 28: 9 0x00000000004051f9 _start() ??:0 28: =============== </pre> <br/> Could anyone please help me with this problem? I would be very grateful for help. <br/> Thank you very much and best regards, <br/> Eva Nowatzki </p>

Dear all,
by changing the INT2LM I could solve the problem I posted before, but now a new error appears, that is also kind of similar.

  0:  OPEN: ncdf-file:
  0:  /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc
  0:  CLOSING ncdf FILE
  0:  OPEN: ncdf-file:
  0:  /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc
  0:  CLOSING ncdf FILE
  0:  OPEN: ncdf-file:
  0:  /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100z.nc
  0:  CLOSING ncdf FILE
  0:  OPEN: ncdf-file:
  0:  /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100p.nc
  0:  CLOSING ncdf FILE
  0:  OPEN: ncdf-file:
  0:  /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc
  0:   smoothing pmsl over mountainous terrain
  0:  CLOSING ncdf FILE
  0:  OPEN: ncdf-file:
  0:  /scratch/b/b380794/Ref_run/output/cclm/1999_01/out01/lffd1999010100.nc
549: [m11510:32171:0] Caught signal 11 (Segmentation fault)
 77: [m11397:46159:0] Caught signal 11 (Segmentation fault)
...
  2: [m11394:43474:0] Caught signal 11 (Segmentation fault)
  0:  CLOSING ncdf FILE
  0: [m11394:43472:0] Caught signal 11 (Segmentation fault)
...
380: [m11503:12975:0] Caught signal 11 (Segmentation fault)
 28:  backtrace 
 28:  2 0x000000000005767c mxm_handle_error()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:641
 28:  3 0x00000000000577ec mxm_error_signal_handler()  /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u7-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.9.7-gcc-OFED-3.18-redhat6.7-x86_64/mxm-v3.6/src/mxm/util/debug/debug.c:616
 28:  4 0x0000000000032510 killpg()  ??:0
 28:  5 0x000000000053343b organize_data_()  ??:0
 28:  6 0x000000000058de82 MAIN__()  ??:0
 28:  7 0x00000000004052fe main()  ??:0
 28:  8 0x000000000001ed1d __libc_start_main()  ??:0
 28:  9 0x00000000004051f9 _start()  ??:0
 28: ===============

Could anyone please help me with this problem? I would be very grateful for help.
Thank you very much and best regards,
Eva Nowatzki