problem with gcm_to_cclm test – in #12: CCLM Starter Package Support

in #12: CCLM Starter Package Support

<p> Dear colleagues, </p> <p> We try to perform test in directory ../step_by_step/gcm_to_cclm/ but for some reasons run_int2lm script fails at a certain moment. All files from directory ../cclm-sp_2.4/step_by_step/gcm_to_cclm/log_int2lm are attached. No file appears in directory ../step_by_step/gcm_to_cclm/data/int2lm_output. Could you, please, suggest any advice on tracking/solving this problem? </p> <p> Kind regards, <br/> Iya Belova </p> <p> P.S. maybe this problem is related to the fact that in file int2lm.exe.out-14621 we see the following line “Binary name ….: tstint2lm” but the real binary name is int2lm.exe but we didn’t find does this name (tstint2lm) come from. </p>

  @iyabelova in #d5345fc

<p> Dear colleagues, </p> <p> We try to perform test in directory ../step_by_step/gcm_to_cclm/ but for some reasons run_int2lm script fails at a certain moment. All files from directory ../cclm-sp_2.4/step_by_step/gcm_to_cclm/log_int2lm are attached. No file appears in directory ../step_by_step/gcm_to_cclm/data/int2lm_output. Could you, please, suggest any advice on tracking/solving this problem? </p> <p> Kind regards, <br/> Iya Belova </p> <p> P.S. maybe this problem is related to the fact that in file int2lm.exe.out-14621 we see the following line “Binary name ….: tstint2lm” but the real binary name is int2lm.exe but we didn’t find does this name (tstint2lm) come from. </p>

problem with gcm_to_cclm test

Dear colleagues,

We try to perform test in directory ../step_by_step/gcm_to_cclm/ but for some reasons run_int2lm script fails at a certain moment. All files from directory ../cclm-sp_2.4/step_by_step/gcm_to_cclm/log_int2lm are attached. No file appears in directory ../step_by_step/gcm_to_cclm/data/int2lm_output. Could you, please, suggest any advice on tracking/solving this problem?

Kind regards,
Iya Belova

P.S. maybe this problem is related to the fact that in file int2lm.exe.out-14621 we see the following line “Binary name ….: tstint2lm” but the real binary name is int2lm.exe but we didn’t find does this name (tstint2lm) come from.

View in channel
<p> I assume that you run the original run_int2lm script without any modifications? In that case it might be a problem with your computer system, i.e. compiler and options, mpi version etc. If you provide information on <ul> <li> the name of the computing system you use </li> <li> the name of the compiler and its version </li> <li> the name of the mpi and its version </li> <li> attach the Fopts file from <code> /home/dokukin/work/cosmo/cclm-sp_2.4/src/int2lm </code> </li> </ul> </p> <p> maybe someone from the <span class="caps"> CLM </span> -Community using a similar configuration can help. </p> <p> The different binary name is not the reason, because this is an information which has to be set by the user (see the following snippet from the subroutine <code> info_int2lm.f90 </code> ) and does not has any effect on the model run. <br/> <pre><br/> ! Currently it is not possible with FORTRAN95 to get the information<br/> ! of the full path of binary name like the $0 in C. Additionally<br/> ! we cannot determine on which host(s) the binary is running and the<br/> ! domain of the data spread through the nodes.<br/> ! Therefore this information has to be defined manually. On using info_readnl()<br/> ! this information may be defined within the segment /info_defaults/ which<br/> ! has to reside within the named namelist of your choice. Missing information<br/> ! will be ignored silently.<br/> ! Currently following information may be defined within /info_defaults/:<br/> ! <span class="caps">INFO</span>_Options ..: List of print options<br/> ! <span class="caps">INFO</span>_BinaryName: Name (best: full path) of the binary<br/> ! <span class="caps">INFO</span>_RunMachine: The machine (OS) where the program is running<br/> ! <span class="caps">INFO</span>_Nodes ….: Description of the nodes the binary is running<br/> ! <span class="caps">INFO</span>_Domain …: The domain the binary is calculating</pre> </p> <p> </p>

  @burkhardtrockel in #d8cb29b

<p> I assume that you run the original run_int2lm script without any modifications? In that case it might be a problem with your computer system, i.e. compiler and options, mpi version etc. If you provide information on <ul> <li> the name of the computing system you use </li> <li> the name of the compiler and its version </li> <li> the name of the mpi and its version </li> <li> attach the Fopts file from <code> /home/dokukin/work/cosmo/cclm-sp_2.4/src/int2lm </code> </li> </ul> </p> <p> maybe someone from the <span class="caps"> CLM </span> -Community using a similar configuration can help. </p> <p> The different binary name is not the reason, because this is an information which has to be set by the user (see the following snippet from the subroutine <code> info_int2lm.f90 </code> ) and does not has any effect on the model run. <br/> <pre><br/> ! Currently it is not possible with FORTRAN95 to get the information<br/> ! of the full path of binary name like the $0 in C. Additionally<br/> ! we cannot determine on which host(s) the binary is running and the<br/> ! domain of the data spread through the nodes.<br/> ! Therefore this information has to be defined manually. On using info_readnl()<br/> ! this information may be defined within the segment /info_defaults/ which<br/> ! has to reside within the named namelist of your choice. Missing information<br/> ! will be ignored silently.<br/> ! Currently following information may be defined within /info_defaults/:<br/> ! <span class="caps">INFO</span>_Options ..: List of print options<br/> ! <span class="caps">INFO</span>_BinaryName: Name (best: full path) of the binary<br/> ! <span class="caps">INFO</span>_RunMachine: The machine (OS) where the program is running<br/> ! <span class="caps">INFO</span>_Nodes ….: Description of the nodes the binary is running<br/> ! <span class="caps">INFO</span>_Domain …: The domain the binary is calculating</pre> </p> <p> </p>

I assume that you run the original run_int2lm script without any modifications? In that case it might be a problem with your computer system, i.e. compiler and options, mpi version etc. If you provide information on

  • the name of the computing system you use
  • the name of the compiler and its version
  • the name of the mpi and its version
  • attach the Fopts file from /home/dokukin/work/cosmo/cclm-sp_2.4/src/int2lm

maybe someone from the CLM -Community using a similar configuration can help.

The different binary name is not the reason, because this is an information which has to be set by the user (see the following snippet from the subroutine info_int2lm.f90 ) and does not has any effect on the model run.


! Currently it is not possible with FORTRAN95 to get the information
! of the full path of binary name like the $0 in C. Additionally
! we cannot determine on which host(s) the binary is running and the
! domain of the data spread through the nodes.
! Therefore this information has to be defined manually. On using info_readnl()
! this information may be defined within the segment /info_defaults/ which
! has to reside within the named namelist of your choice. Missing information
! will be ignored silently.
! Currently following information may be defined within /info_defaults/:
! INFO_Options ..: List of print options
! INFO_BinaryName: Name (best: full path) of the binary
! INFO_RunMachine: The machine (OS) where the program is running
! INFO_Nodes ….: Description of the nodes the binary is running
! INFO_Domain …: The domain the binary is calculating

<p> Thank you for the answer. </p> <p> We changed only lines which are used to call int2lm.exe file and number of <span class="caps"> CPU </span> s in run_in2lm script. Anyway I’ll attach it together with the Fopts file used to compile int2lm. </p> <p> Here is the information about our system <ul> <li> system: CentOS 5.2 </li> <li> compiler: ifort 10.1 </li> <li> mpi: mvapich2 1.0.3 </li> </ul> </p>

  @iyabelova in #57000e8

<p> Thank you for the answer. </p> <p> We changed only lines which are used to call int2lm.exe file and number of <span class="caps"> CPU </span> s in run_in2lm script. Anyway I’ll attach it together with the Fopts file used to compile int2lm. </p> <p> Here is the information about our system <ul> <li> system: CentOS 5.2 </li> <li> compiler: ifort 10.1 </li> <li> mpi: mvapich2 1.0.3 </li> </ul> </p>

Thank you for the answer.

We changed only lines which are used to call int2lm.exe file and number of CPU s in run_in2lm script. Anyway I’ll attach it together with the Fopts file used to compile int2lm.

Here is the information about our system

  • system: CentOS 5.2
  • compiler: ifort 10.1
  • mpi: mvapich2 1.0.3

<p> I just run the script with nprocx =1, nprocy = 1, as you did. No problems. Therefore I assume this is a problem of your computing system. I have no experience with CentOs and mvapich2. Hopefully another member of the <span class="caps"> CLM </span> -Community has and can help you. </p>

  @burkhardtrockel in #e185555

<p> I just run the script with nprocx =1, nprocy = 1, as you did. No problems. Therefore I assume this is a problem of your computing system. I have no experience with CentOs and mvapich2. Hopefully another member of the <span class="caps"> CLM </span> -Community has and can help you. </p>

I just run the script with nprocx =1, nprocy = 1, as you did. No problems. Therefore I assume this is a problem of your computing system. I have no experience with CentOs and mvapich2. Hopefully another member of the CLM -Community has and can help you.

<p> Thank you for your help. <br/> Could you, please, attach resulting log files? This could help us to track our problem. </p>

  @iyabelova in #db0f42e

<p> Thank you for your help. <br/> Could you, please, attach resulting log files? This could help us to track our problem. </p>

Thank you for your help.
Could you, please, attach resulting log files? This could help us to track our problem.

<p> There is a maintenance of the computing system at <span class="caps"> DKRZ </span> today and tomorrow. I will send you the log files after that. </p>

  @burkhardtrockel in #0b600d1

<p> There is a maintenance of the computing system at <span class="caps"> DKRZ </span> today and tomorrow. I will send you the log files after that. </p>

There is a maintenance of the computing system at DKRZ today and tomorrow. I will send you the log files after that.

<p> Here are the log and <span class="caps"> OUTPUT </span> files of the successful job. </p>

  @burkhardtrockel in #33eac9b

<p> Here are the log and <span class="caps"> OUTPUT </span> files of the successful job. </p>

Here are the log and OUTPUT files of the successful job.

<p> Thank you again. <br/> When I compared your files with ours I found that the following part is quite different: (this is part of our file) <br/> “Info about <span class="caps"> KIND </span> -parameters: iintegers / <span class="caps"> MPI </span> _INT = 4 1275069467 int_ga / <span class="caps"> MPI </span> _INT = 4 1275069467“ <br/> In your file instead of 1275069467 stays 7. I’ve received another int2lm outputs and there is also 7 on that place. It seems that this could cause some problems. <br/> I’m not sure but it seems that variable (?) <span class="caps"> MPI </span> _INT is broken for some reasons. <br/> I’ve tried to add something like “export <span class="caps"> MPI </span> _INT=7” to run_int2lm script but unfortunately it didn’t work. </p>

  @iyabelova in #9e9f7c6

<p> Thank you again. <br/> When I compared your files with ours I found that the following part is quite different: (this is part of our file) <br/> “Info about <span class="caps"> KIND </span> -parameters: iintegers / <span class="caps"> MPI </span> _INT = 4 1275069467 int_ga / <span class="caps"> MPI </span> _INT = 4 1275069467“ <br/> In your file instead of 1275069467 stays 7. I’ve received another int2lm outputs and there is also 7 on that place. It seems that this could cause some problems. <br/> I’m not sure but it seems that variable (?) <span class="caps"> MPI </span> _INT is broken for some reasons. <br/> I’ve tried to add something like “export <span class="caps"> MPI </span> _INT=7” to run_int2lm script but unfortunately it didn’t work. </p>

Thank you again.
When I compared your files with ours I found that the following part is quite different: (this is part of our file)
“Info about KIND -parameters: iintegers / MPI _INT = 4 1275069467 int_ga / MPI _INT = 4 1275069467“
In your file instead of 1275069467 stays 7. I’ve received another int2lm outputs and there is also 7 on that place. It seems that this could cause some problems.
I’m not sure but it seems that variable (?) MPI _INT is broken for some reasons.
I’ve tried to add something like “export MPI _INT=7” to run_int2lm script but unfortunately it didn’t work.

<p> Hi, <br/> you cannot modify the <span class="caps"> MPI </span> _INT value. This is a value given by the <span class="caps"> MPI </span> Library used and it can be different for different computers. You should find more information on this value in the documentation of your <span class="caps"> MPI </span> library used (it is the <span class="caps"> MPI </span> _INTEGER value). In the documentation you can see whether the value 1275069467 is correct or not. <br/> Furthermore you could check what “Exit code -5” means on your system. Are you running interactively or per batch? Maybe you have to increase the stack size for your run (with “ulimit -s unlimited”) </p>

  @ulrichschättler in #a956d42

<p> Hi, <br/> you cannot modify the <span class="caps"> MPI </span> _INT value. This is a value given by the <span class="caps"> MPI </span> Library used and it can be different for different computers. You should find more information on this value in the documentation of your <span class="caps"> MPI </span> library used (it is the <span class="caps"> MPI </span> _INTEGER value). In the documentation you can see whether the value 1275069467 is correct or not. <br/> Furthermore you could check what “Exit code -5” means on your system. Are you running interactively or per batch? Maybe you have to increase the stack size for your run (with “ulimit -s unlimited”) </p>

Hi,
you cannot modify the MPI _INT value. This is a value given by the MPI Library used and it can be different for different computers. You should find more information on this value in the documentation of your MPI library used (it is the MPI _INTEGER value). In the documentation you can see whether the value 1275069467 is correct or not.
Furthermore you could check what “Exit code -5” means on your system. Are you running interactively or per batch? Maybe you have to increase the stack size for your run (with “ulimit -s unlimited”)

<p> Thank you for help. <br/> We found that problem was in trying to run program in 1 core mode. <br/> It seems that either programs with mpi can’t be runned on 1 core or this is the feature of our computing system. <br/> Anyway, when we write in script <br/> “npx=4 npy=2 mpirun -np 8“ <br/> everything works correctly. </p>

  @iyabelova in #f8a5589

<p> Thank you for help. <br/> We found that problem was in trying to run program in 1 core mode. <br/> It seems that either programs with mpi can’t be runned on 1 core or this is the feature of our computing system. <br/> Anyway, when we write in script <br/> “npx=4 npy=2 mpirun -np 8“ <br/> everything works correctly. </p>

Thank you for help.
We found that problem was in trying to run program in 1 core mode.
It seems that either programs with mpi can’t be runned on 1 core or this is the feature of our computing system.
Anyway, when we write in script
“npx=4 npy=2 mpirun -np 8“
everything works correctly.

<p> Dear all, </p> <p> We have similar problem with Iya Belova. We are trying to run <span class="caps"> COSMO </span> - <span class="caps"> CLM </span> (cclm-sp_2.4) at 0.11 resolution using <span class="caps"> ERA </span> interim dataset for period between August 1st, 2007 and December 31st, 2009. Although the run time is 29 months, we obtain ‘int2lm finished’ message after 2-3 months run period. We had experience using cclm-sp_1.5 on same workstation before and we did not encounter with this kind of problem. Do you have any suggestion about this problem? </p> <p> Fopts, int2lm_test.log and run_int2lm_eraint_test files are in the attachment. </p> <p> Best regards, </p> <p> Cemre Yürük </p>

  @cemreyürük in #d4450e1

<p> Dear all, </p> <p> We have similar problem with Iya Belova. We are trying to run <span class="caps"> COSMO </span> - <span class="caps"> CLM </span> (cclm-sp_2.4) at 0.11 resolution using <span class="caps"> ERA </span> interim dataset for period between August 1st, 2007 and December 31st, 2009. Although the run time is 29 months, we obtain ‘int2lm finished’ message after 2-3 months run period. We had experience using cclm-sp_1.5 on same workstation before and we did not encounter with this kind of problem. Do you have any suggestion about this problem? </p> <p> Fopts, int2lm_test.log and run_int2lm_eraint_test files are in the attachment. </p> <p> Best regards, </p> <p> Cemre Yürük </p>

Dear all,

We have similar problem with Iya Belova. We are trying to run COSMO - CLM (cclm-sp_2.4) at 0.11 resolution using ERA interim dataset for period between August 1st, 2007 and December 31st, 2009. Although the run time is 29 months, we obtain ‘int2lm finished’ message after 2-3 months run period. We had experience using cclm-sp_1.5 on same workstation before and we did not encounter with this kind of problem. Do you have any suggestion about this problem?

Fopts, int2lm_test.log and run_int2lm_eraint_test files are in the attachment.

Best regards,

Cemre Yürük