Note: This discussion is about an older version of the COMSOL Multiphysics® software. The information provided may be out of date.

Discussion Closed This discussion was created more than 6 months ago and has been closed. To start a new discussion with a link back to this one, click here.

Problem about running on cluster

Please login with a confirmed email address before reporting spam

About Our University cluster :
1). 194 computing nodes, 8cores/node
2). PBS/Torque (Portable Batch System) for the resource management.

We have comsol v3.5a and v4.1 installed on cluster. Now I have been trying to make comsol (comsol batch mode) run on several nodes on cluster. But it seems it doesn't work correctly.


I wrote a PBS-job script according to the comsol documentation, something like:

#!bin/sh
#PBS -l nodes=2:ppn=8
..............
.............
comsol -nn $NODES -mpirsh ssh mpd boot -f comsolenodes
comsol -nn $NODES -np $CORES batch -inputfile test2.mph
comsol -mpirsh ssh mpd allexit

I tried to make it run on 2 (or 3) nodes and 8 cores on each node. but the output showed the model were submit to 2 (or 3) nodes, and each node run the same thing.

Is there any special setting on comsol solver in order to make it run on several nodes??

does anyone have some experience on writing a PBS-job script used to submit comsol model to cluster?
how to make the comsol batch run on several nodes?




5 Replies Last Post Mar 18, 2012, 11:05 p.m. EDT

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago Feb 3, 2011, 9:41 a.m. EST
Unfortunately, instead of offering you a solution I have to request you to keep us informed whenever you happen to solve this problem!

I also have the same configuration except that I am using Comsol 4.0a. But I have not been as lucky as you; hardly two cores are utilized on each physical nodes, no distribution of sweep parameters, and even then the job terminates with failure to converge error. I checked my model on a single node and it worked perfectly so I am pretty sure there is nothing wrong with the physics itself.
Unfortunately, instead of offering you a solution I have to request you to keep us informed whenever you happen to solve this problem! I also have the same configuration except that I am using Comsol 4.0a. But I have not been as lucky as you; hardly two cores are utilized on each physical nodes, no distribution of sweep parameters, and even then the job terminates with failure to converge error. I checked my model on a single node and it worked perfectly so I am pretty sure there is nothing wrong with the physics itself.

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago Feb 3, 2011, 11:23 a.m. EST

Unfortunately, instead of offering you a solution I have to request you to keep us informed whenever you happen to solve this problem!


Yeah, I will. and if you find out the solution, keep me informed too. Thank you


hardly two cores are utilized on each physical nodes, no distribution of sweep parameters,


I also have the same result, no distribution of sweep parameters.
[QUOTE] Unfortunately, instead of offering you a solution I have to request you to keep us informed whenever you happen to solve this problem! [/QUOTE] Yeah, I will. and if you find out the solution, keep me informed too. Thank you [QUOTE] hardly two cores are utilized on each physical nodes, no distribution of sweep parameters, [/QUOTE] I also have the same result, no distribution of sweep parameters.

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago Feb 4, 2011, 4:41 a.m. EST
Thank you Honghui, I really appreciate that.

In the meantime, did you try making it run through alternative mpi implementations that you might have at your cluster? I attempted openmpi, our cluster's standard, but no avail. But may be there is some way of running this thing apart from intel mpi which relies on ssh (rsh seems to be unavailable) communication between nodes. And unless we (my administrator that is) are missing out some possibility, it does not seem to be the safest method for nodes to communicate since it allows users to bypass job scheduler in gaining access to a specific node.
Thank you Honghui, I really appreciate that. In the meantime, did you try making it run through alternative mpi implementations that you might have at your cluster? I attempted openmpi, our cluster's standard, but no avail. But may be there is some way of running this thing apart from intel mpi which relies on ssh (rsh seems to be unavailable) communication between nodes. And unless we (my administrator that is) are missing out some possibility, it does not seem to be the safest method for nodes to communicate since it allows users to bypass job scheduler in gaining access to a specific node.

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago Feb 11, 2011, 6:02 a.m. EST
Ok my problem is somewhat solved now. Comsol support informed me to use MUMPS or Pardiso solvers instead of SPOOLES which I was using. Additionally, we also updated our version to 4.1 and it works like a charm now :)

It remains to see how sweeps are distributed, if at all.
Ok my problem is somewhat solved now. Comsol support informed me to use MUMPS or Pardiso solvers instead of SPOOLES which I was using. Additionally, we also updated our version to 4.1 and it works like a charm now :) It remains to see how sweeps are distributed, if at all.

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago Mar 18, 2012, 11:05 p.m. EDT
Dear Honghui Shen,

When you use the #PBS scheduler, do you need to change anything in the cluster computing settings (located inside COMSOL?). If so, can you please explain what all we need to change or better yet, can you please upload a model file with these settings?

I greatly appreciate your help.
I am having a hard time trying to make the cluster computing work. No matter how many nodes I use, it looks like COMSOL is not using all of them. Now, I am trying to figure out why that is happening.

Thanks,
Sirisha
Dear Honghui Shen, When you use the #PBS scheduler, do you need to change anything in the cluster computing settings (located inside COMSOL?). If so, can you please explain what all we need to change or better yet, can you please upload a model file with these settings? I greatly appreciate your help. I am having a hard time trying to make the cluster computing work. No matter how many nodes I use, it looks like COMSOL is not using all of them. Now, I am trying to figure out why that is happening. Thanks, Sirisha

Note that while COMSOL employees may participate in the discussion forum, COMSOL® software users who are on-subscription should submit their questions via the Support Center for a more comprehensive response from the Technical Support team.