How to Use the Cluster Sweep Node in COMSOL Multiphysics®
Pär Persson Mattsson June 12, 2018
In a previous blog post, we explained how to run a job from the COMSOL Multiphysics® software on clusters directly from the COMSOL Desktop® environment, without any interaction with a Linux® operating system terminal. Since this terminal is sometimes treated with excessive respect, the ability to start a cluster job directly from the graphical user interface is one of the most useful features in the COMSOL® software. Plus, there’s more to it… Enter the Cluster Sweep node.
What Is the Cluster Sweep Node?
One way to parallelize the computation of a parameter set is to combine the parametric sweep with the Cluster Computing node. When doing so, you create one large cluster job that spans a number of nodes. The more nodes you add, the more parameter values are computed in parallel (as long as there are more parameters than nodes, of course).
A cluster example.
You can also use the Cluster Sweep node to parallelize computations. It is designed for when you want to split up a parametric sweep into several cluster computing jobs. You define a list of parametric values in the Cluster Sweep node. For each of these values, a separate batch job is sent to your cluster queue. When the computations are done, COMSOL Multiphysics incorporates the results back into the main process.
You can even nest parametric sweeps this way, combining the cluster sweep with a “normal” parametric sweep. You decide which parameters you start separate jobs for and which parameters you want to keep “inside” the jobs.
In short, the Cluster Sweep node is a powerful tool that COMSOL Multiphysics supplies you with to help you be in full control of your modeling process.
Note that to use a cluster sweep, a Floating Network License (FNL) is required. It is also recommended that you are familiar with the settings discussed in this blog post: How to Run on Clusters from the COMSOL Desktop® Environment. If you follow the steps in that blog post and save your settings, they will automatically be used in the Cluster Sweep node.
When to Use the Cluster Sweep Node
By now, you know what the Cluster Sweep node is, and you might find yourself wondering two things:
- When should I use it?
- When is it preferable over the Cluster Computing node?
The first case that comes to mind is when you have a parameter set and you don’t know if your model will converge or even be valid for all parameter combinations. Your parameter set could control your geometry and, for some values, the geometry causes your solving or meshing to fail. If you compute the model with a parametric sweep, COMSOL Multiphysics cancels the computation at the first failing geometry — even if later ones would finish. By splitting this computation into individual jobs, your computations will be started for each parameter value.
Another situation is when the amount of interesting parameter values is simply too large to be feasible for a single cluster job. If you have parameters that control the frequency, geometry, materials, boundary conditions, and so on, you end up with a lot of computations if you want results for all available combinations. If you put all of these computations into one large job and send it to your cluster, you will almost surely end up with an unhappy cluster administrator and a lot of angry colleagues (more on that later).
Good news: Using the Cluster Sweep node, you can split your potentially enormous job into several smaller ones. To do so, you add a Parametric Sweep node to your model in addition to the Cluster Sweep node. Setting up your model in this way creates what is known as a nested parametric sweep (similar to a nested for-loop in programming). To learn how to do this, keep reading. We’ve included a short tutorial in this blog post.
There is one more thing to note about the Cluster Sweep node: You can use it to potentially increase the throughput of jobs on your cluster.
Using a Cluster Sweep to Optimize Scheduling
Earlier in this blog post, I mentioned an unhappy cluster admin, and you might be wondering why. Computation time is a valuable resource on high-performance clusters. Because of this, most clusters have some kind of queue or scheduling system implemented. How large jobs are handled is up to the cluster administrator, and a rule of thumb is that large jobs mean long waiting times. Why? Large jobs occupy a lot of computational resources, and they can take a long time to complete. Hence, in order to not hold up other users’ jobs, large jobs are assigned a low priority. Of course, this all depends on how your cluster admin has configured the scheduler; that is, your mileage may vary.
What does this have to do with the Cluster Sweep node? Suppose that you have access to a cluster where it is hard to get a large job scheduled, but smaller ones are easier, since they fill the gaps in the scheduler (an unused cluster node is an expensive cluster node). You can use a cluster sweep to split the large job into small ones.
Let’s look at it with the help of an example: Instead of starting 1 large job on 8 nodes to parallelize 800 parameter values, you can start 8 jobs that each use 1 node to compute their own sets of 100 parameter values. The jobs will then be scheduled independently and, depending on how your cluster is set up, the small jobs might finish faster than the large job would!
Setting Up Cluster Sweeps and Nested Parametric Sweeps
If you have previous experience with the batch sweep and the Cluster Computing node, using the Cluster Sweep node is easy. (To find descriptions on how to set up a batch sweep, check out the blog posts “The Power of the Batch Sweep” and “Added Value of Task Parallelism in Batch Sweeps“.)
To demonstrate how to set up both a pure cluster sweep and a nested parametric sweep, let’s turn to my favorite example model: the parameterized thermal microactuator. (It’s my favorite because the model shows the multiphysics capabilities of the COMSOL® software.) Since it is parameterized, it’s very easy to add parametric and cluster sweeps to the model.
Modeling the Joule heating of a microactuator. Current flows through two of the arms, causing them to heat up. Then, the thermal expansion causes the actuator to bend.
Adding a Cluster Sweep
We start by adding a cluster sweep over the actuator length parameter, called L. To do so, first right-click Study 1 and click Cluster Sweep. This adds a node where you can set up your cluster settings, analogue to the instructions in this blog post on running clusters from the COMSOL Desktop®. (If you haven’t read the blog post yet, now is a good time.)
Next, in the Study Settings window, you can add the parameters that you want to sweep over. Click the plus symbol and, in the drop-down list, choose the parameter L. Then, in the Parameter value list field, write (for example) “100 170 240 310”. In the Parameter unit field, write “um” (for micrometers).
If you want to bring the results back into your main model, make sure to check the Synchronize solutions check box. This way, you’ll have all the results available for further analysis and postprocessing.
Adding a cluster sweep to the model.
We have now created a cluster sweep, which loops over the length of the microactuator. Assuming that the cluster settings are correct, all we need to do now is click Compute and the separate jobs are sent to the cluster.
Adding a Nested Parametric Sweep
Now, let’s create a nested parametric sweep so that each of our cluster jobs contains a parametric sweep themselves. We do this by adding a parametric sweep over the voltage parameter, called DV. To do so, follow these steps:
- Right-click Study 1 and click Parametric Sweep, which adds a node where you can set up a parametric sweep
- In the Study Settings window, click the plus symbol and, in the drop-down list, choose the parameter DV
- In the Parameter value list field, write “1 2 3 4 5”
- In the Parameter Unit field, write “V”
- Click Compute so that COMSOL Multiphysics will schedule the jobs for you
Adding a parametric sweep to the model, which creates a nested parametric sweep.
You can either wait for the jobs to complete (their status is shown in the External Processes window) or you can detach from the processes, save the model, close COMSOL Multiphysics, and let the jobs run themselves. When you come back to your workstation, just open your saved model and reattach, and the software will process the results just as when using the regular Cluster Computing node. This workflow is perfect for overnight simulations!
In this blog post, you have learned how you can optimize the parallelization of parametric computations on clusters using the Cluster Sweep node. You have also learned when it is beneficial to use different approaches and, as a bonus, how to avoid making your system admin unhappy.
As with cluster computing in general, you must decide what approach to use depending on the model you want to compute. To know when to use a cluster sweep and a distributed parametric sweep, you have to try the two approaches on your models and your cluster. As always, to master something, you have to test it!
As mentioned, you’ll need an FNL to use a cluster sweep, since this functionality is a network-based technology.
If you want to learn more about the Cluster Sweep node, you can contact us by simply clicking the button below.
Linux is a registered trademark of Linus Torvalds in the U.S. and other countries.