Skip to content

Displaying Cluster Information

Slurm offers a range of commands for interacting with the cluster. In this section, we will explore some examples of using the sinfo, squeue, scontrol, and sacct commands, which provide valuable insights into the cluster's configuration and status. For comprehensive information on all the commands supported by Slurm, please refer to the Slurm project website.

Command: sinfo

The command displays information about the cluster's state, partitions (subdivisions of the cluster), nodes, and available computing resources. There is a multitude of options available to specify the information we want to display about the cluster. For more precise control over the output, we can refer to the (documentation) that provides details on the various options and switches available with the sinfo command.

Display general information about the cluster configuration:

$ sinfo
PARTITION    AVAIL  TIMELIMIT  NODES  STATE NODELIST
all*            up 2-00:00:00      2   drng wn[117,120]
all*            up 2-00:00:00      2   resv wn[101-102]
all*            up 2-00:00:00     70  alloc wn[012-016,021-061,103-116,118-119,121,163-169]
long            up 14-00:00:0      2   drng wn[117,120]
long            up 14-00:00:0      9  alloc wn[012-016,116,118-119,121]
gpu             up 4-00:00:00     14   mix- gwn[01,08,10],wn[204-205,207-208,210-212,217,219-221]
gpu             up 4-00:00:00      2   drng gwn[02,09]
gpu             up 4-00:00:00      1   resv wn201
gpu             up 4-00:00:00      9    mix gwn[03-06],wn[202-203,206,209,218]
mig-preempt     up 1-00:00:00      1   idle gwn07
mig-priority    up 1-00:00:00      1   idle gwn07
sinfo

In the above output, you can see the available logical partitions, their state, the time limit for jobs in each partition, and the lists of compute nodes associated with them. The output can be customized using appropriate options to display specific information based on your requirements.

Display detailed information about compute nodes:

$ sinfo --Node --long
Thu May 21 11:48:44 2026
NODELIST   NODES    PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
gwn01          1          gpu      mixed- 64     2:16:2 256000        0      1 amd,geno none                
gwn02          1          gpu    draining 64     2:16:2 256000        0      1 amd,geno Maintenance         
gwn03          1          gpu       mixed 64     2:16:2 256000        0      1 amd,geno none                
gwn04          1          gpu       mixed 64     2:16:2 256000        0      1 amd,geno none                
gwn05          1          gpu       mixed 64     2:16:2 256000        0      1 amd,geno none                
gwn06          1          gpu       mixed 64     2:16:2 256000        0      1 amd,geno none                
gwn07          1 mig-priority        idle 64     2:16:2 256000        0      1 amd,geno none                
gwn07          1  mig-preempt        idle 64     2:16:2 256000        0      1 amd,geno none                
gwn08          1          gpu      mixed- 96     2:24:2 512000        0      1 amd,geno none                
gwn09          1          gpu    draining 96     2:24:2 512000        0      1 amd,geno Maintenance         
gwn10          1          gpu      mixed- 96     2:24:2 512000        0      1 amd,geno none                
wn012          1         all*   allocated 16      1:8:2 336000        0      1 amd,rome none                
wn012          1         long   allocated 16      1:8:2 336000        0      1 amd,rome none                
wn013          1         all*   allocated 16      1:8:2 336000        0      1 amd,rome none                
wn013          1         long   allocated 16      1:8:2 336000        0      1 amd,rome none                
wn014          1         all*   allocated 16      1:8:2 336000        0      1 amd,rome none                
wn014          1         long   allocated 16      1:8:2 336000        0      1 amd,rome none                
wn015          1         all*   allocated 16      1:8:2 320000        0      1 amd,rome none                
wn015          1         long   allocated 16      1:8:2 320000        0      1 amd,rome none                
wn016          1         all*   allocated 16      1:8:2 336000        0      1 amd,rome none                
wn016          1         long   allocated 16      1:8:2 336000        0      1 amd,rome none                
...
sinfo --Node --long

The above output provides information about each compute node in the cluster, including its partition affiliation (PARTITION), current state (STATE), number of CPUs (CPUS), number of processor sockets (S), number of processor cores per socket (C), number of hardware threads (T), amount of system memory (MEMORY), and any assigned features (AVAIL_FEATURES) such as processor type, presence of GPUs, etc.

Cluster partitions may be reserved in advance for various reasons such as maintenance, workshops, or specific projects. An example of displaying active reservations in the Arnes cluster is as follows:

$ sinfo --reservation
RESV_NAME     STATE           START_TIME             END_TIME     DURATION  NODELIST
fri          ACTIVE  2026-02-25T14:56:32  2026-07-01T20:00:00  126-04:03:28  wn[101-102,201]
sinfo --reservation

The above output shows any active reservations in the cluster, along with the reservation duration and the list of nodes included in each reservation. Each reservation is associated with a user group that has exclusive access to it, allowing them to bypass waiting for job completion from users without reservations.

Command: squeue

In addition to cluster configuration, we are naturally interested in the job queue status. The squeue command allows us to inquire about jobs that are currently in the queue, running, or have already successfully or unsuccessfully completed (documentation).

Output of the current job queue status:

$ squeue
JOBID    PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
52930230       all micmodel  zkolenc PD       0:00      1 (Resources)
52934494       all 13502482   dn9134 PD       0:00      1 (Priority)
52934493       all 38884188   dn9134 PD       0:00      1 (Priority)
52934492       all 29829442   dn9134 PD       0:00      1 (Priority)
52934491       all 18316019   dn9134 PD       0:00      1 (Priority)
52934490       all 68789192   dn9134 PD       0:00      1 (Priority)
52934489       all 50851613   dn9134 PD       0:00      1 (Priority)
52934488       all 19730810   dn9134 PD       0:00      1 (Priority)
52934487       all 34141707   dn9134 PD       0:00      1 (Priority)
52934486       all 31707664   dn9134 PD       0:00      1 (Priority)
52934485       all 62596266   dn9134 PD       0:00      1 (Priority)
52934484       all 92148791   dn9134 PD       0:00      1 (Priority)
52934483       all 50552286   dn9134 PD       0:00      1 (Priority)
...
squeue

From the output, we can retrieve the identifier of each individual job, the partition on which it is running, the job name, the user who launched it, and the current job status.

Some of the important job states are:

  • PD (PenDing) - the job is waiting in the queue,
  • R (Running) - the job is running,
  • CG (CompletinG) - the job is completing,
  • CD (CompleteD) - the job has completed,
  • F (Failed) - there was an error during execution,
  • S (Suspended) - the job execution is temporarily suspended,
  • CA (CAnceled) - the job has been canceled,
  • TO (TimeOut) - the job has been terminated due to a time limit.

The output also provides information about the total job runtime and the list of nodes on which the job is running, or the reason why the job has not started yet.

We are usually most interested in the status of jobs that we have launched ourselves. We can limit the output to jobs of a specific user using the --user option.

Example output of jobs owned by user prdatlas006:

$ squeue --user=prdatlas006
JOBID    PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
52930355       all data22_1 prdatlas PD       0:00      1 (Priority)
52930498       all data23_1 prdatlas PD       0:00      1 (Priority)
52930487       all data23_1 prdatlas PD       0:00      1 (Priority)
52930459       all data22_1 prdatlas PD       0:00      1 (Priority)
52930457       all data22_1 prdatlas PD       0:00      1 (Priority)
52930456       all data22_1 prdatlas PD       0:00      1 (Priority)
52930455       all data22_1 prdatlas PD       0:00      1 (Priority)
52930454       all data22_1 prdatlas PD       0:00      1 (Priority)
52930453       all data22_1 prdatlas PD       0:00      1 (Priority)
...
squeue --user=prdatlas006

In addition, we can also limit the output to jobs in a specific state. This can be done using the --states option.

Example output of all currently pending (PD) jobs:

$ squeue --states=PD
JOBID    PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
52930230       all micmodel  zkolenc PD       0:00      1 (Resources)
52933624       all 34318835   dn9134 PD       0:00      1 (Priority)
52933625       all 37524431   dn9134 PD       0:00      1 (Priority)
52933626       all 19299379   dn9134 PD       0:00      1 (Priority)
52933627       all 63522836   dn9134 PD       0:00      1 (Priority)
52933628       all 20181126   dn9134 PD       0:00      1 (Priority)
52933629       all 23600582   dn9134 PD       0:00      1 (Priority)
52933630       all 40800209   dn9134 PD       0:00      1 (Priority)
52933631       all 29293675   dn9134 PD       0:00      1 (Priority)
52933632       all 11208460   dn9134 PD       0:00      1 (Priority)
52933633       all 25627117   dn9134 PD       0:00      1 (Priority)
52933634       all 77397217   dn9134 PD       0:00      1 (Priority)
52933635       all 34206014   dn9134 PD       0:00      1 (Priority)
...
squeue --states=PD

Command: scontrol

Sometimes we require more detailed information about a specific partition, node, or job. This information can be obtained using the scontrol command (documentation). Below are some examples of how to use this command.

Example output of more detailed information about a specific partition:

$ scontrol show partition all
PartitionName=all
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=00:30:00 DisableRootJobs=NO ExclusiveUser=NO ExclusiveTopo=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
Nodes=wn[012-016,021-061,101-121,163-169]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=14160 TotalNodes=74 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerCPU=2000 MaxMemPerNode=UNLIMITED
TRES=cpu=14160,mem=60744000M,node=74,billing=29660
TRESBillingWeights=CPU=1.0,Mem=0.5G
scontrol show partition all

Example output of more detailed information about the compute node gwn01:

$ scontrol show node gwn01
NodeName=gwn01 Arch=x86_64 CoresPerSocket=16 
CPUAlloc=12 CPUEfctv=64 CPUTot=64 CPULoad=3.81
AvailableFeatures=amd,genoa,gpu,h100
ActiveFeatures=amd,genoa,gpu,h100
Gres=gpu:2
NodeAddr=gwn01 NodeHostName=gwn01 Version=25.05.7
OS=Linux 5.14.0-611.54.6.el9_7.x86_64 #1 SMP PREEMPT_DYNAMIC Fri May 15 04:23:18 EDT 2026 
RealMemory=256000 AllocMem=131072 FreeMem=206002 Sockets=2 Boards=1
State=MIXED+PLANNED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=gpu 
BootTime=2026-05-20T07:25:53 SlurmdStartTime=2026-05-20T07:26:27
LastBusyTime=2026-05-20T23:33:31 ResumeAfterTime=None
CfgTRES=cpu=64,mem=250G,billing=125,gres/gpu=2
AllocTRES=cpu=12,mem=128G,gres/gpu=2
CurrentWatts=0 AveWatts=0
scontrol show node gwn01  

Example output of more detailed information about the job with ID 15646868:

$ scontrol show job 15646868
JobId=15646868 JobName=mc23_13p6TeV_60
    UserId=prdatlas006(21006) GroupId=prdatlas(21000) MCS_label=N/A
    Priority=1 Nice=67 Account=prdatlas QOS=normal
    JobState=RUNNING Reason=None Dependency=(null)
    Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
    RunTime=05:04:05 TimeLimit=1-17:19:00 TimeMin=N/A
    SubmitTime=2026-05-21T06:50:36 EligibleTime=2026-05-21T06:50:36
    AccrueTime=2026-05-21T06:50:36
    StartTime=2026-05-21T06:50:37 EndTime=2026-05-23T00:09:37 Deadline=N/A
    PreemptEligibleTime=2026-05-21T06:50:37 PreemptTime=None
    SuspendTime=None SecsPreSuspend=0 LastSchedEval=2026-05-21T06:50:37 Scheduler=Main
    Partition=all AllocNode:Sid=skrlatica:4595
    ReqNodeList=(null) ExcNodeList=(null)
    NodeList=wn121
    BatchHost=wn121
    NumNodes=1 NumCPUs=8 NumTasks=8 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
    ReqTRES=cpu=8,mem=9496M,node=1,billing=8
    AllocTRES=cpu=8,mem=9496M,node=1,billing=8
    Socks/Node=* NtasksPerN:B:S:C=8:0:*:* CoreSpec=*
    MinCPUsNode=8 MinMemoryCPU=1187M MinTmpDiskNode=0
    Features=(null) DelayBoot=00:00:00
    OverSubscribe=OK Contiguous=0 Licenses=(null) LicensesAlloc=(null) Network=(null)
    Command=/tmp/SLURM_job_script.YZsbKk
    WorkDir=/d/arc/session_ssd/83d5eb5459b0
    StdErr=/d/arc/session_ssd/83d5eb5459b0.comment
    StdIn=/dev/null
    StdOut=/d/arc/session_ssd/83d5eb5459b0.comment
scontrol show job 15646868

We can also check which users have permission to use reserved nodes:

$ scontrol show reservation
ReservationName=fri StartTime=2026-02-25T14:56:32 EndTime=2026-07-01T20:00:00 Duration=126-04:03:28
    Nodes=wn[101-102,201] NodeCnt=3 CoreCnt=140 Features=(null) PartitionName=(null) Flags=IGNORE_JOBS,SPEC_NODES
    TRES=cpu=280
    Users=dsluga,ratkop,urosl,la6468,ta3667,la4710,na2933,nb9613,tb1565,zb26346,gb02287,dc05267,mc6460,fd4786,md10246,jm1540,nd9588,jd54541,sg57072,dg8660,jg5045,bg8634,fg1343,mh3269,ah74079,dj3778,aj0465,mj9739,jj3957,aj3477,kk42117,ak8557,mk8834,ak1790,mk8054,kk6212,nk7629,ak7461,tk0396,sk74019,bk8638,bl32126,lm9231,mm11484,gm64359,mm9307,fm2883,nm41263,rm26805,lp7152,yp09764,mp4116,mr2095,ss85171,zs23657,ps78466,cs67685,js01387,ps02292,vs8734,ns3656,tt4657,mv3237,bv7063,kv5130,pz8920,jz4620,ra7285,kb61817,mb95331,bg9530,ap3956,gr1779,lv8541,sling001,sling002,sling003,sling004,sling005,sling006,sling007,sling008,sling009,sling010,sling011,sling012,sling013,sling014,sling015,sling016,sling017,sling018,sling019,sling020,sling021,sling022,sling023,sling024,sling025,sling026,sling027,sling028,sling029,sling030,sling031,sling032,sling033,sling034,sling035,sling036,sling037,sling038,sling039,sling040 Groups=(null) Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null)
    MaxStartDelay=(null)
scontrol show reservation

Command: sacct

With the sacct command, we can obtain more information about jobs in execution and those completed.

For example, we can check the status of all of our jobs for the last three days:

$ sacct --starttime $(date -d '3 day ago' +%D-%R) --format JobID,JobName,Elapsed,State,ExitCode
JobID           JobName    Elapsed      State ExitCode 
------------ ---------- ---------- ---------- -------- 
52825111     SNNResNet+ 2-02:00:44    RUNNING      0:0 
52825111.ba+      batch 2-02:00:44    RUNNING      0:0 
52825111.ex+     extern 2-02:00:44    RUNNING      0:0 
52825111.0    apptainer 2-02:00:43    RUNNING      0:0 
52825135     test_steps   00:05:05  COMPLETED      0:0 
52825135.ba+      batch   00:05:05  COMPLETED      0:0 
52825135.ex+     extern   00:05:05  COMPLETED      0:0 
52825135.0     hostname   00:00:00  COMPLETED      0:0 
52825135.1     hostname   00:00:00  COMPLETED      0:0 
52825135.2     hostname   00:00:00  COMPLETED      0:0 
sacct --starttime $(date -d '3 day ago' +%D-%R) --format JobID,JobName,Elapsed,State,ExitCode

We can also inquire about the details of a specific job:

$ sacct --job=52825111 --format JobID,JobName,Elapsed,State,ExitCode
JobID           JobName    Elapsed      State ExitCode 
------------ ---------- ---------- ---------- -------- 
52825111     SNNResNet+ 2-01:59:22    RUNNING      0:0 
52825111.ba+      batch 2-01:59:22    RUNNING      0:0 
52825111.ex+     extern 2-01:59:22    RUNNING      0:0 
52825111.0    apptainer 2-01:59:21    RUNNING      0:0 
sacct --job=52825111 --format JobID,JobName,Elapsed,State,ExitCode 

Exercise

You can find exercises to improve your knowledge of commands for querying cluster information at the following link.