Slurm account coordinator


The Paramshakti HPC systems treats each faculty adviser as the head of a research group. Each research group has a Slurm "account" that all advisers of the research group are a part of, including the faculty adviser. Each faculty adviser is set as an "account coordinator" for that account.

A Slurm account coordinator is a user who can modify settings for users in that Slurm account. The coordinator can also hold and cancel jobs submitted by other users in the account.

Controlling Jobs and shares by account coordinator

Job control Command

Explanation

squeue  -A  <account>

List jobs run by the account coordinator

  squeue -u  <Student>     

List jobs run by the student account 

scancel -u  <Student>

Kill All  jobs by student account 

scancel -A   <account>

Kill All  jobs by account coordinator

scontrol   < hold /release/unhold>   <jobid>

Hold/Unhold/Release the jobid

sacctmgr  modify user  <student>where Account=prj_account  set GrpTRESMins=cpu=6000000

Restricts user job submission to 6000000 CPU-core-mins consumption or 100000 SU against a project account.

sacctmgr modify user <student>where Account=prj_account   set  GrpCPUs=64

Limit on the number of processor cores for a user apllicable to all partition. Set a value to -1 to clear it.

sacctmgr modify user where user=<student1> where Account=prj_account set Fairshare=10

All new users are created with Rawshare=1. Account coordinators can adjust the Rawshare of users in their account. This number is relative to other users in that account. In this case what ever normalized share available to Account coordinator is propotionately allocated to student .

sbalance

Check available account balance

Verify associated users with an account

To verify the associated users with a test faculty adviser account called “testfac1” use the command below. Here four testusers' along with Faculty adviser himself as an user(total five members) associated with the account "testfac1".

[testfac1@vm01 ~]$ sacctmgr show assoc where Account=testfac1 format=account%15,user%15,qos%25

        Account            User                       QOS

--------------- --------------- -------------------------

       testfac1                                  testfac1

       testfac1        testfac1                  testfac1

       testfac1       testuser1                  testfac1

       testfac1       testuser2                  testfac1

       testfac1       testuser3                  testfac1

       testfac1       testuser4                  testfac1

[testfac1@vm01 ~]$

View and change rawshares and noramalized shares of associated users with an account

To view Rawshares and Noramalized shares of associated users with an faculty adviser account called “testfac1” use the command below. 

 [testfac1@vm01 ~]$ sshare -A testfac1 -a -o Account,User,RawShares,NormShares

             Account       User   RawShares  NormShares

-------------------- ---------- ---------- -----------

testfac1                                  1    0.009836

 testfac1              testfac1          1     0.001967

 testfac1             testuser1          1     0.001967

 testfac1             testuser2          1     0.001967

 testfac1             testuser3          1     0.001967

 testfac1             testuser4          1     0.001967

[testfac1@vm01 ~]$

 

 

Use below command to redistribute the Normalized shares of parent account (i.e faculty adviser account) among all the members of research group.

[testfac1@vm01 ~]$sacctmgr modify user where user=testfac1 set Fairshare=2

[testfac1@vm01 ~]$sacctmgr modify user where user=testuser1 set Fairshare=3

[testfac1@vm01 ~]$sacctmgr modify user where user=testuser2 set Fairshare=5

[testfac1@vm01 ~]$sacctmgr modify user where user=testuser3 set Fairshare=0

[testfac1@vm01 ~]$sacctmgr modify user where user=testuser4 set Fairshare=1

 After executing the command above the Normalized shares of parent account distributed among all the members of the group in a ratio 2:3:5:0:1( output given below).

 

[testfac1@vm01 ~]$sshare -a -l -A testfac1 -o Account,User,RawShares,NormShares,FairShare

             Account       User   RawShares  NormShares  FairShare

-------------------- ---------- ---------- ----------- ----------

testfac1                                 1    0.009836    1.000000

 testfac1              testfac1          2     0.001788   1.000000

 testfac1             testuser1          3     0.002683   1.000000

 testfac1             testuser2          5     0.004471   1.000000

 testfac1             testuser3          0     0.000000   0.000000

 testfac1             testuser4          1     0.000894   1.000000

[testfac1@vm01 ~]$

Job priority

Job priority is made up of several components, also known as factors. Job size, queue time, and fairshare are some of the factors available in Slurm. Each factor has an integer weight that is multiplied by a decimal value (0.0 to 1.0) representing what proportion of the weight it should receive. The results for each factor are summed up to calculate the job priority. The job's priority at any given time will be a weighted sum of all the factors that have been enabled in the slurm.conf file. Job priority can be expressed as:


Job_priority =

     site_factor +

     (PriorityWeightAge) * (age_factor) +

     (PriorityWeightAssoc) * (assoc_factor) +

     (PriorityWeightFairshare) * (fair-share_factor) +

     (PriorityWeightJobSize) * (job_size_factor) +

     (PriorityWeightPartition) * (partition_factor) +

     (PriorityWeightQOS) * (QOS_factor) +

     SUM(TRES_weight_cpu * TRES_factor_cpu,

         TRES_weight_<type> * TRES_factor_<type>,

         ...)

     - nice_factor

Please see the  documentation  for more details.

To print the job priorities for jobs of specific users use the command “sprio --users=testuser1,testuser2”

[testfac1@vm01 ~]$ sprio --users=testuser1,testuser2

  JOBID     USER          PRIORITY       AGE  FAIRSHARE   JOBSIZE  PARTITION     QOS
   65548    testuser1    62079        1      51077      1000      10000       0
   65549    testuser2    62080         1      51078      1000      10000       0

 

To print the configured weights for each priority component use command “sprio -w”

[testfac1@vm01 ~]$ sprio -w

 JOBID PARTITION   PRIORITY SITE  AGE    FAIRSHARE  JOBSIZE   PARTITION QOS     TRES

 Weights                     1   10000   100000      1000     15000      1000  CPU=1000,Mem=5000,GR 

[testfac1@vm01 ~]$

Fairshare Factor

Fairshare factor is calculated based on two key components: usage and shares. Fairshare is essentially just Shares / Usage.

 

Usage is based on the cputime allocated to prior jobs. The usage decays over time such that recent usage is weighted more heavily.

 

Shares are similar to a slice of a pie and represent the amount of the system that the scheduler should try to dedicate to a particular user. Within an account, the size of the "pie" is the sum of the shares allocated to each of its users. Each user's slice of the pie is shares_self / sum(shares_self_plus_siblings). Each account has the same Shares value as other accounts. Within each account users can have different amounts of shares, determined by the account coordinator.

 

As usage increases, fairshare factor decreases. As Shares increases relative to sibling users, fairshare factor increases relative to sibling users. Please see the documentation for an explanation of how the algorithm works.

 

Fairshare Calculations

Within the account, the fairshare calculations are referred to as Level Fairshare and are visible using sshare -laA youraccount

The equation is:

Level FS = Norm Shares / Effectv Usage
where:
Norm Shares = Raw Shares / sum of (self + siblings) Raw Shares
Effective Usage = Raw Usage / account's Raw Usage