Slurm account coordinator
The Paramshakti HPC systems treats each faculty adviser as the head of a research group. Each research group has a
Slurm "account" that all advisers of the research group are a part of, including the faculty adviser. Each faculty adviser is set as an "account coordinator" for that account.
A Slurm account coordinator is a user who can modify settings for users in that Slurm account. The coordinator can also hold and cancel jobs submitted by other users in the account.
Controlling Jobs and shares by account coordinator
Job control Command
Explanation
squeue
-A
<account>
List jobs run by the account coordinator
squeue -u
<Student>
List jobs run by the student account
scancel -u
<Student>
Kill All jobs by student account
scancel -A
<account>
Kill All jobs by account coordinator
scontrol
<
hold
/release/unhold>
<jobid>
Hold/Unhold/Release the jobid
sacctmgr
modify user
<student>where Account=prj_account
set GrpTRESMins=cpu=6000000
Restricts user job submission to 6000000 CPU-core-mins consumption or 100000 SU against a project account.
sacctmgr modify user <student>where Account=prj_account
set
GrpCPUs=
64
Limit on the number of processor cores for a user apllicable to all partition. Set a value to -1 to clear it.
sacctmgr modify user where user=<student1> where Account=prj_account set Fairshare=10
All new users are created with Rawshare=1. Account coordinators can adjust the Rawshare of users in their account. This number is relative to other users in that account. In this case what ever normalized share available to Account coordinator is propotionately allocated to student .
sbalance
Check available account balance
Verify associated users with an account
To verify the associated users with a test faculty adviser account called “testfac1” use the command below. Here four testusers' along with Faculty adviser himself as an user(total five members) associated with the account "testfac1".
[testfac1@vm01 ~]$ sacctmgr show assoc where Account=testfac1 format=account%15,user%15,qos%25
Account User QOS
--------------- --------------- -------------------------
testfac1 testfac1
testfac1 testfac1 testfac1
testfac1 testuser1 testfac1
testfac1 testuser2 testfac1
testfac1 testuser3 testfac1
testfac1 testuser4 testfac1
[testfac1@vm01 ~]$
View and change rawshares and noramalized shares of associated users with an account
To view Rawshares and Noramalized shares of associated users with an faculty adviser account called “testfac1” use the command below.
[testfac1@vm01 ~]$ sshare -A testfac1 -a -o Account,User,RawShares,NormShares
Account User RawShares NormShares
-------------------- ---------- ---------- -----------
testfac1 1 0.009836
testfac1 testfac1 1 0.001967
testfac1 testuser1 1 0.001967
testfac1 testuser2 1 0.001967
testfac1 testuser3 1 0.001967
testfac1 testuser4 1 0.001967
[testfac1@vm01 ~]$
Use below command to redistribute the Normalized shares of parent account (i.e faculty adviser account) among all the members of research group.
[testfac1@vm01 ~]$sacctmgr modify user where user=testfac1 set Fairshare=2
[testfac1@vm01 ~]$sacctmgr modify user where user=testuser1 set Fairshare=3
[testfac1@vm01 ~]$sacctmgr modify user where user=testuser2 set Fairshare=5
[testfac1@vm01 ~]$sacctmgr modify user where user=testuser3 set Fairshare=0
[testfac1@vm01 ~]$sacctmgr modify user where user=testuser4 set Fairshare=1
After executing the command above the Normalized shares of parent account distributed among all the members of the group in a ratio 2:3:5:0:1( output given below).
[testfac1@vm01 ~]$sshare -a -l -A testfac1 -o Account,User,RawShares,NormShares,FairShare
Account User RawShares NormShares FairShare
-------------------- ---------- ---------- ----------- ----------
testfac1 1 0.009836 1.000000
testfac1 testfac1 2 0.001788 1.000000
testfac1 testuser1 3 0.002683 1.000000
testfac1 testuser2 5 0.004471 1.000000
testfac1 testuser3 0 0.000000 0.000000
testfac1 testuser4 1 0.000894 1.000000
[testfac1@vm01 ~]$
Job priority
Job priority is made up of several components, also known as factors. Job size, queue time, and fairshare are some of the factors available in Slurm. Each factor has an integer weight that is multiplied by a decimal value (0.0 to 1.0) representing what proportion of the weight it should receive. The results for each factor are summed up to calculate the job priority. The job's priority at any given time will be a weighted sum of all the factors that have been enabled in the slurm.conf file. Job priority can be expressed as:
Job_priority =
site_factor +
(PriorityWeightAge) * (age_factor) +
(PriorityWeightAssoc) * (assoc_factor) +
(PriorityWeightFairshare) * (fair-share_factor) +
(PriorityWeightJobSize) * (job_size_factor) +
(PriorityWeightPartition) * (partition_factor) +
(PriorityWeightQOS) * (QOS_factor) +
SUM(TRES_weight_cpu * TRES_factor_cpu,
TRES_weight_<type> * TRES_factor_<type>,
...)
- nice_factor
Please see the
documentation
for more details.
To print the job priorities for jobs of specific users use the command “sprio --users=testuser1,testuser2”
[testfac1@vm01 ~]$ sprio --users=testuser1,testuser2
JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS
65548 testuser1 62079 1 51077 1000 10000 0
65549 testuser2 62080 1 51078 1000 10000 0
To print the configured weights for each priority component use command “sprio -w”
[testfac1@vm01 ~]$ sprio -w
JOBID PARTITION PRIORITY SITE AGE FAIRSHARE JOBSIZE PARTITION QOS TRES
Weights 1 10000 100000 1000 15000 1000 CPU=1000,Mem=5000,GR
[testfac1@vm01 ~]$
Fairshare Factor
Fairshare factor is calculated based on two key components: usage and shares. Fairshare is essentially just Shares / Usage.
Usage is based on the cputime allocated to prior jobs. The usage decays over time such that recent usage is weighted more heavily.
Shares are similar to a slice of a pie and represent the amount of the system that the scheduler should try to dedicate to a particular user. Within an account, the size of the "pie" is the sum of the shares allocated to each of its users. Each user's slice of the pie is shares_self / sum(shares_self_plus_siblings). Each account has the same Shares value as other accounts. Within each account users can have different amounts of shares, determined by the account coordinator.
As usage increases, fairshare factor decreases. As Shares increases relative to sibling users, fairshare factor increases relative to sibling users. Please see the
documentation
for an explanation of how the algorithm works.
Fairshare Calculations
Within the account, the fairshare calculations are referred to as Level Fairshare and are visible using sshare -laA youraccount.
The equation is:
Level FS = Norm Shares / Effectv Usage
where:
Norm Shares = Raw Shares / sum of (self + siblings) Raw Shares
Effective Usage = Raw Usage / account's Raw Usage