forked from SchedMD/slurm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
11280 lines (10998 loc) · 634 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
This file describes changes in recent versions of Slurm. It primarily
documents those changes that are of interest to users and administrators.
* Changes in Slurm 21.08.0pre1
==============================
-- slurmrestd - add v0.0.37 OpenAPI plugin.
-- slurmrestd/v0.0.37 - rename standard_in -> standard_input.
-- slurmrestd/v0.0.37 - rename standard_out -> standard_output.
-- slurmdbd - Improve log messages of more time than possible on rollups.
-- Changed the --format handling for negative field widths (left justified)
to apply to the column headers as well as the printed fields.
-- Add LimitFactor to the QOS. A float that is factored into an associations
[Grp|Max]TRES limits. For example, if the LimitFactor is 2, then an
association with a GrpTRES of 30 CPUs, would be allowed to allocate 60
CPUs when running under this QOS.
-- slurmrestd - Pass SLURM_NO_CHANGE_IN_DATA to client as 403 (Not Modified).
-- slurmrestd/v0.0.37 - Add update_time field to Jobs query to allow clients
to only get jobs list based on change timestamp.
-- Reset job eligible time when job is manually held.
-- Add DEBUG_FLAG_JAG to improve logging related to job account gathering.
-- Convert logging in account_gather/common to DEBUG_FLAG_JAG.
-- Add more logging for jag_common_poll_data() when prec_extra() called.
-- slurmrestd/v0.0.37 - add API to fetch reservation(s) info.
* Changes in Slurm 20.11.4
==========================
* Changes in Slurm 20.11.3
==========================
-- Fix segfault when parsing bad "#SBATCH hetjob" directive.
-- Allow countless gpu:<type> node GRES specifications in slurm.conf.
-- PMIx - Don't set UCX_MEM_MMAP_RELOC for older version of UCX (pre 1.5).
-- Don't green-light any GPU validation when core conversion fails.
-- Allow updates to a reservation in the database that starts in the future.
-- Better check/handling of primary key collision in reservation table.
-- Improve reported error and logging in _build_node_list().
-- Fix uninitialized variable in _rpc_file_bcast() which could lead to an
incorrect error return from sbcast / srun --bcast.
-- mpi/cray_shasta - fix use-after-free on error in _multi_prog_parse().
-- Cray - Handle setting correct prefix for cpuset cgroup with respects to
expected_usage_in_bytes. This fixes Cray's OOM killer.
-- mpi/pmix: Fix PMIx_Abort support.
-- Don't reject jobs allocating more cores than tasks with MaxMemPerCPU.
-- Fix false error message complaining about oversubscribe in cons_tres.
-- scrontab - fix parsing of empty lines.
-- Fix regression causing spank_process_option errors to be ignored.
-- Avoid making multiple interactive steps.
-- Fix corner case issues where step creation should fail.
-- Fix job rejection when --gres is less than --gpus.
-- Fix regression causing spank prolog/epilog not to be called unless the
spank plugin was loaded in slurmd context.
-- Fix regression preventing SLURM_HINT=nomultithread from being used
to set defaults for salloc->srun, sbatch->srun sequence.
-- Reject job credential if non-superuser sets the LAUNCH_NO_ALLOC flag.
-- Make it so srun --no-allocate works again.
-- jobacct_gather/linux - Don't count memory on tasks that have already
finished.
-- Fix 19.05/20.02 batch steps talking with a 20.11 slurmctld.
-- jobacct_gather/common - Do not process jobacct's with same taskid when
calling prec_extra.
-- Cleanup all tracked jobacct tasks when extern step child process finishes.
-- slurmrestd/dbv0.0.36 - Correct structure of dbv0.0.36_tres_list.
-- Fix regression causing task/affinity and task/cgroup to be out of sync when
configured ThreadsPerCore is different than the physical threads per core.
-- Fix situation when --gpus is given but not max nodes (-N1-1) in a job
allocation.
-- Interactive step - ignore cpu bind and mem bind options, and do not set
the associated environment variables which lead to unexpected behavior
from srun commands launched within the interactive step.
-- Handle exit code from pipe when using UCX with PMIx.
* Changes in Slurm 20.11.2
==========================
-- Fix older versions of sacct not working with 20.11.
-- Fix slurmctld crash when using a pre-20.11 srun in a job allocation.
-- Correct logic problem in _validate_user_access.
-- Fix libpmi to initialize Slurm configuration correctly.
* Changes in Slurm 20.11.1
==========================
-- Fix spelling of "overcomited" to "overcomitted" in sreport's cluster
utilization report.
-- Silence debug message about shutting down backup controllers if none are
configured.
-- Don't create interactive srun until PrologSlurmctld is done.
-- Fix fd symlink path resolution.
-- Fix slurmctld segfault on subnode reservation restore after node
configuration change.
-- Fix resource allocation response message environment allocation size.
-- Ensure that details->env_sup is NULL terminated.
-- select/cray_aries - Correctly remove jobs/steps from blades using NPC.
-- cons_tres - Avoid max_node_gres when entire node is allocated with
--ntasks-per-gpu.
-- Allow NULL arg to data_get_type().
-- In sreport have usage for a reservation contain all jobs that ran in the
reservation instead of just the ones that ran in the time specified. This
matches the report for the reservation is not truncated for a time period.
-- Fix issue with sending wrong batch step id to a < 20.11 slurmd.
-- Add a job's alloc_node to lua for job modification and completion.
-- Fix regression getting a slurmdbd connection through the perl API.
-- Stop the extern step terminate monitor right after proctrack_g_wait().
-- Fix removing the normalized priority of assocs.
-- slurmrestd/v0.0.36 - Use correct name for partition field:
"min nodes per job" -> "min_nodes_per_job".
-- slurmrestd/v0.0.36 - Add node comment field.
-- Fix regression marking cloud nodes as "unexpectedly rebooted" after
multiple boots.
-- Fix slurmctld segfault in _slurm_rpc_job_step_create().
-- slurmrestd/v0.0.36 - Filter node states against NODE_STATE_BASE to avoid
the extended states all being reported as "invalid".
-- Fix race that can prevent the prolog for a requeued job from running.
-- cli_filter - add "type" to readily distinguish between the CLI command in
use.
-- smail - reduce sleep before seff to 5 seconds.
-- Ensure SPANK prolog and epilog run without an explicit PlugStackConfig.
-- Disable MySQL automatic reconnection.
-- Fix allowing "b" after memory unit suffixes.
-- Fix slurmctld segfault with reservations without licenses.
-- Due to internal restructuring ahead of the 20.11 release, applications
calling libslurm MUST call slurm_init(NULL) before any API calls.
Otherwise the API call is likely to fail due to libslurm's internal
configuration not being available.
-- slurm.spec - allow custom paths for PMIx and UCX install locations.
-- Use rpath if enabled when testing for Mellanox's UCX libraries.
-- slurmrestd/dbv0.0.36 - Change user query for associations to optional.
-- slurmrestd/dbv0.0.36 - Change account query for associations to optional.
-- mpi/pmix - change the error handler error message to be more useful.
-- Add missing connection in acct_storage_p_{clear_stats, reconfig, shutdown}.
-- Perl API - fix issue when running in configless mode.
-- nss_slurm - avoid deadlock when stray sockets are found.
-- Display correct value for ScronParameters in 'scontrol show config'.
* Changes in Slurm 20.11.0
==========================
-- x11 forwarding: fix race on setup that prevented X11 forwarding from
working within the new Interactive Step.
-- Fix various Coverity issues.
-- cons_tres - Fix DefCpuPerGPU
-- Make it so you can have a job with multiple partitions and multiple
reservations.
-- Fix primary controller assert when shutting down backup controllers.
-- Enforce invalid argument combinations with --ntasks-per-gpu
-- slurmrestd/auth_local - Verify username on slurm_rest_auth_p_apply()
-- Fix requeue of job on node failure.
-- Prevent a job from requesting too much memory if it
requests MEM_PER_CPUS and --threads-per-core < the number of threads
on a core.
-- slurmrestd - Avoid sending close header after body in
_operations_router_reject().
-- slurmrestd - Set new job environment for SLURM_JOB_NAME, SLURM_OPEN_MODE,
SLURM_JOB_DEPENDENCY, SLURM_PROFILE, SLURM_ACCTG_FREQ, SLURM_NETWORK and
SLURM_CPU_FREQ_REQ to match sbatch.
-- slurmrestd - Avoid defaulting open_mode to append for job submission.
-- Fix "scontrol takeover [backup]" hangs when specifying a backup > 1.
-- salloc now waits for PrologSlurmctld to finish before entering the shell.
* Changes in Slurm 20.11.0rc2
==============================
-- MySQL - Remove potential race condition when sending updates to a cluster
and commit_delay used.
-- Fixed regression in rc1 where sinfo et al would not show a node in a resv
state.
-- select/linear will now allocate up to nodes RealMemory when configured with
SelectTypeParameters=CR_Memory and --mem=0 specified. Previous behavior was
no memory accouted and no memory limits implied to job.
-- Remove unneeded lock check from running the slurmctld prolog for a job.
-- Fix duplicate key error on clean starts after slurmctld is killed.
-- Avoid double free of step_record_t in the slurmctld when node is removed
from config.
-- Zero out step_record_t's magic when freed.
-- Fix sacctmgr clearing QosLevel when trailing comma is used.
-- slurmrestd - fix a fatal() error when connecting over IPv6.
-- slurmrestd - add API to interface with slurmdbd.
-- mpi/cray_shasta - fix PMI port parsing for non-contiguous port ranges.
-- squeue and sinfo -O no longer repeat the last suffix specified.
-- cons_tres - fix regression regarding gpus with --cpus-per-task.
-- Avoid non-async-signal-safe functions calls in X11 fowarding which can
lead to the extern step terminating unexpectedly.
-- Don't send job completion email for revoked federation jobs.
-- Fix device or resource busy errors on cgroup cleanup on older kernels.
-- Avoid binding to IPv6 wildcard address in slurmd if IPv6 is not explicitly
enabled.
-- Make ntasks_per_gres work with cpus_per_task.
-- Various alterations in reference to ntasks_per_tres.
-- slurmrestd - multiple changes to make Slurm's OpenAPI spec compatible with
https://openapi-generator.tech/.
-- nss_slurm - avoid loading slurm.conf to avoid issues on configless systems,
or systems with config files loaded on shared storage.
-- scrontab - add cli_filter hooks.
-- job_submit/lua - expose a "cron_job" flag to identify jobs submitted
through scrontab.
-- PMIx - fix potential buffer overflows from use of unpackmem().
CVE-2020-27745.
-- X11 forwarding - fix potential leak of the magic cookie when sent as an
argument to the xauth command. CVE-2020-27746.
* Changes in Slurm 20.11.0rc1
==============================
-- Fix corner case issue with interrupted resource allocation requests.
-- Pack all gres information in the slurmd to send to the stepd to help
reduce calls in the stepd to read gres.conf.
-- The example systemd unit files have been changed to the "simple" type of
operation, and the daemon will now run in the foreground within systemd
instead of daemonizing itself.
-- Add --gpu-bind=mask_gpu reusability functionality if tasks > elements.
-- Add separate unversion libslurm_pmi.so library to use with libpmi.so.
-- Configurations including CR_Socket and AllowSpecResourcesUsage=NO will now
fatal if there are no allocatable sockets due to core specialization.
-- Make sacct get UID from database instead of from the username and a
system call. Add --use-local-uid option to sacct to use old behavior.
-- Limit number of jobs updated by as_mysql_flush_jobs_on_cluster() to avoid
boot loop failures in slurmdbd.
-- Add Autodetect option to NodeName line in gres.conf to override the global
Autodetect option.
-- Add NetworkRaw debugflag.
-- Make REQUEST_LAUNCH_PROLOG handler fail if PrologFlags includes Contain and
the credential has already expired when setting the memory limits.
-- Reject jobs that request more nodes than provided in job credential if
PrologFlags includes Contain.
-- Slurmdbd is now set to fatal if slurmdbd.conf file isn't owned by SlurmUser
or it's mode is not set to 0600.
-- libsrun/opt - use slurm_option_reset() when ignoring ntasks_per_node.
-- Removed "regression" script from testsuite. Please use regression.py.
-- Avoid communication issues if TreeWidth greatly exceeds the node count
for a job.
-- accounting_storage/filetxt has been removed as an option.
-- Update and validate reservations after loading from state save.
-- Update and validate reservations after setting node to down, drain or
updating node state.
-- Change reservation selection order to attempt to reserve unreserved nodes
first, followed by reserved nodes under OVERLAP|MAINT reservations, and
finally all nodes in the partition for MAINT reservations.
-- Add [Accounting]StorageParameters slurm[dbd].conf parameter.
-- Improve detection and logging of incompatible options involving the
REPLACE[_DOWN] flags when creating/updating reservations.
-- Export SLURMD_NODENAME envvar to HealthCheckProgram.
-- SlurmctldParameters=user_resv_delete which allows any user able to run
in a reservation to delete it.
-- Set default unit when parsing #BSUB -M to KB to match LSF documentation.
-- slurmrestd - fatal() when accept() returns an unexpected result.
-- slurmrestd - Parse multiple OpenAPI specifications for path resolution.
-- slurmrestd - Add v0.0.36 OpenAPI plugin.
-- slurmrestd/v0.0.36 - Add error schema.
-- slurmrestd/v0.0.36 - return array of nodes instead of dictionary.
-- slurmrestd/v0.0.36 - return array of partitions instead of dictionary
-- slurmrestd/v0.0.36 - return -1 (integer) instead of INFINITE (as a string)
-- slurmrestd/v0.0.36 - return array of pings instead of dictionary
-- slurmrestd/v0.0.36 - Simplify possible signals for canceling jobs.
-- slurmrestd/v0.0.36 - Simplify exclusive for jobs submissions.
-- slurmrestd/v0.0.36 - Simplify nodes for jobs submissions.
-- slurmrestd/v0.0.36 - Use "/slurm/v0.0.36/" as server instead of "/" to
simply naming for clients.
-- Add 'scontrol update res=name skip' to skip the current/next reoccurring
reservation.
-- Add ability for reservations to be accessed by Linux Groups.
-- Let users submit to multiple reservations as they can partitions.
-- Report a wider range of error codes for sbcast when opening a file.
-- Rename acct_gather_energy/cray_aries to acct_gather_energy/pm_counters.
-- Removed gres_alloc and gres_req from job_record_t.Tres should be used
instead.
-- The JobCompLoc URL endpoint when the JobCompType=jobcomp/elasticsearch
plugin is enabled is now fully configurable and the plugin no longer appends
a hardcoded "/slurm/jobcomp" index and type suffix to it.
-- Add check to the reservation create/update logic to prevent an inconsistent
state without nodes and with no ANY_NODES flag with either Licenses,
BurstBuffer and/or Watts.
-- slurmrestd - allow the host to be optional when specifying the address to
listen on.
-- slurmrestd - Log numerical service name when referencing host port pairs.
-- slurmrestd - Log host port information in RFC3986 format.
-- sview - Remove (long-broken) batch job submission option.
-- Dynamic Future Nodes - slurmds started with -F[<feature>] will be
associated with a nodename in Slurm that matches the same hardware
configuration.
-- SlurmctldParameters=cloud_reg_addrsa - Cloud nodes automatically get
NodeAddr and NodeHostname set from slurmd registration.
-- SlurmctldParameters=power_save[_min]_interval - Configure how often the
power save module looks to do work.
-- Add CLOUD state to sinfo --state filter list.
-- Add ability for sinfo state filtering to require all listed states.
-- Add the "Reserved" license count to 'scontrol show licenses'.
-- Don't display MailUser/MailType in scontrol show jobs if mail won't be sent.
-- Throw an error and ignore CpuSpecList if it cannot be translated to bitmap
of number of CPUs size.
-- Validate at submission that --hint is mutully exclusive with --cpu-bind,
--ntasks-per-core, --threads-per-core or -B.
-- Make --exclusive the default with srun as a step adding --overlap to
reverse behavior.
-- Add --whole option to srun to allocate all resources on a node
in an allocation.
-- Allow --threads-per-core to influence task layout/binding.
-- Remove support for "default_gbytes" option from SchedulerParameters.
-- gres.conf - Add new MultipleFiles configuration entry to allow a single
GRES to manage multiple device files simultaneously.
-- Fix scontrol write config to output OverSubscribe intead of Shared.
-- job_submit/lua - print/access oversubscribe variable with "oversubscribe".
-- Remove SallocDefaultCommand option.
-- Add support for an "Interactive Step", designed to be used with salloc to
launch a terminal on an allocated compute node automatically.
-- Add time specification: "now-<x>" (i.e. subtract from the present)
-- Add IPv6 support. Must be explicitly enabled with EnableIPv6 in
CommunicationParameters.
-- Add LaunchParameters=mpir_use_nodeaddr configuration option.
-- Allow use of a target directory with "srun --bcast", and change the default
filename to include the node name as well.
-- Set -fno-omit-frame-pointer compiler flag.
-- Add --mail-type=INVALID_DEPEND option to salloc, sbatch, and srun.
-- Fix passing names with commas to the slurmdbd.
-- squeue - put sorted start times of "N/A" or 0 at the end of the list.
-- Add correspond_after_task_cnt to SchedulerParameters
-- Fix node's not being considered unresponsive/down for ResumeTimeout after
reboot or power_up.
-- Change "scontrol reboot ASAP" to use next_state=resume logic.
-- Exclude HetJobs from GANG scheduling operations.
-- Add scrontab as a new command.
-- Enable -lnodes=#:gpus=# in #PBS/qsub -l nodes syntax.
-- Add admin-settable "Comment" field to each Node.
-- Fix show runaway and/on hidden partitions for >= Operator.
-- Add --ntasks-per-gpu option.
-- Add --gpu-bind=single option.
* Changes in Slurm 20.02.7
==========================
-- cons_tres - Fix DefCpuPerGPU
-- select/cray_aries - Correctly remove jobs/steps from blades using NPC.
-- Fix false positive oom-kill events on extern step termination when
jobacct_gather/cgroup configured.
-- Ensure SPANK prolog and epilog run without an explicit PlugStackConfig.
* Changes in Slurm 20.02.6
==========================
-- Fix sbcast --fanout option.
-- Tighten up keyword matching for --dependency.
-- Fix "squeue -S P" not sorting by partition name.
-- Fix segfault in slurmctld if group resolution fails during job credential
creation.
-- sacctmgr - Honor PreserveCaseUser when creating users with load command.
-- Avoid attempting to schedule jobs on magnetic reservations when they aren't
allowed.
-- Always make sure we clear the magnetic flag from a job.
-- In backfill avoid NULL pointer dereference.
-- Fix Segfault at end of slurmctld if you have a magnetic reservation and
you shutdown the slurmctld.
-- Silence security warning when a Slurm is trying a job for a
magnetic reservation.
-- Have sacct exit correctly when a user/group id isn't valid.
-- Remove extra \n from invalid user/group id error message.
-- Detect when extern steps trigger OOM events and mark extern step correctly.
-- pam_slurm_adopt - permit root access to the node before reading the config
file, which will give root a chance to fix the config if missing or broken.
-- Reset DefMemPerCPU, MaxMemPerCPU, and TaskPluginParam (among other minor
flags) on reconfigure.
-- Fix incorrect memory handling of mail_user when updating mail_type=none.
-- Handle mail_user and mail_type independently.
-- Fix thread-safety issue with assoc_mgr_get_admin_level().
-- Ignore step features if equal to job features
-- Fix slurmstepd segfault caused by incorrect strtok() usage.
-- CRAY - Remove unneeded ATP spank plugin from ansible playbook.
-- Fix core selection for exclusive step on nodes where CPUs == Cores.
-- Fix topology aware scheduling reservations.
-- Fix loading cpus_per_task on a job from state file.
-- When a partition has no nodes fix estimate of max cpus possible on a job
trying to run there.
-- In cons_tres fix sorting functions to handle node/topo weight
correctly.
-- Fix regression in 20.02.5 where you couldn't request contraints with a
simple & and a count.
-- Limit the number of threads for servicing emails.
-- Avoid possible double init race condition in assoc_mgr_lock().
-- Add missing locks in slurm_cred_handle_reissue().
-- Add missing locks in slurm_cred_revoked().
-- Fix slurmctld segfault due to tight reconfigure RPC requests by serializing
the RPC handler processing logic.
-- Use _exit() instead exit() after fork().
-- Perl API - fix hang reading config in configless environments.
-- slurmrestd - request detailed node information to populate GRES fields.
-- slurmrestd - request detailed job information to populate GRES fields.
-- Fix job license update bug on array tasks or hetjob components.
-- Fix job partition update bug on array tasks or hetjob components.
-- Fix slurmctld segfault on _pick_best_nodes() when processing a job request
with XOR'd constraints and no nodeset has the feature.
-- Fix job requests rejected with incorrect NODE_CONFIG_UNAVAIL when nodes are
actually only busy due to an overlapping MAINT reservation.
-- Fix sacctmgr allowing the deletion of a user's default account.
-- Fix srun and other Slurm commands running within a "configless" salloc
terminal.
-- MySQL - Correctly handle QOS deletion from assocation tables.
-- Fix update of First_Cores flag in a reservation.
-- Fix parsing of update reservation flags.
-- Fix --switches for cons_tres.
-- Retry connection on ETIMEDOUT in slurm_send_addr_recv_msgs.
-- Fix wait for RPC_PROLOG_LAUNCH notification 2*MessageTimeout.
-- Have slurm_send_addr_recv_msgs conn_timeout to match rpc_wait in slurmd.
-- pam_slurm_adopt - operate correctly even if ConstrainRAMSpace is not
enabled on the node by falling back to the cpuset, devices, or freezer
subsystem instead.
-- slurmrestd - use memmove() instead of memcpy() in string manipulation
to avoid bugs related to overlapping memory regions.
-- slurmrestd - avoid xassert() failure on duplicated headers in request.
-- Remove stale 'ReqNodeNotAvail, Reserved for maintenance' message from
pending jobs after a maintenance reservation ended.
-- MySQL - Stop steps from printing when outside time range.
-- Fixed kmem limit calculation to use MaxKmemPercent correctly.
-- Fix initialization of cpuset.mems/cpus on uid cgroup subdir.
-- MySQL - Remove potential race condition when sending updates to a cluster
and commit_delay used.
-- Avoid double free of step_record_t in the slurmctld when node is removed
from config.
-- cons_tres - fix regression regarding gpus with --cpus-per-task.
-- Don't send job completion email for revoked federation jobs.
-- PMIx - fix potential buffer overflows from use of unpackmem().
CVE-2020-27745.
-- X11 forwarding - fix potential leak of the magic cookie when sent as an
argument to the xauth command. CVE-2020-27746.
* Changes in Slurm 20.02.5
==========================
-- Fix leak of TRESRunMins when job time is changed with --time-min
-- pam_slurm - explicitly initialize slurm config to support configless mode.
-- scontrol - Fix exit code when creating/updating reservations with wrong
Flags.
-- When a GRES has a no_consume flag, report 0 for allocated.
-- Fix cgroup cleanup by jobacct_gather/cgroup.
-- When creating reservations/jobs don't allow counts on a feature unless
using an XOR.
-- Improve number of boards discovery
-- Fix updating a reservation NodeCnt on a zero-count reservation.
-- slurmrestd - provide an explicit error messages when PSK auth fails.
-- cons_tres - fix job requesting single gres per-node getting two or more
nodes with less CPUs than requested per-task.
-- cons_tres - fix calculation of cores when using gres and cpus-per-task.
-- cons_tres - fix job not getting access to socket without GPU or with less
than --gpus-per-socket when not enough cpus available on required socket
and not using --gres-flags=enforce binding.
-- Fix HDF5 type version build error.
-- Fix creation of CoreCnt only reservations when the first node isn't
available.
-- Fix wrong DBD Agent queue size in sdiag when using accounting_storage/none.
-- Improve job constraints XOR option logic.
-- Fix preemption of hetjobs when needed nodes not in leader component.
-- Fix wrong bit_or() messing potential preemptor jobs node bitmap, causing
bad node deallocations and even allocation of nodes from other partitions.
-- Fix double-deallocation of preempted non-leader hetjob components.
-- slurmdbd - prevent truncation of the step nodelists over 4095.
-- Fix nodes remaining in drain state state after rebooting with ASAP option.
* Changes in Slurm 20.02.4
==========================
-- srun - suppress job step creation warning message when waiting on
PrologSlurmctld.
-- slurmrestd - fix incorrect return values in data_list_for_each() functions.
-- mpi/pmix - fix issue where HetJobs could fail to launch.
-- slurmrestd - set content-type header in responses.
-- Fix cons_res GRES overallocation for --gres-flags=disable-binding.
-- Fix cons_res incorrectly filtering cores with respect to GRES locality for
--gres-flags=disable-binding requests.
-- Fix regression where a dependency on multiple jobs in a single array using
underscores would only add the first job.
-- slurmrestd - fix corrupted output due to incorrect use of memcpy().
-- slurmrestd - address a number of minor Coverity warnings.
-- Handle retry failure when slurmstepd is communicating with srun correctly.
-- Fix jobacct_gather possibly duplicate stats when _is_a_lwp error shows up.
-- Fix tasks binding to GRES which are closest to the allocated CPUs.
-- Fix AMD GPU ROCM 3.5 support.
-- Fix handling of job arrays in sacct when querying specific steps.
-- slurmrestd - avoid fallback to local socket authentication if JWT
authentication is ill-formed.
-- slurmrestd - restrict ability of requests to use different authentication
plugins.
-- slurmrestd - unlink named unix sockets before closing.
-- slurmrestd - fix invalid formatting in openapi.json.
-- Fix batch jobs stuck in CF state on FrontEnd mode.
-- Add a separate explicit error message when rejecting changes to active node
features.
-- cons_common/job_test - fix slurmctld SIGABRT due to double-free.
-- Fix updating reservations to set the duration correctly if updating the
start time.
-- Fix update reservation to promiscuous mode.
-- Fix override of job tasks count to max when ntasks-per-node present.
-- Fix min CPUs per node not being at least CPUs per task requested.
-- Fix CPUs allocated to match CPUs requested when requesting GRES and
threads per core equal to one.
-- Fix NodeName config parsing with Boards and without CPUs.
-- Ensure SLURM_JOB_USER and SLURM_JOB_UID are set in SrunProlog/Epilog.
-- Fix error messages for certain invalid salloc/sbatch/srun options.
-- pmi2 - clean up sockets at step termination.
-- Fix 'scontrol hold' to work with 'JobName'.
-- sbatch - handle --uid/--gid in #SBATCH directives properly.
-- Fix race condition in job termination on slurmd.
-- Print specific error messages if trying to run use certain
priority/multifactor factors that cannot work without SlurmDBD.
-- Avoid partial GRES allocation when --gpus-per-job is not satisfied.
-- Cray - Avoid referencing a variable outside of it's correct scope when
dealing with creating steps within a het job.
-- slurmrestd - correctly handle larger addresses from accept().
-- Avoid freeing wrong pointer with SlurmctldParameters=max_dbd_msg_action
with another option after that.
-- Restore MCS label when suspended job is resumed.
-- Fix insufficient lock levels.
-- slurmrestd - use errno from job submission.
-- Fix "user" filter for sacctmgr show transactions.
-- Fix preemption logic.
-- Fix no_consume GRES for exclusive (whole node) requests.
-- Fix regression in 20.02 that caused an infinite loop in slurmctld when
requesting --distribution=plane for the job.
-- Fix parsing of the --distribution option.
-- Add CONF READ_LOCK to _handle_fed_send_job_sync.
-- prep/script - always call slurmctld PrEp callback in _run_script().
-- Fix node estimation for jobs that use GPUs or --cpus-per-task.
-- Fix jobcomp, job_submit and cli_filter Lua implementation plugins causing
slurmctld and/or job submission CLI tools segfaults due to bad return
handling when the respective Lua script failed to load.
-- Fix propagation of gpu options through hetjob components.
-- Add SLURM_CLUSTERS environment variable to scancel.
-- Fix packing/unpacking of "unlinked" jobs.
-- Connect slurmstepd's stderr to srun for steps launched with --pty.
-- Handle MPS correctly when doing exclusive allocations.
-- slurmrestd - fix compiling against libhttpparser in a non-default path.
-- slurmrestd - avoid compilation issues with libhttpparser < 2.6.
-- Fix compile issues when compiling slurmrestd without --enable-debug.
-- Reset idle time on a reservation that is getting purged.
-- Fix reoccurring reservations that have Purge_comp= to keep correct
duration if they are purged.
-- scontrol - changed the "PROMISCUOUS" flag to "MAGNETIC"
-- Early return from epilog_set_env in case of no_consume.
-- Fix cons_common/job_test start time discovery logic to prevent skewed
results between "will run test" executions.
-- Ensure TRESRunMins limits are maintained during "scontrol reconfigure".
-- Improve error message when host lookup fails.
* Changes in Slurm 20.02.3
==========================
-- Factor in ntasks-per-core=1 with cons_tres.
-- Fix formatting in error message in cons_tres.
-- Fix calling stat on a NULL variable.
-- Fix minor memory leak when using reservations with flags=first_cores.
-- Fix gpu bind issue when CPUs=Cores and ThreadsPerCore > 1 on a node.
-- Fix --mem-per-gpu for heterogenous --gres requests.
-- Fix slurmctld load order in load_all_part_state().
-- Fix race condition not finding jobacct gather task cgroup entry.
-- Suppress error message when selecting nodes on disjoint topologies.
-- Improve performance of _pack_default_job_details() with large number of job
arguments.
-- Fix archive loading previous to 17.11 jobs per-node req_mem.
-- Fix regresion validating that --gpus-per-socket requires --sockets-per-node
for steps. Should only validate allocation requests.
-- error() instead of fatal() when parsing an invalid hostlist.
-- nss_slurm - fix potential deadlock in slurmstepd on overloaded systems.
-- cons_tres - fix --gres-flags=enforce-binding and related --cpus-per-gres.
-- cons_tres - Allocate lowest numbered cores when filtering cores with gres.
-- Fix getting system counts for named GRES/TRES.
-- MySQL - Fix for handing typed GRES for association rollups.
-- Fix step allocations when tasks_per_core > 1.
-- Fix allocating more GRES than requested when asking for multiple GRES types.
* Changes in Slurm 20.02.2
==========================
-- Fix slurmctld segfault when checking no_consume GRES node allocation counts.
-- Fix resetting of cloud_dns on a reconfigure.
-- squeue - change output for dependency column to use "(null)" instead of ""
for no dependncies as documented in the man page, and used by other columns.
-- Clear node_cnt_wag after job update.
-- Fix regression where AccountingStoreJobComment was not defaulting to 'yes'.
-- Send registration message immediately after a node is resumed.
-- Cray - Fix hetjobs when using only a single component in the step launch.
-- Cray - Fix hetjobs launched without component 0.
-- Cray - Quiet cookies missing message which is expected on for hetjobs.
-- Fix handling of -m/--distribution options for across socket/2nd level by
task/affinity plugin.
-- Fix grp_node_bitmap error when slurmctld started before slurmdbd.
-- Fix scheduling issue when there are not enough nodes available to run a job
resulting in possible job starvation.
-- Make it so mpi/cray_shasta appears in srun --mpi=list
-- Don't requeue jobs that have been explicitly canceled.
-- Fix error message for a regular user trying to update licenses on a running
job.
-- Fix backup slurmctld handling for logrotation via SIGUSR2.
-- Fix reservation feature specification when looking for inactive features
after active features fails.
-- Prevent misleading error messages for reservation creation.
-- Print message in scontrol when a request fails for not having enough nodes.
-- Fix duplicate output in sacct with multiple resv events.
-- auth/jwt - return correct gid for a given user. This was incorrectly
assuming the users's primary group name matched their username.
-- slurmrestd - permit non-SlurmUser/root job submission.
-- Use host IP if hostname unknown for job submission for allocating node.
-- Fix issue with primary_slurmdbd_resumed_operation trigger not happening
on slurmctld restart.
-- Fix race in acct_gather_interconnect/ofed on step termination.
-- Fix typo of SlurmctldProlog -> PrologSlurmctld in error message.
-- slurm.spec - add SuSE-specific dependencies for optional slurmrestd package.
-- Fix FreeBSD build issues.
-- Fixed sbatch not processing --ignore-pbs in batch script.
-- Don't clear the qos_id of an invalid QOS.
-- Allow a job that was once FAIL_[QOS|ACCOUNT] to be eligible again if
the qos|account limitation is remedied.
-- Fix core reservations using the FLEX flag to allow use of resources
outside of the reservation allocation.
-- Fix MPS without File with 1 GPU, and without GPUs.
-- Add FreeBSD support to proctrack/pgid plugin.
-- Fix remote dependency testing for meta job in job array.
-- Fix preemption when dealing with a job array.
-- Don't send remote non-pending singleton dependencies on federation update.
-- slurmrestd - fix crash on empty query.
-- Fix race condition which could lead to invalid references in backfill.
-- Fix edge case in _remove_job_hash().
-- Fix exit code when using --cluster/-M client options.
-- Fix compilation issues in GCC10.
-- Fix invalid references when federated job is revoked while in backfill loop.
-- Fix distributing job steps across idle nodes within a job.
-- Fix detected floating reservation overlapping.
-- Break infinite loop in cons_tres dealing with incorrect tasks per tres
request resulting in slurmctld hang.
-- Send the current (not the previous) reason for a pending job to client
commands like squeue/scontrol.
-- Fix incorrect lock levels for select_g_reconfigure().
-- Handle hidden nodes correctly in slurmrestd.
-- Allow sacctmgr to use MaxSubmitP[U|A] as format options.
-- Fix segfault when trying to delete a corrupted association.
-- Fix setting ntasks-per-core when using --multithread.
-- Only override job wait reason to priority if Reason=None or
Reason=Resources.
-- Perl API / seff - fix missing symbol issue with accounting_storage/slurmdbd.
-- slurm.spec - add --with cray_shasta option.
-- Downgrade "Node config differ.." error message if config_overrides enabled.
-- Add client error when using --gpus-per-socket without --sockets-per-node.
-- Fix nvml/rsmi debug statements making it to stderr.
-- NodeSets - fix slurmctld segfault in newer glibc if any nodes have no
defined features.
-- ConfigLess - write out plugstack config to correct config file name in
the config cache.
-- priority/multifactor - gracefully handle NULL list of associations or array
of siblings when calculating FairTree fairshare.
-- Fix cons_tres --exclusive=user to allocate only requested number of CPUs.
-- Add MySQL deadlock detection and automatic retry mechanism.
-- Reject repeating floating reservations as they aren't supported.
-- Fix testing of reservation flags that may be NO_VAL64.
-- Fix _verify_node_state memory requested as --mem-per-gpu DefMemPerGPU.
-- Fix DependencyNeverSatisfied not set as the job's state reason if
kill_invalid_depend or --kill-on-invalid-dep are used.
-- pam_slurm_adopt - explicitly call slurm_conf_init().
-- configless - fix plugstack.conf handling for client commands.
-- Set SLURM_JOB_USER and SLURM_JOB_UID in task_epilog correctly.
-- slurmrestd - authenticate job submissions by SlurmUser properly.
* Changes in Slurm 20.02.1
==========================
-- Improve job state reason for jobs hitting partition_job_depth.
-- Speed up testing of singleton dependencies.
-- Fix negative loop bound in cons_tres.
-- srun - capture the MPI plugin return code from mpi_hook_client_fini() and
use as final return code for step failure.
-- Fix segfault in cli_filter/lua.
-- Fix --gpu-bind=map_gpu reusability if tasks > elements.
-- Make sure config_flags on a gres are sent to the slurmctld on node
registration.
-- Prolog/Epilog - Fix missing GPU information.
-- Fix segfault when using config parser for expanded lines.
-- Fix bit overlap test function.
-- Don't accrue time if job begin time is in the future.
-- Remove accrue time when updating a job start/eligible time to the future.
-- Fix regression in 20.02.0 that broke --depend=expand.
-- Reset begin time on job release if it's not in the future.
-- Fix for recovering burst buffers when using high-availability.
-- Fix invalid read due to freeing an incorrectly allocated env array.
-- Update slurmctld -i message to warn about losing data.
-- Fix scontrol cancel_reboot so it clears the DRAIN flag and node reason for a
pending ASAP reboot.
* Changes in Slurm 20.02.0
==========================
-- Fix minor memory leak in slurmd on reconfig.
-- Fix invalid ptr reference when rolling up data in the database.
-- Change shtml2html.py to require python3 for RHEL8 support, and match
man2html.py.
-- slurm.spec - override "hardening" linker flags to ensure RHEL8 builds
in a usable manner.
-- Fix type mismatches in the perl API.
-- Prevent use of uninitialized slurmctld_diag_stats.
-- Fixed various Coverity issues.
-- Only show warning about root-less topology in daemons.
-- Fix accounting of jobs in IGNORE_JOBS reservations.
-- Fix issue with batch steps state not loading correctly when upgrading from
19.05.
-- Deprecate max_depend_depth in SchedulerParameters and move it to
DependencyParameters.
-- Silence erroneous error on slurmctld upgrade when loading federation state.
-- Break infinite loop in cons_tres dealing with incorrect tasks per tres
request resulting in slurmctld hang.
-- Improve handling of --gpus-per-task to make sure appropriate number of GPUs
is assigned to job.
-- Fix seg fault on cons_res when requesting --spread-job.
* Changes in Slurm 20.02.0rc1
=============================
-- sbatch - fix segfault when no newline at the end of a burst buffer file.
-- Change scancel to only check job's base state when matching -t options.
-- Save job dependency list in state files.
-- cons_tres - allow jobs to be run on systems with root-less topologies.
-- Restore pre-20.02pre1 PrologSlurmctld synchonization behavior to avoid
various race conditions, and ensure proper batch job launch.
-- Add new slurmrestd command/daemon which implements the Slurm REST API.
* Changes in Slurm 20.02.0pre1
==============================
-- Avoid possible race when 2 conf files are read at the same exact time.
-- Add last and mean backfill table size to sdiag output.
-- Add support for additional job submit environment variables:
SALLOC_MEM_PER_CPU, SALLOC_MEM_PER_NODE, SBATCH_MEM_PER_CPU and
SBATCH_MEM_PER_NODE.
-- Add 'Agent thread count' stat to sdiag.
-- Add sdiag -M, --clusters option.
-- NodeName configurations with CPUs != Sockets*Cores or
Sockets*Cores*Threads will be rejected with fatal.
-- Add scontrol write config <filename> option.
-- Increase maximum number of hostlist ranges from 64k to 256k.
-- Don't acquire unneeded locks in slurmctld _run_prolog thread.
-- Fix sinfo/squeue sort by nodename/nodeaddr/hostname.
-- Optimize getting wckey and associations usage.
-- Keep SLURM_MPI_TYPE variable in srun when not set to 'none'.
-- Remove slurm.spec-legacy packaging file.
-- pam_slurm_adopt - with action_unknown=newest configured, pick a user job
even when failing to get cgroup mtime.
-- Fix "srun --export=" parsing to handle nested commas.
-- Add default "reboot requested" reason to nodes when rebooting with scontrol.
-- Duplicate PartitionName entries in slurm.conf will now fatal() instead of
printing an error message and ignoring the successive records.
-- Remove the smap command.
-- Change exclusive behavior of a node to include all GRES on a node as well
as the cpus.
-- Append ": reboot issued" to node reason when reboot is issued from
controller. Previously only happened when nextstate was specified.
-- Add default jobname of "no-shell" for salloc --no-shell.
-- Save reservation state when automatically shrinking nodes.
-- Add slurm.conf option MaxDBDMsgs to control how many messages will be
stored in the slurmctld before throwing them away when the slurmdbd is down.
-- Change default SLURM_PMIX_TMPDIR to include user id to avoid potential
conflicts on development systems running multiple Slurm instances.
-- Return a newly added ESLURM_DEFER error and set a job state reason to
FAIL_DEFER for immediate alloc requests if defer in SchedulerParameters.
-- Make slurmctld fatal if unable to load a script or a job environment when
building the launch job message.
-- Removed the checkpoint plugin interface and all associated API calls.
-- Add job_get_grace_time() functions to preempt plugins and refactor
slurm_job_check_grace() to use them.
-- Remove --disable-iso8601 configure option.
-- Display StepId=<jobid>.batch instead of StepId=<jobid>.4294967294 in output
of "scontrol show step". (slurm_sprint_job_step_info())
-- Make it so you can have a grace time when preempting by requeue.
-- Translate MpiDefault=openmpi to functionally-equivalent MpiDefault=none,
and remove the mpi/openmpi plugin.
-- burst_buffer/datawarp - add a set of % symbols that will be replaced by
job details. E.g., %d will be filled in with the WorkDir for the job.
-- Fix sacctmgr show events to support node list ranges.
-- Add SchedulerParameters option bf_one_resv_per_job to disallow adding more
than one backfill reservation per job.
-- Allow sacctmgr to filter node events by states that are flags.
-- Allow sacctmgr to filter node events by REBOOT state/flag.
-- Add ability to set MailType and MailUser of job with scontrol.
-- slurm_init_job_desc_msg() initializes mail_type as uint16_t. This allows
mail_type to be set to NONE with scontrol.
-- Add new slurm_spank_log() function to print messages back to the user from
within a SPANK plugin. (This can be done with slurm_error() instead, but
that will always prepend "error: " to every message which may lead to
confusion.)
-- Enforce specification of partition and ALL nodes with PART_NODES flag.
-- Add 'promiscuous' flag to a reservation.
-- Implement the idea of PURGE_COMP=timespec.
-- SPANK - removed never-implemented slurm_spank_slurmd_init() interface. This
hook has always been accessible through slurm_spank_init() in the
S_CTX_SLURMD context instead.
-- sbcast - add new BcastAddr option to NodeName lines to allow sbcast traffic
to flow over an alternate network path.
-- Add auth/jwt plugin.
-- Add new 'scontrol token' subcommand.
-- PMIx - improve performance of proc map generation.
-- For a heterogeneous job to be considered for preemption all components must
be eligible for preemption.
-- Added JobCompParams to slurm.conf.
-- Add configuration parameter DependencyParameters to slurm.conf.
-- Deprecate kill_invalid_depend in SchedulerParameters and move it to new
DependencyParameters.
-- Enable job dependencies for any job on any cluster in the same federation.
-- Stricter escaping of strings sent to Elasticsearch.
-- Allow clusters to be added automatically to db at startup of ctld.
-- Add AccountingStorageExternalHost slurm.conf parameter.
-- Add support for srun -M<cluster> --jobid=# for existing remote allocations.
-- Remove LicensesUsed from 'scontrol show config'.
-- sbatch - adjusted backoff times for "--wait" option to reduce load on
slurmctld. This results in a steady-state delay of 32s between queries,
instead of the prior 10s delay.
-- Add SchedulerParameters option bf_running_job_reserve to add backfill
reservations for jobs running on whole nodes
-- salloc/sbatch/srun - error on invalid --profile option strings.
-- Remove max_job_bf option and replace with bf_max_job_test.
-- Disable sbatch, salloc, srun --reboot for non-admins.
-- jobcomp/elasticsearch - added connect_timeout and timeout options to
JobCompParams.
-- SPANK - added support for S_JOB_GID in the job script context with
spank_get_item().
-- Prolog/Epilog - add SLURM_JOB_GID environment variable.
-- Add gpu/rsmi plugin to support AMD GPUs
-- Make it so you can "stack" the energy plugins
-- Add energy accounting plugin for AMD GPU
* Changes in Slurm 19.05.9
==========================
* Changes in Slurm 19.05.8
==========================
-- sbatch - handle --uid/--gid in #SBATCH directives properly.
-- Fix HDF5 type version build error.
-- PMIx - fix potential buffer overflows from use of unpackmem().
CVE-2020-27745.
-- X11 forwarding - fix potential leak of the magic cookie when sent as an
argument to the xauth command. CVE-2020-27746.
* Changes in Slurm 19.05.7
==========================
-- Fix handling of -m/--distribution options for across socket/2nd level by
task/affinity plugin.
-- Fix grp_node_bitmap error when slurmctld started before slurmdbd.
-- Fix compilation issues in GCC10.
-- Fix distributing job steps across idle nodes within a job.
-- Break infinite loop in cons_tres dealing with incorrect tasks per tres
request resulting in slurmctld hang.
-- priority/multifactor - gracefully handle NULL list of associations or array
of siblings when calculating FairTree fairshare.
-- Fix cons_tres --exclusive=user to allocate only requested number of CPUs.
-- Add MySQL deadlock detection and automatic retry mechanism.
-- Fix _verify_node_state memory requested as --mem-per-gpu DefMemPerGPU.
-- Factor in ntasks-per-core=1 with cons_tres.
-- Fix formatting in error message in cons_tres.
-- Fix gpu bind issue when CPUs=Cores and ThreadsPerCore > 1 on a node.
-- Fix --mem-per-gpu for heterogenous --gres requests.
-- Fix slurmctld load order in load_all_part_state().
-- Fix getting system counts for named GRES/TRES.
-- MySQL - Fix for handing typed GRES for association rollups.
-- Fix step allocations when tasks_per_core > 1.
* Changes in Slurm 19.05.6
==========================
-- Fix OverMemoryKill.
-- Fix memory leak in scontrol show config.
-- Remove PART_NODES reservation flag after ignoring it at creation.
-- Fix deprecation of MemLimitEnforce parameter.
-- X11 forwarding - alter Xauthority regex to work when "FamilyWild" cookies
are present in the "xauth list" output.
-- Fix memory leak when utilizing core reservations.
-- Fix issue where adding WCKeys and then using them right away didn't always
work.
-- Add cosmetic batch step to correct component in a hetjob.
-- Fix to make scontrol write config create a usable config without editing.
-- Fix memory leak when pinging backup controller.
-- Fix issue with 'scontrol update' not enforcing all QoS / Association limits.
-- Fix to properly schedule certain jobs with cons_tres plugin.
-- Fix FIRST_CORES for reservations when using cons_tres.
-- Fix sbcast -C argument parsing.
-- Replace/deprecate max_job_bf with bf_max_job_test and print error message.
-- sched/backfill - fix options parsing when bf_hetjob_prio enabled.
-- Fix for --gpu-bind when no gpus requested.
-- Fix sshare -l crash with large values.
-- Fix printing NULL job and step pointers.
-- Break infinite loop in cons_tres dealing with incorrect tasks per tres
request resulting in slurmctld hang.
-- Improve handling of --gpus-per-task to make sure appropriate number of GPUs
is assigned to job.
* Changes in Slurm 19.05.5
==========================
-- Fix both socket-[un]constrained GRES issues that would lead to incorrect
GRES allocations and GRES underflow errors at deallocation time.
-- Reject unrunnable jobs submitted to reservations.
-- Fix misleading error returned for immediate allocation requests when defer
in SchedulerParameters by decoupling defer from too fragmented logic.
-- Fix printf format string error on FreeBSD.
-- Fix parsing of delay_boot in controller when additional arguments follow it.
-- Fix --ntasks-per-node in cons_tres.
-- Fix array tasks getting same reject reason.
-- Ignore DOWN/DRAIN partitions in reduce_completing_frag logic.
-- Fix alloc_node validation when updating a job.
-- Fix for requesting specific nodes when using cons_tres topology.
-- Ensure x11 is setup before launching a job step.
-- Fix incorrect SLURM_CLUSTER_NAME env var in batch step.
-- Perl API - Fix undefined symbol for slurmdbd_pack_fini_msg.
-- Install slurmdbd.conf.example with 0600 permissions to encourage secure
use. CVE-2019-19727.
-- srun - do not continue with job launch if --uid fails. CVE-2019-19728.
* Changes in Slurm 19.05.4
==========================
-- Don't allow empty string as a reservation name; generate a name if empty
string is provided.
-- Fix salloc segfault when using --no-shell option.
-- Fix divide by zero when normalizing partition priorities.
-- Restore ability to set JobPriorityFactor to 0 on a partition.
-- Fix multi-partition non-normalized job priorities.
-- Adjust precedence between --mem-per-cpu and --mem-per-node to enforce
them as mutually exclusive. Specifying either on the command line will
now explicitly override any value inherited through the environment.
-- Always print node's version, if it exists, in scontrol show nodes.
-- sbatch - ensure SLURM_NTASKS_PER_NODE is exported when --ntasks-per-node
is set.
-- slurmctld - fix memory leak when using DebugFlags=Reservation.
-- Reset --mem and --mem-per-cpu options correctly when using --mem-per-gpu.
-- Use correct function signature for step_set_env() in gres plugin interface.
-- Restore pre-19.05 hostname handling behavior for AllocNodes by always
truncating to just the host portion and dropping any domain name portion
returned by gethostbyaddr().
-- Fix abort initializing a configuration without acct_gather.conf.
-- Fix GRES binding and CLOUD nodes GRES setup regressions.
-- Make sview work with glib2 v2.62.
-- Fix slurmctld abort when in developer mode and submitting to multiple
partitions with a bad QOS and not enforcing QOS.
-- Enforce PART_NODES if only PartitionName is specified.
-- Fix slurmd -G functionality.
-- Fix build on 32-bit systems.
-- Remove duplicate log entry on update job.
-- sched/backfill - fix the estimated sched_nodes for multi-part jobs.
-- slurm.spec - fix pmix_version global context macro.
-- Fix cons_tres topology logic incorrectly evaluating insufficient resoruces.
-- Fix job "--switches=count@time" option handling in cons_tres topology.
-- scontrol - allow changes to the WorkDir for pending jobs.
-- Enable coordinators to delete users if they only belong to accounts that
the coordinator is over.
-- Fix regression on update from older versions with DefMemPerCPU.
-- Fix issues with --gpu-bind while using cgroups.
-- Suspend nodes after being down for SuspendTime.
-- Fix rebooting nodes from skipping nextstate states on boot.
-- Fix regression in reservation creation logic from 19.05.3 which would
incorrectly deny certain valid reservations from being created.
-- slurmdbd - process sacct/sacctmgr job queries from older clients correctly.
* Changes in Slurm 19.05.3-2
============================
-- Fix missing include for Cray Aries systems.
* Changes in Slurm 19.05.3
==========================
-- Fix missing check from conversion of cray -> cray_aries.
-- Improve job state reason string when required nodes are not available by
not including those that don't belong to the job partition.
-- Set a more appropriate ESLURM_RESERVATION_MAINT job state reason for jobs
requesting feature(s) and required nodes are in a maintenance reservation.
-- Fix logic to better handle maintenance reservations.
-- Add spank options to cache in remote callback.
-- Enforce the use of spank_option_getopt().
-- Fix select plugins' will run test under-allocating nodes usage for
completing jobs.
-- Nodes in COMPLETING state treated as being currently available for job
will-run test.
-- Cray - fix contribs slurm.conf.j2 with updated cray_aries plugin names.
-- job_submit/lua - fix problem where nil was expected for min_mem_per_cpu.
-- Fix extra, unaccounted TRESRunMins usage created by heterogeneous jobs when
running with the priority/multifactor plugin.
-- Detach threads once they are done to avoid having to join them
in track scripts code.
-- Handle situation where a slurmctld tries to communicate with slurmdbd more
than once at the same time.
-- Fix XOR/XAND features like cpu&fastio&[knl|westmere] to be resolved
correctly.
-- Don't update [min|max]_exit_code on job array task requeue.
-- Don't assume the first node of a job is the batch host when testing if the
job's allocated nodes are booted/ready.
-- Make --batch=<feature> requests wait for all nodes to be booted so that it
can choose the batch host after the nodes have been booted -- possibly with
different features.
-- Fix talking to batch host on it's protocol version when using --batch.
-- gres/mic plugin - add missing fini() function to clean up plugin state.
-- Move _validate_node_choice() before prolog/epilog check.
-- Look forward one week while create new reservation.
-- Set mising resv_desc.flags before call _select_nodes().
-- Use correct start_time for TIME_FLOAT reservation in _job_overlap().
-- Properly enforce a job's mem-per-cpu option when allocate the node
exclusively to that job.
-- sched/backfill - clear estimated sched_nodes as done for start_time.
-- Have safe_[read|write] handle EAGAIN and EINTR.
-- Fix checking for flag with logical AND.
-- Correct "extern" definition of variable if compiling with __APPLE__.
-- Deprecate FastSchedule. FastSchedule will be removed in 20.02.
The FastSchedule=2 functionality (used for testing and development) has
been retained as the new SlurmdParameters=config_overrides option.
-- Fix preemption issue when picking nodes for a feature job request.
-- Fix race condition preventing held array job from getting a db_index.
-- Fix select/cons_tres gres code infinite loop leaving slurmctld unresponsive.
-- Remove redefinition of global variable in gres.c
-- Fix issue where GPU devices are denied access when MPS is enabled.
-- Fix uninitialized errors when compiling with CFLAGS="--coverage".
-- Fix scancel --full for proctrack/cgroups.
-- Fix sdiag backfill last and mean queue length stats.
-- Do not remove batch host when resizing/shrinking a batch job.
-- nss_slurm - fix file descriptor leaks.
-- Fix preemption for jobs using complex feature requests
(e.g. -C "[rack1*2&rack2*4]").
-- Fix memory leaks in preemption when jobs request multiple features.
-- Allow Operator users to show/fix runaways.
-- Disallow coordinators to show/fix runaways.
-- mpi/pmi2 - increase array len to avoid buffer size exceeded error.
-- Preserve rebooting node's nextstate when updating state with scontrol.
-- Fully merge slurm.conf and gres.conf before node_config_load().