NAME
  pam_slurm_adopt.so

SYNOPSIS
  Adopt incoming connections into jobs

AUTHOR
  Ryan Cox <ryan_cox@byu.edu>

MODULE TYPES PROVIDED
  account

DESCRIPTION
  This module attempts to determine the job which originated this connection.
  The module is configurable; these are the default steps:

  1) Check the local stepd for a count of jobs owned by the non-root user
    a) If none, deny (option action_no_jobs)
    b) If only one, adopt the process into that job
    c) If multiple, continue
  2) Determine src/dst IP/port of socket
  3) Issue callerid RPC to slurmd at IP address of source
    a) If the remote slurmd can identify the source job, adopt into that job
    b) If not, continue
  4) Pick a random local job from the user to adopt into (option action_unknown)

  Jobs are adopted into a job's allocation step.

MODULE OPTIONS
This module has the following options (* = default):

    ignore_root - By default, all root connections are ignored. If the RPC
                  is sent to a node which drops packets to the slurmd port, the
                  RPC will block for some time before failing. This is
                  unlikely to be desirable. Likewise, root may be trying to
                  administer the system and not do work that should be in a job.
                  The job may trigger oom-killer or just exit. If root restarts
                  a service or similar, it will be tracked and killed by Slurm
                  when the job exits. This sounds bad because it is bad.

        1* = Let the connection through without adoption
        0  = I am crazy. I want random services to die when root jobs exit. I
             also like it when RPCs block for a while then time out.


    action_no_jobs - The action to perform if the user has no jobs on the node

        ignore = Do nothing. Fall through to the next pam module
        deny* = Deny the connection


    action_unknown - The action to perform when the user has multiple jobs on
                     the node *and* the RPC does not locate the source job.
                     If the RPC mechanism works properly in your environment,
                     this option will likely be relevant *only* when connecting
                     from a login node.

        newest* = Pick the newest job on the node. The "newest" job is chosen
                  based on the mtime of the job's step_extern cgroup; asking
                  Slurm would require an RPC to the controller. The user can ssh
                  in but may be adopted into a job that exits earlier than the
                  job they intended to check on. The ssh connection will at
                  least be subject to appropriate limits and the user can be
                  informed of better ways to accomplish their objectives if this
                  becomes a problem
        allow   = Let the connection through without adoption
        deny    = Deny the connection


    action_adopt_failure - The action to perform if the process is unable to be
                           adopted into any job for whatever reason. If the
                           process cannot be adopted into the job identified by
                           the callerid RPC, it will fall through to the
                           action_unknown code and try to adopt there. A failure
                           at that point or if there is only one job will result
                           in this action being taken.

        allow* = Let the connection through without adoption
        deny = Deny the connection

    action_generic_failure - The action to perform it there certain failures
                             such as inability to talk to the local slurmd or
                             if the kernel doesn't offer the correct facilities

        ignore* = Do nothing. Fall through to the next pam module
        allow = Let the connection through without adoption
        deny = Deny the connection

    log_level - See SlurmdDebug in slurm.conf(5) for available options. The
                default log_level is info.

SLURM.CONF CONFIGURATION
  PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step
  into which ssh-launched processes will be adopted.

                       **** IMPORTANT ****
  PrologFlags=contain must be in place *before* using this module.
  The module bases its checks on local steps that have already been launched. If
  the user has no steps on the node, such as the extern step, the module will
  assume that the user has no jobs allocated to the node. Depending on your
  configuration of the pam module, you might deny *all* user ssh attempts.

NOTES
  This module and the related RPC currently support Linux systems which
  have network connection information available through /proc/net/tcp{,6}.  A
  proccess's sockets must exist as symlinks in its /proc/self/fd directory.

  The RPC data structure itself is OS-agnostic.  If support is desired for a
  different OS, relevant code must be added to find one's socket information
  then match that information on the remote end to a particular process which
  Slurm is tracking.

  IPv6 is supported by the RPC data structure itself and the code which sends it
  and receives it.  Sending the RPC to an IPv6 address is not currently
  supported by Slurm.  Once support is added, remove the relevant check in
  slurm_network_callerid().

  For the action_unknown=newest setting to work, the memory cgroup must be in
  use so that the code can check mtimes of cgroup directories. If you would
  prefer to use a different subsystem, modify the _indeterminate_multiple
  function.

FIREWALLS, IP ADDRESSES, ETC.
  slurmd should be accessible on any IP address from which a user might launch
  ssh. The RPC to determine the source job must be able to reach the slurmd
  port on that particular IP address.

  If there is no slurmd on the source node, such as on a login node, it is
  better to have the RPC be rejected rather than silently dropped.  This
  will allow better responsiveness to the RPC initiator.

EXAMPLES / SUGGESTED USAGE
  Use of this module is recommended on any compute node.

  Add the following line to the appropriate file in /etc/pam.d, such as
  system-auth or sshd:

    account    sufficient     pam_slurm_adopt.so

  If you always want to allow access for an administrative group (e.g. wheel),
  stack the pam_access module after pam_slurm_adopt. A success with
  pam_slurm_adopt is sufficient to allow access but the pam_access module can
  allow others, such as staff, access even without jobs.

    account    sufficient   pam_slurm_adopt.so
    account    required     pam_access.so


  Then edit the pam_access configuration file (/etc/security/access.conf):

    +:wheel:ALL
    -:ALL:ALL

  When access is denied, the user will receive a relevant error message.

  pam_systemd.so is known to not play nice with Slurm's usage of cgroups. It is
  recommended that you disable it or possibly add pam_slurm_adopt.so after
  pam_systemd.so.
