Introduction:
Ever wondered what happens inside PostgreSQL from start to shutdown? How processes work together internally?
The process architecture:
When we start postgres using pg_ctl, we are starting “postmaster” which is the parent process of most of the upcoming processes. Postmaster runs the main() function which calls PostmasterMain where it will do some important work like
- initializations, like GUC options
- parses postgres binary command-line args and sets the GUCs related to them accordingly.
- does some checks on datadir, lock file, control file; and creates datadir lock file which has postmaster status and pid, which is used to make sure, for 1 cluster only 1 server or a postmaster runs.
- loads shared libraries if we have any, along with this it calculates how much shared memory is needed for core server and libraries/extensions.
- it allocates that much shared memory segment.
- Then comes the main loop which is ServerLoop where all the child processes start. This loop never ends. If it does, the server will go down.
ServerLoop (The Loop of Life):
This is an infinite event loop which waits on a Latch set by signal handler, or new connection pending on any of our sockets.
In this loop child processes start and the cleanup also happens when the same process dies. Requests/signals from backends are handled here.
In the ServerLoop the postmaster keeps on checking if any required process is not running for some reason or may be needed to start some IO workers, background processes. LaunchMissingBackgroundProcesses will start the required child processes.
The rise of child processes:
The child processes of postmaster are
- io workers: these help to read pages asynchronously, so that load will be less on normal backends,
- Checkpointer: this is used to remove/recycle the old WALs and flush dirty pages from shared buffers time to time,
- background writer: this only flushes dirty pages from shared buffers to reduce load on normal backends by maintaining as many free shared buffers as possible,
- Startup process: this replays WALs in master until recovery is completed but in replica it replays continuously WALs that are being replicated to this node,
- wal writer: this process flushes WAL from WAL buffers to disk,
- Autovacuum launcher: this periodically starts autovacuum-worker processes for the vacuum process,
- and more will be forked based on requests, I think we can write separate blogs on these processes,so not going into further detail of each process here.
In PostgreSQL, new child processes are forked in different ways depending on when they are needed.
During server startup, some child processes are created right away. Later, on demand, additional child processes may be forked, such as:
- Backend processes (to handle client connections)
- Background workers (for running user-supplied code such as from extension)
Most of these child processes are launched using the postmaster_child_launch function. This function takes a value from the BackendType enum, which determines the correct main function to call for that specific type of child process.
The Backend Processes:
Postmaster will be listening on the listen_addresses by default its “localhost” for requests from psql clients.psql tries to connect to the postmaster address using libpq library which is a framework on top of sockets for postgres.
Then the postmaster accepts the connection request and starts a new backend process to process the queries which will be coming from the psql client from now on.
As the new backend is the child process of postmaster it inherits the knowledge of shared memory in it, meaning this process can also use the shared buffers because the pointers such as BufferBlocks which are initiated by postmaster.
Per psql client process we get another new child process on the server. When a query needs a page it reads the page from disk if it's not already there in shared memory buffers. If there's no buffer empty to read a new page into because all are currently in-use, then the backend will try to find a victim buffer and forcefully flushes the page and then reads the page from disk into it.
If more reads are happening then buffers will get filled easily, so to read new pages into the shared buffers backend which is processing the query will have to flush pages itself, because of this the query execution becomes slow.
To reduce this extra work of flushing dirty buffers to disk on backends 2 other processes help which are checkpointer and background writer processes. These processes in different intervals flushes the dirty pages from shared buffers to disk, which reduces load on backends.
The disguised Backend process:
During replication, when the replica node starts, the startup process starts the walreceiver process while trying to read WALs.Then walreceiver process sends a connect request to primary’s postmaster with a “replication” message using libpq.
Postmaster starts a normal backend process for this request but when the new forked backend process parses the startup packet using ProcessStartupPacket from walreceiver, it looks for “replication”, if it's there then the backend is turned into walsender.
Walsender will read the WALs from the disk and send them to the walreceiver in physical replication.Walsender will decode the statement in the WAL and send the walreceiver in logical replication.So walsender starts as a normal backend process.
The Background workers:
These processes are used for running user-supplied code such as from extensions. If an extension wants to run such a process then it should register bg worker in PG_init() of the extension using RegisterBackgroundWorker.
So when the postmaster learns about extensions which are mentioned in the shared_preload_libraries ,the respective _PG_init() is called ,then the bg worker will be added into the postmaster’s BackgroundWorkerList list. They are started when the postmaster calls maybe_start_bgworkers while looping in ServerLoop.
There's a way to start these processes even after the server is up and running at a later point of time, by using RegisterDynamicBackgroundWorker.
There are few child processes such as “logical replication launcher” which are background workers started along with other child processes of the postmaster during the start of the server.
Requests through signals:
Postmaster have signal handlers to handle any kind of requests as following:
We can find all the definitions of below handlers,singal processors in postmaster.c.
SIGNAL | HANDLER | Signal Processor |
---|---|---|
SIGHUP | handle_pm_reload_request_signal | process_pm_reload_request |
SIGINT | handle_pm_shutdown_request_signal | process_pm_shutdown_request |
SIGQUIT | handle_pm_shutdown_request_signal | process_pm_shutdown_request |
SIGTERM | handle_pm_shutdown_request_signal | process_pm_shutdown_request |
SIGUSR1 | handle_pm_pmsignal_signal | process_pm_pmsignal |
SIGCHLD | handle_pm_child_exit_signal | process_pm_child_exit |
When the postmaster gets a particular signal it will call the respective handler sets the related variables which forces ServerLoop to call the Signal Processor functions to do the requested work.
If someone wants to re-read postgresql.conf they can send SIGHUP to the postmaster process.SIGUSR1 is used to handle pmsignal conditions representing requests from backends, and check for promote or logrotate requests from pg_ctl.
We discuss more in detail about the shutdown signals and SIGCHLD below.
The Shutdown Modes:
In postgres to shutdown the server we have 3 modes based on how shutdown should be. Basically we can either signal the postmaster directly with the signals related to the modes or let pg_ctl do the same for us by using the “-m” option.
SIGNAL | MODE |
---|---|
SIGTERM | smart |
SIGINT | fast |
SIGQUIT | immediate |
- In smart mode the server quits after all clients have disconnected.
- In fast mode the server quits directly, with proper shutdown (default).
- In immediate mode the server quits without complete shutdown; this will lead to recovery on restart.
Now the internal stuff! When the postmaster gets any of the above signals, the respective handler is called which sets the pending_pm_shutdown_request to true and also signal specific variable which makes ServerLoop call process_pm_shutdown_request() and set specific mode. Based on the mode the way of shutdown becomes different.
We can see the all below mentioned pmStates in code here.
In smart mode if postmaster is running means it will be either in PM_RUN or PM_HOT_STANDBY pmState, so it will wait until the clients complete their work but new connections won’t be allowed from now on.When i say “wait” basically when loop goes into its normal working state, when the client completes their work and exits then only the backend exists successfully, which is a child exit for postmaster so it triggers process_pm_child_exit which cleans up the backend process and as initially because of the smart shutdown signal connsAllowed is set to false, this helps us here in PostmasterStateMachine where it makes the pmState to PM_STOP_BACKENDS, which makes sure that all child processes exit and postmaster shutdowns.
But if we are in PM_STARTUP or PM_RECOVERY where there wont be any connections allowed ,we don’t need to wait for anyone and directly stop other children, it updates the pmState to PM_STOP_BACKENDS.
In fast mode it's simple in any of the above 4 pmStates the children are stopped directly means the pmState changes to PM_STOP_BACKENDS, the postmaster doesn’t wait for clients to complete their work.
In immediate mode postmaster sends SIGQUIT to all children except syslogger if enabled, for this signal during start of every child process InitPostmasterChild is called where for SIGQUIT SignalHandlerForCrashExit handler is set, which makes the process exit(2).
If syslogger was enabled then this process exits at the end when all processes even postmaster is also gone, then as it sees EOF on the syslog pipe as all processes exited.
In smart and fast modes, during shutdown a checkpoint is done.For some reason if the checkpointer is not running the postmaster starts it and signals it will SIGINT, which makes the checkpointer call ReqShutdownXLOG, this sets ShutdownXLOGPending.
Now checkpointer creates a checkpoint in ShutdownXLOG with CHECKPOINT_IS_SHUTDOWN and CHECKPOINT_IMMEDIATE flags. Then the checkpointer signals the postmaster that it's done checkpointing, then the postmaster will signal the walsender to shutdown with SIGUSR2.After waiting IO workers exit, finally the postmaster signals checkpointer SIGUSR2,which sets ShutdownRequestPending, this makes the checkpointerMain loop break and exit normally.
The Reaper:
Now we are going to see how the child processes death/exit is handled by the postmaster process.Until PG v15 the function which does the cleanup when a child process dies is literally called “reaper”. That was kind of cool.
Whenever a child process exits, the postmaster (parent process) gets a SIGCHLD signal.When the postmaster gets this signal, handle_pm_child_exit_signal is called where pending_pm_child_exit is set true.
When postmaster looping in the ServerLoop it checks pending_pm_child_exit as its true, it calls process_pm_child_exit() where the cleanup of a child process happens.
If a child process exits successfully(0), through a normal shutdown it releases any resources that child process used. If a child process exits abnormally, then HandleChildCrash api logs and handles the exit by sending a SIGABRT/SIGQUIT to all child processes except syslogger (as I explained above).
When the postmaster realizes that all the child processes are gone because of an abnormal exit of a process using FatalError which is set during HandleChildCrash, it reinitiates the server, startup process will be started again and crash recovery kicks in.
If the startup process exits normally then it means the server is consistent now and ready to accept new connections but in replica node startup will be running always to replay the WALs received from primary node.