Debugging memory leaks in Postgres, jemalloc edition

June 21, 2025

I've been talking about debugging memory leaks for more than a year now; covering Valgrind, AdressSanitizer, memleak, and heaptrack. But there are still a few more tools to explore1 and today we're going to look at jemalloc, the alternative malloc implementation from2 Meta.

Alternative malloc implementations are popular and practical. Google has tcmalloc, Microsoft has mimalloc, and Meta has jemalloc3. But jemalloc is the only malloc implementation I've seen so far with decent memory leak detection. This is because AddressSanitizer support is not sufficient to detect leaks that, for example, only sometimes trigger the OOM killer but otherwise get cleaned up on exit.

1 gperftools and bytehound are on my list to check out eventually.
2 I can't confidently summarize the history, so read this post if you're curious.
3 Other major jemalloc users include FreeBSD and Apache Arrow.

Scenario

In my last post, we introduced two memory leaks into Postgres and debugged them with heaptrack. In this post we'll introduce those same two memory leaks again1 but we will debug them with jemalloc.

While you can easily use jemalloc on macOS, heap profiling and leak detection isn't supported on macOS. So you'll have to pull out a Linux (virtual) machine.

Although we have been using Postgres as the codebase from which to explore tools for debugging memory leaks, these techniques are relevant for memory leaks in C, C++, and Rust projects in general.

Grab and build Postgres2.

$ git clone https://github.com/postgres/postgres
$ cd postgres
$ git checkout REL_17_STABLE
$ ./configure --without-zlib --without-icu \
    --without-readline --enable-debug --prefix=/usr/local/
$ make -j8 && sudo make install

And grab and build jemalloc.

$ git clone https://github.com/facebook/jemalloc
$ cd jemalloc
$ ./autogen.sh
$ ./configure --enable-prof --enable-prof-frameptr
$ make -j8 && sudo make install

1 Much of the code and text of this post is taken from the previous post, my apologies.
2 I don't normally demonstrate installing globally but I'm running this in a dedicated virtual machine so installing globally doesn't bother me.

A leak in postmaster

Every time a Postgres process starts up it is scheduled by the postmaster process. Let's introduce a memory leak into postmaster.

$ git diff src/backend/postmaster
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index d032091495b..e0bf8943763 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3547,6 +3547,13 @@ BackendStartup(ClientSocket *client_sock)
       Backend    *bn;                         /* for backend cleanup */
       pid_t           pid;
       BackendStartupData startup_data;
+       MemoryContext old;
+       int *s;
+
+       old = MemoryContextSwitchTo(TopMemoryContext);
+       s = palloc(8321);
+       *s = 12;
+       MemoryContextSwitchTo(old);
       /*
        * Create backend data structure.  Better before the fork() so we can

Remember that Postgres allocates memory in nested arenas called MemoryContexts. The top-level arena is called TopMemoryContext and it is freed as the process exits. Excessive allocations (leaks) in TopMemoryContext would not be caught by Valgrind memcheck or LeakSanitizer because the memory is actually freed as the process exits because TopMemoryContext is freed as the process exits. But while the process is alive, the above leak is real.

(If we switch from palloc to malloc above, LeakSanitizer does catch this leak. I didn't try Valgrind memcheck but it probably catches this too.)

An easy way to trigger this leak is by executing a ton of separate psql clients that create tons of Postgres client backend processes.

$ for run in {1..100000}; do psql postgres -c 'select 1'; done

With the diff above in place, rebuild and reinstall Postgres.

$ make -j8 && make install

Create a database and run postgres, but with the jemalloc library in LD_PRELOAD.

$ initdb testdb
$ MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:true \
  LD_PRELOAD=/usr/local/lib/libjemalloc.so \
  postgres -D $(pwd)/testdb
2025-06-21 12:25:07.576 EDT [640443] LOG:  starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit
2025-06-21 12:25:07.577 EDT [640443] LOG:  listening on IPv6 address "::1", port 5432
2025-06-21 12:25:07.577 EDT [640443] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2025-06-21 12:25:07.578 EDT [640443] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-06-21 12:25:07.582 EDT [640446] LOG:  database system was shut down at 2025-06-21 12:24:52 EDT
<jemalloc>: Leak approximation summary: ~423600 bytes, ~109 objects, >= 65 contexts
<jemalloc>: Run jeprof on dump output for leak detail
2025-06-21 12:25:07.586 EDT [640443] LOG:  database system is ready to accept connections

In another terminal we'll exercise the leaking workload.

$ for run in {1..100000}; do psql postgres -c 'select 1'; done

If you want to watch the memory usage climb while this workload is running, open top in another terminal.

When that is done we should have leaked a good deal of memory. Hit Control-C on the postgres process and now we can see what jemalloc tells us. We'll look specifically at the heap file for the postmaster process which was shown above in brackets, 640443.

$ jeprof --lines --inuse_space `which postgres` testdb/jeprof.640443.0.f.heap
Using local file /usr/local/bin/postgres.
Using local file testdb/jeprof.640443.0.f.heap.
Welcome to jeprof!  For help, type 'help'.
(jeprof)

Now run top --cum to see the stack traces with the most cumulative memory in-use.

(jeprof) top --cum
Total: 976.9 MB
    0.0   0.0%   0.0%    976.8 100.0% __libc_init_first@@GLIBC_2.17 ??:?
    0.0   0.0%   0.0%    976.8 100.0% __libc_start_main@GLIBC_2.17 ??:?
    0.0   0.0%   0.0%    976.8 100.0% _start ??:?
    0.0   0.0%   0.0%    976.8 100.0% main /home/phil/postgres/src/backend/main/main.c:199
  976.7 100.0% 100.0%    976.7 100.0% AllocSetAllocLarge /home/phil/postgres/src/backend/utils/mmgr/aset.c:715
    0.0   0.0% 100.0%    976.6 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374
    0.0   0.0% 100.0%    976.6 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676
    0.0   0.0% 100.0%    976.6 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3555
    0.0   0.0% 100.0%      0.1   0.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:585
    0.0   0.0% 100.0%      0.1   0.0% MemoryContextAllocExtended /home/phil/postgres/src/backend/utils/mmgr/mcxt.c:1250 (discriminator 5)

And immediately we see this huge jump in in-use memory at exactly the line we started leakily palloc-ing in src/backend/postmaster/postmaster.c. That's perfect!

Let's introduce a leak in another Postgres process and see if we can catch that too.

A leak in a client backend

Let's leak memory in TopMemoryContext in the implementation of <code>random()</code>.

$ git diff src/backend/utils/
diff --git a/src/backend/utils/adt/pseudorandomfuncs.c b/src/backend/utils/adt/pseudorandomfuncs.c
index 8e82c7078c5..886efbfaf78 100644
--- a/src/backend/utils/adt/pseudorandomfuncs.c
+++ b/src/backend/utils/adt/pseudorandomfuncs.c
@@ -20,6 +20,7 @@
#include "utils/fmgrprotos.h"
#include "utils/numeric.h"
#include "utils/timestamp.h"
+#include "utils/memutils.h"

/* Shared PRNG state used by all the random functions */
static pg_prng_state prng_state;
@@ -84,6 +85,13 @@ Datum
drandom(PG_FUNCTION_ARGS)
{
       float8          result;
+       int* s;
+       MemoryContext old;
+
+       old = MemoryContextSwitchTo(TopMemoryContext);
+       s = palloc(100);
+       MemoryContextSwitchTo(old);
+       *s = 90;
       initialize_prng();

We can trigger this leak by executing random() a bunch of times. For example with SELECT sum(random()) FROM generate_series(1, 100_0000);.

Build and install Postgres with this diff.

$ make -j16 && make install

And start up Postgres again against the testdb we created before.

$ MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:true \
  LD_PRELOAD=/usr/local/lib/libjemalloc.so \
  postgres -D $(pwd)/testdb
2025-06-21 13:10:39.766 EDT [845169] LOG:  starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit
2025-06-21 13:10:39.767 EDT [845169] LOG:  listening on IPv6 address "::1", port 5432
2025-06-21 13:10:39.767 EDT [845169] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2025-06-21 13:10:39.767 EDT [845169] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-06-21 13:10:39.769 EDT [845172] LOG:  database system was shut down at 2025-06-21 13:10:27 EDT
<jemalloc>: Leak approximation summary: ~423600 bytes, ~109 objects, >= 65 contexts
<jemalloc>: Run jeprof on dump output for leak detail
2025-06-21 13:10:39.771 EDT [845169] LOG:  database system is ready to accept connections

In a new terminal, start a psql session and find the corresponding client backend PID with pg_backend_pid().

$ psql postgres
psql (17.5)
Type "help" for help.
postgres=# select pg_backend_pid();
pg_backend_pid
----------------
        845177
(1 row)
postgres=#

Now run the leaking workload.

postgres=# SELECT sum(random()) FROM generate_series(1, 10_000_000);
       sum
-------------------
499960.8137393289
(1 row)

Now hit Control-D to exit psql gracefully. And hit Control-C on the postgres process to exit it gracefully too.

Now load jeprof with the profile file corresponding to the backend in which we leaked.

$ jeprof --lines --inuse_space `which postgres` testdb/jeprof.845177.0.f.heap
Using local file /usr/local/bin/postgres.
Using local file testdb/jeprof.845177.0.f.heap.
Welcome to jeprof!  For help, type 'help'.
(jeprof)

Run top --cum like before.

(jeprof) top --cum
Total: 1305.8 MB
    0.0   0.0%   0.0%   1305.7 100.0% __libc_init_first@@GLIBC_2.17 ??:?
    0.0   0.0%   0.0%   1305.7 100.0% __libc_start_main@GLIBC_2.17 ??:?
    0.0   0.0%   0.0%   1305.7 100.0% _start ??:?
    0.0   0.0%   0.0%   1305.7 100.0% main /home/phil/postgres/src/backend/main/main.c:199
    0.0   0.0%   0.0%   1305.5 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374
    0.0   0.0%   0.0%   1305.5 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676
    0.0   0.0%   0.0%   1305.5 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3603
    0.0   0.0%   0.0%   1305.5 100.0% postmaster_child_launch /home/phil/postgres/src/backend/postmaster/launch_backend.c:277
    0.0   0.0%   0.0%   1305.4 100.0% BackendMain /home/phil/postgres/src/backend/tcop/backend_startup.c:105
 1305.1 100.0% 100.0%   1305.1 100.0% AllocSetAllocFromNewBlock /home/phil/postgres/src/backend/utils/mmgr/aset.c:919

Well, we see some large allocations but not yet enough info. The default top command limits to 10 lines of output. We can use top30 --cum to see more.

(jeprof) top30 --cum
Total: 1305.8 MB
    0.0   0.0%   0.0%   1305.7 100.0% __libc_init_first@@GLIBC_2.17 ??:?
    0.0   0.0%   0.0%   1305.7 100.0% __libc_start_main@GLIBC_2.17 ??:?
    0.0   0.0%   0.0%   1305.7 100.0% _start ??:?
    0.0   0.0%   0.0%   1305.7 100.0% main /home/phil/postgres/src/backend/main/main.c:199
    0.0   0.0%   0.0%   1305.5 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374
    0.0   0.0%   0.0%   1305.5 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676
    0.0   0.0%   0.0%   1305.5 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3603
    0.0   0.0%   0.0%   1305.5 100.0% postmaster_child_launch /home/phil/postgres/src/backend/postmaster/launch_backend.c:277
    0.0   0.0%   0.0%   1305.4 100.0% BackendMain /home/phil/postgres/src/backend/tcop/backend_startup.c:105
 1305.1 100.0% 100.0%   1305.1 100.0% AllocSetAllocFromNewBlock /home/phil/postgres/src/backend/utils/mmgr/aset.c:919
    0.0   0.0% 100.0%   1304.0  99.9% PostgresMain /home/phil/postgres/src/backend/tcop/postgres.c:4767
    0.0   0.0% 100.0%   1304.0  99.9% PortalRun /home/phil/postgres/src/backend/tcop/pquery.c:766
    0.0   0.0% 100.0%   1304.0  99.9% PortalRunSelect /home/phil/postgres/src/backend/tcop/pquery.c:922
    0.0   0.0% 100.0%   1304.0  99.9% exec_simple_query /home/phil/postgres/src/backend/tcop/postgres.c:1278
    0.0   0.0% 100.0%   1304.0  99.9% ExecAgg /home/phil/postgres/src/backend/executor/nodeAgg.c:2179
    0.0   0.0% 100.0%   1304.0  99.9% ExecEvalExprSwitchContext (inline) /home/phil/postgres/src/backend/executor/../../../src/include/executor/executor.h:356
    0.0   0.0% 100.0%   1304.0  99.9% ExecInterpExpr /home/phil/postgres/src/backend/executor/execExprInterp.c:740
    0.0   0.0% 100.0%   1304.0  99.9% ExecProcNode (inline) /home/phil/postgres/src/backend/executor/../../../src/include/executor/executor.h:274
    0.0   0.0% 100.0%   1304.0  99.9% ExecutePlan (inline) /home/phil/postgres/src/backend/executor/execMain.c:1649
    0.0   0.0% 100.0%   1304.0  99.9% advance_aggregates (inline) /home/phil/postgres/src/backend/executor/nodeAgg.c:820
    0.0   0.0% 100.0%   1304.0  99.9% agg_retrieve_direct (inline) /home/phil/postgres/src/backend/executor/nodeAgg.c:2454
    0.0   0.0% 100.0%   1304.0  99.9% drandom /home/phil/postgres/src/backend/utils/adt/pseudorandomfuncs.c:93
    0.0   0.0% 100.0%   1304.0  99.9% standard_ExecutorRun /home/phil/postgres/src/backend/executor/execMain.c:361
    0.0   0.0% 100.0%      1.3   0.1% PostgresMain /home/phil/postgres/src/backend/tcop/postgres.c:4324
    0.0   0.0% 100.0%      0.9   0.1% InitPostgres /home/phil/postgres/src/backend/utils/init/postinit.c:1194 (discriminator 5)
    0.0   0.0% 100.0%      0.9   0.1% InitCatalogCachePhase2 /home/phil/postgres/src/backend/utils/cache/syscache.c:187 (discriminator 3)
    0.0   0.0% 100.0%      0.9   0.1% RelationCacheInitializePhase3 /home/phil/postgres/src/backend/utils/cache/relcache.c:4372
    0.0   0.0% 100.0%      0.6   0.0% RelationBuildDesc /home/phil/postgres/src/backend/utils/cache/relcache.c:1208
    0.0   0.0% 100.0%      0.6   0.0% RelationIdGetRelation /home/phil/postgres/src/backend/utils/cache/relcache.c:2116
    0.0   0.0% 100.0%      0.6   0.0% index_open /home/phil/postgres/src/backend/access/index/indexam.c:137

And we found our leak.

Share this