I've been talking about debugging memory leaks for more than a year now; covering Valgrind, AdressSanitizer, memleak, and heaptrack. But there are still a few more tools to explore1 and today we're going to look at jemalloc, the alternative malloc implementation from2 Meta.
Alternative malloc implementations are popular and practical. Google has tcmalloc, Microsoft has mimalloc, and Meta has jemalloc3. But jemalloc is the only malloc implementation I've seen so far with decent memory leak detection. This is because AddressSanitizer support is not sufficient to detect leaks that, for example, only sometimes trigger the OOM killer but otherwise get cleaned up on exit.
1 gperftools and bytehound are on my list to check out eventually.
2 I can't confidently summarize the history, so read this post if you're curious.
3 Other major jemalloc users include FreeBSD and Apache Arrow.
Scenario
In my last post, we introduced two memory leaks into Postgres and debugged them with heaptrack. In this post we'll introduce those same two memory leaks again1 but we will debug them with jemalloc.
While you can easily use jemalloc on macOS, heap profiling and leak detection isn't supported on macOS. So you'll have to pull out a Linux (virtual) machine.
Although we have been using Postgres as the codebase from which to explore tools for debugging memory leaks, these techniques are relevant for memory leaks in C, C++, and Rust projects in general.
Grab and build Postgres2.
$ git clone https://github.com/postgres/postgres
$ cd postgres
$ git checkout REL_17_STABLE
$ ./configure --without-zlib --without-icu \
--without-readline --enable-debug --prefix=/usr/local/
$ make -j8 && sudo make install
And grab and build jemalloc.
$ git clone https://github.com/facebook/jemalloc
$ cd jemalloc
$ ./autogen.sh
$ ./configure --enable-prof --enable-prof-frameptr
$ make -j8 && sudo make install
1 Much of the code and text of this post is taken from the previous post, my apologies.
2 I don't normally demonstrate installing globally but I'm running this in a dedicated virtual machine so installing globally doesn't bother me.
A leak in postmaster
Every time a Postgres process starts up it is scheduled by the postmaster process. Let's introduce a memory leak into postmaster.
$ git diff src/backend/postmaster
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index d032091495b..e0bf8943763 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3547,6 +3547,13 @@ BackendStartup(ClientSocket *client_sock)
Backend *bn; /* for backend cleanup */
pid_t pid;
BackendStartupData startup_data;
+ MemoryContext old;
+ int *s;
+
+ old = MemoryContextSwitchTo(TopMemoryContext);
+ s = palloc(8321);
+ *s = 12;
+ MemoryContextSwitchTo(old);
/*
* Create backend data structure. Better before the fork() so we can
Remember that Postgres allocates memory in nested arenas called MemoryContexts. The top-level arena is called TopMemoryContext
and it is freed as the process exits. Excessive allocations (leaks) in TopMemoryContext
would not be caught by Valgrind memcheck or LeakSanitizer because the memory is actually freed as the process exits because TopMemoryContext
is freed as the process exits. But while the process is alive, the above leak is real.
(If we switch from palloc
to malloc
above, LeakSanitizer does catch this leak. I didn't try Valgrind memcheck but it probably catches this too.)
An easy way to trigger this leak is by executing a ton of separate psql
clients that create tons of Postgres client backend processes.
$ for run in {1..100000}; do psql postgres -c 'select 1'; done
With the diff above in place, rebuild and reinstall Postgres.
$ make -j8 && make install
Create a database and run postgres
, but with the jemalloc library in LD_PRELOAD
.
$ initdb testdb
$ MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:true \
LD_PRELOAD=/usr/local/lib/libjemalloc.so \
postgres -D $(pwd)/testdb
2025-06-21 12:25:07.576 EDT [640443] LOG: starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit
2025-06-21 12:25:07.577 EDT [640443] LOG: listening on IPv6 address "::1", port 5432
2025-06-21 12:25:07.577 EDT [640443] LOG: listening on IPv4 address "127.0.0.1", port 5432
2025-06-21 12:25:07.578 EDT [640443] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-06-21 12:25:07.582 EDT [640446] LOG: database system was shut down at 2025-06-21 12:24:52 EDT
<jemalloc>: Leak approximation summary: ~423600 bytes, ~109 objects, >= 65 contexts
<jemalloc>: Run jeprof on dump output for leak detail
2025-06-21 12:25:07.586 EDT [640443] LOG: database system is ready to accept connections
In another terminal we'll exercise the leaking workload.
$ for run in {1..100000}; do psql postgres -c 'select 1'; done
If you want to watch the memory usage climb while this workload is running, open top
in another terminal.
When that is done we should have leaked a good deal of memory. Hit Control-C on the postgres
process and now we can see what jemalloc tells us. We'll look specifically at the heap file for the postmaster
process which was shown above in brackets, 640443
.
$ jeprof --lines --inuse_space `which postgres` testdb/jeprof.640443.0.f.heap
Using local file /usr/local/bin/postgres.
Using local file testdb/jeprof.640443.0.f.heap.
Welcome to jeprof! For help, type 'help'.
(jeprof)
Now run top --cum
to see the stack traces with the most cumulative memory in-use.
(jeprof) top --cum
Total: 976.9 MB
0.0 0.0% 0.0% 976.8 100.0% __libc_init_first@@GLIBC_2.17 ??:?
0.0 0.0% 0.0% 976.8 100.0% __libc_start_main@GLIBC_2.17 ??:?
0.0 0.0% 0.0% 976.8 100.0% _start ??:?
0.0 0.0% 0.0% 976.8 100.0% main /home/phil/postgres/src/backend/main/main.c:199
976.7 100.0% 100.0% 976.7 100.0% AllocSetAllocLarge /home/phil/postgres/src/backend/utils/mmgr/aset.c:715
0.0 0.0% 100.0% 976.6 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374
0.0 0.0% 100.0% 976.6 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676
0.0 0.0% 100.0% 976.6 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3555
0.0 0.0% 100.0% 0.1 0.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:585
0.0 0.0% 100.0% 0.1 0.0% MemoryContextAllocExtended /home/phil/postgres/src/backend/utils/mmgr/mcxt.c:1250 (discriminator 5)
And immediately we see this huge jump in in-use memory at exactly the line we started leakily palloc
-ing in src/backend/postmaster/postmaster.c
. That's perfect!
Let's introduce a leak in another Postgres process and see if we can catch that too.
A leak in a client backend
Let's leak memory in TopMemoryContext
in the implementation of <code>random()</code>.
$ git diff src/backend/utils/
diff --git a/src/backend/utils/adt/pseudorandomfuncs.c b/src/backend/utils/adt/pseudorandomfuncs.c
index 8e82c7078c5..886efbfaf78 100644
--- a/src/backend/utils/adt/pseudorandomfuncs.c
+++ b/src/backend/utils/adt/pseudorandomfuncs.c
@@ -20,6 +20,7 @@
#include "utils/fmgrprotos.h"
#include "utils/numeric.h"
#include "utils/timestamp.h"
+#include "utils/memutils.h"
/* Shared PRNG state used by all the random functions */
static pg_prng_state prng_state;
@@ -84,6 +85,13 @@ Datum
drandom(PG_FUNCTION_ARGS)
{
float8 result;
+ int* s;
+ MemoryContext old;
+
+ old = MemoryContextSwitchTo(TopMemoryContext);
+ s = palloc(100);
+ MemoryContextSwitchTo(old);
+ *s = 90;
initialize_prng();
We can trigger this leak by executing random()
a bunch of times. For example with SELECT sum(random()) FROM generate_series(1, 100_0000);
.
Build and install Postgres with this diff.
$ make -j16 && make install
And start up Postgres again against the testdb
we created before.
$ MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:true \
LD_PRELOAD=/usr/local/lib/libjemalloc.so \
postgres -D $(pwd)/testdb
2025-06-21 13:10:39.766 EDT [845169] LOG: starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit
2025-06-21 13:10:39.767 EDT [845169] LOG: listening on IPv6 address "::1", port 5432
2025-06-21 13:10:39.767 EDT [845169] LOG: listening on IPv4 address "127.0.0.1", port 5432
2025-06-21 13:10:39.767 EDT [845169] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-06-21 13:10:39.769 EDT [845172] LOG: database system was shut down at 2025-06-21 13:10:27 EDT
<jemalloc>: Leak approximation summary: ~423600 bytes, ~109 objects, >= 65 contexts
<jemalloc>: Run jeprof on dump output for leak detail
2025-06-21 13:10:39.771 EDT [845169] LOG: database system is ready to accept connections
In a new terminal, start a psql session and find the corresponding client backend PID with pg_backend_pid().
$ psql postgres
psql (17.5)
Type "help" for help.
postgres=# select pg_backend_pid();
pg_backend_pid
----------------
845177
(1 row)
postgres=#
Now run the leaking workload.
postgres=# SELECT sum(random()) FROM generate_series(1, 10_000_000);
sum
-------------------
499960.8137393289
(1 row)
Now hit Control-D to exit psql
gracefully. And hit Control-C on the postgres
process to exit it gracefully too.
Now load jeprof
with the profile file corresponding to the backend in which we leaked.
$ jeprof --lines --inuse_space `which postgres` testdb/jeprof.845177.0.f.heap
Using local file /usr/local/bin/postgres.
Using local file testdb/jeprof.845177.0.f.heap.
Welcome to jeprof! For help, type 'help'.
(jeprof)
Run top --cum
like before.
(jeprof) top --cum
Total: 1305.8 MB
0.0 0.0% 0.0% 1305.7 100.0% __libc_init_first@@GLIBC_2.17 ??:?
0.0 0.0% 0.0% 1305.7 100.0% __libc_start_main@GLIBC_2.17 ??:?
0.0 0.0% 0.0% 1305.7 100.0% _start ??:?
0.0 0.0% 0.0% 1305.7 100.0% main /home/phil/postgres/src/backend/main/main.c:199
0.0 0.0% 0.0% 1305.5 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374
0.0 0.0% 0.0% 1305.5 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676
0.0 0.0% 0.0% 1305.5 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3603
0.0 0.0% 0.0% 1305.5 100.0% postmaster_child_launch /home/phil/postgres/src/backend/postmaster/launch_backend.c:277
0.0 0.0% 0.0% 1305.4 100.0% BackendMain /home/phil/postgres/src/backend/tcop/backend_startup.c:105
1305.1 100.0% 100.0% 1305.1 100.0% AllocSetAllocFromNewBlock /home/phil/postgres/src/backend/utils/mmgr/aset.c:919
Well, we see some large allocations but not yet enough info. The default top
command limits to 10 lines of output. We can use top30 --cum
to see more.
(jeprof) top30 --cum
Total: 1305.8 MB
0.0 0.0% 0.0% 1305.7 100.0% __libc_init_first@@GLIBC_2.17 ??:?
0.0 0.0% 0.0% 1305.7 100.0% __libc_start_main@GLIBC_2.17 ??:?
0.0 0.0% 0.0% 1305.7 100.0% _start ??:?
0.0 0.0% 0.0% 1305.7 100.0% main /home/phil/postgres/src/backend/main/main.c:199
0.0 0.0% 0.0% 1305.5 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374
0.0 0.0% 0.0% 1305.5 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676
0.0 0.0% 0.0% 1305.5 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3603
0.0 0.0% 0.0% 1305.5 100.0% postmaster_child_launch /home/phil/postgres/src/backend/postmaster/launch_backend.c:277
0.0 0.0% 0.0% 1305.4 100.0% BackendMain /home/phil/postgres/src/backend/tcop/backend_startup.c:105
1305.1 100.0% 100.0% 1305.1 100.0% AllocSetAllocFromNewBlock /home/phil/postgres/src/backend/utils/mmgr/aset.c:919
0.0 0.0% 100.0% 1304.0 99.9% PostgresMain /home/phil/postgres/src/backend/tcop/postgres.c:4767
0.0 0.0% 100.0% 1304.0 99.9% PortalRun /home/phil/postgres/src/backend/tcop/pquery.c:766
0.0 0.0% 100.0% 1304.0 99.9% PortalRunSelect /home/phil/postgres/src/backend/tcop/pquery.c:922
0.0 0.0% 100.0% 1304.0 99.9% exec_simple_query /home/phil/postgres/src/backend/tcop/postgres.c:1278
0.0 0.0% 100.0% 1304.0 99.9% ExecAgg /home/phil/postgres/src/backend/executor/nodeAgg.c:2179
0.0 0.0% 100.0% 1304.0 99.9% ExecEvalExprSwitchContext (inline) /home/phil/postgres/src/backend/executor/../../../src/include/executor/executor.h:356
0.0 0.0% 100.0% 1304.0 99.9% ExecInterpExpr /home/phil/postgres/src/backend/executor/execExprInterp.c:740
0.0 0.0% 100.0% 1304.0 99.9% ExecProcNode (inline) /home/phil/postgres/src/backend/executor/../../../src/include/executor/executor.h:274
0.0 0.0% 100.0% 1304.0 99.9% ExecutePlan (inline) /home/phil/postgres/src/backend/executor/execMain.c:1649
0.0 0.0% 100.0% 1304.0 99.9% advance_aggregates (inline) /home/phil/postgres/src/backend/executor/nodeAgg.c:820
0.0 0.0% 100.0% 1304.0 99.9% agg_retrieve_direct (inline) /home/phil/postgres/src/backend/executor/nodeAgg.c:2454
0.0 0.0% 100.0% 1304.0 99.9% drandom /home/phil/postgres/src/backend/utils/adt/pseudorandomfuncs.c:93
0.0 0.0% 100.0% 1304.0 99.9% standard_ExecutorRun /home/phil/postgres/src/backend/executor/execMain.c:361
0.0 0.0% 100.0% 1.3 0.1% PostgresMain /home/phil/postgres/src/backend/tcop/postgres.c:4324
0.0 0.0% 100.0% 0.9 0.1% InitPostgres /home/phil/postgres/src/backend/utils/init/postinit.c:1194 (discriminator 5)
0.0 0.0% 100.0% 0.9 0.1% InitCatalogCachePhase2 /home/phil/postgres/src/backend/utils/cache/syscache.c:187 (discriminator 3)
0.0 0.0% 100.0% 0.9 0.1% RelationCacheInitializePhase3 /home/phil/postgres/src/backend/utils/cache/relcache.c:4372
0.0 0.0% 100.0% 0.6 0.0% RelationBuildDesc /home/phil/postgres/src/backend/utils/cache/relcache.c:1208
0.0 0.0% 100.0% 0.6 0.0% RelationIdGetRelation /home/phil/postgres/src/backend/utils/cache/relcache.c:2116
0.0 0.0% 100.0% 0.6 0.0% index_open /home/phil/postgres/src/backend/access/index/indexam.c:137
And we found our leak.