SnapViewer Devlog #3: Optimizations
2025-10-02
Intro: Troubleshooting Memory and Speed Performance
Disclaimer: I develop and test primarily on Windows using the latest stable Rust toolchain and CPython 3.13.
1. Background and Motivation
SnapViewer handles large memory snapshots effectively. However, when processing extremely large dumps (e.g., a 1.3 GB snapshot), we encountered serious bottlenecks:
- Format conversion triggered memory peaks around 30 GB.
- Data loading caused another 30 GB spike.
2. Profile-Guided Optimization
I decomposed the data-loading pipeline:
- Reading the compressed file
- Extracting JSON from the compressed stream
- Deserializing JSON into Rust structures
- Populating an in-memory SQLite database
- Building the triangle mesh on CPU
- Initializing the rendering window
Eliminating Redundant Clones
- First attempt: switch from
Vec<T>to&[T]. Failed due to lifetimes. - Final solution: use
Arc<[T]>. No significant overhead observed.
Early Deallocation of Intermediate Structures
- Use scoped blocks to limit lifetimes
- Explicitly invoke
drop()on unneeded buffers
Peak memory dropped by roughly one-third.
3. Sharding JSON Deserialization
- Shard JSON data into chunks of at most 50,000 entries.
- Deserialize each chunk independently.
4. Redesigning the Snapshot Format
I split the snapshot into two files:
- allocations.json: lightweight JSON with timestamps and sizes.
- elements.db: SQLite database holding call-stack text.
At runtime:
- Load
allocations.jsoninto memory. - Open
elements.dbon disk. - On click, query
elements.db.
5. Results and Lessons
After these optimizations:
- No longer spikes to 60+ GB of RAM.
- Starts up much faster.
- Maintains smooth rendering.
What I learned:
- Do not always load everything into memory.
- SQLite is a good choice when you need disk storage with intelligent caching.