Triton Common Pitfalls

From the perspective of a newbie user

The Documentation is a Disaster

Recently, I had to optimize a custom operator and decided to use OpenAI’s Triton. After digging into the documentation, I was shocked at how poorly written it is — like an academic paper full of equations but lacking practical code examples.

If the library operates on tensors, the docs should clearly specify input/output shapes and provide concrete examples (like PyTorch does). Instead, everything is vaguely described in plain text, leaving users to guess the details.

How Triton Fails at Clarity

Take the tl.load documentation as an example. It mentions that block pointers support “boundary checks” and “padding options,” but:

What does “boundary check” actually do?

What’s the “padding option”?

After some trial and error, I realized it handles out-of-bounds elements — but this should be explicitly stated, not left for users to reverse-engineer.

Another issue: tl.make_block_ptr and tl.arange require block shapes and element counts to be powers of two. This restriction isn’t mentioned anywhere in the official docs.

Key API Clarifications

tl.load

Shape Constraints

tl.arange element counts and tl.make_block_ptr block shapes must be powers of two.

Memory Access Pitfalls