Mamba stacks mixer levels, which happen to be the equivalent of Attention levels. The core logic of mamba is held from the MambaMixer class.
If passed together, the model uses the prior state in all the blocks (that https://k2spiceshop.com/product/liquid-k2-on-paper-online/