How to run Mamba SSM on Kaggle?

Recently Mamba has been making waves due to it's linear time complexity in regards to processing tokens sequential. It is basically a Linear RNN under the hood but with selective forgetting and selective memorization, the very ablity that sets the Transformer model apart for other sequential models. The Mamba SSMs' abilty to understand long sequences led me to finetune on it hyperspectral data, and see the differences in performance. This curiousity of mine and the generous GPU compute of kaggle ("T4s free for 30 hours OMG!") led me to experiment this. Eventually not every thing turns out to be as smooth as seems.

Bit of Back story

The mamba ssm can be installed and used throught a pypi package. The README states that to install it we could just run:

!pip install mamba-ssm

It install and builds successfully:

Successfully installed mamba-ssm-2.2.5 nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-nvjitlink-cu12-12.4.127

Let's import it test it out:

from mamba_ssm import Mamba
print("✅ Mamba loaded successfully")

and here we have the error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/tmp/ipykernel_36/3998073930.py in <cell line: 0>()
----> 1 from mamba_ssm import Mamba
      2 print("✅ Mamba loaded successfully")

/usr/local/lib/python3.11/dist-packages/mamba_ssm/__init__.py in <module>
      1 __version__ = "2.2.5"
      2 
----> 3 from mamba_ssm.ops.selective_scan_interface import selective_scan_fn, mamba_inner_fn
      4 from mamba_ssm.modules.mamba_simple import Mamba
      5 from mamba_ssm.modules.mamba2 import Mamba2

/usr/local/lib/python3.11/dist-packages/mamba_ssm/ops/selective_scan_interface.py in <module>
     18 from mamba_ssm.ops.triton.layer_norm import _layer_norm_fwd
     19 
---> 20 import selective_scan_cuda
     21 
     22 

ImportError: /usr/local/lib/python3.11/dist-packages/selective_scan_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ESt7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEERKNS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb

Frustrating right, you’ve waited for it to install and now you've ran into this error I, man what even is that!

Well now you don't have to worry because I have wasted my time enough so you don't have to.

Solution

In search for a fix I went into the closed issues of the SSM's repository on github and I came across this an issuing hightlighting the same problem, turns out The installed CUDA on the hardware of our beloved kaggle GPUs is diffrent from the version PyTorch supports.

It's weird beacuse I was using the original enviroment (2025-05-13) and it has to have figured ut all of these inconsistencies in dependecies.

Anyways I folowed what Github issue told me.

We'll first run nvidia-smi see what our hardwares using:

!nvidia-smi

The output:

Sun Aug  3 07:25:15 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   36C    P8              9W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:00:05.0 Off |                    0 |
| N/A   37C    P8              9W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

We are using CUDA 12.6 and when said with pytorch we say cu126

let's check what pytorch version are we using :

!pip list | grep torch

The output:

pytorch-ignite                        0.5.2
pytorch-lightning                     2.5.2
torch                                 2.6.0+cu124
torchao                               0.10.0
torchaudio                            2.6.0+cu124
torchdata                             0.11.0
torchinfo                             1.8.0
torchmetrics                          1.7.3
torchsummary                          1.5.1
torchtune                             0.6.1
torchvision                           0.21.0+cu124

As you see are our version of Pytorch is only compatible with CUDA 12.4 and this a problem and because during Mamba's installation this version mismatch causes leave some dependencies uncompiled hence the import error.

We fix thsi by installing a pytorch version that is compatible with our CUDA version, which is 12.6:

!pip3 install -q -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

The output:


   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 822.1/822.1 MB 1.9 MB/s eta 0:00:00:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 393.1/393.1 MB 2.2 MB/s eta 0:00:00:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.9/8.9 MB 100.1 MB/s eta 0:00:0000:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 64.7 MB/s eta 0:00:00:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 897.7/897.7 kB 47.9 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 571.0/571.0 MB 3.3 MB/s eta 0:00:00:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.2/200.2 MB 8.5 MB/s eta 0:00:00:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 56.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.3/56.3 MB 31.5 MB/s eta 0:00:00:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 158.2/158.2 MB 7.7 MB/s eta 0:00:00:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 216.6/216.6 MB 7.7 MB/s eta 0:00:00:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.8/156.8 MB 8.6 MB/s eta 0:00:00:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.3/201.3 MB 1.6 MB/s eta 0:00:00:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.7/19.7 MB 8.6 MB/s eta 0:00:000:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.3/89.3 kB 6.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.7/155.7 MB 11.0 MB/s eta 0:00:0000:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.5/7.5 MB 79.6 MB/s eta 0:00:00:00:0100:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 77.3 MB/s eta 0:00:00:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 84.5 MB/s eta 0:00:00:00:0100:01
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastai 2.7.19 requires torch<2.7,>=1.10, but you have torch 2.7.1+cu126 which is incompatible.

Ignore the error at the end beacuse our goal is to install Mamba and it's dependecies nothing else.

Now let's restart the kernel again run !pip list | grep torch and now we get:


pytorch-ignite                        0.5.2
pytorch-lightning                     2.5.2
torch                                 2.7.1+cu126
torchao                               0.10.0
torchaudio                            2.7.1+cu126
torchdata                             0.11.0
torchinfo                             1.8.0
torchmetrics                          1.7.3
torchsummary                          1.5.1
torchtune                             0.6.1
torchvision                           0.22.1+cu126

Yay! our CUDA versions our now consistent, lets load Mamba:

from mamba_ssm import Mamba
print("✅ Mamba loaded successfully")

And there you have it:

✅ Mamba loaded successfully

a succussfully loaded Mamba

Conclusion

These type of errors sometimes end up wasting hours, when tried to fix, it's better to have have pre-configured enviroment just so that one doesn't face the same problem again.