Recently Mamba has been making waves due to it's linear time complexity in regards to processing tokens sequential. It is basically a Linear RNN under the hood but with selective forgetting and selective memorization, the very ablity that sets the Transformer model apart for other sequential models. The Mamba SSMs' abilty to understand long sequences led me to finetune on it hyperspectral data, and see the differences in performance. This curiousity of mine and the generous GPU compute of kaggle ("T4s free for 30 hours OMG!") led me to experiment this. Eventually not every thing turns out to be as smooth as seems.
Bit of Back story
The mamba ssm can be installed and used throught a pypi package. The README states that to install it we could just run:
!pip install mamba-ssm
It install and builds successfully:
Successfully installed mamba-ssm-2.2.5 nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-nvjitlink-cu12-12.4.127
Let's import it test it out:
from mamba_ssm import Mamba
print("✅ Mamba loaded successfully")
and here we have the error:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
/tmp/ipykernel_36/3998073930.py in <cell line: 0>()
----> 1 from mamba_ssm import Mamba
2 print("✅ Mamba loaded successfully")
/usr/local/lib/python3.11/dist-packages/mamba_ssm/__init__.py in <module>
1 __version__ = "2.2.5"
2
----> 3 from mamba_ssm.ops.selective_scan_interface import selective_scan_fn, mamba_inner_fn
4 from mamba_ssm.modules.mamba_simple import Mamba
5 from mamba_ssm.modules.mamba2 import Mamba2
/usr/local/lib/python3.11/dist-packages/mamba_ssm/ops/selective_scan_interface.py in <module>
18 from mamba_ssm.ops.triton.layer_norm import _layer_norm_fwd
19
---> 20 import selective_scan_cuda
21
22
ImportError: /usr/local/lib/python3.11/dist-packages/selective_scan_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ESt7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEERKNS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb
Frustrating right, you’ve waited for it to install and now you've ran into this error I, man what even is that!
Well now you don't have to worry because I have wasted my time enough so you don't have to.
Solution
In search for a fix I went into the closed issues of the SSM's repository on github and I came across this an issuing hightlighting the same problem, turns out The installed CUDA on the hardware of our beloved kaggle GPUs is diffrent from the version PyTorch supports.
It's weird beacuse I was using the original enviroment (2025-05-13) and it has to have figured ut all of these inconsistencies in dependecies.
Anyways I folowed what Github issue told me.
We'll first run nvidia-smi
see what our hardwares using:
!nvidia-smi
The output:
Sun Aug 3 07:25:15 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 36C P8 9W / 70W | 1MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
| N/A 37C P8 9W / 70W | 1MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
We are using CUDA 12.6 and when said with pytorch we say cu126
let's check what pytorch version are we using :
!pip list | grep torch
The output:
pytorch-ignite 0.5.2
pytorch-lightning 2.5.2
torch 2.6.0+cu124
torchao 0.10.0
torchaudio 2.6.0+cu124
torchdata 0.11.0
torchinfo 1.8.0
torchmetrics 1.7.3
torchsummary 1.5.1
torchtune 0.6.1
torchvision 0.21.0+cu124
As you see are our version of Pytorch is only compatible with CUDA 12.4 and this a problem and because during Mamba's installation this version mismatch causes leave some dependencies uncompiled hence the import error.
We fix thsi by installing a pytorch version that is compatible with our CUDA version, which is 12.6:
!pip3 install -q -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
The output:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 822.1/822.1 MB 1.9 MB/s eta 0:00:00:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 393.1/393.1 MB 2.2 MB/s eta 0:00:00:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.9/8.9 MB 100.1 MB/s eta 0:00:0000:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 64.7 MB/s eta 0:00:00:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 897.7/897.7 kB 47.9 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 571.0/571.0 MB 3.3 MB/s eta 0:00:00:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.2/200.2 MB 8.5 MB/s eta 0:00:00:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 56.0 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.3/56.3 MB 31.5 MB/s eta 0:00:00:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 158.2/158.2 MB 7.7 MB/s eta 0:00:00:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 216.6/216.6 MB 7.7 MB/s eta 0:00:00:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.8/156.8 MB 8.6 MB/s eta 0:00:00:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.3/201.3 MB 1.6 MB/s eta 0:00:00:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.7/19.7 MB 8.6 MB/s eta 0:00:000:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.3/89.3 kB 6.0 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.7/155.7 MB 11.0 MB/s eta 0:00:0000:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.5/7.5 MB 79.6 MB/s eta 0:00:00:00:0100:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 77.3 MB/s eta 0:00:00:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 84.5 MB/s eta 0:00:00:00:0100:01
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastai 2.7.19 requires torch<2.7,>=1.10, but you have torch 2.7.1+cu126 which is incompatible.
Ignore the error at the end beacuse our goal is to install Mamba and it's dependecies nothing else.
Now let's restart the kernel again run !pip list | grep torch
and now we get:
pytorch-ignite 0.5.2
pytorch-lightning 2.5.2
torch 2.7.1+cu126
torchao 0.10.0
torchaudio 2.7.1+cu126
torchdata 0.11.0
torchinfo 1.8.0
torchmetrics 1.7.3
torchsummary 1.5.1
torchtune 0.6.1
torchvision 0.22.1+cu126
Yay! our CUDA versions our now consistent, lets load Mamba:
from mamba_ssm import Mamba
print("✅ Mamba loaded successfully")
And there you have it:
✅ Mamba loaded successfully
a succussfully loaded Mamba
Conclusion
These type of errors sometimes end up wasting hours, when tried to fix, it's better to have have pre-configured enviroment just so that one doesn't face the same problem again.