IngeniousRocks (They/She)
Don’t DM me without permission please
- 0 Posts
- 1 Comment
Joined 1 year ago
Cake day: December 7th, 2024
You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.
Don’t DM me without permission please
8b parameter models are relatively fast on 3rd gen RTX hardware with at least 8gigs of vram, CPU inferencing is slower and requires boatloads of ram but is doable on older hardware. These really aren’t designed to run on consumer hardware, but the 8b model should do fine on relatively powerful consumer hardware.
If you have something that would’ve been a high end gaming rig 4 years ago, you’re good.
If you wanna be more specific, check huggingface, they have charts. If you’re using linux with nvidia hardware you’ll be better off doing CPU inferencing.
Edit: Omg y’all I didn’t think I needed to include my sources but this is quite literally a huge issue on nvidia. Nvidia works fine on linux but you’re limited to whatever VRAM is on your video card, no RAM sharing. Y’all can disagree all you want but those are the facts. Thays why AMD and CPU inferencing are more reliable, and allow for higher context limits. They are not faster though.
Sources for nvidia stuff https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/618
https://forums.developer.nvidia.com/t/shared-vram-on-linux-super-huge-problem/336867/
https://github.com/NVIDIA/open-gpu-kernel-modules/issues/758
https://forums.opensuse.org/t/is-anyone-getting-vram-backed-by-system-memory-with-nvidia-drivers/185902