IngeniousRocks (They/She)

IngeniousRocks (They/She) @lemmy.dbzer0.com · edit-2 11 days ago

8b parameter models are relatively fast on 3rd gen RTX hardware with at least 8gigs of vram, CPU inferencing is slower and requires boatloads of ram but is doable on older hardware. These really aren’t designed to run on consumer hardware, but the 8b model should do fine on relatively powerful consumer hardware.

If you have something that would’ve been a high end gaming rig 4 years ago, you’re good.

If you wanna be more specific, check huggingface, they have charts. If you’re using linux with nvidia hardware you’ll be better off doing CPU inferencing.

Edit: Omg y’all I didn’t think I needed to include my sources but this is quite literally a huge issue on nvidia. Nvidia works fine on linux but you’re limited to whatever VRAM is on your video card, no RAM sharing. Y’all can disagree all you want but those are the facts. Thays why AMD and CPU inferencing are more reliable, and allow for higher context limits. They are not faster though.

Sources for nvidia stuff https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/618

https://forums.developer.nvidia.com/t/shared-vram-on-linux-super-huge-problem/336867/

https://github.com/NVIDIA/open-gpu-kernel-modules/issues/758

https://forums.opensuse.org/t/is-anyone-getting-vram-backed-by-system-memory-with-nvidia-drivers/185902