

It would need to load every part of the model from disk into ram for every token it generates. This would take ages.
What you can do, however, is quantize the model. If you, for example, quantize a 16-bit model into 4-bit, its storage and ram requirements will go down to 1/4. While the calculations will still be in 16-bit, the weights will lose some accuracy.
You don’t need a license to look at stuff.