• 0 Posts
  • 8 Comments
Joined 2 years ago
cake
Cake day: December 18th, 2023

help-circle
  • “They” is the copyright industry. The same people, who are suing AI companies for money, want the Internet Archive gone for more money.

    I share the fear that the copyrightists reach a happy compromise with the bigger AI companies and monopolize knowledge. But for now, AI companies are fighting for Fair Use. The Internet Archive is already benefitting from those precedents.








  • For fastest inference, you want to fit the entire model in VRAM. Plus, you need a few GB extra for context.

    Context means the text (+images, etc) it works on. That’s the chat log, in the case of a chatbot, plus any texts you might want summarized/translated/ask questions about.

    Models can be quantized, which is a kind of lossy compression. They get smaller but also dumber. As with JPGs, the quality loss is insignificant at first and absolutely worth it.

    Inference can be split between GPU and CPU, substituting VRAM with normal RAM. Makes it slower, but you’ll probably will still feel that it’s smooth.

    Basically, it’s all trade-offs between quality, context size, and speed.