Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?
An H100 is a $20k USD card and has 80GB of vRAM. Imagine a 2U rack server with $100k of these cards in it. Now imagine an entire rack of these things, plus all the other components (CPUs, RAM, passive cooling or water cooling) and you're talking $1 million per rack, not including the costs to run them or the engineers needed to maintain them. Even the "cheaper"I don't think people realize the size of these compute units.When the AI bubble pops is when you're likely to be able to realistically ru...
Read more at news.ycombinator.com