More on Sputnik

More on Sputnik

More on Sputnik

More on Sputnik

8

Min read

Jan 30, 2025

Jan 30, 2025

Share this

Share this

Share this

Share this

Share this

More information has come to light regarding DeepSeek, the Chinese AI company that has created what Marc Andreessen called "a Sputnik moment." There are basically two questions about DeepSeek (many more but two main ones): how did it produce a state-of-the-art model? And how is it able to make it run so cheaply? As we noted yesterday, DeepSeek's model can run on $6,000 worth of computer equipment anybody can buy and run at home, while ChatGPT runs on giant data centers that cost billions to build and operate. We now have a good answer for the second question, which is that it was heavily optimized to run on Nvidia's computer chips, and partly written in an assembly-like language.

For the non-technical, here's what this means. A computer program is fundamentally just a series of 1s and 0s, which really correspond to instructions for a computer chip to either open an electric switch or close it. A computer chip is really billions of microscopic electric switches, called transistors, connected to each other on a very small area. The computer program, the series of 1s and 0s, is instructions on how to open and close those switches. Open and close the right sequence of switches, and you can get a computer screen to show you, say, a policy-focused newsletter. Okay. If you've ever seen computer code, you have seen that it is not a series of 0s and 1s but instead a series of numbers and symbols. It's impossible, or at least extremely arduous, for a human being to write code as a series of 0s and 1s and nobody would do it except as a science project (or possibly a form of torture). Computer code is written in a language which allows programmers to abstract away the 0s and 1s and instead write a series of instructions. The computer code is written into a program called a compiler, which turns this abstract language into the correct series of 0s and 1s. So, a programmer will write, into whatever language he uses, something that means "if the user presses the button L on his keyboard, the letter L will appear on the screen." The compiler will then turn that instruction into the right series of 0s and 1s. Okay. But if you're sharp, you've already noticed a problem: where does the letter L appear? Is it a capital L or a lowercase l? In what font? And so on. Which is why some computer languages work at a higher level of abstraction than others. In the very simplified analogy we are using, some computer languages will just let you write "type L on the screen" and decide the particulars for you. Other computer languages will force you to decide the specifics. The tradeoff between less-abstract and more-abstract computer languages immediately becomes obvious: more-abstract languages are much easier (therefore faster) to program with, but leave you fewer choices. Less-abstract languages give you more ability, but are more complicated (therefore slower) to program with. Much as in the world of conceptual thought, more abstraction makes things easier but you lose complexity—and as in the world of conceptual thought, this is neither good or bad, and really depends on what you're trying to do.

Once we are done with this somewhat-lengthy (but, we hope, interesting and enlightening) explanation, we can explain what the DeepSeek people did. If you want to program a GPU (the chips that are used to run AI models), the industry uses a standard language called CUDA (which is a variant of C/C++ which is the most popular programming language in the world, in large part because it sits at a "goldilocks" level between abstraction and complexity). The DeepSeek programmer instead wrote their chip program in a derivative of Assembly; Assembly is the oldest and least-abstract computer language there is: it is as close as you can get to just writing sequences of 0s and 1s without actually just writing sequences of 0s and 1s. And for that reason, it is extremely tedious to program with and is not used in 99.999% of real-world programming scenarios. Using Assembly allowed the DeepSeek programmers to tell their chips exactly what to do and when, which bits of memory to use and when, for example, which allowed them to optimize the code to run much more efficiently on the same (or very similar) chips that American AI giants like OpenAI and Anthropic use. This is all very interesting, you might say, but what does it mean, practically? Well, first, it means we know how they did it. Second, it means that we now know they did something technically very impressive, but that's not magic. They did not invent new physics. We can do this too if we want to. So that's how they were able to run the model so cheaply.

What about the model itself? How did it get so good? Top Silicon Valley VC Vinod Khosla recently xeeted: "One of our startups found Deepseek makes the same mistakes O1 [an OpenAI model] makes, a strong indication the technology was ripped off. It feels like they then they hacked some code and did some impressive optimizations on top. Most likely, not an effort from scratch" So there are some national security implications there.

Policy News You Need To Know

#AI — More AI news, from CNBC: "OpenAI partners with U.S. National Laboratories on scientific research, nuclear weapons security"

#Kids #K12 #BigTech — Last year, Arkansas launched a pilot program to support schools that chose to go phone-free. The results are so good that now Gov. Huckabee is introducing a bill to make all schools phone free. The phone free schools movement is gaining steam. More here.

#EOs #K12 — The Daily Caller reports that the President is set to sign an EO prohibiting federal funding to K-12 schools that teach critical race theory or radical gender ideology, as well as reinstating his 1776 Commission.

#WeWereRight — New NAEP education test scores show aftermath of pandemic school closures: ‘heartbreaking.’

#TechRight — Interesting: a bunch of luminaries from the new right have come together to pursue a "family and tech agenda", in the form of a manifesto with "ten guiding principles for empowering families through technology".

#Marriage #Family — “Even today, though people sometimes report in surveys that marriage isn’t all that important for childbearing, the fact is people overwhelmingly prefer to be married before having children, and thus most delay pregnancy until they have a spouse.” From a new sociological survey highlighted by the good people at IFS.

#Space — Remember the stranded astronauts that definitely weren't stranded but also couldn't get home because their Boeing capsule was broken? President Trump commissioned Elon Musk to go get them.

#Immigration — NYT: Trump Officials Revoke Biden’s Extension of Protections for Venezuelans

Chart of the Day

This is a very well-known finding to specialists, but not well-known enough to the public: poverty has no effect on violent crime. (Via Jonatan Pallesen)

Meme of the Day

PolicySphere

Newsletter

By clicking Subscribe, you agree to share your email address with PolicySphere to receive the Morning Briefing. Full terms

By clicking Subscribe, you agree to share your email address with PolicySphere to receive the Morning Briefing. Full terms

PolicySphere

Newsletter

By clicking Subscribe, you agree to share your email address with PolicySphere to receive the Morning Briefing. Full terms

By clicking Subscribe, you agree to share your email address with PolicySphere to receive the Morning Briefing. Full terms