Large Language Models on Small Computers

7 Septiembre 2024 at 08:00

As technology progresses, we generally expect processing capabilities to scale up. Every year, we get more processor power, faster speeds, greater memory, and lower cost. However, we can also use improvements in software to get things running on what might otherwise be considered inadequate hardware. Taking this to the extreme, while large language models (LLMs) like GPT are running out of data to train on and having difficulty scaling up, [DaveBben] is experimenting with scaling down instead, running an LLM on the smallest computer that could reasonably run one.

Of course, some concessions have to be made to get an LLM running on underpowered hardware. In this case, the computer of choice is an ESP32, so the dataset was reduced from the trillions of parameters of something like GPT-4 or even hundreds of billions for GPT-3 down to only 260,000. The dataset comes from the tinyllamas checkpoint, and llama.2c is the implementation that [DaveBben] chose for this setup, as it can be streamlined to run a bit better on something like the ESP32. The specific model is the ESP32-S3FH4R2, which was chosen for its large amount of RAM compared to other versions since even this small model needs a minimum of 1 MB to run. It also has two cores, which will both work as hard as possible under (relatively) heavy loads like these, and the clock speed of the CPU can be maxed out at around 240 MHz.

Admittedly, [DaveBben] is mostly doing this just to see if it can be done since even the most powerful of ESP32 processors won’t be able to do much useful work with a large language model. It does turn out to be possible, though, and somewhat impressive, considering the ESP32 has about as much processing capability as a 486 or maybe an early Pentium chip, to put things in perspective. If you’re willing to devote a few more resources to an LLM, though, you can self-host it and use it in much the same way as an online model such as ChatGPT.

Hackaday Links: September 1, 2024

Hackaday

Dan Maloney

1 Septiembre 2024 at 23:00

Why is it always a helium leak? It seems whenever there’s a scrubbed launch or a narrowly averted disaster, space exploration just can’t get past the problems of helium plumbing. We’ve had a bunch of helium problems lately, most famously with the leaks in Starliner’s thruster system that have prevented astronauts Butch Wilmore and Suni Williams from returning to Earth in the spacecraft, leaving them on an extended mission to the ISS. Ironically, the launch itself was troubled by a helium leak before the rocket ever left the ground. More recently, the Polaris Dawn mission, which is supposed to feature the first spacewalk by a private crew, was scrubbed by SpaceX due to a helium leak on the launch tower. And to round out the helium woes, we now have news that the Peregrine mission, which was supposed to carry the first commercial lander to the lunar surface but instead ended up burning up in the atmosphere and crashing into the Pacific, failed due to — you guessed it — a helium leak.

Thankfully, there’s a bit more technical detail on that last one; it seems that a helium pressure control valve, designated PCV2 and controlling helium to pressurize an oxidizer tank, got stuck open thanks to “vibration-induced relaxation” in threaded components within the valve. So, launch vibrations shook a screw loose inside the valve, which kept it from sealing and over-pressurized an oxidizer tank with helium to the point of tank failure — kablooie, end of mission. All of these failures are just another way of saying that space travel is really, really hard, of course. But still, with helium woes figuring so prominently in so many failures, we’re left wondering if there might not be an upside to finding something else to pressurize tanks.

Back on terra firma, we got a tip from a reader going by the name of [Walrus] who is alarmed by an apparent trend in the electronics testing market toward a subscription model for the software needed to run modern test gear. Specifically, the tip included a link to a reseller offering a deal on an “Ultimate Software Bundle” for Tektronix 4 Series Mixed-Signal Oscilloscopes. The offer expired at the end of 2023 and prices aren’t mentioned, but given that a discount of up to $5,670 with purchase of a scope was advertised, we’d imagine the Ultimate Software Bundle comes at a pretty steep price. The chief concern [Walrus] expressed was about the possibility that used instruments whose software is tied to a subscription may have little to no value in the secondary market, where many up-and-coming engineers shop for affordable gear. We haven’t had any personal experience with subscription models for test equipment software, and a quick read of the Tektronix site seems to suggest that subscriptions are only one of the models available for licensing instrument software. Still, the world seems to be moving to one where everything costs something forever, and that the days of a “one and done” purchase are going away. We’d love to hear your thoughts on subscription software for test gear, especially if we’ve misread the situation with Tek. Sound off in the comments below.

In this week’s edition of “Dystopia Watch,” we’re alarmed by a story about how police departments are experimenting with generative AI to assist officers in report writing. The product, called Draft One, is from Axon, a public safety technology concern best known for its body-worn cameras and tasers. Using Azure OpenAI, Draft One transcribes the audio from body cam footage and generates a “draft narrative” of an officer’s interaction with the public. The draft is then reviewed by the officer, presumably corrected if needed, and sent on to a second reviewer before becoming the official report. Axon reports that it had to adjust the LLM’s settings to keep AI hallucinations from becoming part of the narrative. While we can see how this would be a huge benefit to officers, who generally loathe everything about report writing, and would get them back out on patrol rather than sitting in a parking lot tapping at a keyboard, we can also see how this could go completely sideways in a hurry. All it will take is one moderately competent defense attorney getting an officer to admit under oath that the words of the report were not written by him or her, and this whole thing goes away.

And finally, getting three (or more) monitors to all agree on what white is can be quite a chore, and not just a little enraging for the slightly obsessive-compulsive — it’s one of the reasons we favor dark mode so much, to be honest. Luckily, if you need a screen full of nothing but #FFFFFF pixels so you can adjust color balance in your multi-monitor setup, it’s as easy as calling up a web page. The White Screen Tool does one thing — paints all the pixels on the screen whatever color you want. If you need all white, it’s just a click away — no need to start up MS Paint or GIMP and futz around with making it bezel-to-bezel. There are plenty of other presets, if white isn’t your thing, plus a couple of fun animated screens that imitate Windows update screens — let the office hijinks begin! You can also set custom colors, which is nice; might we suggest #1A1A1A and #F3BF10?

Vista de Lectura