USB Stick Hides Large Language Model

17 Febrero 2025 at 09:00

Large language models (LLMs) are all the rage in the generative AI world these days, with the truly large ones like GPT, LLaMA, and others using tens or even hundreds of billions of parameters to churn out their text-based responses. These typically require glacier-melting amounts of computing hardware, but the “large” in “large language models” doesn’t really need to be that big for there to be a functional, useful model. LLMs designed for limited hardware or consumer-grade PCs are available now as well, but [Binh] wanted something even smaller and more portable, so he put an LLM on a USB stick.

This USB stick isn’t just a jump drive with a bit of memory on it, though. Inside the custom 3D printed case is a Raspberry Pi Zero W running llama.cpp, a lightweight, high-performance version of LLaMA. Getting it on this Pi wasn’t straightforward at all, though, as the latest version of llama.cpp is meant for ARMv8 and this particular Pi was running the ARMv6 instruction set. That meant that [Binh] needed to change the source code to remove the optimizations for the more modern ARM machines, but with a week’s worth of effort spent on it he finally got the model on the older Raspberry Pi.

Getting the model to run was just one part of this project. The rest of the build was ensuring that the LLM could run on any computer without drivers and be relatively simple to use. By setting up the USB device as a composite device which presents a filesystem to the host computer, all a user has to do to interact with the LLM is to create an empty text file with a filename, and the LLM will automatically fill the file with generated text. While it’s not blindingly fast, [Binh] believes this is the first plug-and-play USB-based LLM, and we’d have to agree. It’s not the least powerful computer to ever run an LLM, though. That honor goes to this project which is able to cram one on an ESP32.

Hackaday Links: February 9, 2025

Hackaday

Dan Maloney

10 Febrero 2025 at 00:00

January 9 ended up being a very expensive day for a Culver City, California man after he pleaded guilty to recklessly operating a drone during the height of the Pacific Palisades wildfire. We covered this story a bit when it happened (second item), which resulted in the drone striking and damaging the leading edge of a Canadian “Super Scooper” plane that was trying to fight the fire. Peter Tripp Akemann, 56, admitted to taking the opportunity to go to the top of a parking garage in Santa Monica and launching his drone to get a better view of the action to the northwest. Unfortunately, the drone got about 2,500 meters away, far beyond visual range and, as it turns out, directly in the path of the planes refilling their tanks by skimming along the waters off Malibu. The agreement between Akemann and federal prosecutors calls for a guilty plea along with full restitution to the government of Quebec, which owns the damaged plane, plus the costs of repair. Akemann needs to write a check for $65,169 plus perform 150 hours of community service related to the relief effort for the fire’s victims. Expensive, yes, but probably better than the year in federal prison such an offense could have earned him.

Another story we’ve been following for a while is the United States government’s effort to mandate that every car sold here comes equipped with an AM radio. The argument is that broadcasters, at the government’s behest, have devoted a massive amount of time and money to bulletproofing AM radio, up to and including providing apocalypse-proof bunkers for selected stations, making AM radio a vital part of the emergency communications infrastructure. Car manufacturers, however, have been routinely deleting AM receivers from their infotainment products, arguing that nobody but boomers listen to AM radio in the car anymore. This resulted in the “AM Radio for Every Vehicle Act,” which enjoyed some support the first time it was introduced but still failed to pass. The bill has been reintroduced and appears to be on a fast track to approval, both in the Senate and the House, where a companion bill was introduced this week. As for the “AM is dead” argument, the Geerling boys put the lie to that by noting that the Arbitron ratings for AM stations around Los Angeles spiked dramatically during the recent wildfires. AM might not be the first choice for entertainment anymore, but while things start getting real, people know where to go.

Most of us are probably familiar with the concept of a honeypot, which is a system set up to entice black hat hackers with the promise of juicy information but instead traps them. It’s a time-honored security tactic, but one that relies on human traits like greed and laziness to work. Protecting yourself against non-human attacks, like those coming from bots trying to train large language models on your content, is a different story. That’s where you might want to look at something like Nepenthes, a tarpit service intended to slow down and confuse the hell out of LLM bots. Named after a genus of carnivorous pitcher plants, Nepenthes traps bots with a two-pronged attack. First, the service generates a randomized but deterministic wall of text that almost but not quite reads like sensible English. It also populates a bunch of links for the bots to follow, all of which point right back to the same service, generating another page of nonsense text and self-referential links. Ingeniously devious; use with caution, of course.

When was the last time you actually read a Terms of Service document? If you’re like most of us, the closest you’ve ever come is the few occasions where you’ve got to scroll to the bottom of a text window before the “Accept Terms” button is enabled. We all know it’s not good to agree to something legally binding without reading it, but who has time to trawl through all that legalese? Nobody we know, which is where ToS; DR comes in. “Terms of Service; Didn’t Read” does the heavy lifting of ToS and EULAs for you, providing a summary of what you’re agreeing to as well as an overall grade from A to E, with E being the lowest. Refreshingly, the summaries and ratings are not performed by some LLM but rather by volunteer reviewers, who pore over the details so you don’t have to. Talk about taking one for the team.

And finally, how many continents do you think there are? Most of us were taught that there are seven, which would probably come as a surprise to an impartial extraterrestrial, who would probably say there’s a huge continent in one hemisphere, a smaller one with a really skinny section in the other hemisphere, the snowy one at the bottom, and a bunch of big islands. That’s not how geologists see things, though, and new research into plate tectonics suggests that the real number might be six continents. So which continent is getting the Pluto treatment? Geologists previously believed that the European plate fully separated from the North American plate 52 million years ago, but recent undersea observations in the arc connecting Greenland, Iceland, and the Faroe Islands suggest that the plate is still pulling apart. That would make Europe and North America one massive continent, at least tectonically. This is far from a done deal, of course; more measurements will reveal if the crust under the ocean is still stretching out, which would support the hypothesis. In the meantime, Europe, enjoy your continental status while you still can.

More Details On Why DeepSeek is a Big Deal

Hackaday

Donald Papp

4 Febrero 2025 at 00:00

The DeepSeek large language models (LLM) have been making headlines lately, and for more than one reason. IEEE Spectrum has an article that sums everything up very nicely.

We shared the way DeepSeek made a splash when it came onto the AI scene not long ago, and this is a good opportunity to go into a few more details of why this has been such a big deal.

For one thing, DeepSeek (there’s actually two flavors, -V3 and -R1, more on them in a moment) punches well above its weight. DeepSeek is the product of an innovative development process, and freely available to use or modify. It is also indirectly highlighting the way companies in this space like to label their LLM offerings as “open” or “free”, but stop well short of actually making them open source.

The DeepSeek-V3 LLM was developed in China and reportedly cost less than 6 million USD to train. This was possible thanks to developing DualPipe, a highly optimized and scalable method of training the system despite limitations due to export restrictions on Nvidia hardware. Details are in the technical paper for DeepSeek-V3.

There’s also DeepSeek-R1, a chain-of-thought “reasoning” model which handily provides its thought process enclosed within easily-parsed <think> and </think> pseudo-tags that are included in its responses. A model like this takes an iterative step-by-step approach to formulating responses, and benefits from prompts that provide a clear goal the LLM can aim for. The way DeepSeek-R1 was created was itself novel. Its training started with supervised fine-tuning (SFT) which is a human-led, intensive process as a “cold start” which eventually handed off to a more automated reinforcement learning (RL) process with a rules-based reward system. The result avoided problems that come from relying too much on RL, while minimizing the human effort of SFT. Technical details on the process of training DeepSeek-R1 are here.

DeepSeek-V3 and -R1 are freely available in the sense that one can access the full-powered models online or via an app, or download distilled models for local use on more limited hardware. It is free and open as in accessible, but not open source because not everything needed to replicate the work is actually released. Like with most LLMs, the training data and actual training code used are not available.

What is released and making waves of its own are the technical details of how researchers produced what they did, and that means there are efforts to try to make an actually open source version. Keep an eye out for Open-R1!

Modern AI on Vintage Hardware: LLama 2 Runs on Windows 98

Hackaday

Donald Papp

13 Enero 2025 at 09:00

[EXO Labs] demonstrated something pretty striking: a modified version of Llama 2 (a large language model) that runs on Windows 98. Why? Because when it comes to personal computing, if something can run on Windows 98, it can run on anything. More to the point: if something can run on Windows 98 then it’s something no tech company can control how you use, no matter how large or influential they may be. More on that in a minute.

Ever wanted to run a local LLM on 25 year old hardware? No? Well now you can, and at a respectable speed, too!

Easy With AI

EasyWithAI

20 Abril 2023 at 00:11

Stability AI, the team behind the popular AI art tool Stable Diffusion, has announced the launch of its latest creation: StableLM, a suite of text-generating AI models designed to rival systems like OpenAI’s GPT-4 and ChatGPT. Available in “alpha” on GitHub and Hugging Face, StableLM can generate both code and text and has been trained […]

Source

Vista de Lectura