Vista de Lectura

Hay nuevos artículos disponibles. Pincha para refrescar la página.

USB Stick Hides Large Language Model

Large language models (LLMs) are all the rage in the generative AI world these days, with the truly large ones like GPT, LLaMA, and others using tens or even hundreds of billions of parameters to churn out their text-based responses. These typically require glacier-melting amounts of computing hardware, but the “large” in “large language models” doesn’t really need to be that big for there to be a functional, useful model. LLMs designed for limited hardware or consumer-grade PCs are available now as well, but [Binh] wanted something even smaller and more portable, so he put an LLM on a USB stick.

This USB stick isn’t just a jump drive with a bit of memory on it, though. Inside the custom 3D printed case is a Raspberry Pi Zero W running llama.cpp, a lightweight, high-performance version of LLaMA. Getting it on this Pi wasn’t straightforward at all, though, as the latest version of llama.cpp is meant for ARMv8 and this particular Pi was running the ARMv6 instruction set. That meant that [Binh] needed to change the source code to remove the optimizations for the more modern ARM machines, but with a week’s worth of effort spent on it he finally got the model on the older Raspberry Pi.

Getting the model to run was just one part of this project. The rest of the build was ensuring that the LLM could run on any computer without drivers and be relatively simple to use. By setting up the USB device as a composite device which presents a filesystem to the host computer, all a user has to do to interact with the LLM is to create an empty text file with a filename, and the LLM will automatically fill the file with generated text. While it’s not blindingly fast, [Binh] believes this is the first plug-and-play USB-based LLM, and we’d have to agree. It’s not the least powerful computer to ever run an LLM, though. That honor goes to this project which is able to cram one on an ESP32.

Hackaday Links: February 9, 2025

Hackaday Links Column Banner

January 9 ended up being a very expensive day for a Culver City, California man after he pleaded guilty to recklessly operating a drone during the height of the Pacific Palisades wildfire. We covered this story a bit when it happened (second item), which resulted in the drone striking and damaging the leading edge of a Canadian “Super Scooper” plane that was trying to fight the fire. Peter Tripp Akemann, 56, admitted to taking the opportunity to go to the top of a parking garage in Santa Monica and launching his drone to get a better view of the action to the northwest. Unfortunately, the drone got about 2,500 meters away, far beyond visual range and, as it turns out, directly in the path of the planes refilling their tanks by skimming along the waters off Malibu. The agreement between Akemann and federal prosecutors calls for a guilty plea along with full restitution to the government of Quebec, which owns the damaged plane, plus the costs of repair. Akemann needs to write a check for $65,169 plus perform 150 hours of community service related to the relief effort for the fire’s victims. Expensive, yes, but probably better than the year in federal prison such an offense could have earned him.

Another story we’ve been following for a while is the United States government’s effort to mandate that every car sold here comes equipped with an AM radio. The argument is that broadcasters, at the government’s behest, have devoted a massive amount of time and money to bulletproofing AM radio, up to and including providing apocalypse-proof bunkers for selected stations, making AM radio a vital part of the emergency communications infrastructure. Car manufacturers, however, have been routinely deleting AM receivers from their infotainment products, arguing that nobody but boomers listen to AM radio in the car anymore. This resulted in the “AM Radio for Every Vehicle Act,” which enjoyed some support the first time it was introduced but still failed to pass. The bill has been reintroduced and appears to be on a fast track to approval, both in the Senate and the House, where a companion bill was introduced this week. As for the “AM is dead” argument, the Geerling boys put the lie to that by noting that the Arbitron ratings for AM stations around Los Angeles spiked dramatically during the recent wildfires. AM might not be the first choice for entertainment anymore, but while things start getting real, people know where to go.

Most of us are probably familiar with the concept of a honeypot, which is a system set up to entice black hat hackers with the promise of juicy information but instead traps them. It’s a time-honored security tactic, but one that relies on human traits like greed and laziness to work. Protecting yourself against non-human attacks, like those coming from bots trying to train large language models on your content, is a different story. That’s where you might want to look at something like Nepenthes, a tarpit service intended to slow down and confuse the hell out of LLM bots. Named after a genus of carnivorous pitcher plants, Nepenthes traps bots with a two-pronged attack. First, the service generates a randomized but deterministic wall of text that almost but not quite reads like sensible English. It also populates a bunch of links for the bots to follow, all of which point right back to the same service, generating another page of nonsense text and self-referential links. Ingeniously devious; use with caution, of course.

When was the last time you actually read a Terms of Service document? If you’re like most of us, the closest you’ve ever come is the few occasions where you’ve got to scroll to the bottom of a text window before the “Accept Terms” button is enabled. We all know it’s not good to agree to something legally binding without reading it, but who has time to trawl through all that legalese? Nobody we know, which is where ToS; DR comes in. “Terms of Service; Didn’t Read” does the heavy lifting of ToS and EULAs for you, providing a summary of what you’re agreeing to as well as an overall grade from A to E, with E being the lowest. Refreshingly, the summaries and ratings are not performed by some LLM but rather by volunteer reviewers, who pore over the details so you don’t have to. Talk about taking one for the team.

And finally, how many continents do you think there are? Most of us were taught that there are seven, which would probably come as a surprise to an impartial extraterrestrial, who would probably say there’s a huge continent in one hemisphere, a smaller one with a really skinny section in the other hemisphere, the snowy one at the bottom, and a bunch of big islands. That’s not how geologists see things, though, and new research into plate tectonics suggests that the real number might be six continents. So which continent is getting the Pluto treatment? Geologists previously believed that the European plate fully separated from the North American plate 52 million years ago, but recent undersea observations in the arc connecting Greenland, Iceland, and the Faroe Islands suggest that the plate is still pulling apart. That would make Europe and North America one massive continent, at least tectonically. This is far from a done deal, of course; more measurements will reveal if the crust under the ocean is still stretching out, which would support the hypothesis. In the meantime, Europe, enjoy your continental status while you still can.

More Details On Why DeepSeek is a Big Deal

The DeepSeek large language models (LLM) have been making headlines lately, and for more than one reason. IEEE Spectrum has an article that sums everything up very nicely.

We shared the way DeepSeek made a splash when it came onto the AI scene not long ago, and this is a good opportunity to go into a few more details of why this has been such a big deal.

For one thing, DeepSeek (there’s actually two flavors, -V3 and -R1, more on them in a moment) punches well above its weight. DeepSeek is the product of an innovative development process, and freely available to use or modify. It is also indirectly highlighting the way companies in this space like to label their LLM offerings as “open” or “free”, but stop well short of actually making them open source.

The DeepSeek-V3 LLM was developed in China and reportedly cost less than 6 million USD to train. This was possible thanks to developing DualPipe, a highly optimized and scalable method of training the system despite limitations due to export restrictions on Nvidia hardware. Details are in the technical paper for DeepSeek-V3.

There’s also DeepSeek-R1, a chain-of-thought “reasoning” model which handily provides its thought process enclosed within easily-parsed <think> and </think> pseudo-tags that are included in its responses. A model like this takes an iterative step-by-step approach to formulating responses, and benefits from prompts that provide a clear goal the LLM can aim for. The way DeepSeek-R1 was created was itself novel. Its training started with supervised fine-tuning (SFT) which is a human-led, intensive process as a “cold start” which eventually handed off to a more automated reinforcement learning (RL) process with a rules-based reward system. The result avoided problems that come from relying too much on RL, while minimizing the human effort of SFT. Technical details on the process of training DeepSeek-R1 are here.

DeepSeek-V3 and -R1 are freely available in the sense that one can access the full-powered models online or via an app, or download distilled models for local use on more limited hardware. It is free and open as in accessible, but not open source because not everything needed to replicate the work is actually released. Like with most LLMs, the training data and actual training code used are not available.

What is released and making waves of its own are the technical details of how researchers produced what they did, and that means there are efforts to try to make an actually open source version. Keep an eye out for Open-R1!

Modern AI on Vintage Hardware: LLama 2 Runs on Windows 98

[EXO Labs] demonstrated something pretty striking: a modified version of Llama 2 (a large language model) that runs on Windows 98. Why? Because when it comes to personal computing, if something can run on Windows 98, it can run on anything. More to the point: if something can run on Windows 98 then it’s something no tech company can control how you use, no matter how large or influential they may be. More on that in a minute.

Ever wanted to run a local LLM on 25 year old hardware? No? Well now you can, and at a respectable speed, too!

What’s it like to run an LLM on Windows 98? Aside from the struggles of things like finding compatible peripherals (back to PS/2 hardware!) and transferring the required files (FTP over Ethernet to the rescue) or even compilation (some porting required), it works maybe better than one might expect.

A Windows 98 machine with Pentium II processor and 128 MB of RAM generates a speedy 39.31 tokens per second with a 260K parameter Llama 2 model. A much larger 15M model generates 1.03 tokens per second. Slow, but it works. Going even larger will also work, just ever slower. There’s a video on X that shows it all in action.

It’s true that modern LLMs have billions of parameters so these models are tiny in comparison. But that doesn’t mean they can’t be useful. Models can be shockingly small and still be perfectly coherent and deliver surprisingly strong performance if their training and “job” is narrow enough, and the tools to do that for oneself are all on GitHub.

This is a good time to mention that this particular project (and its ongoing efforts) are part of a set of twelve projects by EXO Labs focusing on ensuring things like AI models can be run anywhere, by anyone, independent of tech giants aiming to hold all the strings.

And hey, if local AI and the command line is something that’s up your alley, did you know they already exist as single-file, multi-platform, command-line executables?

Giskard

Giskard is an open-source AI model quality testing tool that helps data scientists and engineers build safer, more reliable AI systems. The platform was built by AI engineers for AI engineers. It’s completely open source and designed to help teams and developers build more robust, trustworthy AI models. To use the platform, you can get […]

Source

Faraday.dev

Faraday.dev lets you easily run open-source LLMs (chatbots) on your computer. Once you’ve got the program and AI models installed, no internet connection is required to use and interact with the AI LLMs. Faraday.dev supports a wide range of LLaMA-based models, including WizardLM, GPT4-x-Alpaca, Vicuna, Koala, Open Assistant, PygmalionAI, and more. You have the option […]

Source

Oobabooga

Oobabooga is an open-source Gradio web UI for large language models that provides three user-friendly modes for chatting with LLMs: a default two-column view, a notebook-style interface, and a chat interface. This flexibility allows you to interact with the AI models in a way that best suits your needs, whether it’s for writing, analysis, question-answering, […]

Source

Code Llama

Code Llama is a suite of large language models released by Meta AI for generating and enhancing code. It includes foundation models for general coding, Python specializations, and models tailored for following instructions. Key features include state-of-the-art performance, code infilling, large context support up to 100K tokens, and zero-shot ability to follow instructions for programming […]

Source

ChatGLM-6B

ChatGLM-6B is an open-source, bilingual conversational AI LLM based on the General Language Model (GLM) framework. It has 6.2 billion parameters and can be deployed locally with only 6GB of GPU memory. This model allows for natural language processing in both Chinese and English, question answering, task-oriented dialogue, and easy integration via API and demo […]

Source

Perplexity AI

Perplexity AI is an AI chat and search engine that uses advanced technology to provide direct answers to your queries. It delivers accurate answers using large language models and even includes links to citations and related topics. It is available for free via web browser and also on mobile via the Apple App Store. Using […]

Source

Codestral

Codestral is a powerful 22B parameter AI model from Mistral AI. This open-weight model is designed specifically for code generation across over 80 programming languages including Python, Java, C++, JavaScript and more. Codestral offers impressive performance, outperforming other models on benchmarks like HumanEval and RepoBench with its large 32k token context window. The model is […]

Source

Langtail

Langtail is a platform that helps you develop and deploy LLM-powered applications faster. It provides tools for prompt engineering, testing, observability, and deployment – all in one place. You can collaborate with your team, iterate quickly, and get your LLM apps to production with confidence.

Source

Mistral AI

Mistral AI is a large language model and chat assistant tool. You can access the chatbot via the Mitral website by clicking on “Talk to le Chat“, or if you prefer a local setup then you can download and run the model files on your own hardware. The creators of Mistral describe it as an […]

Source

❌