Vista Normal

Hay nuevos artículos disponibles. Pincha para refrescar la página.
Ayer — 9 Julio 2025Salida Principal

Crunching The News For Fun And Little Profit

Por: Jenny List
9 Julio 2025 at 14:00

Do you ever look at the news, and wonder about the process behind the news cycle? I did, and for the last couple of decades it’s been the subject of one of my projects. The Raspberry Pi on my shelf runs my word trend analysis tool for news content, and since my journey from curious geek to having my own large corpus analysis system has taken twenty years it’s worth a second look.

How Career Turmoil Led To A Two Decade Project

A hanging sign surrounded by ornate metalwork, with the legend "Cyder house".
This is very much a minority spelling. Colin Smith, CC BY-SA 2.0.

In the middle of the 2000s I had come out of the dotcom crash mostly intact, and was working for a small web shop. When they went bust I was casting around as one does, and spent a while as a Google quality rater while I looked for a new permie job. These teams are employed by the search giant through temporary employment agencies, and in loose terms their job is to be the trained monkeys against whom the algorithm is tested. The algorithm chose X, and if the humans also chose X, the algorithm is probably getting it right. Being a quality rater is not in any way a high-profile job, but with the big shiny G on my CV I soon found myself in demand from web companies seeking some white-hat search engine marketing expertise. What I learned mirrored my lesson from a decade earlier in the CD-ROM business, that on the web as in any other electronic publishing medium, good content well presented has priority over any black-hat tricks.

But what makes good content? Forget an obsession with stuffing bogus keywords in the text, and instead talk about the right things, and do it authoritatively. What are the right things in this context? If you are covering a subject, you need to do so using the right language; that which the majority uses rather than language only you use. I can think of a bunch of examples which I probably shouldn’t talk about, but an example close to home for me comes in cider. In the UK, cider is a fermented alcoholic drink made from apples, and as a craft cidermaker of many years standing I have a good grasp of its vocabulary. The accepted spelling is “Cider”, but there’s an alternate spelling of “Cyder” used by some commercial producers of the drink. It doesn’t take long to realise that online, hardly anyone uses cyder with a Y, and thus pages concentrating on that word will do less well than those talking about cider.

A graph of the word football versus the word soccer in British news.
We Brits rarely use the word “soccer” unless there’s a story about the Club World Cup in America.

I started to build software to analyse language around a given topic, with the aim of discerning the metaphorical cider from the cyder. It was a great surprise a few years later to discover that I had invented for myself the already-existing field of computational linguistics, something that would have saved me a lot of time had I known about it when I began. I was taking a corpus of text and computing the frequencies and collocates (words that appear alongside each other) of the words within it, and from that I could quickly see which wording mattered around a subject, and which didn’t. This led seamlessly to an interest in what the same process would look like for news data with a time axis added, so I created a version which harvested its corpus from RSS feeds. Thus began my decades-long project.

From Project Idea, To Corpus Appliance

In 2005 I knew how to create websites in the manner of the day, so I used the tools I had. PHP5, and MySQL. I know PHP is unfashionable these days, but at the time this wasn’t too controversial, and aside from all the questionable quality PHP code out there it remains a useful scripting language. Using MySQL however would cause me immense problems. I had done what seemed the right thing and created a structured database with linked tables, but I hadn’t fully appreciated just how huge was the task I had taken on. Harvesting the RSS firehose across multiple media outlets brings in thousands of stories every week, so queries which were near-instantaneous during my first development stages grew to take many minutes as my corpus expanded. It was time to come up with an alternative, and I found it in the most basic of OS features, the filesystem.

A graph of the words cat and doc in British news.
I have no idea why British news has more dog stories than cat stories.

Casting back to the 1990s, when you paid for web hosting it was given in terms of the storage space it came with. The processing power required to run your CGI scripts or later server-side interpreters such as ASP or PHP, wasn’t considered. It thus became normal practice to try to reduce storage use and not think about processing, and I had without thinking followed this path.

But by the 2000s the price of storage had dropped hugely while that of processing hadn’t. This was the decade in which cloud services such as AWS made an appearance, and as well as buying many-gigabyte hard disks for not a lot, you could also for the first time rent a cloud bucket for pennies. My corpus analysis system didn’t need to spend all its time computing if I could use a terabyte hard drive to make up for less processor usage, so I turned my system on its head. When collecting the RSS stories my retrieval script would pre-compute the final data and store it in a vast tree of tiny JSON files accessible at high speed through the filesystem, and then my analysis software could simply retrieve them and make its report. The system moved from a hard-working x86 laptop to a whisper-quiet and low powered Raspberry Pi with a USB hard disk, and there it has stayed in some form ever since.

Just What Can This Thing Do?

A bubble cloud for the week of 2016-06-23, when the UK Brexit referendum happened. Big words are EU, Brexit,referendum, leave, and vote.
No prizes for guessing what happened this week.

So I have a news corpus that has taken me a long time to build. I can take one or more words, and I can compare their occurrence over time. I can watch the news cycle, I can see stories build up over time. I can even see trends which sometimes go against received opinion, such as spotting that the eventual winner of the 2016 UK Labour leadership race was likely to be Jeremy Corbyn early on while the herd were looking elsewhere. Sometimes as with the performance of the word “Brexit” over the middle of the last decade I can see the great events of our times in stark relief, but perhaps it’s in the non-obvious that there’s most value. If you follow a topic and it suddenly dries up for a couple of days, expect a really big story on day three, for example. I can also see which outlets cover one story more than another, something helpful when trying to ascertain if a topic is being pushed on behalf of a particular lobby.

My experiment in text analysis then turned into something much more, even dare I say it, something I find of help in figuring out what’s really going on in turbulent times. But from a tech point of view it’s taught me a huge amount, about statistics, about language, about text parsing, and even about watching the number of available inodes on a hard drive. Believe me, many millions of tiny files in a tree can become unwieldy. But perhaps most of all, after a lifetime of mucking about with all manner of projects but generating little of lasting significance, I can look at this one and say I created something useful. And that is something to be happy about.

AnteayerSalida Principal

The Hackaday Summer Reading List: No AI Involvement, Guaranteed

Por: Jenny List
7 Julio 2025 at 17:00

If you have any empathy at all for those of us in the journalistic profession, have some pity for the poor editor at the Chicago Sun-Times, who let through an AI-generated summer reading list made up of novels which didn’t exist.  The fake works all had real authors and thus looked plausible, thus we expect that librarians and booksellers throughout the paper’s distribution area were left scratching their heads as to why they’re not in the catalogue.

Here at Hackaday we’re refreshingly meat-based, so with a guarantee of no machine involvement, we’d like to present our own summer reading list. They’re none of them new works but we think you’ll find them as entertaining, informative, or downright useful as we did when we read them. What are you reading this summer?

Surely You’re Joking, Mr. Feynman!

Richard P. Feynman was a Nobel-prize-winning American physicist whose career stretched from the nuclear weapons lab at Los Alamos in the 1940s to the report on the Challenger shuttle disaster in the 1980s, along the way working at the boundaries of quantum physics. He was also something of a character, and that side of him comes through in this book based on a series of taped interviews he gave.

We follow him from his childhood when he convinced his friends he could see into the future by picking up their favourite show from a distant station that broadcast it at an earlier time, to Los Alamos where he confuses security guards by escaping through a hole in the fence, and breaks into his colleagues’ safes. I first read this book thirty years ago, and every time I read it again I still find it witty and interesting. A definite on the Hackaday reading list!

Back Into The Storm

A lot of us are fascinated by the world of 1980s retrocomputers, and here at Hackaday we’re fortunate to have among our colleagues a number of people who were there as it happened, and who made significant contributions to the era.

Among them is Bil Herd, whose account of his time working at Jack Tramiel’s Commodore from the early to mid 1980s capture much more than just the technology involved. It’s at the same time an an insider’s view of a famous manufacturer and a tale redolent with the frenetic excesses of that moment in computing history. The trade shows and red-eye flights, the shonky prototypes demonstrated to the world, and the many might-have-been machines which were killed by the company’s dismal marketing are all recounted with a survivor’s eye, and really give a feeling for the time. We reviewed it in 2021, and it’s still very readable today.

The Cuckoo’s Egg

In the mid 1980s, Cliff Stoll was a junior academic working as a university sysadmin, whose job was maintaining the system that charged for access to their timesharing system. Chasing a minor discrepancy in this financial system led him to discover an unauthorised  user, which in turn led him down a rabbit-hole of computer detective work chasing an international blackhat that’s worthy of James Bond.

This book is one of the more famous break-out novels about the world of hacking, and is readable because of its combination of story telling and the wildly diverse worlds in which it takes place. From the hippyish halls of learning to three letter agencies, where he gets into trouble for using a TOP SECRET stamp, it will command your attention from cover to cover. We reviewed it back in 2017 and it was already a couple of decades old then, but it’s a book which doesn’t age.

The Code Book

Here’s another older book, this time Simon Singh’s popular mathematics hit, The Code Book. It’s a history of cryptography from Roman and medieval cyphers to the quantum computer, and where its value lies is in providing comprehensible explanations of how each one works.

Few of us need to know the inner workings of RSA or the Vigniere square in our everyday lives, but we live in a world underpinned by encryption. This book provides a very readable introduction, and much more than a mere bluffers guide, to help you navigate it.

The above are just a small selection of light summer reading that we’ve been entertained by over the years, and we hope that you will enjoy them. But you will have your own selections too, would you care to share them with us?

Header image: Sheila Sund, CC BY 2.0.

A Feast Of 1970s Gaming History, And An 8080 Arcade Board

Por: Jenny List
5 Julio 2025 at 11:00

Sometimes a write-up of a piece of retrocomputing hardware goes way beyond the hardware itself and into the industry that spawned it, and thus it is with [OldVCR]’s resurrection of a Blasto arcade board from 1978. It charts the history of Gremlin Industries, a largely forgotten American pioneer in the world of arcade games, and though it’s a long read it’s well worth it.

The board itself uses an Intel 8080, and is fairly typical of microcomputer systems from the late 1970s. Wiring it up requires a bit of detective work, particularly around triggering the 8080’s reset, but eventually it’s up and playing with a pair of Atari joysticks. The 8080 is a CPU we rarely see here.

The history of the company is fascinating, well researched, and entertaining. What started as an electronics business moved into wall games, early coin-op electronic games, and thence into the arcade segment with an 8080 based system that’s the precursor of the one here. They even released a rather impressive computer system based on the same hardware, but since it was built into a full-sized desk it didn’t sell well. For those of us new to Gremlin Industries the surprise comes at the end, they were bought by Sega and became that company’s American operation. In that sense they never went away, as their successor is very much still with us. Meanwhile if you have an interest in the 8080, we have been there for you.

Track Your GitHub Activity With This E-Ink Display

Por: Jenny List
5 Julio 2025 at 08:00

If you’re a regular GitHub user you’ll be familiar with the website’s graphical calendar display of activity as a grid. For some of you it will show a hive of activity, while for others it will be a bit spotty. If you’re proud of your graph though, you’ll want to show it off to the world, and that’s where [HarryHighPants]’ Git Contributions E-Ink Display comes in. It’s a small desktop appliance with a persistent display, that shows the current version of your GitHub graph.

At its heart is an all-in-one board with the display and an ESP32 on the back, with a small Li-Po cell. It’s all put in a smart 3D printed case. The software is the real trick, with a handy web interface from which you can configure your GitHub details.

It’s a simple enough project, but it joins a growing collection which use an ESP32 as a static information display. The chip is capable of more though, as shown by this much more configurable device.

It’s 2025, And We Still Need IPv4! What Happens When We Lose It?

Por: Jenny List
3 Julio 2025 at 11:00

Some time last year, a weird thing happened in the hackerspace where this is being written. The Internet was up, and was blisteringly fast as always, but only a few websites worked. What was up? Fortunately with more than one high-end networking specialist on hand it was quickly established that we had a problem with our gateway’s handling of IPv4 addresses, and normal service was restored. But what happens if you’re not a hackerspace with access to the dodgy piece of infrastructure and you’re left with only IPv6? [James McMurray] had this happen, and has written up how he fixed it.

His answer came in using a Wireguard tunnel to his VPS, and NAT mapping the IPv4 space into a section of IPv6 space. The write-up goes into extensive detail on the process should you need to follow his example, but for us there’s perhaps more interest in why here in 2025, the loss of IPv4 is still something that comes with the loss of half the Internet. As of this writing, that even includes Hackaday itself. If we had the magic means to talk to ourselves from a couple of decades ago our younger selves would probably be shocked by this.

Perhaps the answer lies in the inescapable conclusion that IPv6 answers an address space problem of concern to many in technical spaces, it neither solves anything of concern to most internet users, nor is worth the switch for so much infrastructure when mitigations such as NAT make the IPv4 address space problem less of a problem. Will we ever entirely lose IP4? We’d appreciate your views in the comments. For readers anxious for more it’s something we looked at last year.

Finally, An Extension To Copyright Law We Can Get Behind

Por: Jenny List
2 Julio 2025 at 11:00

Normally when a government extends a piece of copyright law we expect it to be in the favour of commercial interests with deep pockets and little care for their consumers. But in Denmark they do things differently it seems, which is why they are giving Danes the copyright over their own features such as their faces or voices. Why? To combat deepfakes, meaning that if you deepfake a Dane, they can come after you for big bucks, or indeed kronor. It’s a major win, in privacy terms.

You might of course ask, whether it’s now risky to photograph a Dane. We are not of course lawyers here but like any journalists we have to possess a knowledge of how copyright works, and we are guessing that the idea in play here is that of passing off. If you take a photograph of a Volkswagen you will have captured the VW logo on its front, but the car company will not sue you because you are not passing off something that’s not a Volkswagen as the real thing. So it will be with Danes; if you take a picture of their now-copyrighted face in a crowd you are not passing it off as anything but a real picture of them, so we think you should be safe.

We welcome this move, and wish other countries would follow suit.


Pope Francis, Midjourney, Public domain, (Which is a copyright story all of its own!)

Break The Air Gap With Ultrasound

Por: Jenny List
30 Junio 2025 at 02:00

In the world of information security, much thought goes into ensuring that no information can leave computer networks without expressly being permitted to do so. Conversely, a lot of effort is expended on the part of would-be attackers to break through whatever layers are present. [Halcy] has a way to share data between computers, whether they are networked or not, and it uses ultrasound.

To be fair, this is more of a fun toy than an elite exploit, because it involves a web interface that encodes text as ultrasonic frequency shift keying. Your computer speakers and microphone can handle it, but it’s way above the human hearing range. Testing it here, we were able to send text mostly without errors over a short distance, but at least on this laptop, we wouldn’t call it reliable.

We doubt that many sensitive servers have a sound card and speakers installed where you can overhear them, but by contrast, there are doubtless many laptops containing valuable information, so we could imagine it as a possible attack vector. The code is on the linked page, should you be interested, and if you want more ultrasonic goodness, this definitely isn’t the first time we have touched upon it. While a sound card might be exotic on a server, a hard drive LED isn’t.

Reading The Chip In Your Passport

Por: Jenny List
29 Junio 2025 at 02:00

For over a decade, most passports have contained an NFC chip that holds a set of electronically readable data about the document and its holder. This has resulted in a much quicker passage through some borders as automatic barriers can replace human officials, but at the same time, it adds an opaque layer to the process. Just what data is on your passport, and can you read it for yourself? [Terence Eden] wanted to find out.

The write-up explains what’s on the passport and how to access it. Surprisingly, it’s a straightforward process, unlike, for example, the NFC on a bank card. Security against drive-by scanning is provided by the key being printed on the passport, requiring the passport to be physically opened.

He notes that it’s not impossible to brute force this key, though doing so reveals little that’s not printed on the document. The write-up reveals a piece of general-purpose technical knowledge we should all know. However, there’s a question we’re left with that it doesn’t answer. If we can read the data on a passport chip, could a passport forger thus create a counterfeit one? If any readers are in the know, we’d be interested to hear more in the comments. If you are into NFC hacking, maybe you need a handy multitool.

Header: [Tony Webster], CC BY-SA 4.0.

A Cheap Smart Plug To Block Distractions

Por: Jenny List
27 Junio 2025 at 08:00

We have all suffered from this; the boss wants you to compile a report on the number of paper clips and you’re crawling up the wall with boredom, so naturally your mind strays to other things. You check social media, or maybe the news, and before you know it a while has been wasted. [Neil Chen] came up with a solution, to configure a cheap smart plug with a script to block his diversions of choice.

The idea is simple enough, the plug is in an outlet that requires getting up and walking a distance to access, so to flip that switch you’ve really got to want to do it. Behind it lives a Python script that can be found in a Git Hub repository, and that’s it! We like it for its simplicity and ingenuity, though we’d implore any of you to avoid using it to block Hackaday. Some sites are simply too important to avoid!

Of course, if distraction at work is your problem, perhaps you should simply run something without it.

Revealing The Last Mac Easter Egg

Por: Jenny List
26 Junio 2025 at 08:00

A favourite thing for the developers behind a complex software project is to embed an Easter egg: something unexpected that can be revealed only by those in the know. Apple certainly had their share of them in their early days, a practice brought to a close by Steve Jobs on his return to the company. One of the last Macs to contain one was the late 1990s beige G3, and while its existence has been know for years, until now nobody has decoded the means to display it on the Mac. Now [Doug Brown] has taken on the challenge.

The Easter egg is a JPEG file embedded in the ROM with portraits of the team, and it can’t be summoned with the keypress combinations used on earlier Macs. We’re taken on a whirlwind tour of ROM disassembly as he finds an unexpected string in the SCSI driver code. Eventually it’s found that formatting the RAM disk with the string as a volume name causes the JPEG to be saved into the disk, and any Mac user can come face to face with the dev team. It’s a joy reserved now for only a few collectors of vintage hardware, but still over a quarter century later, it’s fascinating to learn about. Meanwhile, this isn’t the first Mac easter egg to find its way here.

There’s A New Reusable Rocket, And It’s A Honda

Por: Jenny List
24 Junio 2025 at 08:00

As we watched the latest SpaceX Starship rocket test end in a spectacular explosion, we might have missed the news from Japan of a different rocket passing a successful test. We all know Honda as a car company but it seems they are in the rocket business too, and they successfully tested a reusable rocket. It’s an experimental 900 kg model that flew to a height of 300 m before returning itself to the pad, but it serves as a valuable test platform for Honda’s take on the technology.

It’s a research project as it stands, but it’s being developed with an eye towards future low-cost satellite launches rather than as a crew launch platform.As a news story though it’s of interest beyond its technology, because it’s too easy to miss news from the other side of the world when all eyes are looking at Texas. It’s the latest in a long line of interesting research projects from the company, and we hope that this time they resist the temptation to kill their creation rather than bring it to market.

3D Print Glass, Using Accessible Techniques

Por: Jenny List
23 Junio 2025 at 02:00

When seeing a story from MIT’s Lincoln Labs that promises 3D printing glass, our first reaction was that it might use some rare or novel chemicals, and certainly a super-high-tech printer. Perhaps it was some form of high-temperature laser sintering, unlikely to be within the reach of mere mortals. How wrong we were, because these boffins have developed a way to 3D print a glass-like material using easy-to-source materials and commonly available equipment.

The print medium is sodium silicate solution, commonly known as waterglass, mixed with silica and other inorganic nanoparticles. It’s referred to as an ink, and it appears to be printed using a technique very similar to the FDM printers we all know. The real magic comes in the curing process, though, because instead of being fired in a special furnace, these models are heated to 200 Celsius in an oil bath. They can then be solvent cleaned and are ready for use. The result may not be the fine crystal glass you may be expecting, but we can certainly see plenty of uses for it should it be turned into a commercial product. Certainly more convenient than sintering with a laser cutter.

Bento is an All-In-One Computer Designed to be Useful

Por: Jenny List
20 Junio 2025 at 08:00

All-in-one computers in which the mainboard lurked beneath a keyboard were once the default in home computing, but more recently they have been relegated to interesting niche devices such as the Raspberry Pi 400 and 500.

The Bento is another take on the idea, coming at it not with the aim of replacing a desktop machine, instead as a computer for use with wearable display glasses. The thinking goes that when your display is head mounted, why carry around a screen with your laptop.

On top it’s a keyboard, but underneath it’s a compartmentalized space similar to the Japanese lunchboxes which lend the project its name. The computing power comes courtesy of a Steam Deck so it has a USB-C-for-everything approach to plugging in a desktop, though there’s a stated goal to produce versions for other boards such as the Raspberry Pi. There’s even an empty compartment for storage of peripherals.

We like this computer, both for being a cyberdeck and for being without a screen so not quite like the other cyberdecks. It’s polished enough that we could almost imagine it as a commercial product. It’s certainly not the first Steam Deck based cyberdeck we’ve seen.

Capturing Screenshots Using a Fake Printer

Por: Jenny List
18 Junio 2025 at 05:00

If you have very old pieces of analogue test equipment with CRTs on your bench, the chances are they will all have surprisingly similar surrounds to their screens. Back when they were made it was common to record oscilloscope screens with a Polaroid camera, that would have a front fitting for just this purpose.

More recent instruments are computerized so taking a screen shot should be easier, but that’s still not easy if the machine can’t save to a handy disk. Along comes [Tom] with a solution, to hook up a fake printer, and grab the screen from a print.

Old instruments come with a variety of ports, serial, IEE-488, or parallel, but they should usually have the ability to print a screen. Then capturing that is a case of capturing an interpreting the print data, be it ESC/P, PCL5, Postscript, or whatever. The linked page takes us through a variety of techniques, and should be of help to anyone who’s picked up a bargain in the flea market.

This isn’t the only time we’ve touched on the subject of bringing older computerized equipment into the present, we’ve also shown you a disk drive emulator.

Thanks [JohnU] for the tip.

The PCB Router You Wish You Had Made

Por: Jenny List
15 Junio 2025 at 05:00

The advent of cheap and accessible one-off PCB production has been one of the pivotal moments for electronic experimenters during the last couple of decades. Perhaps a few still etch their own boards, but many hobbiest were happy to put away their ferric chloride. There’s another way to make PCBs, though, which is to mill them. [Tom Nixon] has made a small CNC mill for that purpose, and it’s rather beautiful.

In operation it’s a conventional XYZ mechanism, with a belt drive for the X and Y and a lead screw for the Z axis. The frame is made from aluminium extrusion, and the incidental parts such as the belt tensioners are 3D printed. The write-up is very comprehensive, and takes the reader through all the stages of construction. The brains of the outfit is a Creality 3D printer controller, but he acknowledges that it’s not the best for the job.

It’s certainly not the first PCB router we’ve seen, but it may be one of the nicer ones. If you make a PCB this way, you might like to give it professional-looking solder mask with a laser.

The Billionth Repository On GitHub is Really Shitty

Por: Jenny List
12 Junio 2025 at 20:00

What’s the GitHub repository you have created that you think is of most note? Which one do you think of as your magnum opus, the one that you will be remembered by? Was it the CAD files and schematics of a device for ending world hunger, or perhaps it was software designed to end poverty? Spare a thought for [AasishPokhrel] then, for his latest repository is one that he’ll be remembered by for all the wrong reasons. The poor guy created a repository with a scatalogical name, no doubt to store random things, but had the misfortune to inadvertently create the billionth repository on GitHub.

At the time of writing, the 💩 repository sadly contains no commits. But he seems to have won an unexpectedly valuable piece of Internet real estate judging by the attention it’s received, and if we were him we’d be scrambling to fill it with whatever wisdom we wanted the world to see. A peek at his other repos suggests he’s busy learning JavaScript, and we wish him luck in that endeavor.

We think everyone will at some time or another have let loose some code into the wild perhaps with a comment they later regret, or a silly name that later comes back to haunt them. We know we have. So enjoy a giggle at his expense, but don’t give him a hard time. After all, this much entertainment should be rewarded.

❌
❌