Porting COBOL Code and the Trouble With Ditching Domain Specific Languages

Whenever the topic is raised in popular media about porting a codebase written in an ‘antiquated’ programming language like Fortran or COBOL, very few people tend to object to this notion. After all, what could be better than ditching decades of crusty old code in a language that only your grandparents can remember as being relevant? Surely a clean and fresh rewrite in a modern language like Java, Rust, Python, Zig, or NodeJS will fix all ailments and make future maintenance a snap?
For anyone who has ever had to actually port large codebases or dealt with ‘legacy’ systems, their reflexive response to such announcements most likely ranges from a shaking of one’s head to mad cackling as traumatic memories come flooding back. The old idiom of “if it ain’t broke, don’t fix it”, purportedly coined in 1977 by Bert Lance, is a feeling that has been shared by countless individuals over millennia. Even worse, how can you ‘fix’ something if you do not even fully understand the problem?
In the case of languages like COBOL this is doubly true, as it is a domain specific language (DSL). This is a very different category from general purpose system programming languages like the aforementioned ‘replacements’. The suggestion of porting the DSL codebase is thus to effectively reimplement all of COBOL’s functionality, which should seem like a very poorly thought out idea to any rational mind.
Sticking To A Domain
The term ‘domain specific language’ is pretty much what it says it is, and there are many of such DSLs around, ranging from PostScript and SQL to the shader language GLSL. Although it is definitely possible to push DSLs into doing things which they were never designed for, the primary point of a DSL is to explicitly limit its functionality to that one specific domain. GLSL, for example, is based on C and could be considered to be a very restricted version of that language, which raises the question of why one should not just write shaders in C?
Similarly, Fortran (Formula translating system) was designed as a DSL targeting scientific and high-performance computation. First used in 1957, it still ranks in the top 10 of the TIOBE index, and just about any code that has to do with high-performance computation (HPC) in science and engineering will be written in Fortran or strongly relies on libraries written in Fortran. The reason for this is simple: from the beginning Fortran was designed to make such computations as easy as possible, with subsequent updates to the language standard adding updates where needed.
Fortran’s latest standard update was published in November 2023, joining the COBOL 2023 standard as two DSLs which are both still very much alive and very current today.
The strength of a DSL is often underestimated, as the whole point of a DSL is that you can teach this simpler, focused language to someone who can then become fluent in it, without requiring them to become fluent in a generic programming language and all the libraries and other luggage that entails. For those of us who already speak C, C++, or Java, it may seem appealing to write everything in that language, but not to those who have no interest in learning a whole generic language.
There are effectively two major reasons why a DSL is the better choice for said domain:
- Easy to learn and teach, because it’s a much smaller language
- Far fewer edge cases and simpler tooling
In the case of COBOL and Fortran this means only a fraction of the keywords (‘verbs’ for COBOL) to learn, and a language that’s streamlined for a specific task, whether it’s to allow a physicist to do some fluid-dynamic modelling, or a staff member at a bank or the social security offices to write a data processing application that churns through database data in order to create a nicely formatted report. Surely one could force both of these people to learn C++, Java, Rust or NodeJS, but this may backfire in many ways, the resulting code quality being one of them.
Tangentially, this is also one of the amazing things in the hardware design language (HDL) domain, where rather than using (System)Verilog or VHDL, there’s an amazing growth of alternative HDLs, many of them implemented in generic scripting and programming languages. That this prohibits any kind of skill and code sharing, and repeatedly, and often poorly, reinvents the wheel seems to be of little concern to many.
Non-Broken Code
A very nice aspect of these existing COBOL codebases is that they generally have been around for decades, during which time they have been carefully pruned, trimmed and debugged, requiring only minimal maintenance and updates while they happily keep purring along on mainframes as they process banking and government data.
One argument that has been made in favor of porting from COBOL to a generic programming language is ‘ease of maintenance’, pointing out that COBOL is supposedly very hard to read and write and thus maintaining it would be far too cumbersome.
Since it’s easy to philosophize about such matters from a position of ignorance and/or conviction, I recently decided to take up some COBOL programming from the position of both a COBOL newbie as well as an experienced C++ (and other language) developer. Cue the ‘Hello Business’ playground project.
For the tooling I used the GnuCOBOL transpiler, which converts the COBOL code to C before compiling it to a binary, but in a few weeks the GCC 15.1 release will bring a brand new COBOL frontend (gcobol) that I’m dying to try out. As language reference I used a combination of the Wikipedia entry for COBOL, the IBM ILE COBOL language reference (PDF) and the IBM COBOL Report Writer Programmer’s Manual (PDF).
My goal for this ‘Hello Business’ project was to create something that did actual practical work. I took the FileHandling.cob
example from the COBOL tutorial by Armin Afazeli as starting point, which I modified and extended to read in records from a file, employees.dat
, before using the standard Report Writer feature to create a report file in which the employees with their salaries are listed, with page numbering and totaling the total salary value in a report footing entry.
My impression was that although it takes a moment to learn the various divisions that the variables, files, I/O, and procedures are put into, it’s all extremely orderly and predictable. The compiler also will helpfully tell you if you did anything out of order or forgot something. While data level numbering to indicate data associations is somewhat quaint, after a while I didn’t mind at all, especially since this provides a whole range of meta information that other languages do not have.
The lack of semi-colons everywhere is nice, with only a single period indicating the end of a scope, even if it concerns an entire loop (perform
). I used the modern free style form of COBOL, which removes the need to use specific columns for parts of the code, which no doubt made things a lot easier. In total it only took me a few hours to create a semi-useful COBOL application.
Would I opt to write a more extensive business application in C++ if I got put on a tight deadline? I don’t think so. If I had to do COBOL-like things in C++, I would be hunting for various libraries, get stuck up to my gills in complex configurations and be scrambling to find replacements for things like Report Writer, or be forced to write my own. Meanwhile in COBOL everything is there already, because it’s what that DSL is designed for. Replacing C++ with Java or the like wouldn’t help either, as you end up doing so much boilerplate work and dependencies wrangling.
A Modern DSL
Perhaps the funniest thing about COBOL is that since version 2002 it got a whole range of features that push it closer to generic languages like Java. Features that include object-oriented programming, bit and boolean types, heap-based memory allocation, method overloading and asynchronous messaging. Meanwhile the simple English, case-insensitive, syntax – with allowance for various spellings and acronyms – means that you can rapidly type code without adding symbol soup, and reading it is obvious even as a beginner, as the code literally does what it says it does.
True, the syntax and naming feels a bit quaint at first, but that is easily explained by the fact that when COBOL appeared on the scene, ALGOL was still highly relevant and the C programming language wasn’t even a glimmer in Dennis Ritchie’s eyes yet. If anything, COBOL has proven itself – much like Fortran and others – to be a time-tested DSL that is truly a testament to Grace Hopper and everyone else involved in its creation.