Part one of of this article is about AI programming at a high level and where it might take us. This second part is about the practical lessons I learned in using it. Read this piece, take all of my advice as gospel, and by the end of the article, you’ll be an expert too.1
If you didn’t catch part one, I used AI to rewrite in Go a program that I wrote by hand in Java more than ten years ago. The project comprises about 15,000 lines of Go, plus another 5,000 lines of Python, JavaScript, and shell scripts. It took two weeks and change to write using AI, a small fraction of the time it took to hand-write the original.
The program does a simple thing very fast: it reads the X (formerly Twitter) firehose and extracts the newly emerging subjects. It does this on up to about 8,000 Tweets/second when reading from RabbitMQ, and at about 50,000 Tweets/second reading directly from disk files.
This program makes a good test project for using AI because of its diversity. It’s a high-performance program with lots of concurrency, lots of disk reads and writes, text parsing, tokenizing, some intense algorithmic processing, some esoteric graphing libraries, and a simple Web front end.
How The Subject Program Works
I think the observations might mean more with an idea of how the project works, but if it seems to tedious, feel free to jump to “Lessons Learned.”
The original Tweet data format is very wordy JSON in compressed files that each hold about five minutes of the decahose, or, about 150k Tweets (5 * 60 seconds * 500/second). There are about 5500 such files that are parsed into CSV offline. It feels reasonable to exclude this step when talking about the pipeline because it is trivial to gang up as many processors as you need to unpack JSON as fast as required. Nevertheless, the parsing is part of the project.
The main pipeline:
- Reads the CSV Tweets, either from RabbitMQ or directly from disk.
- Parses each Tweet into a Go struct.
- Tokenizes the words in the text, normalizes them with respect to capitalization, diacritics, etc., and discards junk words that carry no information, e.g. “a”, “the”, “lol”, etc.
- There is a fork in the pipeline after the tokenizing.
- Every token is put on a queue to a background process that continuously computes and refreshes a data structure that the main pipeline uses to partition the stream of tokens into F frequency-of-use classes.
- In the main pipeline, every token is tested using the frequency-of-use data structure and put onto one of F corresponding queues.
- Each of the frequency class queues is read by its own thread, which applies a linear-time heuristic that recognizes tokens that are being used unusually often. The magic is that these threads don’t need to do relative frequency calculations or any other global operations. This makes them fast.
- The F threads each pass the small sets of tokens that they identify from the current batch to a single analytics thread that finds all the Tweets in the most recent batch that use M or more of these unusually busy words. This is again, a linear time operation, because all it does is scan the current batch of Tweets.
- The analytics thread then applies a graph algorithm to identify one or more clusters among the resulting small set of Tweets. The clustering is based on similarity of the subsets of the current busy words they use, and each of the resulting clusters constitutes a subject. The graph algorithm is technically quadratic, not linear, but it operates on a tiny subset of the flow, so the computational cost is minor compared to the total time it takes to process a batch.
A batch might typically be from 5k to 20k Tweets, i.e., a few seconds of the full firehose.
The JSON formatted output consists of some metadata about the batch followed by a list of clusters and clusters-of-clusters of Tweets.
Each cluster constitutes a subject. When you look at one, it’s usually apparent that the cluster represents something people are Tweeting about.
Twenty thousand Tweets in a batch might yield 0, 1, or a dozen new subjects. Many will be variations on subjects that have been present and evolving over numerous batches in the last few minutes.
How Well Does the AI Version Work?
Both the original Java version and the Go version do a good job of finding the new subjects in the stream. The original Java version seemed very fast for such a fine grained analysis, handling about a quarter of the firehose on a laptop.
The new AI-programmed Go version absolutely smokes the original, handling around 50,000 Tweets per second in a similar environment–up to ten firehoses.
The limiting bottleneck by far is parsing the original JSON into CSV. Running a thread for each core on the iMac (4) we parse about 3000/second. But as discussed above, we treat this as an offline process so, enough said.
Surprisingly, the limiting factor for the analysis program running on CSV is also I/O in both of the input modes it supports. The core heuristic only takes up about 20% of the CPU.
- When the Tweets arrive via RabbitMQ, processing tops out at about 8000/second and profiling indicates that most of the time is spent waiting for input from Rabbit.
- When the program reads the Tweets directly from files on disk, it runs about 45,000/second on the iMac, and 55,000/second on a Linux laptop.
The Front End
The front end is simple, consisting of a server that reads the JSON output of the back end and presents it to a JavaScript browser interface. The browser offers a choice of display modes emphasizing different ways to look at the stream of new subjects over recent time.
Lessons Learned
The rest of the article is what I learned about the mechanics of using AI to program.
If you don’t want to read it all, I think the single most important observation is about AI’s effect on a programmer’s mental process.
Programmers chronically fall into the trap of thinking in code. We are prone to typing without thinking first because typing code is easier than disciplined thought and it gives you an instant reward. You may not understand the problem yet, but you feel like you’re at least making progress. The problem is, thinking in code is an expensive vice because code resists change in a way that thoughts do not.
One of the biggest benefits of AI is that using it changes the mental economics by forcing you to say what you want up front. Doing so requires clear thinking about the problem. When you are forced to say it first, you see gaps in your own understanding. This accelerates your short term results even as it produces a better base to extend from.
If you’re using it right, it makes you a better programmer, not just a faster one.
Dunning-Kruger
AI programming is like having a brain-damaged but tireless genius for an intern. AI makes it incomparably easier to program the boiler plate, glue code, I/O, parsing, unit tests, etc. and it can be a huge help in selecting and figuring out how to use esoteric APIs. You can tell it what you want out of the library, and it will figure out the API calls.
However, when it comes to the large scale logic of the program, AI reliably has digital Dunning-Kruger syndrome. It’s too stupid to know that it’s stupid and it will do things that are in obvious opposition to the larger sense of the program. When you point it out, it says essentially, “Oops, my bad,” and then does the same thing again.
John Henry
The claim has floated around that AI actually slows the best developers down. No way. It’s like John Henry versus the steam hammer, except that John Henry sort of won, even though it cost him his life. You will not win the contest. I don’t care how good a programmer you are–a person can’t even type as fast as the AI generates code.
If it’s not making you faster, you probably aren’t using it correctly. It’s not like assigning parts of the work to humans. It’s incredibly effective at the things it does well, but if you don’t understand its limitations and work with them, you can burn up a lot of the time you would otherwise save trying to figure out how it just broke the program that was working fine a few minutes ago. That problem mostly goes away when you get the hang of using it.
There are a few relatively narrow areas where getting AI to do a good job might be more trouble than just writing the code yourself. John Henry might have a chance competing to implement a novel algorithm that the AI can’t simply look up. That’s my theory, but I did not have that problem with the core heuristic in this program; I simply described how it was supposed to work and the AI wrote it correctly. Start to finish, it was just minutes.
When implementing at a scale beyond individual functions, AI does best with a stock design patterns it has seen before. This breaks down when the design is less standard, particularly if the logic spans multiple threads. It can usually write all the parts, but you need to keep your hand in when linking it all together. You quickly find that it has a limited grasp of how the parts interact.
Artificial “Intelligence”
“Intelligence” is an exaggeration. AI functions more like a magic database than a reasoning engine. Some experts say otherwise, but one suspects that the differences of opinion are more a matter of arcane techie language conventions and assumptions than of fundamental disagreement.
When I say “magic database” I mean that it has a vast knowledge of known ways to do things, specifications of libraries, what approach is most likely to be appropriate in a given situation, etc. But for the most part, it doesn’t actually reason (although that is apparently improving.)
You see immediately that AI is much better at language than logic. It can take the vague skeleton of a specification from you and flesh it out to produce the module that you had in mind. It can predict what would be in the blanks you left, it can gloss over misstatements and it silently resolves ambiguities. It can do this because its basic skill is guessing what comes next in a stream of language. That’s basically what LLM means.
The flip side of this is that because it is trying to fit your wishes to patterns of tokens that it is familiar with, it will try to jam your program into a Procrustean bed regardless of the fit. On this project it made a number of significant screwups that fit this model, failing to grasp the logic of the code as a whole.
There is a theme to the failure modes I encountered:
- AI will repeatedly re-implement the same code rather than re-use existing functions. It won’t notice that it’s writing the same thing again.
- It understands concurrency only at the line level.
- It tends to apply thread-safety constructs blindly, applying mutexes to thread safe data structures, using mutexes when an atomic update would do, applying protection where there can be no concurrent access, etc.
- It tried to do things that could cause deadlocks and definitely caused contention.
- Time and again it would violate carefully designed encapsulation and heedlessly bind threads together with unnecessary shared data structures. It does this even when you point it out.
- It has trouble figuring out what’s happening syntactically. For instance it would do poorly with inheritance but better with composition (inheritance doesn’t come up in this case.)
- The core of how this program works is that it detects normally very unusual words that are suddenly popping up together frequently in the same Tweets. The unusual words are the ones we care about. The AI could not grasp this idea, and throughout the project repeatedly tried to optimize away all the unusual words despite being repeatedly told not to.
The reason it’s so bad with concurrency is that it tends to view code as if through a keyhole. It makes choices that seem reasonable if you only see the surrounding few lines but would be obviously absurd in a larger context, which is almost always the nature of concurrent threads.
The Cost of Experimenting Goes Way Down
The AI/Go version of the program is strikingly better than my original hand-written Java program. Some of this is because I’d already worked out the solution so I understood it better the second time around. But I think the greater part of it was that the cost of experimenting was so low that I could try things.
Its database-like nature also naturally tends to favor best practices because that’s what makes it into articles and books. It’s less likely than a human to write hare brained idiosyncratic nonsense because it is copying textbook examples of how to do it right.
It Unlocks The Ability to Make Tools
I created a number of tools and analytic programs just to poke around in the data, achieving a level of understanding that would have been considerably more expensive to obtain by hand. If it wasn’t so easy, I wouldn’t have bothered. I know this is true because when I had to do it by hand the first time, I didn’t bother.
If You’re Doing It Right, It Makes You Smarter
It’s a sad truism that programs are only interesting to the person writing them. Entire protocols in software management exist to address the problem that your programming problems are boring as hell to anyone else. That’s why formalized code reviews, pair programming, and stand-ups are necessary.
Thinking requires dialog, even if there is nobody to talk to but yourself. We talked about “thinking in code” above. One of the reasons people do it is because it’s a workable substitute for the dialog that a programmer can otherwise only get in limited amounts from human colleagues.
But AI is your never tiring playmate. Not only does it know infinitely many things, you can go back and forth with it all day long and it never gets bored or snippy. It’s not just for getting answers or writing code–it’s also a way to hammer out half formed ideas.
Don’t think of AI as just a mental prosthetic. It can be, but it’s also a way to leverage your own intelligence. It can make you smarter. Tell it to argue with you, to take the Devil’s Advocate position, to tell you what the weaknesses of your plan are. And argue back. Your idea will get sharper.
Modularization Is Critical
One consequence of AI’s keyhole view of the world is that it does better in the small, which is true of human programmers too.
Modularization is key to success. Encapsulate rigorously and enforce it with warning comments that disallow changes to completed modules unless the AI gets your specific buy-in.
For me, early on my progress was derailed repeatedly when the AI snuck in data structures that were shared among threads, which crippled the program. It got out of hand because the AI is smart enough to make the violations thread-safe so they don’t crash the program, but they introduced a tremendous cost in unnecessary contention and delays.
A General Principle of Powerful Tools
AI shares a property that many powerful programming tools have: it makes good programmers better, but mediocre programmers worse.
With my sample size of one, I can’t prove this, but I’m confident that some of the more egregious errors that it made would have been very difficult for an inexperienced programmer to recognize. Far from reducing the need for human intelligence, using AI puts a premium on human intelligence.
I was very proud of the original Java program being able to process 2500 Tweets/second. That would be extremely hard to do on a single machine if natural language processing were involved.
The naive AI/Go version was significantly slower than the original at first, but profiling indicated some places where threads contending for data structures might be a problem. Fixing those things boosted its performance well past that of the original version. Going farther, the application of a little more human intelligence aided by AI boosted performance an order of magnitude, into the 50,000 Tweets/second range.
There is no way AI would have figured that out without human understanding, and I demonstrably had not figured it out in the old version without the AI. The lesson is, you have to work together.
How to Start
Getting my program started was easy. I just wrote a bullet-point list of the main data flow, where the configuration file should go, etc. The list included no real functionality–it was simply a sender program to read files and write the CSV lines to RabbitMQ, and a skeletal pipeline program to read the CSV lines from RabbitMQ and write them out.
With the primitive skeleton working end to end, it was easy to add and test pieces one by one.
AI also automates stupidity. I can’t overemphasize that it doesn’t understand the big picture. In fact, it doesn’t really understand anything–it’s a machine. The illusion of intelligence breaks down as you get beyond the individual module level. The flow must be very clear and each step must be very modularized and commented with how it connects to other steps.
This is why you should tell it to wrap tests around every function as you go. Not just unit tests but performance tests. Good tests mean the AI will break things if it makes changes.
Where The AI Gets Its Marching Orders
The rightmost panel in your Cursor session is the main place where you talk to the AI. You text it to tell it explicitly what you’re trying to do, or what you want to know. However, there are other important places where the AI gets input, and these often get short shrift.
SESSION_NOTES
I have a markdown file called SESSION_NOTES where I maintain several kinds of notes that I frequently tell the AI to read. Things like:
- Instructions to Cursor on how to behave. Things like:
- Never implement or change anything without my explicit permission
- Always explain exactly what you are going to change before you do it
- Don’t touch anything relating to concurrency without making the plan very clear
- A succinct description of the overall program will help both you and Cursor to keep your eyes on the ball.
- A TTD list with a detailed specification for each new feature. Speak as if explaining it to a dull-normal high school student. There is nothing to dumb to spell out.
- If you have a raft of upcoming features, describe them all. What you say about the others may inform its understanding of the one you are currently working on.
Comments
Cursor reads the comments in the code, so it’s good not only to comment with what the code does, but use comments that warn the AI not to change critical areas without explicit approval. I usually tell the the AI to write the comments: “Please comment function foo() saying what it does and warn yourself against making any changes.”
Data
You can paste output, error messages, snips from logs, the output of the profiler, etc. into the dialog box and the AI will read them and make inferences.
Your Git Porcelain
Look in your Git porcelain for changes that are suspicious. When you find something, add a fix-me asking for an explanation, then stage it, and diff with the staged changes.
Following what the AI does via Git is very helpful in developing an intuition for how it thinks.
The AI is a Sycophantic Suck-up
Whatever you say or ask for, no matter how dumb, the AI will assure you that you are brilliant. The danger is that part of you will believe it and seek further endorsement by doubling down on your stupid idea.
Tell the AI to question everything you tell it to do and to lay off the flattery. (At best it will tone it down slightly.) Put instructions to that effect in your SESSION_NOTES.md file. Repeat often.
The Sorcerer’s Apprentice
Your AI acts like an overexcited child helping you cook. Turn your back for one minute and it will mix curry powder and barbecue sauce into the cupcake batter.
You may only want to discuss a design issue, but if you fail to preface the your remark with “Don’t implement anything, but…” it may start rearranging the codebase while you try unsuccessfully to interrupt it. Putting this in your SESSION_NOTES and starting every session with “Read the first section of SESSION_NOTES” helps. Even so, the AI may just randomly decide to lock you out on the back porch while it mixes dishwashing detergent into the chicken soup.
Because its reasoning is fundamentally statistical, it is prone to making the same bad decision repeatedly. If you catch it making a conceptual mistake, detail it in a comment so it doesn’t do it again.
Manners Matter
Oddly enough, considering that it’s just a computer program, the way you talk to it matters. Showing courtesy to your toaster doesn’t make any difference, but for some reason, you get better results when you talk nicely to the AI.
Circling
The following situation frequently arises, especially late in the day: you have been making fantastic progress for hours, then progress seems to stop and the AI can’t get past some dumb thing that seems like it should be easy. It starts asking to insert debug statements again and again, putting you in a frustrating loop of thrashing around with more and more debug statements.
This is called circling. What is happening is, the AI has landed in some kind of high-dimensional local minimum that it can’t get out of. Its “context” has somehow become overloaded and it becomes blind to the obvious solution even though it’s right over there. I suspect this is similar to overtraining a neural net. Anyway, the more you try, the more stuck it gets. It often happens late in the day because the AI’s context has had time to become bloated.
This behavior can trap you indefinitely if you don’t recognize it. It seems like exactly the kind of thing that AI itself could learn to recognize, but it doesn’t. I don’t know why it can’t and neither did the AI when I asked it.
Fortunately, the fix is easy–if you even suspect it is happening, just start a new chat. What you thrashed over for an hour will often be solved in seconds.
Git
Git is critical. AI will sometimes randomly visit mayhem on the very code it wrote itself, so commit every time you add and test anything significant and comment the commits clearly.
Several time in the course of my project, the AI started circling and things quickly got to where neither I nor the AI knew what was happening. So I started a new chat and told Git to reset to the previous commit. Sometimes the brutal approach beats picking through diffs and trying to figure out nonsense.
Profiling
Like Git, profiling is a valuable tool. It is not just about speed. The AI is good at making sense of profiler output, which is handy. But analyzing it while talking to you makes the AI smarter about what you are doing. It considers your questions and folds them into its context.
Tests
AI will gladly fix a problem by modifying the test, just as people do.
LLMs and Static Typing
Go works great with AI because it’s statically typed, it has minimalist syntax, and it has good tooling.
Dynamic typing is tougher for AI because it can require a lot of logic to anticipate what a given variable might be used for.
LLM’s are mostly making statistical predictions of what you probably want, given what is already there. They make those guesses in the context of well-established design patterns. That means the AI can get in trouble if it has to infer the runtime behavior of code or if your design is non-standard because it’s not obvious what your conceptual model is. AI’s problems with dynamic typing and concurrency have this in common–they are both hard to map to well known, concise development patterns.
And That’s About It
That’s everything. Much of it boils down to recognizing that AI is mostly about language, not logic. The magic is its skill at fitting what you ask for into well understood computational models that it is familiar with from consuming vast amounts of technical literature. Where your purposes most easily fit into a predictable structure, it will do best. And where it isn’t obvious, it will do worse. The art seems to be structuring your requests in a way that suits those capabilities.
I’ve heard people sniff at using AI, like it’s cheating or something. You won’t hear that from me. I found the experience intoxicating. I’ve done this for decades, and frankly, the tedium of the mechanics of coding had begun to outweigh the interesting part, which is exploring computation itself.
I don’t understand why anyone would program without it. It’s the future. How are you going to keep them down on the farm after they’ve been to the city?
1 That’s a joke, right?


