I have moved to Gemini mostly because their limits on the number of OCRs in a given time is higher, so I haven’t been kicked out for overuse, as I was on ChatGPT. Gemini generally seems to produce better results, but I can’t say if that’s due to the system or my improving scanning technique.
What’s interesting is that both engines get the same things wrong and produce the same incorrect output. I want to say that it really looks like they are using the same underlying logic and language models. In any event, here are some issues that both get wrong and you should look for:
-
by far the biggest problem is that both systems will confuse the $ for an S, but only in variable names. So if you have a program that uses string variables - which is most of them - then you need to carefully look through the program for every instance of a string variable and change that “AS” to “A$”. They do not seem to get confused when the $ appears in a keyword, so the “SEG$” and “LEFT$” all seem to come out fine, it’s only when it doesn’t match the internal keyword list that this happens.
-
the second biggest problem is that they renumber your lines. It is not clear to me what triggers this. I thought Gemini didn’t do this, but my very next attempt at scanning was full of this issue. Note that it only changes the numbers at the start of the line, not those that appear within the statements, so it will change line “1010” to “990”, but it won’t change “GOTO 1010” to “GOTO 990”. For this reason, I suggest having it OCR only small blocks of code, 30 or 40 lines at a time, so when this does happen you don’t have too much work, and the next block will start at the right number anyway.
-
0’s are confused for 8’s, but not in the line numbers at the front of the line, so you only see this more to the right in the code. Check GOTOs and such for “2408” where it should be “2400”
-
They often get confused in places where logic equations are found. “A<1” will randomly change into “A>=1” and so forth. There appears to be no reason for this, the quality of the text is not the issue.
-
both remove semicolons from PRINT statements. Not all of them, but most of them. This is OK in many dialects, but I’m trying to preserve the original code as closely as possible so this is one I look for.
-
whitespace will be removed at random in various places, including string constants, REM statements, and so forth. If you are trying to preserve WS, you’ll have a bunch of editing to do here.