Gemini and Djinnis

2025-02-16 Sun
ai nlp

I hit a weird problem with Google's Gemini 2.0 this morning- I asked it to transcribe some images of hand-written notes into a CSV file and had it swap some of the lines. As someone that does a lot of data engineering, I find this especially dangerous because the output was convincing enough I almost didn't catch the error in the actual numbers. This makes me wonder how much bad data is going to flow into important records in the near future.

Transcribing Power Bills

This morning my wife and I were talking about how our power bills have gone up a bit in the last few years. When I asked if we had this in a spreadsheet somewhere, she said no, but pointed me at 5 pages of handwritten notes she had made over the years that concisely summarized all the data I'd need to make some plots. It occurred to me that this was exactly the kind of thing that AI should be able to do so I pulled up Gemini, typed the below prompt, and proceeded to take pictures of the notes.

I would like to convert 5 pictures of notes about my pg&e
gas bills into a csv file with the following columns: date,
dollar amount, electric amount, gas amount, PG&E code, Human
notes, and conversion notes. The human notes should be any
comments that were written in the entry. The conversion
notes should be any problems you came across while translating
a particular entry.

I then sent a picture of each page, with a statement about how this was the nth page. The response from the first picture was very reassuring: it printed out a CSV file that looked like everything was in the right place. The next few pictures appended more data to the CSV. Gemini said something went wrong on the last two pictures, so I switched to my chromebook, started a new conversation, and repeated the process until I got to the end. It even gave me a link for downloading the CSV file so I didn't have to copy paste it. I moved over to Google sheets, imported the CSV, and started asking Gemini to remind me how to do a pivot in sheets so I could look at the years in a monthly timeline.

Transcription Problem

The Sheets spreadsheet plots had some issues because of missing cell entries, so I went back and started manually inserting values. This is fine, as I'm happy to use any tool that automates 90% of the brute force work. However, as I went through the table, I noticed two problems. First, the gas numbers had some bad values because of my wife's handwriting (eg "g .55" became 8.55). This Brazil-like mistake is still fine, as the errors would be caught with some range checking. The bigger problem was that Gemini swapped a few of the data values in the lines:

A few mistakes

This problem is significant because it's hard to detect. It's also very puzzling because the image has a lot of guides to help keep things aligned: the original text is structured, each line has blue lines to guide the reader, and there isn't anything obvious to me that would trip it up. In fact if I just send the one page, it transcribes the data just fine. That makes me wonder if it's something to do with appending knowledge from one prompt to the next.

Dealing with Djinnis

This kind of problem is what leads to the endless prompt engineering cycle that is what makes AI so difficult to use these days in tasks that require accuracy. Should I just send 5 tasks to transcribe the images and merge them myself? Should I tell it valid ranges for the fields? Should I take a video and scan all the pages in one shot?

Prompt engineering feels like you're dealing with a Djinni that's granted you three wishes. There's always a loophole that wrecks your commands, so you spend more and more time perfecting additional clauses to your prompt to try to convince it to do the thing you want. In the end I'm not sure you can ever be certain that it's not stabbing you in the back in some unexpected way.