Image recognition with LLM

06 Apr

Recognizing the growing use of LLMs in image recognition, we decided to investigate their applicability to our task. This led us to compare an LLM approach with traditional machine learning methods.

Our recent project involved creating an image recognition model capable of determining the state of electrical fuses (installed or not installed) in electrical boxes. As illustrated below, the image shows two connected phases, each with an electrical fuse in the “installed” state. The model we trained would identify these as two phases, both “installed.”

We employed the Google Vision service for training, leveraging a large dataset provided by the customer and conducting the entire training process in the cloud.

So, when we tried using LLMs for recognizing the fuses, we found that only a handful of the big-name models can even handle images. We tested all the ones we could get our hands on at the time. First off, we had to tell the model what it was looking at in the picture and what we wanted it to do. We also told it how we wanted the results to be laid out. Then, we looked at the results to see how accurate each model was. We also played around with two common tricks: zero-shot and few-shot learning. With zero-shot, we just showed the LLM the picture, described the task, and told it how to give us the answer. Few-shot was a bit different. We started by showing the LLMs a few example pictures along with the correct answers. Then, we showed it new pictures and gave it the same task description and output format.

Turns out, the few-shot method worked way better with all the LLMs we tried. The downside is that it sends more data (what they call “tokens”) to the LLM, so it ends up costing a bit more than the zero-shot approach. Here’s a quick look at how the two methods looked, along with the models we used and their accuracy:

Model	Zero shot cost	Zero shot precision [%]	Few shot cost	Few shot precision [%]
gpt-4o	$0.0135	58	$0.04	85
gpt-4o-mini	$0.0008	21	$0.0024	58
gemini-2.0-flash-lite-preview-02-05	$0.0002	21	$0.0006	75

Now, if we compare these numbers to what we got with the original Google Vision setup, you can see that the regular machine learning way is still a bit better than using LLMs. But as you can see, the difference isn’t huge.

It looks like using LLMs shows some real potential, but it depends on what you’re trying to do. In our fuse example, building a specific machine learning model cost us around 2000€ just for the tech stuff, not even counting what we paid the developers. If a customer only needs to use this model maybe 1000 times a year, it’s way cheaper to just use an LLM and pay per use. That would be like 1000 * 0.04 = 400€ a year. But, there’s definitely a point where if you’re using the model a ton, it makes more sense to just train your own.

Also, it’s worth pointing out that each LLM out there can give you different accuracy. For what we were doing, it seemed like the Gemini-2.0-flash-lite model gave us the best value for money in terms of price and how well it worked.

At this point, it might seem like just using an LLM is the way to go for something like this. But we’ve got to ask ourselves a few questions. Do we want to host the LLM ourselves so we have control over our data, or are we okay with using a service like OpenAI? And how fast do we need to get our results? It’s no secret that LLMs take way longer to give you an answer compared to the old-school methods.

All these questions, and probably a few more, are going to decide whether an LLM or regular machine learning is the best fit for what a customer needs.

No Comments

TAGS : AI image recognition LLM

AI, non-tech, projects

Image recognition with LLM

06 Apr

Related Post

27 Jun

Using AI in the Software Development Process

05 Apr

The Future of Warehouse Management Lies In Artific

19 Dec

Christmas are coming!

Leave a Comment Cancel reply

Recent Posts

Image recognition with L

Image recognition with L

The Importance of Code R

IT Freelancer vs. Softwa

Using AI in the Software

Contact us

Where to find us

Connect with us