We are your Digital Ally™
Image recognition with LLM
AI

Image recognition with LLM

We compared Large Language Models against traditional machine learning for detecting electrical fuse states in images. Here is what we found about accuracy, cost, and when each approach makes sense.

Adam HarnúšekApril 6, 2025

Recognising the growing use of Large Language Models in image recognition, we decided to investigate their applicability to a real project task. This post covers the business perspective — for the technical implementation details, see Image recognition with LLM: A dev's how-to.

The Problem

Our task was to create a model that could determine whether electrical fuses were installed or not installed within electrical boxes.

Electrical fuse example

For the initial machine learning approach, we used Google Vision service trained on a substantial customer-provided dataset hosted in cloud infrastructure.

Testing LLMs for Image Recognition

When we turned to LLMs, we found that relatively few major models support image analysis. We evaluated the available options using two distinct approaches:

Zero-Shot Learning

We presented the model with images and a description of the task along with the desired output format — no examples provided. This is the simplest and cheapest approach.

Few-Shot Learning

We provided several example images with correct answers before presenting the new task. This consistently produced better results across all models, though at higher token costs due to the additional images in the prompt.

Results

ML result statistics

ModelZero-shot CostZero-shot PrecisionFew-shot CostFew-shot Precision
gpt-4o$0.013558%$0.0485%
gpt-4o-mini$0.000821%$0.002458%
gemini-2.0-flash-lite-preview-02-05$0.000221%$0.000675%

Traditional machine learning achieved superior accuracy, but it required a substantial upfront investment of roughly €2,000 in development effort alone.

When Does Each Approach Make Sense?

The right choice depends on your specific situation:

Use traditional ML when:

  • You need maximum accuracy
  • You have a large labeled dataset available
  • The model will be used frequently at scale
  • Latency is critical (LLMs are significantly slower at inference)

Use LLMs when:

  • Usage is infrequent — at around 1,000 runs per year, gpt-4o few-shot costs approximately €400 vs. ~€2,000 for custom ML development
  • You need a quick prototype without dataset collection
  • You want to avoid upfront development investment

Key considerations:

  • Data privacy: Self-hosting vs. sending data to external APIs like OpenAI or Google
  • Response speed: LLMs have significantly higher latency than a locally deployed traditional model
  • Model selection: For this task, gemini-2.0-flash-lite offered the best pricing-to-performance ratio among the models we tested

Conclusion

LLMs are a genuinely viable option for image recognition tasks, particularly when usage volume is low and avoiding upfront development cost matters. The few-shot approach with gpt-4o reached 85% precision — close enough to traditional ML for many real-world applications. The optimal choice depends on your project's usage frequency, latency tolerance, data privacy requirements, and budget constraints.

© 2026