Skip to main content

Command Palette

Search for a command to run...

Image recognition

Updated
4 min read
Image recognition
R

I am a recent graduate at the beginning of my software development career. I enjoy documenting my learnings through my blogs

This feature illustrates AI’s ability to understand and pick things out from an image using Microsoft.Extensions.AI.

This post will walk you through getting OpenAI to summarize a single image. You should end up with and output like this:

AI image summarization result for a single image

Then we will go a step further to extract more specific data form a set of images, aand you should see an output similar to this:

AI image recognition result for Traffic images

AI image recognition result for Animal images

Step 1

Create a simple console app in whichever IDE you prefer.

Step 2

Install the required Nuget packages and add them to the top of your Program.cs file:

// Package Microsoft.Extension.AI (9.1.0-preview.1.25064.3)
// Package Microsoft.Extensions.OpenAI (9.1.0-preview.1.25064.3)
// Package Microsoft.Extensions.Abstractions (9.1.0-preview.1.25064.3)
// Package Microsoft.Extensions.DependencyInjection (9.0.1)
// Package Microsoft.Extensions.Hosting (9.0.1)
// Package Microsoft.Extensions.Logging (9.0.1)

using Microsoft.Extensions.AI;
using OpenAI; 
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Hosting;

Step 3

Set up the required app configurations and your AI chat completion client (IChatClient). I am using OpenAI for this example and I have my API key saved as an environment variable, you can change the model to whichever model you want to use.

💡
The main aspect required here is the IChatClient and model you want to use. The other configurations are not required, but if you want to add extra functionality later like logging or function calling this will be needed.
var hostBuilder = Host.CreateApplicationBuilder(args);
hostBuilder.Configuration.AddEnvironmentVariables("OPENAI_API_KEY");

IChatClient innerChatClient = new OpenAIClient(Environment.GetEnvironmentVariable("OPENAI_API_KEY"))
    .AsChatClient("gpt-4o-mini");

hostBuilder.Services.AddSingleton<IChatClient>(innerChatClient);

hostBuilder.Services.AddLogging(builder => builder
    .AddConsole()
    .SetMinimumLevel(LogLevel.Trace));

var app = hostBuilder.Build();
var chatClient = app.Services.GetRequiredService<IChatClient>();

Step 4

Set up your user message with a simple query like “What’s in this image?”.

var message = new ChatMessage(ChatRole.User, "What's in this image?");

Then add your image using Contents.Add and add a new ImageContent like so:

message.Contents.Add(new ImageContent(File.ReadAllBytes("<path to your image"),"image/jpg"));

Step 5

Once you’ve set up your user message you will need to call your chat client (OpenAI) and give it the message with the image for context.

var response = await chatClient.CompleteAsync([message]);

Then write out the AI’s response to the console

Console.WriteLine(response.Message.Text);
💡
If you wanted to stream the message like ChatGpt does, step 5 will look a bit different. Instead of uisng CompleteAsync you will use CompleteStreamAsync and iterate over each word (token) of the response. You will also need to use Console.Write for this instead of Console.WriteLine, as WriteLine will add a new line for each token.
var responseStream = chatClient.CompleteStreamingAsync([message]);
await foreach (var response in responseStream)
{
    Console.Write(response.Text);
}

Further Steps

The above steps will be used, however step 5 will be used differently, and this will not be stream the message, as it is predefined.

Step 6

Retrieve the folder of your images

var dir = Path.Combine(AppContext.BaseDirectory, "<path to your image folder>");

Step 7

Iterate over the files in the folder and select files with a specific extension, e.g. “.jpg” (this can be chnaged to whatever extension you want). The code inside the foreach loop will follow in Steps 8 - 11.

foreach (var imagePath in Directory.GetFiles(dir, "*.jpg"))
{
    ...
}

Step 8

Get the image name without it’s extension, this will then be added to your user message

    var name = Path.GetFileNameWithoutExtension(imagePath);

    var message = new ChatMessage(ChatRole.User, $$"""
                                                   Extract information from this image from camera {{name}}.
                                                   """);

Adding the image file will be the same as before, but instead of passing in the full folder path you will pass in the foreach loop’s variable imagePath

    message.Contents.Add(new ImageContent(File.ReadAllBytes(imagePath), "image/jpg"));

Step 9

Create an object class to store the relevant data

class TrafficCamResult
{
    public TrafficStatus Status { get; set; }
    public int NumCars { get; set; }
    public int NumTrucks { get; set; }

    public enum TrafficStatus
    {
        Clear,
        Flowing,
        Congested,
        Blocked
    };
}

Another example I did was with animals:

class AnimalsResult
{
    public int NumAnimals { get; set; }
    public int NumDogs { get; set; }
    public int NumCats { get; set; }
    public int NumRacoons { get; set; }
    public int NumMonkeys { get; set; }
    public int NumRedPandas { get; set; }
    public int NumMeerkats { get; set; }
}

Step 10

The above class will be used as the type parameter for the AI’s response (CompleteAsync), again add the user message to the chat completion.

    var response = await chatClient.CompleteAsync<TrafficCamResult>([message]);

Step 11

Lastly get the results of the response and output them to the console.

    if (response.TryGetResult(out var result))
    {
        Console.WriteLine($"{name} status: {result.Status} (cars: {result.NumCars}, trucks: {result.NumTrucks})");
    }

This is a good base point for using AI to extract data from images, and there are so many use cases for this feature.

Repo

References