Image to Text - ZnapAI Documentation

Multimodal language models can accept both text prompts and images as input.

Vision (Image-to-Text)

Pass both text and image content to a multimodal model.

import { generateText } from 'ai';
import { openai, MODEL } from './client';

async function analyzeImage() {
  const base64Image =
    'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwAEhQGAhKmMIQAAAABJRU5ErkJggg==';

  const response = await generateText({
    model: openai(MODEL),

    messages: [
      {
        role: 'user',
        content: [
          {
            type: 'text',
            text: 'Describe this image.',
          },
          {
            type: 'image',
            image: base64Image,
            mimeType: 'image/png',
          },
        ],
      },
    ],
  });

  console.log(response.text);
}

Parameters

When using vision models with generateText, the following parameters are supported:

model

LanguageModel

required

The model instance to use for generation.

messages

Message[]

required

Array of message objects representing the conversation history. For vision, pass type: 'image' along with the image data (URL or Base64) and mimeType.

temperature

number

Controls randomness (0.0 to 2.0).

maxTokens

number

The maximum number of tokens to generate.

topP

number

Nucleus sampling probability.

topK

number

Limits sampling to the top K probable tokens.

presencePenalty

number

Encourages the model to talk about new topics.

frequencyPenalty

number

Prevents the model from repeating words.

seed

number

Attempts deterministic generation.

stopSequences

string[]

Custom sequences that stop the model from generating further text.

​Vision (Image-to-Text)

​Parameters

Vision (Image-to-Text)

Parameters