Multimodal
Multimodality refers to a model's ability to understand and generate content using various input types—such as text, images, audio, and video.
VisionImagesSpeech
All recipes25
Image Evals for Image Generation and Editing Use Cases
EvalsImagesVision
Jan 29, 2026Realtime Eval Guide
AudioEvalsResponsesSpeech
Jan 25, 2026Gpt-image-1.5 Prompting Guide
ImagesVision
Dec 16, 2025Transcribing User Audio with a Separate Realtime Request
AudioSpeech
Nov 20, 2025Realtime Prompting Guide
AudioResponsesSpeech
Aug 28, 2025Generate images with high input fidelity
Images
Jul 17, 2025Using Evals API on Image Inputs
EvalsImages
Jul 15, 2025Practical guide to data-intensive apps with the Realtime API
AudioSpeech
May 29, 2025Image Understanding with RAG
ImagesResponsesVision
May 16, 2025Context Summarization with Realtime API
AudioSpeechTiktoken
May 10, 2025ElatoAI - Realtime Speech AI Agents for ESP32 on Arduino
AudioSpeech
May 1, 2025Comparing Speech-to-Text Methods with the OpenAI API
Agents SDKAudioSpeech
Apr 29, 2025Generate images with GPT Image
Images
Apr 23, 2025Processing and narrating a video with GPT-4.1-mini's visual capabilities and GPT-4o TTS API
ResponsesSpeechVision
Apr 22, 2025Building a Voice Assistant with the Agents SDK
AudioResponsesSpeech
Mar 27, 2025Multi-Language One-Way Translation with the Realtime API
AudioSpeech
Mar 24, 2025Using GPT4 Vision with Function Calling
ChatVision
Dec 13, 2024Optimizing Retrieval-Augmented Generation using GPT-4o Vision Modality
CompletionsVision
Nov 12, 2024Vision Fine-tuning on GPT-4o for Visual Question Answering
CompletionsFine-tuningVision
Nov 1, 2024How to parse PDF docs for RAG
EmbeddingsVision
Sep 29, 2024How to combine GPT4o mini with RAG to create a clothing matchmaker app
EmbeddingsVision
Jul 18, 2024Using GPT4o mini to tag and caption images
EmbeddingsVision
Jul 18, 2024Introduction to GPT-4o and GPT-4o mini
CompletionsVisionWhisper
Jul 18, 2024Data Extraction and Transformation in ELT Workflows using GPT-4o as an OCR Alternative
CompletionsVision
Jul 9, 2024CLIP embeddings to improve multimodal RAG with GPT-4 Vision
EmbeddingsVision
Apr 10, 2024