image captioning

Paper Summary #12 - Image Recaptioning in DALL-E 3

Technical Paper: Improving Image Generation with Better Captions OpenAI’s Sora is built upon the image captioning model which was described in quite some detail in the DALL-E 3 technical report. In general, in text-image datasets, the captions omit background details or common sense relationships, e.g. sink in a kitchen or stop signs along the road. They also omit the position and count of objects in the picture, color and size of the objects and any text present in the image.