Multimodality and Large Multimodal Models (LMMs)

Open challenges in LLM research Sampling for Text Generation

Multimodality and Large Multimodal Models (LMMs)

For a long time, each ML model operated in one data mode �� text (translation, language modeling), image (object detection, image classification), or audio (speech recognition).

However, natural intelligence is not limited to just a single modality. Humans can read and write text. We can see images and watch videos. We listen to music to relax and watch out for strange noises to detect danger. Being able to work with multimodal data is essential for us or any AI to operate in the real world.

...

View more on Chip Huyen's website »

1 like · Like • 0 comments • flag

Published on October 09, 2023 17:00

No comments have been added yet.

Chip Huyen's Blog

Chip Huyen's profile
4065 followers