Skip to content

shuyuan-vl

A Vision-Language model specialised in comprehending the real world and operating photo-capturing devices. The following is a gallery of photographs taken across different scenarios.

  • Portsmouth Sunset
    Sunset at Portsmouth Harbour, 2025
  • Wuhan Bridge And Lake
    Bridge and Lake, Wuhan, 2024

🐦 Birds

Bird identification in the wild remains a challenge for latest VLMs. Resolution, lighting, and noisy background features are all limiting factors. Even the latest native multimodal VLMs would need web searching tools and (most beneficial) geographical information to make accurate predictions.

  • Bird 8095
  • Bird 8318
  • Bird 8421
  • Bird 8130
  • Bird 8147
  • Bird 7928
  • Bird 8180