Community Computer Vision Course documentation

Welcome to the Community Computer Vision Course

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Welcome to the Community Computer Vision Course

Dear learner,

Welcome to the community-driven course on computer vision. Computer vision is revolutionizing our world in many ways, from unlocking phones with facial recognition to analyzing medical images for disease detection, monitoring wildlife, and creating new images. Together, we’ll dive into the fascinating world of computer vision!

Throughout this course, we’ll cover everything from the basics to the latest advancements in computer vision. It’s structured to include various foundational topics, making it friendly and accessible for everyone. We’re delighted to have you join us for this exciting journey!

On this page, you can find how to join the learners community, make a submission and get a certificate, and more details about the course!

Assignment 📄

To obtain your certification for completing the course, complete the following assignments:

  1. Training/fine-tuning a Model
  2. Building an application and hosting it on Hugging Face Spaces

Training/fine-tuning a Model

There are notebooks under the Notebooks/Vision Transformers section. As of now, we have notebooks for object detection, image segmentation, and image classification. You can either train a model on a dataset that exists on 🤗 Hub or upload a dataset to a dataset repository and train a model on that.

The model repository needs to have the following:

  1. A properly filled model card, you can check out here for more information
  2. If you trained a model with transformers and pushed it to Hub, the model card will be generated. In that case, edit the card and fill in more details.
  3. Add the dataset’s ID to the model card to link the model repository to the dataset repository.

Creating a Space

In this assignment section, you’ll be building a Gradio-based application for your computer vision model and sharing it on 🤗 Spaces. Learn more about these tasks using the following resources:

Certification 🥇

Once you’ve finished the assignments — Training/fine-tuning a Model and Creating a Space — please complete the form with your name, email, and links to your model and Space repositories to receive your certificate

Join the community!

We invite you to be a part of our active and supportive Discord community, where engaging conversations and shared interests flourish every day and where this course started. You will find peers with whom you can exchange ideas and resources. It is your source to collaborate, get feedback, and ask questions!

It is also a good way to motivate yourself to follow the course. Joining our community is an excellent way to stay engaged. Who knows what is the next thing we will build together?

As AI continues to advance, so does the quality of our discussions and the diversity of perspectives within our community. Upon becoming a member, you’ll have an opportunity to connect with fellow course participants, exchange ideas, and collaborate with others. Moreover, the contributors to this course are active on Discord and might help you when needed. Join us now!

Computer Vision Channels

There are many channels focused on various topics on our Discord server. You will find people discussing papers, organizing events, sharing their projects and ideas, brainstorming, and so much more.

As a computer vision course learner, you may find the following set of channels particularly relevant:

  • #computer-vision: a catch-all channel for everything related to computer vision.
  • #cv-study-group: a place to exchange ideas, ask questions about specific posts and start discussions.
  • #3d: a channel to discuss aspects of computer vision specific to 3D computer vision

If you are interested in generative AI, we also invite you to join all channels related to the Diffusion Models: #core-announcements, #discussions, #dev-discussions, and #diff-i-made-this.

What you will learn

The course is composed of theory, practical tutorials, and engaging challenges.

  • Theory Part : This section covers the theoretical principles of computer vision, explained in detail with practical examples.
  • Hands-on Tutorials : You will learn how to train and apply key computer vision models using Google Colab notebooks.

Throughout this course, we will cover everything from the basics to the latest advancements in computer vision. It is structured to include various foundational topics, giving you a comprehensive understanding of what makes computer vision so impactful today.


Before beginning this course, make sure that you have some experience with Python programming and are familiar with transformers, machine learning, and neural networks. If these are new to you, consider reviewing the first unit of the Hugging Face NLP course. While a strong knowledge of pre-processing techniques and mathematical operations like convolutions is beneficial, they are not prerequisites.

Course Structure

The course is organized into multiple units, covering the fundamentals and delving into an in-depth exploration of state-of-the-art models.

  • Unit 1 - Fundamentals of Computer Vision : this unit covers the essential concepts to get started with computer vision: the need for computer vision, the field’s basics, and its applications. Explore image fundamentals, formation, and preprocessing, along with key aspects of feature extraction.
  • Unit 2 - Convolutional Neural Networks (CNNs) : delve into the world of CNNs, understanding their general architecture, key concepts, and common pre-trained models. Learn how to apply transfer learning and fine-tuning to adapt CNNs for various tasks.
  • Unit 3 - Vision Transformers : explore transformer architecture in the context of computer vision and learn how they compare to CNNs. Understand common vision transformers such as Swin, DETR, and CVT, along with techniques for transfer learning and fine-tuning.
  • Unit 4 - Multimodal Models : understand the fusion of text and vision by exploring multimodal tasks like image-to-text and text-to-image. Study models such as CLIP and its relatives (GroupViT, BLIPM, Owl-VIT), and master transfer learning techniques for multimodal tasks.
  • Unit 5 - Generative Models : explore generative models, including GANs, VAEs, and diffusion models. Learn about their differences and applications in tasks such as text-to-image, image-to-image, and inpainting.
  • Unit 6 - Basic Computer Vision Tasks : cover fundamental tasks like image classification, object detection, and segmentation and the models used in them (YOLO, SAM). Gain insights into metrics and practical applications for these tasks.
  • Unit 7 - Video and Video Processing : examine the characteristics of videos, the role of video processing, and the challenges compared to image processing. Explore temporal continuity, motion estimation, and practical applications in video processing.
  • Unit 8 - 3D Vision, Scene Rendering, and Reconstruction : delve into the complexities of three-dimensional vision, exploring concepts like Nerf and GQN for scene rendering and reconstruction. Understand the challenges and applications of 3D vision in computer vision, and how it provides an even more comprehensive view of spatial information.
  • Unit 9 - Model Optimization : explore the critical aspects of model optimization. Cover techniques such as model compression, deployment considerations, and the usage of tools and frameworks. Include topics topics like distillation, pruning, and TinyML for efficient model deployment.
  • Unit 10 - Synthetic Data Creation : discover the importance of synthetic data creation using deep generative models. Explore methods like point clouds and diffusion models and investigate major synthetic datasets and their applications in computer vision.
  • Unit 11 - Zero Shot Computer Vision : delve into the realm of zero-shot learning in computer vision, covering aspects of generalization, transfer learning, and its applications in tasks such as zero-shot recognition and image segmentation. Explore the relationship between zero-shot learning and transfer learning across various computer vision domains.
  • Unit 12 - Ethics and Biases in Computer Vision : understand the ethical considerations specific to computer vision. Explore why ethics matter, how biases can infiltrate AI models, and the types of biases prevalent in these domains. Learn how to do bias evaluation and mitigation strategies, emphasizing responsible development and deployment of AI technologies.
  • Unit 13 - Outlook and Emerging Trends : explore current trends and emerging architectures . Delve into innovative approaches like Retentive Network, Hiera, Hyena, I-JEPA, and Retention Vision Models.

Meet our team

This course is made by the Hugging Face Community with love 💜! Join us by adding your contribution on GitHub. Our goal was to create a computer vision course that is beginner-friendly and that could act as a resource for others. Around 60+ people from all over the world joined forces to make this project happen. Here we give them credit:

Unit 1 - Fundamentals of Computer Vision

Unit 2 - Convolutional Neural Networks (CNNs)

Unit 3 - Vision Transformers

Unit 4 - Multimodal Models

Unit 5 - Generative Models

Unit 6 - Basic Computer Vision Tasks

Unit 7 - Video and Video Processing

Unit 8 - 3D Vision, Scene Rendering, and Reconstruction

Unit 9 - Model Optimization

Unit 10 - Synthetic Data Creation

Unit 11 - Zero Shot Computer Vision

Unit 12 - Ethics and Biases in Computer Vision

Unit 13 - Outlook and Emerging Trends

Organisation Team Merve Noyan, Adam Molnar, Johannes Kolbe

We are happy to have you here, let’s get started!

< > Update on GitHub