Udemy | Introductory Computer Vision Course | Taxonomy

Taxonomy Of Computer Vision

Reading Resources

Kaivalya Vanguri
3 min readJust now
Drawn By Author inspired from Szeliski

Difference Between Vision and Graphics

Computer vision focuses on interpreting and understanding images or video to extract meaningful information, essentially making sense of what a camera “sees.”
In contrast, computer graphics deals with generating visual content, creating images or 3D models from abstract data.

  • Vision → From real-world to data; builds on Images via Processing to understand Geometry and Photometry
  • Graphics → From data to real-world-like representations; grasps insights from Geometry and Photometry to create images

Graphics is about creation, whereas vision is about understanding.

Relationship between Images, Geometry and Photometry

  • Images (2D) provide 2D data for processing.
  • Geometry (3D) aids in understanding 3D or spatial structures from 2D inputs.
  • Photometry (light and color properties) studies light interaction, critical for tasks like texture mapping or scene rendering.

Images are 2D limiting their ability to represent depth.

Sampling and Aliasing:

  • Sampling: Converts continuous visual data into discrete form (e.g., pixel grids).
  • Aliasing: Undesired distortions caused by insufficient sampling resolution, leading to artifacts like jagged edges in images.

Image Processing:

Image processing transforms images for tasks like noise reduction, contrast enhancement, and edge detection. It serves as the first step in most vision pipelines.

Segmentation:

Segmentation divides an image into meaningful regions, such as separating a foreground object from its background. It is crucial for object detection and recognition.

Objects on my plate after Segmentation

Computational Photography:

This area combines computer vision and graphics to enhance or create images, such as High Dynamic Range (HDR) imaging or panoramas.

Computational Photography my phone camera uses to capture images

Recognition:

Recognition tasks identify objects, faces, or patterns in images. It involves training machine learning models to classify and label visual content.

With Facebook Deep Face in Keras Face Recognition with Facebook DeepFace in Keras — Sefik Ilkin Serengil

Geometry is 3D

Geometric Image Formation:

Focuses on how 3D objects are projected into 2D images using principles like perspective and camera models.

Feature Based Alignment:

Aligns images by detecting and matching features (e.g., corners or edges) between frames. Common in stitching and motion tracking.

Structure From Motion:

Determines the 3D structure of a scene from multiple 2D images captured from different viewpoints.

Stereo Correspondence:

Uses images from two slightly different perspectives to determine depth by matching corresponding points.

Shape Recovery:

Infers the 3D shape of an object using geometric and photometric clues, often for object modeling or augmented reality.

Photometry

Photometric Image Formation:

Explains how light interacts with objects, affecting color, brightness, and shading in images.

Texture Recovery:

Reconstructs surface textures from images, used in applications like 3D modeling and game design.

Things that float between 2D and 3D

Feature Detection:

Identifies key points or regions in images, like corners, blobs, or edges, which are crucial for tasks like alignment and tracking.

Motion Estimation:

Estimates the movement of objects or the camera itself by analyzing sequential frames. Applications include video stabilization and object tracking.

Stitching:

Combines overlapping images to create a seamless larger image or panorama, commonly used in computational photography.

Image Based Rendering:

Creates realistic images or videos by manipulating existing images rather than modeling scenes from scratch, bridging the gap between graphics and vision.

Hope this helps in understanding the various terms commonly used in Computer Vision. Thank you for the read.

--

--

No responses yet