[Paper Review] You Only Look Once: Unified, Real-Time Object Detection

A new approach to object detection.

Prior work on object detection repurposes classifiers to perform detection

Instead, we frame object detection as a regression problem to spatially separated bounding boxed and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.

Humans glance at an image and instantly know what objects are in the image, where they are, and how they interact.

You Only Look Once (YOLO)

YOLO 는 object detection의 여러 구성 요소를 하나의 신경망으로 통합한다. 이 네트워크는 이미지 전체의 특징을 사용하여 각 bounding box를 예측하며, 이미지 내 모든 class에 대한 bounding box을 동시에 예측한다. 이를 통해 global prediction 이 가능하다.

Model Architecture

입력 이미지를 S x S 그리드로 나고, 객체의 중심이 특정 grid cell에 위치하면, grid cell은 object dection 을 한다.
B개의 bounding boxes와 해당 box 의 confidence score, class probability를 예측한다.

bounding box 는 (x y, w, h) 로 표현된다. 이는 각각 grid cell 내 중심 좌표와 이미지 크기에 대한 상대적인 너비와 높이를 나타낸다.

Network Design

GoogLeNet model 아키텍처 기반으로 하며 , 24개의 Convolutional layers and 2 fully connected layers.
Fast YOLO uses a neural network with fewer convolutional layer (9) and fewer filters in those layers.

Comparison to Other Detection System

Deformable parts Models (DPM)

DPM은 object detection에 sliding window 방식을 사용한다.

DPM uses a disjoint pipeline to extract static features, classify regions, predict bounding boxes for high scoring region.

* Sliding window: 고정 사이즈의 window가 이동하면서 윈도우 내에 있는 데이터를 이용해 문제를 풀이하는 알고리즘

R-CNN

R-CNN 은 sliding window 대신 region proposal 방식을 사용하여 객체를 탐지한다. Selecteive search을 통해 잠재적인 경계 상자를 생성하고 , ConvNet 를 이용해 특징을 추출하고 , SVM으로 bounding box를 평가한다.

bounding box를 조정하고 non-max suppression을 통해 duplicate detection을 제거한다.

각 단계가 독립적으로 수행되며, 전체 시스템이 매우 느리다.

*Region proposal: 이미지에서 유사한 color/texture 등을 가지고 있어 object가 있을 것 같은 영역을 잡아내는 것

But, YOLO 는 이러한 disparate part 를 a single convolutional neural network 로 대체한다. 이러한 unified architecture leads to faster, more accurate model than other detection system.

'공부 > Deep Learning' 카테고리의 다른 글

[Paper Review] DUSt3R: Geometric 3D Visioin Made Easy+ MASt3R: Grounding Image Matching in 3D (0)	2025.01.08
[Paper Review] RoMa : Robust Dense Feature Matching (0)	2025.01.08
[Paper Review] Deep Convolutional Neural Models for Picture-Quality Prediction (0)	2024.05.15
[Paper Review] Quality-aware Pre-trained Models for Blind Image Quality Assessment (0)	2024.04.02
[Paper Review] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (0)	2024.03.17

힘빠진 컴공

[Paper Review] You Only Look Once: Unified, Real-Time Object Detection

You Only Look Once (YOLO)

'공부 > Deep Learning' 카테고리의 다른 글

티스토리툴바

[Paper Review] You Only Look Once: Unified, Real-Time Object Detection

You Only Look Once (YOLO)

'공부 > Deep Learning' 카테고리의 다른 글

관련글

티스토리툴바