AI03, Deep learning trend

Back to the previous page
List of posts to read before reading this article

Researcher
Framework
Cloud Platform
Hardware
CUDA(Compute Unified Device Architecture) Programming

Researcher

LeCun ｜ Hinton ｜ Fei-Fei Li ｜ Krizhevsky

Framework

Past

Caffe2 ｜ MatConvNet

Present

Tensorflw ｜ Pytorch

Cloud Platform

AWS ｜ Google Cloud Platform ｜ Microsoft Azure

Enterprise IT(legacy IT) > Infrastructure(as a Service, IaaS) > Platform(as a Service, PaaS) > Software(as a Service, SaaS)

Hardware

NVIDIA(GTX, Titan, TESLA)

CUDA(Compute Unified Device Architecture) Programming

hayun’s lecture

$ apt update
$ apt upgrade
$ apt install nvidia-driver-[version]
$ nvidia-smi                             # hardware information
$ lsmod | grep nvidia                    # kernel module

cuda
URL

cudnn
URL

Present situation of CUDA and deep learning framework

Multiprocessing system and programming model concept for using CUDA/cuDNN, by grasping graphics card and hardware model in relation to Caffe, Caffe2, Tensorflow.

CUDA DeviceInfo
1D/2D Matrix sum, product based on CUDA
Optimizing parallel reduction
Code review for Caffe, Caffe2, Tensorflow

GPU Memory usage

Deep understanding of the principle of data transfer between CPU and GPUs, and the characteristics of the different kinds of memory used by GPUs, and we learn efficient memory utilization techniques and implement them into CUDA.

메모리 동기/비동기 복사 공유 메모리/고정 메모리 복사 글로벌 메모리/Zero-Copy 메모리 복사 통합 메모리 복사

GPU Memory and stream usage

Stream concept and access technique for maximizing resource utlization of GPU, with implementation.

메모리 정합/정렬 액세스 메모리 뱅크 충돌과 패딩 회피 데이터 전송 스트림과 이벤트 구현 스트림 동기화 구현

CUDA debugging profiling, cuDNN usage

CUDA 프로그램을 실질적으로 디버깅하거나 성능 최적화하는 방법을 이해하고 이때 사용하는 도구들을 활용해봅니다. 병렬처리 성능을 극대화시키기 위해cuDNN 을 학습한 다음, 효율적인 Convolution연산을 위한 GEMM(General Matrix Multiplication) 알고리즘을 학습하고 직접 구현해봅니다.

[실습] CUDA 디버깅 도구 활용 [실습] CUDA 시각화/프로파일링 도구 활용 [실습] GEMM 구현 [실습] CUDA/cuDNN 기반 Convolution Layer

Implement of deep learning with CUDA

CUDA및 cuDNN을 활용해서 MaxPooling, Activation, FC 레이어 등을 구현해보고, 이들을 통합해서 Object Detection을 위한 YOLO v2를 구현해봅니다.

[실습] CUDA/cuDNN 기반 MaxPooling Layer [실습] CUDA/cuDNN 기반 Activation Layer [실습] CUDA/cuDNN 기반 FullyConnected Layer [실습] YOLO v2 구현

Implement of custom layers based on CUDA/cuDNN

CUDA/cuDNN 구현 테크닉들을 기반으로, Caffe/Caffe2/Tensorflow에서 각각 적용해볼 수 있는 자신만의 사용자 정의 레이어를 구현해봅니다.

[실습] Caffe Custom Layer(CPP, CUDA, cuDNN)구현 [실습] Caffe2 Custom Operator(CPP, CUDA, cuDNN)구현 [실습] Tensorflow Custom Operator(CPP, CUDA, cuDNN)구현

List of posts followed by this article

Reference

6626070
2997924

AI03, Deep learning trend

Contents

Researcher

Framework

Cloud Platform

Hardware

CUDA(Compute Unified Device Architecture) Programming

hayun’s lecture

Present situation of CUDA and deep learning framework

GPU Memory usage

GPU Memory and stream usage

CUDA debugging profiling, cuDNN usage

Implement of deep learning with CUDA

Implement of custom layers based on CUDA/cuDNN

6626070 2997924

AI03, Deep learning trend

Contents

Researcher

Framework

Cloud Platform

Hardware

CUDA(Compute Unified Device Architecture) Programming

hayun’s lecture

Present situation of CUDA and deep learning framework

GPU Memory usage

GPU Memory and stream usage

CUDA debugging profiling, cuDNN usage

Implement of deep learning with CUDA

Implement of custom layers based on CUDA/cuDNN

6626070
2997924