Packt+ | Advance your knowledge in tech

You're reading from Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA Effective techniques for processing complex image data in real time using GPUs

Product type Paperback

Published in Sep 2018

Publisher Packt

ISBN-13 9781789348293

Length 380 pages

Edition 1st Edition

Languages

C++

Tools

CUDA

Concepts

Computer Vision

Author (1):

Vaidya

View More author details

Table of Contents (20) Chapters

Title Page

Packt Upsell

Contributors

Preface

1. Introducing CUDA and Getting Started with CUDA FREE CHAPTER

2. Parallel Programming using CUDA C

3. Threads, Synchronization, and Memory

4. Advanced Concepts in CUDA

5. Getting Started with OpenCV with CUDA Support

6. Basic Computer Vision Operations Using OpenCV and CUDA

7. Object Detection and Tracking Using OpenCV and CUDA

8. Introduction to the Jetson TX1 Development Board and Installing OpenCV on Jetson TX1

9. Deploying Computer Vision Applications on Jetson TX1

10. Getting Started with PyCUDA

11. Working with PyCUDA

12. Basic Computer Vision Applications Using PyCUDA

1. Assessments

2. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Summary

This chapter explained the launch of multiple blocks with each having multiple threads from kernel function. It also showed the method of choosing these two parameters for a large value of threads. It also explained the hierarchical memory architecture that can be used by CUDA programs. The memory nearer to thread execution is fast and as we move away from it memories get slower. When multiple threads want to communicate with each other then CUDA provides the flexibility of using shared memory by which threads from same blocks can communicate with each other. When multiple threads use same memory location then there should be synchronization between this memory access otherwise the final result will not be as expected. We have also seen use of atomic operation to accomplish this synchronization. If some parameters are remaining constant throughout the kernel execution then it can be stored in constant memory for speed up. When CUDA programs exhibit a certain communication pattern...