Introduction to Parallel Programming and CUDA with Sample Code

To give an example, let’s say we have an array that contains thousands of floating-point integers and each value needs to be run through a lengthy algorithm. Instead of running each value through the algorithm consecutively (i.e. one at a time), parallelism allows multiple values to be processed simultaneously (i.e. running many values through the algorithm at the same time), reducing overall processing time and producing fast and accurate results.

There are some restrictions with using parallelism and not every program can be done in parallel. For instance, let’s say we have that same program from before but this time as we process a value we want to then check the currently processed value against all the previously calculated values in the array, before going to the next. We can confidently say all previous values in the array have been processed and are available to be accessed for the check. If we tried to do this in parallel, we could have incorrect data because multiple values are calculated at the same time and some may be ready for checking while others are not. Extra checks and steps are needed to prevent these types of concurrency issues. However, the results could still prove to be worth the extra steps.

One of the major breakthroughs in parallel programming technology today goes beyond the scope of just multi-core CPU’s. Although they do offer a lot more power and potential than single-core units, another common computer component, the GPU, offers even more power, and NVIDIA’s flagship product, called CUDA, offers this technology to all developers easily and for free.

CUDA was developed by NVIDIA to provide simple access to GPGPU (General-Purpose Computation on Graphics Processing Units) and parallel computing on their own GPU’s. The logic behind the idea is that GPU’s have much more processing power than CPU’s and have numerous cores that operate in parallel to run intensive graphics operations. By allowing developers to utilize this power for their own projects, it can create fast solutions to some heavy and time-consuming programs, specifically those that run the same process recursively and independently of other processes.

The learning curve is not very steep for most developers. CUDA accomplishes making GPGPU easily usable by adding functionality to the standard C and C++ programming languages. This allows for fast adoption by almost any programmer and helps with cross-platform integration.

To get started with CUDA, you will need a recent NVIDIA GPU (Geforce 8 series and beyond, or you can check on the NVIDIA website to see which GPU’s are CUDA enabled). CUDA works on Windows, Mac OSX, and certain Linux distributions. You will need to download and install the developer drivers, the CUDA toolkit, and the CUDA SDK off the Nvidia website, respectively.

NVIDIA provides an installation guide on their website which provides more details about the installation process, as well as a method of checking the installation to see if it is working.