Research:
Side channel analysis on GPUs
Side channel attack exploits the physical implementation of such cryptographic system, rather than the inherent theoretical weaknesses of the algorithm for encryption. One of the most widely known SCA techniques called Differential Power Analysis targets the correlation between several intermediate results in a particular encryption algorithm and records the corresponding power consumption to identify the otherwise secret information such as encryption key. In our work, we use the GPU as the co-processor for encryption and perform SCA on this medium by taking a sample of tens of thousands of power traces from the GPU to extract the secret key. We use the GPU as the co-processor encryption, as well as to analyze the large dataset for power analysis.
'Hetero-Mark': Benchmarking HSA with OpenCL 2.0
Today's most commonly used benchmarking suites such as Rodinia, Parboil do not fully utilize the advanced heterogeneous features exist in emerging hardware and runtime models. We target to develop a benchmarking suite 'Hetero-Mark' with real world applications that leverage the features of HSA platform as well as OpenCL 2.0 through applications from various domains including signal processing, cybersecurity, machine learning, BigData etc. All the applications in the suite include an OpenCL 1.2 as baseline and an OpenCL 2.0 and HSA 1.0 implementations.
Floating point arithmetic on heterogenous architecures
I have worked on finding potential sources of
reliability and portability deficiencies in parallel scientific
code, specifically written in OpenCL, that are due to dependencies
of the floating-point behavior on the underlying heterogeneous
architecture. These issues are important for ensuring reliability in
medical applications.
3D CT reconstruction using GPUs
I have worked on accelerating 3D conebeam reconstruction using CUDA and OpenCL. An image algorithm to reconstruct conebeam computed tomography using two dimensional projections is implemented using GPUs. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and creates the final three dimensional constructions. This is implemented on two types of hardware: CPU and a heterogeneous system combining CPU and GPU. The CPU codes in C and MATLAB are compared with the heterogeneous versions written in CUDA-C and OpenCL. The relative performance is tested and evaluated on a mathematical phantoms as well as mouse scan data.