2/21/2023 0 Comments Opencl benchmark tool windows![]() ![]() I was able to scale it up to 256 GPUs and 2048 CPUs showing tremendous improvement over the sequential program or any other existing work at that time. I have thoroughly investigated the problems related to parallelizing the algorithm and successively implemented a parallel multi-GPU version of the MCTS algorithm using CUDA on the TSUBAME 2.0 supercomputer. It combines the generality of random simulation with the precision of tree search. It is a method for making optimal decisions in artificial intelligence (AI) problems, typically move planning in combinatorial games. My PhD dissertation described a very effective parallelization scheme a novel, approximate Monte Carlo Tree Search (MCTS) algorithm. ULP-HPC: Ultra Low-Power, High-Performance Computing via Modeling and Optimization of Next Generation HPC Technologies, CREST, JSTĪs a PhD student I was a member of ULP-HPC (Ultra Low-Power, High-Performance Computing via Modeling and Optimization of Next Generation HPC Technologies) project. ![]() 0.3ĬUDA + OpenGL Windows demo application (as presented at SC12)įlopsCUDA - CUDA Benchmarking Tool (with Kepler GPU support)ĭownload : Windows executable ( Screenshot), Linux source ( Screenshot) 0.62 ()ĭownload Windows binary (CUDA/OpenCL, Multi-GPU support) ( Video), based on the Linux code - ()ĬUDA/OpenCL/SSE/AVX/Xeon Phi - v. Logo - High Performance Parallel GPU-based TSP Solver ( Screenshot)ĭownload latest source code (OSX, Linux - CUDA/OpenCL/SSE/AVX/Xeon Phi support v. ![]() My parallel TSP solver is available here. Using even a simple 2-opt optimization technique, the time needed to perform a single local search operation can be decreased approximately 5 to 45 times compared to a similar parallel CPU version. GPU usage greatly decreases the time needed to optimize a route, however requires a complicated and well tuned implementation. Research was focused on the classic TSP problem. Parallel GPU accelerated discrete optimization (Traveling Salesman Problem) It is expected that by 2020 the fastest supercomputer in the world will have billions of cores and personal computers will be as powerful as currently fastest machines which thousands of processors. In the modern world, being limited by power and space, parallelization is the only way to run algorithms quicker and graph algorithms are no exception. Therefore novel parallel algorithms are required. It should be expected that computer hardware will become even more complex, surely not simpler. The main goals of my work were to investigate the impact of the new parallel architectures such as Xeon Phi, GPUs and platforms like OpenCL and CUDA on problems which is typically very hard to parallelize. It combines my parallel computing, GPU programming and supercomputing with very important, yet hard fundamental problems in computer science such as combinatorial optimization and AI. In my opinion, this research can have a great impact now and in the near future because of the rapidly increasing parallelism of computer hardware. I am interested in all types on new parallel, low-power architectures and the methods of programming them. My research explores new ways of speeding up these algorithms by utilizing modern architectures and novel parallel algorithms. Moreover, parallelization of such algorithms in general always poses a challenge. In particular, the graph algorithms are not well suited for SIMD architectures due to their complexity and branching. The recent trend in hardware towards multi-core and GPU-accelerated heterogeneous environments creates a big challenge for algorithm designers. Post-petascale programming tools and applications, CREST, JST ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |