oneAPI – The Cross-Architecture, Multi-Vendor Path to Accelerated Computing

Accelerator Adoption Will Thrive With Software Standardization

Accelerator technologies are receiving more attention throughout the computing infrastructure, from the endpoint to the data center. User needs, ranging from managing the explosion of data to time-critical business processes, have driven this interest for exponentially greater computation in a slowly growing energy budget.

While the conversation for the past decade focused on the use of programmable graphics accelerators (GPUs), accelerator architectures today are far more diverse. Specialized accelerators for artificial intelligence (AI), high-performance computing (HPC), cryptography, data movement and IO, and telecommunications currently or architecture in development use a diverse set ofs: superscalar, vector, dataflow/spatial, matrix, as well as emerging neuromorphic and quantum. Beyond architectural diversity, accelerator implementations also vary from the traditional IO connected devices to coherent memory devices to devices tightly integrated into the CPU complex.

A key challenge to mass adoption of accelerators is the process, time, cost, and maintenance of developing accelerator software. A key advantage of accelerators is that they are designed to perform a specific task more efficiently. While degrees of general programmability vary – from GPUs programmable for data-parallel compute to purpose-built ASICs for a specific task like data hashing or deep learning – the programming tools and models for these accelerators are mostly specific to each individual device. While device specialization allows optimized performance for a specific task, the investment made by developers for one accelerator may not port to others.

New technology often achieves initial success with vendor-specific, closed emerging architectures, but for the technology to become pervasive, multi-vendor standards and eventually supersede the proprietary pioneer. Many factors contribute to this natural cycle including the economic consequences of vendor lock-in; the technical and economic risks of relying on a single vendor; and the cost of building, deploying, and maintaining solutions. This is further complicated by the need for employees with rare skills. According to, customers, end-users, software developers, and communities have approached Intel seeking a standardized solution for accelerated computing.

What Is oneAPI?

oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across processor and accelerator architectures.

oneAPI consists of three distinct elements: an open specification, open-source implementations, and Intelenhanced implementations. The oneAPI specification, based on existing standards, was developed in open collaboration with community developers. Open-source implementations of oneAPI component specifications were created to enable fast adoption of oneAPI for new hardware architectures or software languages. oneAPI responds to the needs of developers to:

  • Avoid the economic and technical disadvantages of single-vendor, single-architecture software;
  • Be productive while delivering performance on diverse hardware; and,
  • Rely on a trustworthy development environment now and in the future.

The complete specification and links to unencumbered, open-source code repositories are available in the oneAPI community website. The oneAPI industry initiatives further collaboration on the specification and compatible oneAPI implementations across the ecosystem.

Intel’s oneAPI products implement the oneAPI specification and add other standards-based tools and components for software developers. Intel’s oneAPI products are available either as individual components or easy-to-access toolkits, including the Intel® oneAPI Base Toolkit with all major oneAPI components, and domain-optimized toolkits for HPC, AI and analytics, Internet-of-Things (IoT) , and advanced rendering.

Trustworthy Foundation Built on Open Standards

oneAPI leverages Intel’s rich heritage of high-performance compilers and libraries to build an open standard which reduces the risks associated with deploying accelerated computing. For example, the Intel® oneAPI Math Kernel Library (oneMKL) specification incorporates decades of broad use and refinement of the Intel® Math Kernel Library, the most widely used high-performance math library in the industry according to the Evans Data 2021 Developer Survey1. Moreover, oneAPI is interoperable with existing HPC programming standards like Fortran, C/C++, OpenMP and MPI, as well as Python and a rich set of optimized Python libraries to allow easy integration with legacy code. Developers also have a set of advanced debuggers and profilers to analyze performance and correctness and assist with offloading existing code to accelerators. There are even tools2 to aid migration of proprietary CUDA code to the open SYCL language. These advantages helped oneAPI receive the HPCwire Readers’ Choice Award for Best HPC Programming Tool or Technology3 in November 2021. By adhering to open standards, oneAPI not only eases acceleration of existing code but also provides developers confidence that their code will run on future architectures.

Freedom From Proprietary Lock-In

oneAPI offers an open alternative to proprietary “walled gardens” that is free from lock-in to a single vendor or hardware architecture. It gives the hardware and software developer community the ability to deliver new accelerators without the burden of establishing yet another developer ecosystem. For example, Fujitsu used the Intel® oneAPI Deep Neural Network Library(oneDNN)4 to achieve leading MLPerf benchmark performance5 on the Fugaku supercomputer. Plus, oneAPI allows system developers to choose the best architecture for their specific problem, and to benefit from accelerator innovations across a range of hardware solutions. Developers can spend their time creating the next scientific or market breakthrough instead of rewriting software for the next hardware platform.

Productivity and Performance to Realize the Full Value of Cutting-Edge Accelerated Hardware

oneAPI simplifies software development and lowers development and maintenance costs by allowing developers to maintain a single codebase that delivers performance on multiple accelerator architectures. Optimizing compilers and library implementations of the APIs allow developers to take advantage of the innovative features in the latest hardware.

A key to success for oneAPI is the choice to adopt the SYCL language developed by the Khronos Group. SYCL extends modern C++ with support for heterogeneous parallelism, unlike languages ​​designed for a single accelerator architecture. oneAPI also provides standard interfaces to established libraries like oneMKL. Third-party developers have successfully interfaced vendor-optimized accelerator libraries for non-Intel CPUs and GPUs to oneAPI using common open-source development practices.

Finally, the open nature of oneAPI provides a path to productive innovation. From the open interfaces to the open source implementation and infrastructure, oneAPI empowers developers to independently innovate and develop new language frontends and new hardware backends through productive, modern, and open methods.

oneAPI offers a faster path to deploying performant applications across a variety of accelerators, unlike hardwarespecific models limited to one architecture or interpreted languages ​​which trade productivity for performance.

In Summary

Technologies need standards to scale from niche to volume. The developer community asked for an alternative to proprietary programming models for accelerated computing, and oneAPI delivered. The oneAPI initiative provides the open, trustworthy path to developer choice.

In the upcoming weeks, we will publish papers, blogs, and articles that provide deeper looks into oneAPI, its technical benefits, and the opportunity to engage with oneAPI as a developer or industry adopter.


1Global Development Survey 2021, Volume 1, Evans Data Corporation
2Examples: Intel® DPC++ Compatibility Tool and https://www.iwocl.org/wp-content/uploads/iwocl-2019-dhpcc-tobias-stauber-resyclator-transforming-cuda-C-source-code-into-sycl. pdf
3HPCwire Reveals Winners of the 2021 Readers’ and Editors’ Choice Awards During SC21
4HPC and AI Initiatives for Supercomputer Fugaku and Future Prospects and A Deep Dive into a Deep Learning Library for the A64FX Fugaku CPU
5MLPerf benchmark performance

.

Leave a Comment