Do you want to explore the AI (Artificial Intelligence) on the edge and looking for a suitable embedded hardware for the same?
Recently, I got interested in AI on the edge and tried to find suitable hardware to start experimenting, instantly I got to know that thousands of people are searching for AI hardware on Google search but the information is scattered and lacking an article or blog which shares a list of useful AI hardware one can choose from. So, I decided to write one.
Background on Computing for AI
AI algorithms takes a lot of computing power to execute and that’s the push for the need of such hardware accelerator which can solve this problem. Although, the compute power required will largely depend on the use case at hand.
More and more companies are working on building AI hardware chips because of its ability to solve complex problems once considered impossible and today, there are many options already available in the market to explore AI on the edge without spending too much money.
The list of AI hardware options listed below are a mix of complete Edge Computing board to an addon AI accelerator which could be connected to another embedded computing board over the USB.
List of popular AI Hardware:
1. Intel® Neural Compute Stick 2 ($79)
It’s built on the Intel® Movidius™ Myriad™ X VPU which features the neural compute engine—a dedicated hardware accelerator for deep neural network inferences. With more compute cores than the original version and access to the Intel® Distribution of OpenVINO™ toolkit, the Intel® NCS2 delivers approx 8X performance boost over the previous generation.
The Intel Distribution of OpenVINO™ toolkit is the default software development kit to optimize performance, integrate deep learning inference, and run deep neural networks (DNN) on Intel® Movidius™ Vision Processing Units (VPU).
The Intel® Distribution of OpenVINO™ toolkit includes two sets of optimized models that can expedite development and improve image processing pipelines for Intel® processors. Use these models for development and production deployment without the need to search for or to train your own models. Full list of models at: Pretrained Models
2. Nvidia AI Platforms ($129+)
NVIDIA Jetson: The AI platform for autonomous everything.
NVIDIA® Jetson™ systems provide the performance and power efficiency to run autonomous machines software, faster and with less power. Each is a complete System-on-Module (SOM), with CPU, GPU, PMIC, DRAM, and flash storage—saving development time and money.
Jetson is also extensible. Just select the SOM that’s right for the application, and build the custom system around it to meet its specific needs.
Jetson Xavier NX
Nvidia launched in Nov 2019, a compact form factor (credit card sized – 45 mm x 69.6 mm ) system on module for Artificial Intelligence applications on the edge.
It uses 6-core NVIDIA Carmel ARM®v8.2 64-bit CPU and 384-core NVIDIA Volta™ GPU with 48 Tensor Cores. Performance as mentioned by NVIDIA is 21 TOPS @ 15Watts and 14 TOPS @ 10Watts.
There no separate carrier board launched for Jetson Xavier NX, so one need to use NVIDIA Jetson Nano carrier board.
Jetson nano is based on NVIDIA Maxwell™ architecture with 128 NVIDIA CUDA® cores.
Jetson X2 is based on NVIDIA Pascal™ architecture with 256 NVIDIA CUDA cores.
Jetson AGX Xavier is based on NVIDIA Volta™ architecture with up to 512 NVIDIA CUDA cores and up to 64 Tensor cores.
All three generations of Jetson solutions are supported by the same software stack, enabling companies to develop once and deploy everywhere. The Jetson platform is supported by the Jetpack SDK, which includes the board support package (BSP), Linux operating system, NVIDIA CUDA®, and compatibility with third-party platforms. DeepStream SDK enables developers to quickly build and deploy efficient video analytics pipelines on Jetson.
Google has launched a development board to quickly prototype on-device ML products. Scale from prototype to production with a removable system-on-module (SoM).
It has on-board Edge TPU coprocessor which is capable of performing 4 trillion operations (tera-operations) per second (TOPS), using 0.5 watts for each TOPS (2 TOPS per watt). For example, it can execute state-of-the-art mobile vision models such as MobileNet v2 at 400 FPS, in a power efficient manner.
You can see Edge TPU performance benchmarks here.
4. BeagleBone® AI ($125)
BeagleBone® AI fills the gap between small SBCs and more powerful industrial computers. Based on the Texas Instruments AM5729, developers have access to the powerful SoC with the ease of BeagleBone® Black header and mechanical compatibility.
BeagleBone® AI makes it easy to explore how artificial intelligence (AI) can be used in everyday life via the TI C66x digital-signal-processor (DSP) cores and embedded-vision-engine (EVE) cores supported through an optimized TIDL machine learning OpenCL API with pre-installed tools. Focused on everyday automation in industrial, commercial and home applications.
- Dual Arm® Cortex®-A15 microprocessor subsystem
- 2 C66x floating-point VLIW DSPs
- 2.5MB of on-chip L3 RAM
- 2x dual Arm® Cortex®-M4 co-processors
- 4x Embedded Vision Engines (EVEs)
- 2x dual-core Programmable Real-Time Unit and Industrial Communication SubSystem (PRU-ICSS)
- 2D-graphics accelerator (BB2D) subsystem
- Dual-core PowerVR® SGX544™ 3D GPU
- IVA-HD subsystem (4K @ 15fps encode and decode support for H.264, 1080p60 for others)
5. GAPUINO GAP8 Developer Kit ( $229)
GAP8 is a System-on-a-Chip that enables massive deployment of low-cost intelligent devices that capture, analyse, classify and act on fusion of rich data sources such as images, sounds or vibrations. GAP8 integrates everything necessary to acquire from sensor, pre-process, analyse and act on rich data sources integrated into a single device. This allows GAP8 to have an energy efficiency that is compatible with operation for years on batteries and a system cost that enables massive deployment of embedded, intelligent devices.
Optimized for the execution of signal processing and machine learning algorithms on intelligent edge devices
- Autonomous operation using a battery or energy harvesting
- Fully programmable in C/C++
- 200 MOPS at 1mW
- Minimum 2µA standby current
- Integrated design results in low system cost
- >8 GOPS at a few tens of mW
Kendryte K210 Based Module & Development Boards
6. Sipeed modules & Boards ($8 Module, $25+ for Dev. Boards)
Sipeed Module: Using Kendryte’s AI chip K210 as the core unit, K210 is fully pin-out, with strong performance, small size (25.4 * 25.4 mm), low price (<$8), improve hardware design efficiency, reduce hardware design difficulty, and increase the anti-interference ability with shielded case
- M1: 8MB SRAM on chip, 16MB Flash built in module
- M1W: M1 based with WiFi (ESP8285) module
- Dual-core 64-bit processor with hardware floating-point operation, up to 800MHz frequency (the highest supported frequency is based on the development board design)
- Built-in 8MB (6MB + 2MB) RAM, 16MB Flash
- Commonly used peripherals such as I2C, SPI, I2S, WDT, TIMER, RTC, UART, GPIO, DVP, DMAC, etc.
- Unique programmable IO array (ie FPIOA, peripherals can be mapped to any pin) for more flexible product design
- With machine vision capabilities
- With machine hearing and speech recognition, built-in voice processing unit (APU)
- With convolutional artificial neural network hardware accelerator KPU, high performance convolution artificial neural network operation
- Fast Fourier Transform Accelerator (FFT Accelerator)
- Hardware AES encryption and decryption, Secure Hash Algorithm Accelerator SHA256
7. Grove AI HAT for Edge Computing ($28.9)
The Grove AI HAT for Edge Computing is built around Sipeed MAix M1 AI MODULE with Kendryte K210 processor inside. It’s a low cost but powerful raspberry pi AI hat which assists raspberry pi run the AI at the edge, it also can work independently for edge computing applications.
The MAix M1 is a powerful RISC-V 600MHz AI module that features dual-core 64-bit CPU, 230 GMULps 16-bit KPU(Neural Network Processor), FPU(Float Point Unit) supports DP&SP, and APU(Audio Processor) supports 8 mics.
8. Kendryte K210 AI Development Board ($49.99)
The Kendryte K210 is a system-on-chip (SoC) that integrates machine vision and machine hearing. Using TSMC’s ultra-low-power 28-nm advanced process with dual-core 64-bit processors for better power efficiency, stability and reliability. The SoC strives for ”zero threshold” development and to be deployable in the user’s products in the shortest possible time, giving the product artificial intelligence.Kendryte K210 is intended for the AI and IoT markets, but is also a high-performance MCU.
Kendryte in Chinese means researching intelligence. The main application field of this chip is in the field of Internet of Things. The chip provides AI solutions to add intelligence to this.
- Machine Vision
- Machine Hearing
- Better low power vision processing speed and accuracy
- KPU high performance Convolutional Neural Network (CNN) hardware accelerator
9. Orange Pi AI Stick Lite ($19.99)
Orange Pi uses Lightspeeur®
SPR2801S AI Accelerator chip which has High Energy Efficiency: 9.3 TOPs/Watt, Ultra Low Power: 2.8 TOPs @300mW. The Best Peak Performance is 5.6 TOPs @100MHz
Chip has SDIO3.0 eMMC 4.5 interfaces and comes in a BGA (7mm*7mm) package.
USB stick supports USB 3.0, it can run on various OS: X86 Linux(Ubuntu 16.04) ARM Linux ARM Android (ARM v7，ARM v8).
As per their website, it is support libraries like Caffe, PyTorch, (TensorFlow will be supported soon).
10. Plai™ Plug 2803 ($69.99)
The Plai™ Plug features a low-power and high performance Lightspeeur® 2803S accelerator chip that provides on-device and real-time inferencing capability. Developer website: https://dev.gyrfalcontech.ai/introduction/
- Connector: USB 3.0 Type A plug
- Dimensions: 66.5 x 20.5 x 10.8mm
- Operating Temperature: 0° – 40°C
- Supported Frameworks: Caffe, TensorFlow
- TOPS: 16.8, Power: 700mW, Efficiency: 24 TOPS/Watt
11. RK1808 AI Compute Stick ($86)
It is equipped with Rockchip’s RK1808 neural network processor. It has low power consumption and high performance, and can be applied to various application fields of artificial intelligence.
Equipped with RK1808 NPU, computing power up to 3.0 Tops.
Supported OS: Windows, Linux, macOS, Arm Linux.
Edge TPU BM1880
BM1880 is SoC ASIC chip for Deep Learning inference acceleration focusing on edge application. BM1880 TPU can provide 1TOPs peak performance for 8-bit integer operation.
You can read an interesting comparison of Intel Neural Compute Stick, Google Coral Board and Nvidia Jetson here in the article by Soon-Yau.
If you know of some other interesting AI hardware platforms, let me know I will add them here.
Happy learning to you!