
Nvidia’s DGX-A100 Datacenter Server
Nvidia’s annual GTC Conference has come and gone, and with it, some of the most amazing announcements ever packed into a single year. Nvidia and their technology partners have long since been just a gaming card company, and are now a key component in any future-ready datacenter in the Enterprise. In fact, Nvidia revenue has switched from gaming systems to datacenter products being the majority.
This year, Nvidia held their annual GPU Tech Conference online, and provided over 700 sessions for free to download: https://www.nvidia.com/en-us/gtc/on-demand/?
At the heart of all messaging were Nvidia’s latest technology release; the Ampere-based A100 GPU and DGX-A100 GPU server.
The DGX-A100 is Nvidia’s one system for ALL AI Infrastructure. The 3rd Generation NVLink & NVSwitch interconnect between GPUs is 2x faster than previous generations at rates of 600GB/sec, and the new TF32 tensor cores are up to 20x faster than the previous generations Turing-based GPUs.

Nvidia DGX-A100 Benefits ( Illustration © Nvidia )
We at High Availability have seen the benefit of Modernizing DataCenters with GPUs for the last decade, and have been honored to be one of a few partners in the country certified and authorized as an Nvidia DGX Reseller Solution Provider.
It’s always a great day when we see customers look beyond the legacy way of doing business, and step into products that will drive them towards the future. We hear lots of questions and concerns around GPUs / AI / ML and new technologies. We want to walk with you down the right path of proven software and technologies to make your business more productive, more proactive, and more profitable.
Below are some of the topics we discuss most often with our customers.
The rise of GPU-Accelerated Databases
We are in an age of information, IoT, and mobile applications. Data Lakes and row counts are growing. Displaced from offices, we have adapted to a new way of working.
Rarely does a CIO/CTO imagine that a GPU or GPU server like the DGX-A100 would help their business analysts work more efficiently on Ad Hoc queries in PowerBI and Tableau, nor would you think that it is possible to increase your BI service performance by over 100x with Sub-second response times on multiple billion record data sources.
The ad-hoc nature of modern applications, along with Web-Scale databases surpassing billions of rows means that we need to rethink how we process queries and parallelism. Traditional RDBMS OLAP databases will make you license per CPU Cores or Pair of Cores, and many customers are having to run their most important business applications off of just a few licensed cores or else pay millions and millions of dollars in additional licensing ( or fines with certain audit-prone RDBMS companies)
With GPU-accelerated databases like Kinetica, you can run your queries with the power of tens of thousands of GPU cores, and ingest data at TB/s data rates against NVMe arrays. All at a fraction of the cost of additional licensing of your legacy traditional RDBMS.

Kinetica Architecture ( Illustration © Kinetica )
Kinetica provides Single-Click Tableau integration and SQL-92 support over ODBC/JDBC, as well as Streaming Analytics and Location/Spatial Analytics.
Seeing is believing. Kinetica has many videos showing real working examples, and what their customers have done on their YouTube Page:
- How does Kinetica Actually Work: https://www.youtube.com/watch?v=rljkA3n_KX8
- High performance BI with Kinetica and Tableau: https://www.youtube.com/watch?v=7XliLU7ZC_s
Because GPUs and interconnects are faster, Nvidia needed to modernize how data and storage is accessed. Their solution was the creation of “GPUDirect Storage“.

GPUDirect Storage ( Illustration © Nvidia )
If you have a large Apache Spark cluster, and want to feed data to your Business Intelligence platform, Kinetica streaming it in utilizing A100 and GPUDirect Storage can ingest many billions of rows a minute, while at the same time servicing requests. This gives you a truly Active Analytics platform.
Without GPUDirect Storage, customers waste CPU time controlling their HBAs to call to the data, stealing resources from the already overworked cores trying to process the data.
Early benchmarks from Nvidia and storage vendors that H.A. sells have shown GPUDirect Storage accelerated NFS throughput over 92GB/second at just 16% CPU utilization! Compare this to the same system providing around 33GB/second with RDMA over NFS without GDS at 99% CPU utilization, or 2GB/s with NFS over TCP without RDMA or GDS!
H.A. can help you design a Turn-Key End to End solution with DGX-A100, NVMe-based Storage Array for Kinetica and Tableau!
Making GPU clusters more flexible with Nvidia Multi-Instance GPUs (MiG) and better management tools
Would you use one or more bare metal servers for each application in your business, in the times of Virtualization or Containers? No! Efficiency & Increasing Utilization matters!
Nvidia has updated their latest generation of GPUs with native virtualization technology called Multi-Instance GPUs, or MiG for short. Think of this as the GPU-version of VT-d/VT-x of the CPU world, or IOMMU of the PCIe cards.

Nvidia A100 Workload Segmentation with Multi-Instance GPUs ( Illustration © Nvidia )
This week alone I have heard from 2 completely different types of companies that they need GPUs, and that they “just want to stick a card in a server or workstation for each staff member”. This is the wrong way of thinking about GPUs in the Enterprise!
One, a prestigious university on the cutting edge of math, astrophysics, and biology. The other, a company focused on business transformation for their customers.
“How can we give resources to everyone if we don’t put in GPU into each of their workstations?” Easy, each Nvidia A100 can be segmented up to 7 “GPU Compute Instances”. A DGX-A100 with it’s 8x physical A100 GPUs could present out up to 56 virtual “GPU Compute Instances”
You have the flexibility to slice up the GPUs or not, depending on what resources your users and application needs, and you can do it On-the-Fly!
- Many End users with PyTorch and TensorFlow? Let them share GPU Compute Instances with MiG
- Long running Machine Learning training tasks? Give the workload it’s own dedicated GPUs
- ML Inference for your mobile or web application? Each Inference engine instance can get a share GPUs with MiG

Nvidia MiG Use Cases and Instance types based on “SM Shares” and Memory needed ( Illustration © Nvidia )

Flexible Utilization on DGX-A100 ( Illustration © Nvidia )
With Nvidia MiG you have the flexibility to:
- Expand GPU Access to more users
- Optimize GPU Utilization
- Run Simultaneous Mixed Workloads
Convincing CPU cluster researchers and data scientists to try GPU:

OpenACC Source Example
One of the easiest ways to step your researchers into using GPUs, without completely refactoring their code, is to see if they will test compiling their code with OpenACC hints. Recent GCC versions have support for OpenACC callouts built in.
OpenACC (free) allows you to put in hints and directives to parallelize your code, and offload certain aspects within GPUs. The CPU functionality of the code remains unchanged, and if ran on a cpu-only system, it will function as normal. Most commonly, applications with basic linear algebra can see 500% or better speedup just by applying basic hinting and letting the compiler decide what to do cpu/gpu.
OpenACC is now part of the HPC SDK ( https://developer.nvidia.com/hpc-sdk )
Nvidia also released an application profiler and performance analysis tool called Nsight System (free), which will the your staff optimize and tune their code and algorithms on both CPU and GPU.
If any of the researchers use any Gradient Boosting algorithms and libraries today, XGBoost can be used instead, and it is GPU aware.
Nvidia also released RAPIDS ( https://rapids.ai/) for a single common platform of data science with the most common tools used by data scientists. Packages include cuda/gpu aware libraries for data frames, linear algebra, graph analytics, signal processing, spatial/spatiotemporal, memory management, parallelism, and plotting. This makes package management easy, since they are all tested within each other.
Nvidia also provides tools to rapidly add GPUs into your clusters or existing schedulers and orchestrators (SLURM / Kubernetes), with tools such as DeepOps ( https://github.com/NVIDIA/deepops )
Remember, Nvidia has thousands of developers. Let their hard work benefit your team!
Letting your Data Scientists and Researchers get right to work:
All too often, we see the “DIY Data Science” environments where customers want to throw a GPU in a few servers, and then have their data scientists or infrastructure staff try to build up a home grown solution to spin up VMs or containers, or install packages.
Let me say this now: Your Data Scientists and Analysts are not SysAdmins. They should not be SysAdmins. If you are treating them as SysAdmins, you are throwing money away.
What do you want your Data Scientists to be working on:
- Creating and tuning cutting edge algorithms or
- Troubleshooting getting Open Source Library A to work with Framework B and not slow down when upgrading to linux Package C
I hope you picked #1.
Your data scientists should not have to know or care about what’s under the covers of their framework or Jupyter NoteBooks! They should not have to worry about how things are deployed! They should be able to easily deploy the software they need with a few clicks and get right to work!
Nvidia has gone to market with a few software companies to make for a turnkey Data Science Platform. High Availability’s recommendation for turnkey Data Science platforms is Domino Data Lab’s Data Science Platform. We can happily say that one of our local customers were the first to run this on a previous generation Nvidia DGX, and since then many others across the world have as well.
Domino Data Lab’s is the “Open Data Science Platform for the Enterprise”, and is a Visionary in the Gartner Magic Quadrant for Data Science platforms.
To Illustrating it’s functions:

Domino Data Lab’s Data Science Platform ( Illustration © Domino Data Labs )
Workbench:
Use the tools you want with self-serve access to scalable compute
Run and compare computationally intensive experiments simultaneously
Automatically track all work and progress over time
Model Ops:
Deploy data science models on any infrastructure
Publish powerful data science applications for business users
Enable proactive model monitoring to maximize business impact
Knowledge Center:
Find, reuse, reproduce, and discuss work
See the health and status of all data science projects and production assets
Instill consistent standards and best practices
Enterprise Infrastructure Foundation (DevOps and Automation):
Provide self-service access to scalable compute resources
Enable standards and governance across tools, languages, and teams
Keep up with the latest advances in data science
Domino supports containers from Nvidia’s Docker Repository called Nvidia GPU Cloud (NGC), which are optimized and tested by Nvidia’s team of over 7000 developers, as well as many other products such as SAS, MATLAB, RStudio, Jupyter, and more.

Nvidia’s Docker/K8s Adoption – Containerize all the things! ( Illustration © Nvidia )
Prebuilt ML Models and Tools to advance Medicine, Recommendation Engines, and Conversational AI:
I won’t go deep into this topic, but I would be remiss if I did not mention that Nvidia is trying to make the world of Machine Learning and AI as easy to integrate into existing applications as possible. Three Projects which are leading this are Nvidia Clara, Nvidia Merlin , and Nvidia Jarvis .
Nvidia Clara is their medical application framework and series of products. From embedded chips, to pre-built models that can be easily incorporated into your radiology and genomics applications. You can see many examples of AI assisted radiological annotation and 3d visualizations using the pre-trained models here: https://news.developer.nvidia.com/nvidia-clara-platform-augmenting-radiology-with-ai/

Nvidia Clara (Illustration © Nvidia)
Nvidia Merlin is a ready to use Recommendation model.

Nvidia Merlin (Illustration © Nvidia)
Nvidia Jarvis is a Conversational AI Application Framework.

Nvidia Jarvis Pipeline ( Illustration © Nvidia )
Jarvis integrates several components:
- NeMo: Open-Source toolkit to build and fine tune conversational AI models.
- Megatron-BERT: World’s largest BERT model .
- TensorRT 7.1: For tuning the models with optimizations for NVIDIA A100 GPUs, and BERT inference using INT8 precision.
- Flowtron: Speech synthesis model that generates more realistic and controllable voice expressionn. See it here in the I AM AI opening keynote video at GTC Digital 2020.
Comments are closed.