Speed up the generation of images by neural networks by 30 times and reduce the cost of training large language models by 6 times: Latest works in the field of AI

March 27, 2024  19:13

Scientists and engineers have found a way to speed up neural networks for image generation by almost 30 times, and also significantly reduce the cost of training large language models using SSDs. It is expected that these developments will make AI even more accessible to both the general public and specialists who use these technologies in their work.

Image generation: from 2590 to 90 milliseconds

Researchers at the Massachusetts Institute of Technology in the US have developed a method called Distribution Matching Distillation (DMD): it trains new AI models to imitate existing image generators known as diffusion models (such as DALL-E 3, Midjourney and Stable Diffusion ). This structure makes it possible to create more compact AI models that can generate images from text queries much faster, without loss of quality.

The imaging process of diffusion models typically involves up to 100 steps. Scientists, however, were able to reduce the number of operations to one, as a result of which the AI spent only 90 milliseconds instead of 2.59 seconds to generate the image, that is, it completed the work 28.8 times faster.

DMD consists of two components that reduce the number of iterations required by the model before it produces a normal image. Using this approach will also significantly reduce the computational power required for the image generator.

“Reducing the number of iterations has been the Holy Grail of diffusion models since their inception,” said Fedro Duran, co-author of the paper published in the journal arXiv, and professor of electrical engineering and computer science.

Cheaper training for large language models

Phison, in turn, showed a workstation with four graphics processors, the performance of which is enough to train an artificial intelligence model with 70 billion parameters. Under normal conditions, such a task requires 6 servers with 24 Nvidia H100 accelerators and 1.4 TB of video memory, but in this case, the required performance was achieved by using SSD resources and system DRAM.

As Tom's Hardware explains, Phison's aiDaptiv+ platform helps reduce the amount of resources needed to train large AI language models by using system memory and SSDs to increase the amount of memory available to GPUs. And this solution can help companies significantly reduce the cost of AI training. Moreover, it could help avoid the GPU shortages (and price increases) that are already threatening the industry today.

The performance of the system proposed by experts is still inferior to expensive server solutions. But it allows SMBs to run cutting-edge models locally, maintaining data privacy and saving money, as long as they have enough time to train the model.

To demonstrate the system, we used a Maingear Pro AI workstation with an Intel Xeon W7-3445X processor, 512 GB of DDR5-5600 memory and two Phison aiDaptiveCache ai100E 2 TB M.2 solid-state drives, designed for 100 rewrite cycles per day for 5 years . Phison's aiDaptiv+ software solution strips layers of the AI model from video memory that are not currently being actively processed and sends them to system memory; all necessary information remains here, and low-priority data is transferred to solid-state drives. As needed, they are moved to the video memory of the GPU, where they are processed, and the already processed data is sent to DRAM and SSD.

The Maingear Pro AI workstation comes in several variants, starting at $28,000 for a version with one Nvidia RTX 6000 Ada A100 graphics accelerator and up to $60,000 for a version with four GPUs.

Phison aiDaptiv+ works with Pytorch/Tensor Flow and does not require modification of AI applications. Training AI on such a setup will require 6 times less costs than on 8 clusters with 30 AI accelerators. But at the same time, training will take about 4 times longer.

However, in the case of horizontal scaling with the launch of 4 workstations, training a model with 70 billion parameters will take approximately 1.2 hours, and a system with 30 AI accelerators will carry out this training in 0.8 hours.

  • Archive