From: aidotengineer

Luminal Cloud is an offering designed to provide a straightforward and highly optimized cloud experience for Machine Learning (ML) inference workloads [00:23:36]. It aims to be the “simplest, fastest ML cloud in the world” [00:24:19].

How Luminal Cloud Works

The foundation of Luminal Cloud relies on Luminal’s ability to represent ML models as graphs [00:11:10] [00:23:11]. Users can:

  1. Work on a model within the Luminal framework [00:23:12].
  2. Export the model graph using graph.export to obtain a file [00:23:14].
  3. Upload that file to Luminal Cloud [00:23:18].
  4. Receive a serverless inference endpoint [00:23:20].

Luminal Cloud handles all subsequent processes automatically [00:23:22].

Simplification through Graphs

The ability to represent ML models as directed acyclic graphs of operations is a core aspect of Luminal’s design, enabling extreme simplicity in the library itself (under 5,000 lines of code) [00:10:04] [00:06:25]. This simplification is also leveraged for cloud deployment.

Serverless Implementation

Luminal Cloud is built with a serverless architecture [00:23:30]. This means users only pay for the time their graph is actively executing [00:23:32].

"It's totally serverless. You only pay for when your graph is actually executing." [00:23:30]

Automated Management

Luminal Cloud automates several critical aspects of ML deployment:

  • Optimization: The platform handles optimization of the deployed models [00:23:24].
  • Batching and Queuing: It manages batching and queuing of inference requests [00:23:25].
  • Machine Provisioning: The provisioning of machines for inference is fully automated [00:23:28].

This automated approach aims to deliver the “simplest, fastest, most straightforward cloud experience out there” [00:23:36].