Boost Trino Performance: Master Dataproc Autoscaling Now!

An open-source, widely used distributed SQL query engine for warehouses and data lakes is called Trino. Numerous companies use it to examine big datasets kept in cloud storage, other data sources, including the Hadoop Distributed File System

However, workloads like Trino that aren’t built on Yet Another Resource Negotiator, or YARN, aren’t yet supported by Dataproc for autoscaling

Self-scaling By addressing the absence of autoscaling support for workloads that are not YARN-based, Dataproc for Trino helps to avoid overprovisioning, underprovisioning, and manual scaling

It offers a distributed computing platform for large data processing that is dependable, scalable, and adaptable. A YARN centralized resource manager is used by Hadoop for resource allocation, cluster management, and monitoring

The Trino Coordinator, who oversees planning, resource allocation, and query coordination, is in charge of Trino’s resource allocation and administration

As of right now, Dataproc can only handle autoscaling for YARN-based apps. since of this, it is difficult to optimize the expenses of operating Trino on Dataproc since the cluster size has to be changed to accommodate for the processing demands of the moment

Trino offers a graceful shutdown API that should only be used on workers to guarantee that they end without interfering with ongoing requests