An open-source, widely used distributed SQL query engine for warehouses and data lakes is called Trino. Numerous companies use it to examine big datasets kept in cloud storage, other data sources, including the Hadoop Distributed File System
However, workloads like Trino that aren’t built on Yet Another Resource Negotiator, or YARN, aren’t yet supported by Dataproc for autoscaling
Self-scaling By addressing the absence of autoscaling support for workloads that are not YARN-based, Dataproc for Trino helps to avoid overprovisioning, underprovisioning, and manual scaling
It offers a distributed computing platform for large data processing that is dependable, scalable, and adaptable. A YARN centralized resource manager is used by Hadoop for resource allocation, cluster management, and monitoring
The Trino Coordinator, who oversees planning, resource allocation, and query coordination, is in charge of Trino’s resource allocation and administration
As of right now, Dataproc can only handle autoscaling for YARN-based apps. since of this, it is difficult to optimize the expenses of operating Trino on Dataproc since the cluster size has to be changed to accommodate for the processing demands of the moment
Trino offers a graceful shutdown API that should only be used on workers to guarantee that they end without interfering with ongoing requests