Amazon EMR Studio

Amazon EMR Studio is a web-based integrated development environment (IDE) for working with managed Jupyter notebooks on Amazon EMR clusters

It supports development, visualization, and debugging in R, Python, Scala, and PySpark, with real-time collaboration features

EMR Studio integrates with AWS IAM and IAM Identity Center for secure authentication and user management, supporting both direct and federated access

Users can launch, connect to, and manage Amazon EMR clusters (on EC2 or EKS) directly from the Studio interface

Notebook work is automatically backed up to Amazon S3, ensuring data persistence and recovery between sessions

The SQL Explorer allows users to browse data catalogs, run SQL queries, and download results before using them in notebooks

EMR Studio supports workflow integration with orchestration tools like Apache Airflow and Amazon MWAA for scheduled, parameterized notebook execution

1. Security groups, IAM roles, and VPC subnets are used to manage network access and permissions for Studio resources and users.

Workspaces in EMR Studio organize notebooks, cluster connections, and Git integrations, and can be shared or used individually