Amazon EMR
Amazon EMR simplifies running big data frameworks like Apache Hadoop and Spark on AWS for business intelligence and analytics
Jobs can be submitted as steps during cluster creation, through the EMR UI, API, or CLI, or interactively via SSH to the primary node
Data is processed in sequential steps, with each step performing specific tasks like data manipulation or querying
Data is typically stored in HDFS or Amazon S3, processed through steps, and output to a designated location like an S3 bucket
Steps transition through states (PENDING, RUNNING, COMPLETED, or FAILED), with options to cancel, retry, or terminate the cluster on failure
Clusters progress through states: STARTING, BOOTSTRAPPING, RUNNING, WAITING, TERMINATING, and TERMINATED
Custom scripts can be executed during cluster setup to install additional software or configure instances
EMR supports pre-installed applications like Hive, Hadoop, and Spark for data processing
Clusters can auto-terminate after completing steps or remain in a WAITING state for manual shutdown
EMR supports custom AMIs, hardware configurations, and termination protection for enhanced control and recovery
For more details visit Govindhtech.com