Safely Launch Spark job using AWS Lambda and SNS

Introduction

In this post, we are going to learn how to safely launch Apache Spark job using AWS lambda function and SNS topic. We are going to use Assume Role in our lambda function. This way it is much safer and no need to use long-term user access credentials.

The same pattern can be applied to control different services or applications those are using EC2 machines.

Prerequisites

  • Spark cluster under AWS
    • version – spark-2.2.0-bin-hadoop2.7
  • AWS Lambda
  • SNS topic
  • Systems manager
    • Run command
    • Document – version 2.2

Spark cluster under AWS

We are assuming that you have deployed your Apache Spark cluster under AWS and have access to Master instance. You can google and find great tutorials describing spark cluster setup on EC2 machines. In our sample run, we will use ‘spark-submit’ command to launch jobs under spark-2.2.0-bin-hadoop2.7/bin.

Systems manager shared resources – Document

We would need to create a run command document that contains spark job launch command. Create a document with following details:

Name*: ApacheSparkJobController
Parameters:
1. Name – input, Default value – s3a://spark.data.com/shakespeare-1-100.txt
2. Name – output, Default value – s3a://spark.data.com/output
Content*: {"schemaVersion":"2.2","description":"Apache Spark Job Controller","parameters":{"input":{"type":"String","default":"s3a://spark.data.com/shakespeare-1-100.txt"},"output":{"type":"String","default":"s3a://spark.data.com/output"}},"mainSteps":[{"action":"aws:runShellScript","name":"SparkJobLaunch","precondition":{"StringEquals":["platformType","Linux"]},"inputs":{"runCommand":["su ubuntu -c \"~/spark-2.2.0-bin-hadoop2.7/bin/spark-submit --master spark://ip-000-00-00-000.us-east-2.compute.internal:7077 --class com.spark.aws.samples.s3.S3WordCountSTSTemporaryCredentials ~/spark-aws-samples-0.0.1-SNAPSHOT.jar {{input}} {{output}}\""]}}]}

To know how to create document follow this post: Launch / Terminate Linux services using AWS Run Command.

Create Lambda function

Create Lambda function by clicking on ‘Author from scratch’. Enter following details:

Name*: LambdaSparkJobHandler
Runtime: Java 8
Enter Handler*: com.aws.example.lambda.LambdaSparkJobHandler::runCommandHandler
Upload function package*: aws-examples-0.0.1-SNAPSHOT.jar.
Environment variables: Key: document, Value: ApacheSparkJobController.
Role*: LambdaSparkJobLaunchRole. Apart from creating or selecting a role you also need to authorize it to perform SSM operations. To attach SSM assume role follow this post: Running Systems Manager Run Command from AWS Lambda function.

Create SNS topic and subscribe lambda function

Now we need an SNS topic and then we subscribe LambdaSparkJobHandler lambda function to it. Topic details:

Topic Name: SparkJobControllerTopic
Subscription: Protocol: AWS lambda -> LambdaSparkJobHandler.

To understand the steps behind it follow: Invoking lambda function using SNS Topic notification.

Note: You can also subscribe lambda function to SNS topic by add trigger, available in lambda function configuration.

Spark job launch – Publish SNS topic

Now its time to test launching a spark job by publishing an SNS topic. Open topic SparkJobControllerTopic and enter following details.

Subject: Spark job launch
Message format: Raw
Message: {"input":"s3a://spark.data.com/shakespeare-1-100.txt", "output":"s3a://spark.data.com/output1", "masterInstanceId":"i-11111b28f98720000"}

Note: you need to provide your own EC2 instanceId of spark master. replace i-11111b28f98720000 with real instance id.

Publish this message and you will see success logs in lambda and Run command.

Conclusion

We saw how to safely launch spark job by publishing from SNS topic and then processing by assigning Assume Role to lambda function. I believe you can do more with integrating SNS, Lambda and Run command services. I hope this post will help to understand effective use of AWS services and how to integrate them.

Leave a Reply

Your email address will not be published. Required fields are marked *