Set an AWS lambda function to run on a schedule
schedule-lambda.Rmd
In this exercise, we’ll write a simple lambda runtime function, build a docker image locally, test the lambda invocation, deploy it to AWS Lambda, update it to run on a schedule, and check the AWS logs to confirm it executes at the correct times.
A lambda runtime function
We start with a simple function that does not require any input and does not return anything. If this example lambda is to run on a schedule, we don’t want to worry about any input arguments. Also, we want this lambda function to simply have a side-effect, like printing something to the logs, without returning any data or writing to a database. This will help us greatly with the setup, in that we’ll be able deploy and schedule the lambda with mininal involvement from other AWS services.
With this in mind, we have the following function that simply prints the system time. Printing the current time makes sense because we can easily check that the lambda runs on the correct schedule from the logs.
Build, test, and deploy
Here, we follow the procedure described in
Tidy Tuesday dataset Lambda
vignette. We write this to a
file that we’ll use to build the lambda docker
image:
r_code <- "
current_time <- function() {
print(paste('CURRENT TIME:', Sys.time()))
}
lambdr::start_lambda()
"
tmpfile <- tempfile(pattern = "current_time_lambda_", fileext = ".R")
write(x = r_code, file = tmpfile)
And then build the docker
image. Note that we don’t have
any dependencies other than base R
.
r2lambda::build_lambda(
tag = "current_time",
runtime_function = "current_time",
runtime_path = tmpfile,
dependencies = NULL
)
We test the lambda docker container locally, because it makes sense. The console output should include the log messages and the standard output string showing the current time.
r2lambda::test_lambda(tag = "current_time", payload = list())
Then, we deploy the lambda to AWS, leaving the lambda environment to its defaults, as 3 seconds should be enough to get and print the current time.
r2lambda::deploy_lambda(tag = "current_time")
Finally, we invoke the cloud instance of our function, to make sure everything went well. Be sure to include the logs, as this particular function does not return anything.
r2lambda::invoke_lambda(
function_name = "current_time",
invocation_type = "RequestResponse",
payload = list(),
include_logs = TRUE
)
Schedule to run every minute
To make a lambda function run on a recurring schedule, we need to update an already deployed function. This involves three steps and two AWS services, Lambda and EventBridge:
- creating a schedule event role (EventBridge,
paws::eventbridge
) - adding permissions to this role to invoke lambda functions (Lambda,
paws::lambda
) - adding our target lambda function to eventBridge
Detailed instructions are available in the AWS
documentation. The function schedule_lambda
abstracts
these three steps in one go. To set a lambda on a schedule, we need the
name of the function we wish to update, and the rate at which we want
EventBridge to invoke it. Two rate-setting expression formats are
supported, cron
and rate
. For example, to
schedule a lambda to run every Sunday at midnight, we could use
execution_rate = "cron(0 0 * * Sun)"
. Alternatively, to
schedule a lambda to run every 15 minutes, we might use
execution_rate = "rate(15 minutes)"
. The details are in
this article
r2lambda::schedule_lambda(lambda_function = "current_time", execution_rate = "rate(1 minute)")
Checking the AWS logs
To see if our function runs every minute, we can take a look at the AWS logs. If the function was writing to a database, or dropping files in an S3 bucket, we could also check the contents of those resources for the effects of the scheduled lambda function. But as our example function only prints the current time, the only way to know that it indeed runs every minute is to check the logs.
To do this, we’ll use paws
and
r2lambda::aws_connect
to establish an AWS CloudWatchLogs
service locally, and fetch the recent logs to look for traces of our
lambda function.
In the first step, we connect to cloudwatchlogs
and
fetch the names of the log groups. Inspect the logs
object
below to find the name corresponding to the lambda function whose logs
we want to fetch.
logs_service <- r2lambda::aws_connect(service = "cloudwatchlogs")
logs <- logs_service$describe_log_groups()
(logGroups <- sapply(logs$logGroups, "[[", 1))
The, we can grab only the data for our scheduled lambda function:
current_time_lambda_logs <- logs_service$filter_log_events(
logGroupName = "/aws/lambda/current_time")
And pull only the message printed by our R
function
wrapped in the lambda:
messages <- sapply(current_time_lambda_logs$events, "[[", "message")
current_time_messages <- messages[grepl("CURRENT TIME", messages)]
data.frame(Current_time_lambda = current_time_messages)
#> Current_time_lambda
#> 1 [1] "CURRENT TIME: 2023-02-26 22:53:55"\n
#> 2 [1] "CURRENT TIME: 2023-02-26 22:54:41"\n
#> 3 [1] "CURRENT TIME: 2023-02-26 22:55:41"\n
#> 4 [1] "CURRENT TIME: 2023-02-26 22:56:41"\n
#> 5 [1] "CURRENT TIME: 2023-02-26 22:57:41"\n
Clean up
We don’t want to let a this trivial lambda fire every minute, it will still incur some cost. So its wise to delete the event schedule rule and maybe even the lambda function it self.
To remove the event rule, we first need to remove associated targets, and then remove the rule.
# connect to the EventBridge service
events_service <- r2lambda::aws_connect("eventbridge")
# find the names of all rules we need
schedule_rules <- events_service$list_rules()[[1]] |> sapply("[[", 1)
# find the targets associated with the rule we want to remove
rule_to_remove <- schedule_rules[[1]]
target_arn_to_remove <- events_service$list_targets_by_rule(Rule = rule_to_remove)$Targets[[1]]$Id
events_service$remove_targets(Rule = rule_to_remove, Ids = target_arn_to_remove)
events_service$delete_rule(Name = rule_to_remove)
events_service$list_rules()[[1]] |> sapply("[[", 1)
Finally, to remove the Lambda:
lambda_service <- r2lambda::aws_connect("lambda")
lambda_service$list_functions()$Functions |> sapply("[[","FunctionName")
lambda_service$delete_function(FunctionName = "parity1")