Tidy Tuesday dataset Lambda
tt-lambda.Rmd
Use {r2lambda}
to download Tidytuesday dataset
In this exercize, we’ll create an AWS Lambda function that downloads the tidytuesday data set for the most recent Tuesday (or most recent Tuesday from a date of interest).
Runtime function
The first step is to write the runtime function. This is the function
that will be executed when we invoke the Lambda function after it has
been deployed. To download the Tidytuesday data set, we will use the
tidytuesdayR package. In the runtime script, we define a
function called tidytyesday_lambda
that takes one optional
argument date
. If date
is omitted, the
function returns the data set(s) for the most recent Tuesday, otherwise,
it looks up the most recent Tuesday from a date of interest and returns
the corresponding data set(s).
library(tidytuesdayR)
tidytuesday_lambda <- function(date = NULL) {
if (is.null(date))
date <- Sys.Date()
most_recent_tuesday <- tidytuesdayR::last_tuesday(date = date)
tt_data <- tidytuesdayR::tt_load(x = most_recent_tuesday)
data_names <- names(tt_data)
data_list <- lapply(data_names, function(x) tt_data[[x]])
return(data_list)
}
tidytuesday_lambda("2022-02-02")
R script to build the lambda
To build the lambda image, we need an R
script that
sources any required code, loads any needed libraries, defines a runtime
function, and ends with a call to lambdr::start_lambda()
.
The runtime function does not have to be defined in this file. We could,
for example, source another script, or load a package and set a loaded
function as the runtime function in the subsequent call to
r2lambda::build_lambda
(see below). We save this script to
a file and record the path:
r_code <- "
library(tidytuesdayR)
tidytuesday_lambda <- function(date = NULL) {
if (is.null(date))
date <- Sys.Date()
most_recent_tuesday <- tidytuesdayR::last_tuesday(date = date)
tt_data <- tidytuesdayR::tt_load(x = most_recent_tuesday)
data_names <- names(tt_data)
data_list <- lapply(data_names, function(x) tt_data[[x]])
return(data_list)
}
lambdr::start_lambda()
"
tmpfile <- tempfile(pattern = "ttlambda_", fileext = ".R")
write(x = r_code, file = tmpfile)
Build, test, and deploy the lambda function
1. Build
We set the
runtime_function
argument to the name of the function we wish thedocker
container to run when invoked. In this case, this istidytuesday_lambda
. This adds aCMD
instruction to theDockerfile
We set the
runtime_path
argument to the path we stored the script defining our runtime function.We set the
dependencies
argument toc("tidytuesdayR")
because we need to have thetidytuesdayR
package installed within thedocker
container if we are to download the dataset. This steps adds aRUN
instruction to theDockerfile
that callsinstall.packages
to install tidytuesdayR from CRAN.Finally, the
tag
argument sets the name of our Lambda function which we’ll use later to test and invoke the function. Thetag
argument also becomes the name of the folder that r2lambda will create to build the image. This folder will have two files,Dockerfile
andruntime.R
.runtime.R
is our script fromruntime_path
, renamed before it is copied in thedocker
image with aCOPY
instruction.
runtime_function <- "tidytuesday_lambda"
runtime_path <- tmpfile
dependencies <- "tidytuesdayR"
r2lambda::build_lambda(
tag = "tidytuesday3",
runtime_function = runtime_function,
runtime_path = runtime_path,
dependencies = dependencies
)
2. Test
To make sure our Lambda docker
container works as
intended, we start it locally, and invoke it to test the response. The
response is a list of three elements:
response <- r2lambda::test_lambda(tag = "tidytuesday3", payload = list(date = Sys.Date()))
-
status
, should be 0 if the test worked, -
stdout
, the standard output stream of the invocation, and -
stderr
, the standard error stream of the invocation
stdout
and stderr
are raw
vectors that we need to parse, for example:
rawToChar(response$stdout)
If the stdout
slot of the response returns the correct
output of our function, we are good to deploy to AWS.
3. Deploy
The deploy step is simple, in that all we need to do is specify the
name (tag) of the Lambda function we wish to push to AWS ECR. The
deploy_lambda
function also accepts ...
, which
are named arguments ultimately passed onto
paws.compute:::lambda_create_function
. This is the function
that calls the Lambda API. To see all available arguments run
?paws.compute:::lambda_create_function
.
The most important arguments are probably Timeout
and
MemorySize
, which set the time our function will be allowed
to run and the amount of memory it will have available. In many cases it
will make sense to increase the defaults of 3 seconds and 128 mb.
r2lambda::deploy_lambda(tag = "tidytuesday3", Timeout = 30)
4. Invoke
If all goes well, our function should now be available on the cloud
awaiting requests. We can invoke it from R
using
invoke_lambda
. The arguments are:
-
function_name
– the name of the function -
invocation_type
– typicallyRequestResponse
-
include_log
– whether to print the logs of the run on the console -
payload
– a named list with arguments sent to theruntime_function
. In this case, the runtime function,tidytuesday_lambda
has a single argumentdate
, so the corresponding list islist(date = Sys.Date())
. As our function can be called without any argument, we can also send and empty list as the payload.
response <- r2lambda::invoke_lambda(
function_name = "tidytuesday3",
invocation_type = "RequestResponse",
payload = list(),
include_logs = TRUE
)
Just like in the local test, the response payload comes as a raw vector that needs to be parsed into a data.frame: