databricks_aws_utils.rds

RdsUtils Objects

class RdsUtils(DatabrickAWSUtils)

AWS RDS Integration Utils.

Using pg8000 in the Databricks notebook not works properly, so, AWS Data Wrangler not work as well, then this module helps to abstract call to execute queries against to RDS using Spark Session using AWS Secrets manage to retrive the RDS instance authentication

Arguments:

  • spark SparkSession - spark session
  • secret_id str - AWS Secrets Manager Secret Id
  • aws_region str, optional - AWS region, default us-east-1
  • iam_role str, optional - IAM Role ARN, if specified assumes the IAM role to perform the AWS API calls
  • aws_access_key_id str, optional - Temporary AWS Access Key Id
  • aws_secret_access_key str, optional - Temporary AWS Secret Access Key
  • aws_session_token str, optional - Temporary AWS Session Token

Features:

  • Execute query against a database using Spark JDBC using AWS Secrets Manager

read_query

def read_query(query: str, dbname: Optional[str] = None) -> DataFrame

Run Query to the Database configured in the AWS Secrets Manager entry and returns the Spark DataFrame

The AWS Secrets Manager entry needs to have these properties:

  • host
  • port
  • username
  • password
  • dbname
  • engine

Example:

{
   "host": "myrds-dns",
   "post": 5432,
   "username": "myuser",
   "password": "mysecretpassword",
   "dbname": "mydatabase",
   "engine": "postgresql",
}

Arguments:

  • query str - SQL Query
  • dbname str, optional - Override the AWS Secret Manager DB Name

Returns:

  • DataFrame - Spark DataFrame