THE ROLE

Design modern, HIPAA-compliant enterprise data platforms powering analytics, AI/ML workloads, and real-time intelligence for health plans, PACE organizations, and pharma clients. You'll own the full data stack — from lakehouse architecture to streaming pipelines handling millions of daily pharmacy claims.

KEY RESPONSIBILITIES

  • Architect data lakes, warehouses, and lakehouse platforms on AWS
  • Design scalable ETL/ELT pipelines for 800K+ daily claim records
  • Implement dimensional modeling (star/snowflake) for healthcare analytics
  • Build real-time streaming pipelines: Kafka (MSK), Flink, and Kinesis
  • Define data governance: lineage, PII masking, catalog, and access policies
  • Collaborate with AI/ML teams on feature stores and training data pipelines
WHAT WE'RE LOOKING FOR

  • 8+ yrs in data engineering, 3+ yrs as data architect
  • Deep AWS data services: S3, Glue, Redshift, Athena, Lake Formation
  • Apache Spark (PySpark), Kafka, Flink streaming architectures
  • Modern data stack: dbt, Airflow, Snowflake or Databricks
  • Healthcare data (X12 EDI, FHIR, claims, pharmacy) a strong plus
CORE TECHNOLOGIES

  • AWS S3 · Glue · Redshift · Athena
  • Apache Spark · PySpark · Flink
  • Kafka · AWS MSK · Kinesis
  • dbt · Airflow · Snowflake · Databricks
  • Terraform · Apache Iceberg · Delta Lake

Required Skills

Glue Redshift Healthcare data Athena Kafka AWS S3 Snowflake