THE ROLE
Design modern, HIPAA-compliant enterprise data platforms powering analytics, AI/ML workloads, and real-time intelligence for health plans, PACE organizations, and pharma clients. You'll own the full data stack — from lakehouse architecture to streaming pipelines handling millions of daily pharmacy claims.
KEY RESPONSIBILITIES
- Architect data lakes, warehouses, and lakehouse platforms on AWS
- Design scalable ETL/ELT pipelines for 800K+ daily claim records
- Implement dimensional modeling (star/snowflake) for healthcare analytics
- Build real-time streaming pipelines: Kafka (MSK), Flink, and Kinesis
- Define data governance: lineage, PII masking, catalog, and access policies
- Collaborate with AI/ML teams on feature stores and training data pipelines
WHAT WE'RE LOOKING FOR
- 8+ yrs in data engineering, 3+ yrs as data architect
- Deep AWS data services: S3, Glue, Redshift, Athena, Lake Formation
- Apache Spark (PySpark), Kafka, Flink streaming architectures
- Modern data stack: dbt, Airflow, Snowflake or Databricks
- Healthcare data (X12 EDI, FHIR, claims, pharmacy) a strong plus
CORE TECHNOLOGIES
- AWS S3 · Glue · Redshift · Athena
- Apache Spark · PySpark · Flink
- Kafka · AWS MSK · Kinesis
- dbt · Airflow · Snowflake · Databricks
- Terraform · Apache Iceberg · Delta Lake