Brad Clark ([email protected])

Brad Clark

[email protected]	GitHub:bdashrad
+1-978-846-1533	LinkedIn

I lead teams that focus on maximizing the time engineers and stakeholders spend on solving business problems, and minimize the time they need to spend on technical details around implementation and operation of these solutions. I accomplish this through careful balance: when to accrue technical debt and when to pay it back, when to build or buy, when to question and to commit, and when to put business needs over technical correctness. Through a nuanced and thoughtful approach, I empower teams to focus on core problem-solving, enhancing overall productivity and achieving organizational goals.

Summary

13+ years combined experience in leadership and platform engineering; built engineering teams from scratch up to 6 engineers
Maintained 99.98% service uptime by implementing best practices, observability tooling, and CI/CD improvements
Able to quickly switch from a high-level business-centric view of a situation to a deep technical dive as required
Open-source maintainer and contributor

Skills

Management: Agile methodologies (Scrum, Kanban), Interviewing and hiring individual contributors and peer managers, Coaching and Mentoring Engineers, Building cross-team relationships with both technical teams, like developers, and business stakeholders, such as marketing, revenue cycle management, and partner relations.
Tools: Kubernetes, Terraform, GCP, AWS, Helm, Docker, Ansible, Git, NGINX, Apache HTTP, MySQL, PostgreSQL, MongoDB, Linux, BigQuery, dbt, Argo Workflows, Skyvia, Fivetran, Vault, Github Actions, Travis-CI, Elasticsearch, Logstash, Datadog, Cloudflare, Jira, Doppler, Vault, Github
Languages: Bash, Python, Ruby, Go, YAML, HCL, regex, node.js, JavaScript, Markdown
Other: HIPAA, SOC2, PCI-DSS, Jira, CI/CD, Capacity Planning, System Design

Experience

Eleanor Health – Remote

Head of Platform and Data Engineering; 8/2023 — Present

Built the data engineering organization from scratch, through hiring and internal development of existing engineers
Managed members of four engineering teams across multiple disciplines: Software Development, Platform Engineering, and Data Engineering. Provided mentorship and coaching while balancing business needs and priorities
Worked closely with business stakeholders, such as the Revenue Cycle, Clinical Outreach, and Partner Relations teams, to identify key projects that have the largest business impact
Acting Product Manager for Data Engineering and Platform Engineering, balancing feature work with tech debt
Reduced data processing times for tasks from 24-72 hours to approximately 15 minutes by improving pipelines, allowing for improved outcomes by shortening the time between events and member outreach
Reduced manual task ingestion work by 98% through automation

Head of Infrastructure; 4/2021 — 8/2023

Hired and built the Infrastructure Engineering team from scratch
Co-authored, designed, and developed our engineering hiring guidelines and process, helping to provide a consistent candidate experience and shorten our hiring cycle, while maintaining quality
Created career ladders and competency matrices to align salary and performance expectations, and provide clear growth paths for team members
Created onboarding and 30/60/90 day goals for engineers, creating a clearly defined path through onboarding to improve the new hire experience. If a new user has a bad time, it’s a bug.
Achieved a near-zero rate of bad production deploymentss (99.98% app uptime) via improved observability. This was accomplished through strategic mentorship of an engineer, guiding them to spearhead this project, which led to both increased service reliability and professional growth in key areas, such as cross-team collaboration and communication
Authored and implemented a production-readiness process, providing clear guidelines for engineers to launch stable, observable, and maintainable services.
Ensured products and systems met HIPAA compliance by leading architecture and technical discussions and implementations. Helped achieve SOC2 Type 1 compliance by partnering with the CISO and Head of Product Operations to complete the work across various teams, delegating work as required to complete our audit.
Built highly-available (over 99.99%) infrastructure to host patient management and member apps. Defined SLOs (Service Level Objectives) and SLI (Service Level Indicators) to ensure we meet our SLA (Service Level Agreements)
Designed systems to meet disaster recovery goals and compliance (HIPAA, SOC2) requirements.
Decreased page load times by over 50% by identifying areas for improvement and facilitating their implementation

Sun Life (Maxwell Health) – Remote

Associate Director, Senior Infrastructure Engineer; 1/2019 — 4/2021

Rebuilt Infrastructure Engineering team from near-zero to 5 engineers
Advised engineering teams on platform and architectural decisions
Provided technical and professional leadership and mentorship to the Infrastructure Engineering team
Wrote production readiness guidelines, establishing a base set of recommendations for engineers when creating products, and advised development teams on best practices for monitoring, operating, and maintaining stable, reliable services
Maintain a standardized deployment pipeline built with Github Actions, Docker, Helm, and multiple Kubernetes environments
Redesigned MongoDB deployment to allow for the automation of most of the manual work required to replace lost nodes or perform maintenance, reducing risk and greatly decreasing recovery time

Maxwell Health – Hybrid, Boston, MA

Infrastructure Engineer; 8/2016 — 1/2019

Led project to manage all AWS resources as Infrastructure as Code, using Terraform, reducing differences between multiple environments, which ultimately cut down development time and reduced deployment risk.
Helped maintain compliance (HIPAA, NYDFS), improved security posture, and decreased effort and time required for disaster recovery with auditable infrastructure changes, RBAC (Role Based Access Control) managed through Terraform.
Designed event message store backed by Elasticsearch, enabling efficient and reliable communication between dozens of microservices, reducing technical debt while providing durability and the ability to replay messages as needed
Reduced deployment time ~90% by migrating microservices from Elastic Beanstalk to Kubernetes
Enhanced developer productivity by building templated Helm charts for Kubernetes
Improved engineer efficiency, cross-team collaboration, and software security by standardizing our CI/CD build pipeline

Curriculum Associates – Boston, MA

Senior Operations Engineer; 5/2016 — 8/2016

Saved ~$10k/month by designing a centralized log management system backed by Logstash and Elasticsearch

Acquia – Hybrid, Boston, MA

Site Reliability Engineer; 8/2014 — 5/2016

Ensured site availability during some of the world’s largest events: The Olympics, The Super Bowl, and more.
Dedicated operations resource for large enterprise customer
Worked in a high-touch relationship with the customer, including regular calls and on-site visits
Reduced downtime to near-zero from multiple customers, by developing tools for auditing and automation to allow for capacity planning as needs changed
Maintained high availability infrastructure for customer sites and ensured uptime during high traffic events
Improved platform monitoring and create auditing tools to improve stability
Built tooling to use metrics to provide proactive notifications of pending issues before they occur

Senior Cloud Systems Engineer; 7/2013 — 8/2014

Managed and maintained 13,000+ Linux Systems on Amazon EC2.
Redesigned the monitoring system to accommodate 10,000+ systems and 300,000+ services
Lowered MTTR (mean time to resolve) by developing command-line tooling for automating ticket creation
Improved engineer productivity by creating a standardized development environment using Vagrant

PlumChoice - Boston, MA

Linux Systems Administrator; 10/2011 — 7/2013

Centralized management configuration and security updates for Red Hat, CentOS, and Ubuntu Servers
Architected system and network monitoring across multiple sites worldwide, using Nagios, Icinga, and Splunk
Acted as company Security Administrator for PCI-DSS certification. Assisted in writing policies, procedures, and reviewing all security incidents

IT Infrastructure Consultant; 2/2011 — 10/2011

Created PC cloning and imaging solution using Clonezilla and DRBL
Designed and implemented escalation procedures for Operations to Development

Education

2001-2002: Computer Science; Worcester Polytechnic Institute (Worcester, MA)
2002-2003: Management Information Systems; UMass Lowell (Lowell, MA)
2003-2006: Management Information Systems; Middlesex Community College (Lowell, MA)

[email protected] • +1-978-846-1533