Brad Clark
I lead teams that focus on maximizing the time engineers and
stakeholders spend on solving business problems, and minimize the time
they need to spend on technical details around implementation and
operation of these solutions. I accomplish this through careful balance:
when to accrue technical debt and when to pay it back, when to build or
buy, when to question and to commit, and when to put business needs over
technical correctness. Through a nuanced and thoughtful approach, I
empower teams to focus on core problem-solving, enhancing overall
productivity and achieving organizational goals.
Summary
- 13+ years combined experience in leadership and platform
engineering; built engineering teams from scratch up to 6 engineers
- Maintained 99.98% service uptime by implementing best practices,
observability tooling, and CI/CD improvements
- Able to quickly switch from a high-level business-centric view of a
situation to a deep technical dive as required
- Open-source maintainer and contributor
Skills
- Management
-
Agile methodologies (Scrum, Kanban), Interviewing and hiring individual
contributors and peer managers, Coaching and Mentoring Engineers,
Building cross-team relationships with both technical teams, like
developers, and business stakeholders, such as marketing, revenue cycle
management, and partner relations.
- Tools
-
Kubernetes, Terraform, GCP, AWS, Helm, Docker, Ansible, Git, NGINX,
Apache HTTP, MySQL, PostgreSQL, MongoDB, Linux, BigQuery, dbt, Argo
Workflows, Skyvia, Fivetran, Vault, Github Actions, Travis-CI,
Elasticsearch, Logstash, Datadog, Cloudflare, Jira, Doppler, Vault,
Github
- Languages
-
Bash, Python, Ruby, Go, YAML, HCL, regex, node.js, JavaScript, Markdown
- Other
-
HIPAA, SOC2, PCI-DSS, Jira, CI/CD, Capacity Planning, System Design
Experience
Eleanor Health – Remote
Head of Platform and Data Engineering; 8/2023 —
Present
- Built the data engineering organization from scratch, through hiring
and internal development of existing engineers
- Managed members of four engineering teams across multiple
disciplines: Software Development, Platform Engineering, and Data
Engineering. Provided mentorship and coaching while balancing business
needs and priorities
- Worked closely with business stakeholders, such as the Revenue
Cycle, Clinical Outreach, and Partner Relations teams, to identify key
projects that have the largest business impact
- Acting Product Manager for Data Engineering and Platform
Engineering, balancing feature work with tech debt
- Reduced data processing times for tasks from 24-72 hours to
approximately 15 minutes by improving pipelines, allowing for improved
outcomes by shortening the time between events and member outreach
- Reduced manual task ingestion work by 98% through automation
Head of Infrastructure; 4/2021 — 8/2023
- Hired and built the Infrastructure Engineering team from
scratch
- Co-authored, designed, and developed our engineering hiring
guidelines and process, helping to provide a consistent candidate
experience and shorten our hiring cycle, while maintaining quality
- Created career ladders and competency matrices to align salary and
performance expectations, and provide clear growth paths for team
members
- Created onboarding and 30/60/90 day goals for engineers, creating a
clearly defined path through onboarding to improve the new hire
experience. If a new user has a bad time, it’s a bug.
- Achieved a near-zero rate of bad production deploymentss (99.98% app
uptime) via improved observability. This was accomplished through
strategic mentorship of an engineer, guiding them to spearhead this
project, which led to both increased service reliability and
professional growth in key areas, such as cross-team collaboration and
communication
- Authored and implemented a production-readiness process, providing
clear guidelines for engineers to launch stable, observable, and
maintainable services.
- Ensured products and systems met HIPAA compliance by leading
architecture and technical discussions and implementations. Helped
achieve SOC2 Type 1 compliance by partnering with the CISO and Head of
Product Operations to complete the work across various teams, delegating
work as required to complete our audit.
- Built highly-available (over 99.99%) infrastructure to host patient
management and member apps. Defined SLOs (Service Level Objectives) and
SLI (Service Level Indicators) to ensure we meet our SLA (Service Level
Agreements)
- Designed systems to meet disaster recovery goals and compliance
(HIPAA, SOC2) requirements.
- Decreased page load times by over 50% by identifying areas for
improvement and facilitating their implementation
Sun Life (Maxwell Health) –
Remote
Associate Director, Senior Infrastructure Engineer;
1/2019 — 4/2021
- Rebuilt Infrastructure Engineering team from near-zero to 5
engineers
- Advised engineering teams on platform and architectural
decisions
- Provided technical and professional leadership and mentorship to the
Infrastructure Engineering team
- Wrote production readiness guidelines, establishing a base set of
recommendations for engineers when creating products, and advised
development teams on best practices for monitoring, operating, and
maintaining stable, reliable services
- Maintain a standardized deployment pipeline built with Github
Actions, Docker, Helm, and multiple Kubernetes environments
- Redesigned MongoDB deployment to allow for the automation of most of
the manual work required to replace lost nodes or perform maintenance,
reducing risk and greatly decreasing recovery time
Maxwell Health – Hybrid,
Boston, MA
Infrastructure Engineer; 8/2016 — 1/2019
- Led project to manage all AWS resources as Infrastructure as Code,
using Terraform, reducing differences between multiple environments,
which ultimately cut down development time and reduced deployment
risk.
- Helped maintain compliance (HIPAA, NYDFS), improved security
posture, and decreased effort and time required for disaster recovery
with auditable infrastructure changes, RBAC (Role Based Access Control)
managed through Terraform.
- Designed event message store backed by Elasticsearch, enabling
efficient and reliable communication between dozens of microservices,
reducing technical debt while providing durability and the ability to
replay messages as needed
- Reduced deployment time ~90% by migrating microservices from Elastic
Beanstalk to Kubernetes
- Enhanced developer productivity by building templated Helm charts
for Kubernetes
- Improved engineer efficiency, cross-team collaboration, and software
security by standardizing our CI/CD build pipeline
Curriculum Associates – Boston,
MA
Senior Operations Engineer; 5/2016 — 8/2016
- Saved ~$10k/month by designing a centralized log management system
backed by Logstash and Elasticsearch
Acquia – Hybrid, Boston, MA
Site Reliability Engineer; 8/2014 — 5/2016
- Ensured site availability during some of the world’s largest events:
The Olympics, The Super Bowl, and more.
- Dedicated operations resource for large enterprise customer
- Worked in a high-touch relationship with the customer, including
regular calls and on-site visits
- Reduced downtime to near-zero from multiple customers, by developing
tools for auditing and automation to allow for capacity planning as
needs changed
- Maintained high availability infrastructure for customer sites and
ensured uptime during high traffic events
- Improved platform monitoring and create auditing tools to improve
stability
- Built tooling to use metrics to provide proactive notifications of
pending issues before they occur
Senior Cloud Systems Engineer; 7/2013 — 8/2014
- Managed and maintained 13,000+ Linux Systems on Amazon EC2.
- Redesigned the monitoring system to accommodate 10,000+ systems and
300,000+ services
- Lowered MTTR (mean time to resolve) by developing command-line
tooling for automating ticket creation
- Improved engineer productivity by creating a standardized
development environment using Vagrant
PlumChoice - Boston, MA
Linux Systems Administrator; 10/2011 — 7/2013
- Centralized management configuration and security updates for Red
Hat, CentOS, and Ubuntu Servers
- Architected system and network monitoring across multiple sites
worldwide, using Nagios, Icinga, and Splunk
- Acted as company Security Administrator for PCI-DSS certification.
Assisted in writing policies, procedures, and reviewing all security
incidents
IT Infrastructure Consultant; 2/2011 — 10/2011
- Created PC cloning and imaging solution using Clonezilla and
DRBL
- Designed and implemented escalation procedures for Operations to
Development
Education
- 2001-2002
-
Computer Science; Worcester Polytechnic Institute
(Worcester, MA)
- 2002-2003
-
Management Information Systems; UMass Lowell (Lowell,
MA)
- 2003-2006
-
Management Information Systems; Middlesex Community
College (Lowell, MA)
[email protected] • +1-978-846-1533