Managed automation, monitoring, and emergency response
for major parts of the production infrastructure: distributed
storage, job scheduling, distributed
locking service, automated machine management
Troubleshooted system-level issues across 300K+ servers
Assisted in testing, qualification, and rollout automation of new
Linux kernels across Google's server fleet
Produced training material and exercises and provided mentoring for
new team members
Worked as part of a small worldwide team responsible for
some of the most critical servers in the Fixed Income,
Currency, and Commodities (FICC) division
Provided long-term system engineering and 24/7 operations
support for Solaris, Linux, and NetApp servers
Maintained the most widely deployed Linux distribution in
the firm