First Touch Penalty On AWS Cloud

(first time published in March 2018 some information may not be correct anymore)
A couple of weeks ago I had a discussion about AWS RDS with one of my colleagues and he mentioned some unexpected IO problem during migration. It was during production cutover when they switched from the old environment on-prem to the freshly restored database on RDS. The migration itself is out of scope for the today’s topic. We are going to point our attention to the unexpected IO problem. They should have plenty of IO bandwidth and everything was totally fine when they tested it before, but somehow many of the queries to the database were performing extremely slow for around 30 or 40 minutes and even after that they observed sporadic spikes in the number of sessions waiting for IO. After a couple of additional questions, it was more or less clear that they most likely hit a known problem described in AWS documentation. I am talking about “First touch penalty” on AWS. For this topic, I will use an Oracle RDS database to demonstrate the issue and how you can prepare for it. Continue reading “First Touch Penalty On AWS Cloud”