Recently, a few engineers asked me whether we really need back-of-the-envelope estimation in a system design interview. I think it would be helpful to clarify.1.
Estimations are important because we need them to understand the scale of the system and justify the design. It helps answer questions like: 2.
- Do we really need a distributed solution?
- Is a cache layer necessary?
- Shall we choose data replication or sharding?
Here is an example of how the estimations shape the design decision. 3.
- Is a cache layer necessary?
- Shall we choose data replication or sharding?
Here is an example of how the estimations shape the design decision. 3.
One interview question is to design proximity service and how to scale geospatial index is a key part of it. Here are a few paragraphs we wrote to show why jumping to a sharding design without estimations is a bad idea: 4.
"One common mistake about scaling the geospatial index is to quickly jump to a sharding scheme without considering the actual data size of the table. In our case, the full dataset for the geospatial index table is not large (quadtree index only takes 1.71G memory)." 5.
"The whole geospatial index can easily fit in the working set of a modern database server. However, depending on the read volume, a single database server might not have enough CPU or network bandwidth to service all read requests." 6.
"If that is the case, it will be necessary to spread the read load among multiple database servers." 7.
Loading suggestions...