I had an interesting conversation with my development team and my DBA team specifically regarding how to identify when to shard. Currently we are using Zabbix to monitor multiple mongodb servers using the mikoomi Zabbix template and php script. If you aren’t leveraging Zabbix for monitoring your infrastructure then you should check it out, its got a lot of great features. Anyway, the conversation started with someone suggesting that the memory foot print might be a very important statistic leverage to determine if its time to shard. While I don’t disagree that monitoring memory can be a key indicator I certainly think there are better.
Using mikoomi and Zabbix you get a graph which shows how your data is flushing to disk four important metrics are displayed:
- Background Flush Average Time
- Total Background Flush Time In Last Minute
- Number Of Background Flushes In The Last Minute
- Last Background Flush Time
The reason background flush information is so important is that it allows you to determine if the amount of data being updated or added is overwhelming the disk subsystem. As you know by default mongodb flushes data to disk every minute so we need to make sure that the amount of data being flushed can flush in 60 seconds. I’d say there should be concern when background flush average time has reached 30 seconds, either fast disks need to be added to the mongodb cluster or a new shard should be introduced.
- Current Reader Queue Length
- Total Lock Time (microseconds) Last Minute
- Current Writer Queue Length
- Current Total Queue Length
While this certainly isn’t an exhaustive list of things which should be monitored for sharding they are defiantly some of the ones in my experience show that a new shard should be added, in some cases its simply sufficient to add memory or ssd drives to your solution to keep free of having to implement sharding.