As I continue to watch the community and observe which questions which are asked over and over again I feel inclined to blog about common misconceptions or misunderstandings for new mongodb users. Lately I see a trend of questions regarding how mongodb utilizes memory and memory mapped files.
What Is A Memory Mapped File
Memory mapped files are not unique to mongodb they are leveraged by many modern operating systems and run-time environments. The function behind memory mapped files is mmap(), which creates a mapping of a file given a file descriptor, starting location, and a length, and is part of the POSIX specification.
In layman’s terms a memory-mapped file is a segment of virtual memory which has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource. The important part of this definition is the “some portion” of a file. As you know from reading my blog and from reading the mongodb documentation mongodb creates data files starting at 64mb and as large as 2048mb in size for storing database data, so its important to remember that mmap is capable of only mapping a portion of these files as needed for the “hot” data set.
How MongoDB Uses Memory Mapped Files
First and foremost (if it was clear in the previous paragraph) mongodb leaves memory management up to the operating system. This means the operating system’s virtual memory manager is in charge of caching. As the mongodb documentation points out there are several implications of leveraging the operating system’s virtual memory manager:
- There is no redundancy between file system cache and database cache they are one in the same. In Linux as files are accessed they are pulled into the file system cache (also know as the buffer cache) this is the case not only with mongodb but with other applications as well this is especially true of file servers running Linux as a base OS.
- MongoDB can utilize all free memory on the server for cache space automatically without any configuration of cache size or changes to the hugepages setting in sysctl.conf (we all know what a pain that can be).
- Virtual memory size and resident size will appear to be very large for the mongod process. This is a common misunderstanding for many people in the community. Its important to remember that virtual memory space will be just larger than the size of the datafiles open and mapped while resident size will vary depending on the amount of memory not used by other processes on the machine.
Monitoring Memory Used By Mongodb
One of the most effective operating system level commands you can use to check what percent of memory is being used for memory mapped files is the free command:
skot@stump:~$ free -tm
total used free shared buffers cached
Mem: 3962 3602 359 0 411 2652
-/+ buffers/cache: 538 3423
Swap: 1491 52 1439
Total: 5454 3655 1799
As you can see from the example 2652mb of memory is being used to memory map files. If this was a network based file server the cached size could represent recently accessed content being stored on the Linux server; however this is an example from a mongodb server so there is a good chance that nearly 100% of the 2652mb of cached memory is mongodb data.
Another thing to be aware of is that having a low amount of “free” memory doesn’t necessary mean there is a problem, as data is accessed from the mongodb database it will be “cached” into memory as a memory mapped file. Linux wants to leverage as much memory as possible to make sure disk I/O doesn’t slow down access to recently queried data. The important statistic to keep an eye out for is “Memory: Page Faults/Minute” as I point out in my blog “Mongodb When To Shard” the caveat being when the mongodb instance is first started. The reason “Memory: Page Faults/Minute” might be high when a mongodb instance is first booted up (or when a new system is nominated as master) is because there are no files mapped to memory and there is a period of time when mongodb must pull the “hot” data into memory.
By and large virtual memory is and should generally be viewed as a misleading statistic especially when journaling is concerned. Virtual memory includes the memory mapped files and when journaling is enabled the database files will be mapped twice leading to much higher virtual memory allocation. Also be aware that in general your virtual memory size can be significantly larger than the actual memory installed on the mongodb server.
How Much Memory Do I Need?
This is a great question and one that requires much though about the overall implementation of mongodb with your application. The general rule is that all of your indexes should reside in memory + your hot data set. For instance let’s say you run a site like digg.com (I have no idea if digg.com uses mongodb or how they use it). By and large your front page is going to receive probably 70% of your visitors while “new stories” are probably going to receive 25% of your visitors and the other 5% will be allocated to Google searches which turn up content users are interested in. This means that only a small percent of your overall stories are actually being read on a day-to-day basis. This probably means that after indexes you only need to keep less then 5% of your total database content in memory. Which could mean that your active memory footprint is only 48GB while your total data set is in the terabytes of data.
Once I was at a mongodb conference and it was said that the sweet spot for mongodb memory was 96G per mongodb server in a replica set. I am not one to argue with that statement however that has never been my personal experience. In my personal experience the sweet spot is between 24GB and 48GB of memory per mongodb server in a replica set. If more memory is needed to keep the active data set hot then shards should be added for the extra capacity needs. Most of my mongodb experience, however, comes from an application where the hot dataset was nearly 100% of the entire database, the read/write ratio was nearly 50/50, and disk I/O issues (even with SSD) started once the mongodb instance needed to use more then 36GB of RAM. I could see how certain applications especially high read applications would leverage memory more efficiently with 96G especially if the change data was less then 5% of that 96GB.
Hopefully this gives you a good idea about how mongodb utilizes memory as well as what statistics to look for as your mongodb usage begins to increase. Remember don’t get alarmed if your “free” memory is low and your virtual memory is high, primarily you want to be concerned with how often mongodb goes to disk to access frequently requested data.