Understanding MongoDB Indexes

Onyancha Brian Henry

Among the tasks involved in database management is improving performance by employing different strategies. Indexing is one of the tips that improve throughput operations by facilitating data access to query requests. It does so by minimizing the number of disk access required when a query is processed. Failure to use indexes in MongoDB will force the database to perform a full collection scan, that is, scan through all the documents in the collection in order to select documents that match an issued query statement. Obviously, this will take a lot of time especially if there are so many documents involved. In a nutshell, indexes support efficient execution of queries.

MongoDB Indexes

Since we expect to store many documents in a MongoDB collection, we need to find a way to store a small portion of data for each document in a different partition for easy traversing by use of indexes. An index will store a specific field value or fields and then sort this data in order of the value of that field. With this ordering, efficient query matching and range-based query operations are supported. Indexes are defined at the collection level and they are supported by any field or embedded field of the documents in the collection.

When you create a document, MongoDB by default assigns an _id field if not specified and makes this a unique index for that document. Basically, this is to prevent inserting of the same document more than ones in that collection. In addition, for a sharded cluster, it is advisable to use this _id field as part of the shard keys selection, otherwise there must be some uniqueness of data in the _id field in order to avoid errors.

Creating an Index for a Collection

Assuming you have inserted some data in your collection and you want to assign a field to be an index, you can use the createIndex method to achieve this, i.e.

Let’s say you have this json data:

{
    _id:1,
    Name: “Sepp Maier”, 
    Country: “Germany”
}

We can make the Name field a descending index by:

db.collection.createIndex({Name: -1})

This method creates an index with the same specification if only not in existence already.

Types of Indexes in MongoDB

MongoDB involves different types of data hence different types of indexes are derived to support these data types and queries.

  1. Single Field

    Using a single field of a document one can make the field an index in an ascending or descending manner just like the example above. Besides, you can create an index on an embedded document as a whole, for example:

    { 
        _id: “xyz”,
        Contact:{
            email: “[email protected]”, 
            phone:”+420 78342823” },
        Name: “Sergio”
    }

    Contact field is an embedded document hence we can make it an ascending index with the command:

    db.collection.createIndex({ Contact: 1})

    In a query we can fetch the document like:

    db.collection.find({ 
        Contact: {email: “[email protected]”,
        phone:”+420 78342823”} 
    })

    A best practice is creating the index in the background especially when a large amount of data is involved since the application needs to access the data while building the index.

  2. Compound Index

    Compound indexes are often used to facilitate the sort operation within a query and support queries that match on multiple fields. The syntax for creating a compound index is:

    db.collection.createIndex( { <field0>: <type>, <field1>: <type1>, ... } )

    Creating a compound index for the sample data below

    { 
        _id: “1”,
        Name: “Tom”,
        Age: 24,
        Score:”80”
    }
    db.collection.createIndex({ Age: 1, Score:-1})

    Considerations:

    • A limit of only 32 fields can be supported.
    • Value of the field will define the type of index i.e. 1 is ascending and -1 is descending.
    • Don’t create compound indexes that have hashed index type.
    • The order of fields listed in a compound index is important. The sorting will be done in accordance with the order of the fields.
  3. Multikey Index

    At some point, you may have fields with stored array content. When these fields are indexed, separate index entries for every element are created. It therefore helps a query to select documents that consist arrays by matching on element or elements of the arrays. This is done automatically by MongoDB hence no need for one to explicitly specify the multikey type. From version 3.4, MongoDB tracks which indexed fields cause an index to be a multikey index. With this tracking, the database query engine is allowed to use tighter index bounds.

    Limitations of Multikey Index

    • Only one array field can be used in the multikey indexing for a document in the collection. I.e. You cannot create a multikey index for the command and data below
      { _id: 1, nums: [ 1, 2 ], scores: [ 30, 60 ]}
      You cannot create a multikey index
      { nums: 1, scores: 1 } 
    • If the multikey index already exists, you cannot insert a document that violates this restriction. This is to say if we have
      { _id: 1, nums:  1, scores: [ 30, 60 ]}
      { _id: 1, nums: [ 1, 2 ], scores:  30}
      After creating a compound multikey index, an attempt to insert a document where both nums and scores fields are arrays, the database will fail the insert.
  4. Text Indexes

    Text indexes are often used to improve on search queries for a string in a collection. They do not store language-specific stop words (i.e “the”, ”a”, “or”). A collection can have at most one text index. To create a text index:

    db.collection.createIndex({Name:”text”})

    You can also index multiple fields i.e.

    db.collection.createIndex({
        Name:”text”,
        place:”text”
    })

    A compound index can include a text index key in combination with the ascending/descending index key but:

    • All text index keys must be adjacently in the index specification document when creating a compound text index.
    • No other special index types such as multikey index fields should be involved in the compound text index.
    • To perform a $text search, the query predicate must include equality match conditions on the preceding keys.
  5. Hashed Indexes

    Sharding is one of the techniques used in MongoDB to improve on horizontal scaling. Sharding often involves hash based concept by use of hashed indexes. The more random distribution of values along their range is portrayed by these indexes, but only support equality matches and cannot support range-based queries.

Overall Operational Considerations for Indexes

  • Each index requires at least 8kB of data space.
  • When active, each index will consume some disk space and memory. This is significant when tracked in capacity planning.
  • For a high read-to-write ratio collection, additional indexes improve performance and do not affect un-indexed read operations.

Limitations of Using Indexes

  • Adding an index has some negative performance impact for write operations especially for collections with the high write-to-read ratio. Indexes will be expensive in that each insert must also update any index.
  • MongoDB will not create, update an index or insert into an indexed collection if the index entry for an existing document exceeds the index key limit.
  • For existing sharded collections, chunk migration will fail if the chunk has a document that contains an indexed field that has an index entry that exceeds the index key limit.

Conclusion

There are so many ways of improving MongoDB performance, indexing being one of them. Indexing facilitates query operations by reducing latency over which data is retrieved by somehow minimizing the number of documents that need to be scanned. However, there are some considerations one needs to undertake before deciding to use a specific type of index. Collections with high read-to-write ratio tend to utilize indexes better than collections with high write-to-read operations.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.