Elasticsearch Cluster Optimization: Performance Tuning and Best Practices

Elasticsearch is a powerful search and analytics engine, but optimizing it for production requires understanding indexing strategies, query patterns, and cluster configuration. This guide covers essential optimization techniques.

Cluster architecture

Node roles

Configure nodes with specific roles:

# Master node
node.roles: [master]

# Data node
node.roles: [data]

# Ingest node
node.roles: [ingest]

# Coordinating node (default)
node.roles: []  # No specific role

Shard strategy

Primary shards: Set at index creation (cannot be changed)

PUT /my_index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Best practices:

Aim for 10-50GB per shard
Calculate: shards = (total_data_size / 50GB)
Consider future growth (2-3x current size)

Indexing optimization

Bulk indexing

Use bulk API for efficient indexing:

// Node.js example
const { Client } = require('@elastic/elasticsearch');
const client = new Client({ node: 'http://localhost:9200' });

async function bulkIndex(documents) {
  const body = documents.flatMap(doc => [
    { index: { _index: 'my_index' } },
    doc
  ]);
  
  const response = await client.bulk({ body });
  return response;
}

Best practices:

Batch size: 1,000-5,000 documents
Monitor bulk response for errors
Use refresh=false for large imports

Index settings

Optimize index settings for your use case:

PUT /my_index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",  // Reduce refresh frequency
    "index.translog.durability": "async",  // For better write performance
    "index.translog.sync_interval": "5s"
  }
}

Mapping optimization

Define explicit mappings:

PUT /my_index/_mapping
{
  "properties": {
    "title": {
      "type": "text",
      "analyzer": "standard",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    },
    "created_at": {
      "type": "date"
    },
    "status": {
      "type": "keyword"  // Use keyword for exact matches
    }
  }
}

Query optimization

Use filters instead of queries

Filters are cached and faster:

GET /my_index/_search
{
  "query": {
    "bool": {
      "filter": [  // Use filter for exact matches
        { "term": { "status": "active" } },
        { "range": { "created_at": { "gte": "2024-01-01" } } }
      ],
      "must": [  // Use must for relevance scoring
        { "match": { "title": "search term" } }
      ]
    }
  }
}

Avoid expensive queries

Bad: Wildcard queries

{
  "query": {
    "wildcard": {
      "title": "*search*"  // Very slow
    }
  }
}

Good: Use text search

{
  "query": {
    "match": {
      "title": "search"
    }
  }
}

Limit result size

{
  "size": 20,  // Limit results
  "from": 0,
  "query": { ... }
}

Use source filtering

Only return needed fields:

{
  "_source": ["title", "created_at"],  // Only return these fields
  "query": { ... }
}

Cluster configuration

JVM heap size

Set to 50% of available RAM (max 32GB):

# jvm.options
-Xms16g
-Xmx16g

Why 50%?

Remaining memory for OS cache
Lucene uses off-heap memory
Prevents swapping

Thread pools

Monitor and tune thread pools:

GET /_nodes/thread_pool

// Adjust if needed in elasticsearch.yml
thread_pool:
  write:
    size: 4
    queue_size: 200
  search:
    size: 8
    queue_size: 1000

Disk I/O optimization

# Use SSD for data nodes
# Disable swap
bootstrap.memory_lock: true

# Use multiple data paths
path.data: ["/data1", "/data2", "/data3"]

Monitoring and maintenance

Key metrics

Monitor these metrics:

Cluster health: GET /_cluster/health
Node stats: GET /_nodes/stats
Index stats: GET /_stats
Pending tasks: GET /_cluster/pending_tasks

Index lifecycle management

Use ILM for automatic index management:

PUT /_ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB",
            "max_age": "30d"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Force merge

Merge segments for better performance:

POST /my_index/_forcemerge?max_num_segments=1

When to use:

After large data imports
Before archiving
When segments are too many

Performance tuning checklist

Indexing performance

Use bulk API with optimal batch size
Set refresh_interval appropriately
Disable refresh during large imports
Use async translog for better write performance
Monitor indexing rate and adjust

Query performance

Use filters for exact matches
Avoid wildcard queries
Limit result size
Use source filtering
Cache frequently used filters

Cluster configuration

Set JVM heap to 50% of RAM (max 32GB)
Configure node roles appropriately
Set appropriate number of shards
Use SSD for data nodes
Disable swap (bootstrap.memory_lock)

Monitoring

Set up cluster health monitoring
Monitor node stats regularly
Track query performance
Set up alerts for cluster issues
Review slow query logs

Common issues and solutions

Too many shards

Problem: Cluster has thousands of shards

Solution:

Reduce shards per index
Use index templates with ILM
Consolidate small indices

Hot spots

Problem: Some nodes handle more load

Solution:

Rebalance shards: POST /_cluster/reroute
Check shard allocation
Ensure even data distribution

Slow queries

Problem: Queries take too long

Solution:

Review query patterns
Add appropriate indexes
Use filters instead of queries
Limit result size
Profile queries: GET /_search?explain=true

Conclusion

Elasticsearch optimization requires understanding your data patterns, query requirements, and cluster architecture. Start with proper index design, optimize queries, and monitor cluster health regularly. Remember: Measure first, optimize based on data.

Continuous monitoring and adjustment are key to maintaining optimal Elasticsearch performance in production.

Cluster architecture#

Node roles#

Shard strategy#

Indexing optimization#

Bulk indexing#

Index settings#

Mapping optimization#

Query optimization#

Use filters instead of queries#

Avoid expensive queries#

Limit result size#

Use source filtering#

Cluster configuration#

JVM heap size#

Thread pools#

Disk I/O optimization#

Monitoring and maintenance#

Key metrics#

Index lifecycle management#

Force merge#

Performance tuning checklist#

Indexing performance#

Query performance#

Cluster configuration#

Monitoring#

Common issues and solutions#

Too many shards#

Hot spots#

Slow queries#

Conclusion#

Cluster architecture

Node roles

Shard strategy

Indexing optimization

Bulk indexing

Index settings

Mapping optimization

Query optimization

Use filters instead of queries

Avoid expensive queries

Limit result size

Use source filtering

Cluster configuration

JVM heap size

Thread pools

Disk I/O optimization

Monitoring and maintenance

Key metrics

Index lifecycle management

Force merge

Performance tuning checklist

Indexing performance

Query performance

Cluster configuration

Monitoring

Common issues and solutions

Too many shards

Hot spots

Slow queries

Conclusion