Replication in OpenSearch
Objectives
By the end of this lab, students should be able to:
-
Set up a 2-node cluster using OpenSearch without root access
-
Understand how distributed systems form clusters
-
Observe and reason about:
-
Primary shards
-
Replica shards
-
Data distribution and fault tolerance
-
Prerequisites
-
Two Linux environments (can be:
-
Two separate machines, or
-
One machine with two separate processes simulating nodes)
-
Access only to home directory (no sudo/root)
-
Java runtime available (or ability to use a bundled runtime)
-
Basic familiarity with:
-
Running background processes
-
Editing configuration files
-
Sending HTTP requests (curl or equivalent)
Part 1.1: Installing OpenSearch Locally
Run OpenSearch without system-wide installation.
Steps:
-
Download the OpenSearch distribution archive (tarball, not package manager) from here.
-
Extract it into your home directory.
-
Inspect the directory structure:
-
config/ -
data/ -
logs/ -
bin/
-
-
Identify:
-
The main configuration file
-
The startup script
-
Part 1.2: Installing OpenSearch Dashboard Locally
- Download the OpenSearch distribution archive (tarball, not package manager) from here.
- Extract it into your home directory.
Part 2: Configuring Node 1
Run the first node of your cluster.
Steps:
- Duplicate the extracted folder into a separate directory for Node 1.
- Follow installation guide
-
Edit configuration:
- Assign a cluster name
- Assign a node name
cluster.name: ds614-lab node.name: node-1 node.roles: [cluster_manager, data] network.host: [IP_ADDRESS] # "localhost" if both nodes on same machine, private ip if on different machines http.port: 9200 transport.port: 9300- Configure:
- Network host (local machine)
- Port (default or custom)
- Configure discovery settings:
- Define this node as part of a cluster (even if single node for now)
- Set a data directory inside your home folder
- Start the node
- Verify:
- Node is running at this url:
- It responds to HTTP requests
- Cluster health is available
- Check nodes url:
Part 2.1: Configuring OpenSearch Dashboard
-
Edit the configuration file
config/opensearch_dashboards.yml:- Set
opensearch.hoststo point to your OpenSearch node(s) - Set
server.portto a unique port (different from OpenSearch nodes) - Disable security plugin
opensearch_security.enabled: false - Set
-
Start OpenSearch Dashboard
- Verify:
- Dashboard is running
- It responds to HTTP requests
- Cluster health is available
Part 3: Configuring Node 2
Join a second node to form a cluster.
Steps:
-
Create a second copy of OpenSearch directory for Node 2
-
Modify configuration:
- Different node name
- Different port
- Same cluster name as Node 1
-
Configure discovery/seed hosts:
- Node 2 should know about Node 1
- Set in each node config file(opensearch.yml)
discovery.seed_hosts: ["localhost:9300"] # other nodes transport port- Set in each node config file(opensearch.yml)
cluster.initial_cluster_manager_nodes: ["node-1", "node-2"] # on both nodes -
Ensure:
- Unique data directory
- Clear data directories
- Restart both nodes
- Verify:
- Both nodes appear in the same cluster
- Cluster size = 2 nodes
Checkpoint 1 (Conceptual)
Answer:
- What information allows nodes to discover each other?
- Why must node names and ports differ?
- What happens if both nodes use the same data directory?
Part 4: Creating an Index with Controlled Sharding
Understand how data is partitioned.
Steps
-
Create an index with:
- A fixed number of primary shards (e.g., 2 or more)
- 0 replicas
-
Insert sample documents (at least 20–50 records)
- Query:
- Index statistics
- Shard allocation
Observation Tasks
- How many shards are created?
- Where are they located?
- Does each node store data?
Part 5: Adding Replication
Goal
Understand redundancy and fault tolerance.
Steps
-
Update the index:
- Increase number of replica shards
-
Observe:
- Redistribution of shards across nodes
-
Check:
- Cluster health status
Observation Tasks
-
What changes after adding replicas?
-
Are replicas placed on the same node as primaries?
-
Why or why not?
Checkpoint 2 (Conceptual)
Answer:
-
Difference between primary shard and replica shard
-
Why replication improves availability but increases storage cost
-
What happens if replication factor > number of nodes?
Part 6: Failure Simulation
Goal
Observe fault tolerance behavior.
Steps
-
Stop one node manually
-
Observe:
-
Cluster health
-
Shard status
-
-
Query data:
- Check if reads still work
-
Restart the stopped node
-
Observe:
-
Rebalancing
-
Recovery process
-
Observation Tasks
-
What happens to primary shards when a node goes down?
-
How does OpenSearch maintain availability?
-
What is the cluster health during failure?
Checkpoint 3 (Deep Thinking)
-
Can a cluster function with:
-
1 node + replicas configured?
-
Why does OpenSearch avoid placing replicas on the same node?
-
What is the minimum number of nodes needed for true fault tolerance?
Part 7: Optional Exploration (Advanced)
-
Change number of shards and observe redistribution
-
Explore:
-
Shard rebalancing when nodes join/leave
-
Cluster health states: green, yellow, red
-
Try uneven shard configurations and analyze imbalance
Submission Requirements
Students must submit:
-
Screenshots / logs showing:
-
2-node cluster running
-
Index creation
-
Shard distribution
-
-
Short answers to all checkpoints
-
One-page reflection:
-
What surprised you?
-
What design trade-offs did you observe?
-