Replication in OpenSearch

Objectives

By the end of this lab, students should be able to:

Set up a 2-node cluster using OpenSearch without root access
Understand how distributed systems form clusters
Observe and reason about:
- Primary shards
- Replica shards
- Data distribution and fault tolerance

Prerequisites

Two Linux environments (can be:
Two separate machines, or
One machine with two separate processes simulating nodes)
Access only to home directory (no sudo/root)
Java runtime available (or ability to use a bundled runtime)
Basic familiarity with:
Running background processes
Editing configuration files
Sending HTTP requests (curl or equivalent)

Part 1.1: Installing OpenSearch Locally

Run OpenSearch without system-wide installation.

Steps:

Download the OpenSearch distribution archive (tarball, not package manager) from here.
Extract it into your home directory.
Inspect the directory structure:
- config/
- data/
- logs/
- bin/
Identify:
- The main configuration file
- The startup script

Part 1.2: Installing OpenSearch Dashboard Locally

Download the OpenSearch distribution archive (tarball, not package manager) from here.
Extract it into your home directory.

Part 2: Configuring Node 1

Run the first node of your cluster.

Steps:

Duplicate the extracted folder into a separate directory for Node 1.
Follow installation guide
Edit configuration:
- Assign a cluster name
- Assign a node name
```
cluster.name: ds614-lab
node.name: node-1
node.roles: [cluster_manager, data]
network.host: [IP_ADDRESS] # "localhost" if both nodes on same machine, private ip if on different machines
http.port: 9200
transport.port: 9300
```
- Configure:
  - Network host (local machine)
  - Port (default or custom)
- Configure discovery settings:
- Define this node as part of a cluster (even if single node for now)
- Set a data directory inside your home folder
- Start the node
- Verify:
- Node is running at this url:
- It responds to HTTP requests
- Cluster health is available
- Check nodes url:

Part 2.1: Configuring OpenSearch Dashboard

Edit the configuration file config/opensearch_dashboards.yml:
- Set opensearch.hosts to point to your OpenSearch node(s)
- Set server.port to a unique port (different from OpenSearch nodes)
- Disable security plugin
```
opensearch_security.enabled: false
```
Start OpenSearch Dashboard
Verify:
- Dashboard is running
- It responds to HTTP requests
- Cluster health is available

Part 3: Configuring Node 2

Join a second node to form a cluster.

Steps:

Create a second copy of OpenSearch directory for Node 2
Modify configuration:
- Different node name
- Different port
- Same cluster name as Node 1

Configure discovery/seed hosts:

Node 2 should know about Node 1
Set in each node config file(opensearch.yml)

discovery.seed_hosts: ["localhost:9300"] # other nodes transport port

Set in each node config file(opensearch.yml)

cluster.initial_cluster_manager_nodes: ["node-1", "node-2"] # on both nodes

Ensure:
- Unique data directory
- Clear data directories
Restart both nodes
Verify:
- Both nodes appear in the same cluster
- Cluster size = 2 nodes

Checkpoint 1 (Conceptual)

Answer:

What information allows nodes to discover each other?
Why must node names and ports differ?
What happens if both nodes use the same data directory?

Part 4: Creating an Index with Controlled Sharding

Understand how data is partitioned.

Steps

Create an index with:
- A fixed number of primary shards (e.g., 2 or more)
- 0 replicas
Insert sample documents (at least 20–50 records)
Query:
- Index statistics
- Shard allocation

Observation Tasks

How many shards are created?
Where are they located?
Does each node store data?

Part 5: Adding Replication

Goal

Understand redundancy and fault tolerance.

Steps

Update the index:
- Increase number of replica shards
Observe:
- Redistribution of shards across nodes
Check:
- Cluster health status

Observation Tasks

What changes after adding replicas?
Are replicas placed on the same node as primaries?
Why or why not?

Checkpoint 2 (Conceptual)

Answer:

Difference between primary shard and replica shard
Why replication improves availability but increases storage cost
What happens if replication factor > number of nodes?

Part 6: Failure Simulation

Goal

Observe fault tolerance behavior.

Steps

Stop one node manually
Observe:
- Cluster health
- Shard status
Query data:
- Check if reads still work
Restart the stopped node
Observe:
- Rebalancing
- Recovery process

Observation Tasks

What happens to primary shards when a node goes down?
How does OpenSearch maintain availability?
What is the cluster health during failure?

Checkpoint 3 (Deep Thinking)

Can a cluster function with:
1 node + replicas configured?
Why does OpenSearch avoid placing replicas on the same node?
What is the minimum number of nodes needed for true fault tolerance?

Part 7: Optional Exploration (Advanced)

Change number of shards and observe redistribution
Explore:
Shard rebalancing when nodes join/leave
Cluster health states: green, yellow, red
Try uneven shard configurations and analyze imbalance

Submission Requirements

Students must submit:

Screenshots / logs showing:
- 2-node cluster running
- Index creation
- Shard distribution
Short answers to all checkpoints
One-page reflection:
- What surprised you?
- What design trade-offs did you observe?