Getting Started with Elasticsearch

In this post, I will show you how easy it is to get started with Elasticsearch, where we will create indices, ingest documents, search for data and also visualize our data with Kibana.

We will deploy our Elasticsearch Cluster using Docker for simplicity.

About Elasticsearch

Elasticsearch is a distributed, restful, search and analytics engine based on Apache Lucene and is commonly used for full-text search, log analytics, business analytics and many more.

Anatomy of a Elasticsearch Cluster:

  • A Elasticsearch cluster consists of nodes (master and data nodes)
  • Master nodes handles cluster-specific tasks
  • Data nodes host the actual shards and handles data related operations such as CRUD, search, aggregations
  • Each node contains indexes, where an Index is a collection of documents
  • Documents contain key/value pairs
  • Indexes are split into multiple shards
  • Shards exist of primary and replica Shards
  • A Replica Shard is a copy of a primary shard, which is used for HA/Redundancy
  • Shards are placed on random nodes throughout the cluster
  • A Replica shard of the primary will never be placed on the same node

When we ingest a document into Elasticsearch, we have an index, type and document id, eg:
/index-name/_doc/1

You can almost think of this in a relational database world as a database, table, record.

Pre-requisites

To follow along with this tutorial, you will need curl, docker, and docker-compose.

I will be installing Docker on Ubuntu, if you are using a different operating system, you can have a look at their documentation for instructions.

Update, add the docker repository and install docker:

$ sudo apt update 
$ sudo apt install apt-transport-https ca-certificates curl software-properties-common -y 
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - 
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" 
$ sudo apt update 
$ sudo apt-get install docker-ce -y

Downloading docker-compose from their GitHub repository:

$ curl -L "https://github.com/docker/compose/releases/download/1.23.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

Applying for executable permissions on the binary that we downloaded:

$ chmod +x /usr/local/bin/docker-compose

Elasticsearch Cluster Deployment

As mentioned, we will be deploying our Stack with Kibana and a 3 Node Elasticsearch Cluster on Docker for demonstrative purposes.

Below is the content of the docker-compose.yml file which needs to be saved in the current working directory.

The content of the file is also available from here

version: '2'
services:
  es0:
    container_name: es0
    image: docker.elastic.co/elasticsearch/elasticsearch:6.8.6
    container_name: es0
    environment:
      - ES_SKIP_SET_KERNEL_PARAMETERS=true
      - cluster.name=docker-cluster
      - node.name=es0
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms256m -Xmx256m"
      - "discovery.zen.ping.unicast.hosts=es1,es2"
      - "discovery.zen.minimum_master_nodes=2"
      - "xpack.security.enabled=false"
      - "xpack.watcher.enabled=false"
      - "xpack.monitoring.enabled=false"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    mem_limit: 1g
    volumes:
      - esvol0:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - esnet

  es1:
    container_name: es1
    image: docker.elastic.co/elasticsearch/elasticsearch:6.8.6
    environment:
      - ES_SKIP_SET_KERNEL_PARAMETERS=true
      - cluster.name=docker-cluster
      - node.name=es1
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms256m -Xmx256m"
      - "discovery.zen.ping.unicast.hosts=es0,es2"
      - "discovery.zen.minimum_master_nodes=2"
      - "xpack.security.enabled=false"
      - "xpack.watcher.enabled=false"
      - "xpack.monitoring.enabled=false"
    depends_on:
      - es0
    ulimits:
      memlock:
        soft: -1
        hard: -1
    mem_limit: 1g
    volumes:
      - esvol1:/usr/share/elasticsearch/data
    networks:
      - esnet

  es2:
    container_name: es2
    image: docker.elastic.co/elasticsearch/elasticsearch:6.8.6
    environment:
      - ES_SKIP_SET_KERNEL_PARAMETERS=true
      - cluster.name=docker-cluster
      - node.name=es2
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms256m -Xmx256m"
      - "discovery.zen.ping.unicast.hosts=es0,es1"
      - "discovery.zen.minimum_master_nodes=2"
      - "xpack.security.enabled=false"
      - "xpack.watcher.enabled=false"
      - "xpack.monitoring.enabled=false"
    depends_on:
      - es0
    ulimits:
      memlock:
        soft: -1
        hard: -1
    mem_limit: 1g
    volumes:
      - esvol2:/usr/share/elasticsearch/data
    networks:
      - esnet

  kibana:
    container_name: kibana
    image: docker.elastic.co/kibana/kibana:6.8.6
    ports:
      - 5601:5601
    environment:
      - "ELASTICSEARCH_URL=http://es0:9200"
    ports:
      - 5601:5601
    links:
      - es0
    networks:
      - esnet

networks:
  esnet: {}

volumes:
  esvol0:
    driver: local
  esvol1:
    driver: local
  esvol2:
    driver: local

Deploy the stack with docker-compose:

$ docker-compose --file ./docker-compose.yml up --detach

 Creating es0    ... done 
 Creating es1    ... done
 Creating es2    ... done
 Creating kibana ... done
 Attaching to es0, es2, es1, kibana

Once all the containers have been launched, you can get the state of the containers by running:

$ docker-compose --file ./docker-compose.yml ps
 Name               Command               State                Ports
----------------------------------------------------------------------------------
es0      /usr/local/bin/docker-entr ...   Up      0.0.0.0:9200->9200/tcp, 9300/tcp
es1      /usr/local/bin/docker-entr ...   Up      9200/tcp, 9300/tcp
es2      /usr/local/bin/docker-entr ...   Up      9200/tcp, 9300/tcp
kibana   /usr/local/bin/kibana-docker     Up      0.0.0.0:5601->5601/tcp

From the output above, you can see that we are only exposing the host port 9200 for Elasticsearch and port 5601 for Kibana. The 3 Elasticsearch nodes communicate with each other via internal networking and we only want to access one of the containers to get a response from the cluster.

We can see that all 4 containers are up and it’s time to start with the fun stuff.

Interacting with Elasticsearch

In this section we will interact with the Elasticsearch API by covering the following examples:

  • Cluster Overview
  • View Node Information
  • Ingesting Data into Elasticsearch
  • Searching Data in Elasticsearch
  • Deleting your Index
Cluster Overview

To get the overview of the cluster’s health, we can make a request against the /_cluster/health API:

$ curl http://localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "docker-cluster", 
  "status" : "green", 
  "timed_out" : false, 
  "number_of_nodes" : 3, 
  "number_of_data_nodes" : 3, 
  "active_primary_shards" : 3, 
  "active_shards" : 6, 
  "relocating_shards" : 0, 
  "initializing_shards" : 0, 
  "unassigned_shards" : 0, 
  "delayed_unassigned_shards" : 0, 
  "number_of_pending_tasks" : 0, 
  "number_of_in_flight_fetch" : 0, 
  "task_max_waiting_in_queue_millis" : 0, 
  "active_shards_percent_as_number" : 100.0
}

From the output above, we can see the status indicates “green“, which means that everything works as expected. In Elasticsearch we have green, yellow and red cluster statuses.

Yellow status indicates that one or more replica shards are in an unassigned state and a red status indicates that some or all primary shards are in an unassigned state and this is usually bad as this could lean towards data loss.

Furthermore, we can see the number of nodes, shards, etc. This is usually a good place to start to get an overall view of your cluster’s health.

View Node Information

We can get node information of our cluster by looking at the /_cat/nodes API:

$ curl http://localhost:9200/_cat/nodes?v
ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.160.4           58          97   5    0.24    0.23     0.29 mdi       -      es1
192.168.160.2           45          97   5    0.24    0.23     0.29 mdi       -      es0
192.168.160.5           44          97   5    0.24    0.23     0.29 mdi       *      es2

From the output above, we can see information about our nodes such as JVM Heap, CPU Utilisation, Load Averages, Node Roles, Elected Master, Node Name and Node IP.

In the current setup, our 3 nodes act as master, data and ingest nodes and as you can see node “es2” got elected as master. In a production setup, you want to have dedicated master nodes. You can read up more on different node roles here

Ingest Data into Elasticsearch

Let’s create a traditional hello-world index:

$ curl -XPUT http://localhost:9200/hello-world
{"acknowledged":true,"shards_acknowledged":true,"index":"hello-world"}

By default, when you create an index with no additional options, your index will consist of 5 primary replica shards with a replication factor of 1. To verify, we can look at the /_cat/indices API:

$ curl http://localhost:9200/_cat/indices?v
health status index       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   hello-world Rm_8GlyOQy6XnKdDu75oeA   5   1          0            0      2.2kb          1.1kb

You can see the index has been created and you can also see that we have 5 primary shards and a replication factor of one, so every primary shard has one replica shard. And as mentioned before, they will never reside on the same node.

We can verify that by looking at the /_cat/shards API:

$ curl http://localhost:9200/_cat/shards/hello-world?v
index       shard prirep state   docs store ip            node
hello-world 1     r      STARTED    0  261b 192.168.160.5 es2
hello-world 1     p      STARTED    0  261b 192.168.160.4 es1
hello-world 3     p      STARTED    0  261b 192.168.160.5 es2
hello-world 3     r      STARTED    0  261b 192.168.160.2 es0
hello-world 4     p      STARTED    0  261b 192.168.160.4 es1
hello-world 4     r      STARTED    0  261b 192.168.160.2 es0
hello-world 2     r      STARTED    0  261b 192.168.160.5 es2
hello-world 2     p      STARTED    0  261b 192.168.160.2 es0
hello-world 0     p      STARTED    0  261b 192.168.160.4 es1
hello-world 0     r      STARTED    0  261b 192.168.160.2 es0

We will ingest our first document into our index, let’s take this document as an example:

{
  "name": "ruan",
  "age": 30,
  "country": "south africa",
  "occupation": "systems development",
  "hobbies": ['programming', 'games', 'guitar']
}

When we use a POST request, a document id is automatically generated for us and when we use a PUT request, we need to specify the document id.

Our first document, using a POST request:

$ curl -H 'Content-Type: application/json' \
  -XPOST http://localhost:9200/hello-world/_doc/ -d '
{
  "name": "ruan", 
  "age": 30, 
  "country": "south africa", 
  "occupation": "systems development", 
  "hobbies": ["programming", "games", "guitar"]
}'

Our second document, using a PUT request:

$ curl -H 'Content-Type: application/json' \
  -XPUT http://localhost:9200/hello-world/_doc/steph -d '
{
  "name": "stephanie", 
  "age": 28, 
  "country": "ireland", 
  "occupation": "web designer", 
  "hobbies": ["music", "design", "reading"]
}'

When we view our /_cat/indices API, we can see that we have 2 documents:

$ curl http://localhost:9200/_cat/indices/hello-world?v
health status index       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   hello-world Rm_8GlyOQy6XnKdDu75oeA   5   1          2            0     30.4kb         15.2kb

Let’s make a search request to view all our documents:

$ curl http://localhost:9200/hello-world/_search?pretty
{
  "took" : 90, 
  "timed_out" : false, 
  "_shards" : { 
    "total" : 5, 
    "successful" : 5, 
    "skipped" : 0, 
    "failed" : 0 
  }, 
  "hits" : { 
    "total" : 2, 
    "max_score" : 1.0, 
    "hits" : [ 
      { 
        "_index" : "hello-world", 
        "_type" : "_doc", 
        "_id" : "steph", 
        "_score" : 1.0, 
        "_source" : { 
          "name" : "stephanie", 
          "age" : 28, 
          "country" : "ireland", 
          "occupation" : "web designer", 
          "hobbies" : [ "music", "design", "reading" ] 
        } 
      }, 
      { 
        "_index" : "hello-world", 
        "_type" : "_doc", 
        "_id" : "ywfwpW8BDk0vAQ5FVYL7", 
        "_score" : 1.0, 
        "_source" : { 
          "name" : "ruan", 
          "age" : 30, 
          "country" : "south africa", 
          "occupation" : "systems development", 
          "hobbies" : [ "programming", "games", "guitar" ] 
         } 
       } 
     ] 
   } 
}

From the output above, you can see that the POST request generated a document id automatically. I will ingest a couple more documents into our index so that we can perform some queries:

$ curl -H 'Content-Type: application/json' -XPOST http://localhost:9200/hello-world/_doc/ -d '{"name": "randolph", "age": 32, "country": "brazil", "occupation": "sysadmin", "hobbies": ["coffee", "music", "programming"]}'

$ curl -H 'Content-Type: application/json' -XPOST http://localhost:9200/hello-world/_doc/ -d '{"name": "michael", "age": 24, "country": "italy", "occupation": "artist", "hobbies": ["coffee", "travel", "cars"]}'

$ curl -H 'Content-Type: application/json' -XPOST http://localhost:9200/hello-world/_doc/ -d '{"name": "susan", "age": 25, "country": "america", "occupation": "athlete", "hobbies": ["running", "tv", "sport"]}'
Searching in Elasticsearch

Let’s search for people from America:

$ curl 'http://localhost:9200/hello-world/_search?q=country:america&pretty'
{ 
  ...  
  "hits" : { 
    "total" : 1, 
    "max_score" : 1.0925692, 
    "hits" : [ 
      { 
        "_index" : "hello-world", 
        "_type" : "_doc", 
        "_id" : "zgf7pW8BDk0vAQ5FPoK-", 
        "_score" : 1.0925692, 
        "_source" : { 
          "name" : "susan", 
          "age" : 25, 
          "country" : "america", 
          "occupation" : "athlete", 
          "hobbies" : [ "running", "tv", "sport" ] 
      }
    ]
... 
}

We can also query using a POST request, where our search query gets included in our request body, a similar query as above will look like this:

$ curl -H 'Content-Type: application/json' \
  -XPOST 'http://localhost:9200/hello-world/_search?pretty' -d '
{
  "query": {
    "match": {
      "country": "america"
    }
  }
}'

We can use wildcard queries:

$ curl -H 'Content-Type: application/json' \
  -XPOST 'http://localhost:9200/hello-world/_search?pretty' -d '
{
  "query": {
    "wildcard": {
      "occupation": "sys*"
    }
  }
}'
{ 
...  
  "hits" : { 
    "total" : 2, 
    "max_score" : 1.0, 
    "hits" : [ 
      { 
        "_index" : "hello-world", 
        "_type" : "_doc", 
        "_id" : "ywfwpW8BDk0vAQ5FVYL7", 
        "_score" : 1.0, 
        "_source" : { 
          "name" : "ruan", 
          "age" : 30, 
          "country" : "south africa", 
          "occupation" : "systems development", 
          "hobbies" : [ "programming", "games", "guitar" ] 
        } 
    }, 
    { 
      "_index" : "hello-world", 
      "_type" : "_doc", 
      "_id" : "zAf5pW8BDk0vAQ5FWYLw", 
      "_score" : 1.0, 
      "_source" : { 
        "name" : "randolph", 
        "age" : 32, 
        "country" : "brazil", 
        "occupation" : "sysadmin", 
        "hobbies" : [ "coffee", "music", "programming" ] 
    } 
... 
}

Using range queries:

$ curl -H 'Content-Type: application/json' \
  -XPOST 'http://localhost:9200/hello-world/_search?pretty' -d '
{
  "query": {
    "range": {
      "age": {
        "gte": 28,
        "lte": 32
      }
    }
  }
}'
{
  "took" : 41,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "hello-world",
        "_type" : "_doc",
        "_id" : "steph",
        "_score" : 1.0,
        "_source" : {
          "name" : "stephanie",
          "age" : 28,
          "country" : "ireland",
          "occupation" : "web designer",
          "hobbies" : [
            "music",
            "design",
            "reading"
          ]
        }
      },
      {
        "_index" : "hello-world",
        "_type" : "_doc",
        "_id" : "ywfwpW8BDk0vAQ5FVYL7",
        "_score" : 1.0,
        "_source" : {
          "name" : "ruan",
          "age" : 30,
          "country" : "south africa",
          "occupation" : "systems development",
          "hobbies" : [
            "programming",
            "games",
            "guitar"
          ]
        }
      },
      {
        "_index" : "hello-world",
        "_type" : "_doc",
        "_id" : "zAf5pW8BDk0vAQ5FWYLw",
        "_score" : 1.0,
        "_source" : {
          "name" : "randolph",
          "age" : 32,
          "country" : "brazil",
          "occupation" : "sysadmin",
          "hobbies" : [
            "coffee",
            "music",
            "programming"
          ]
        }
      }
    ]
  }
}

Have a look at their documentation for more information about the Search API

Visualize with Kibana

Head over to “http://localhost:5601” in your browser and you should see the following UI:

kibana

Before we can visualize our data from our index, we first need to create the index pattern in Kibana. To do that, head over to “Management“, select “Index Patterns” and select “Create Index Pattern“. Provide the index name that we created, “hello-world” and as shown below, the index name should show:

Select “Next step” and select “Create index pattern“. Now that our index pattern has been created, head over to “Discover” on the left pane, you should see the data from your index:

kibana-discover-index-data

If you expand a document, you will see the key-value pairs in a table format, this is useful when you have larger documents:

kibana-discover-keyvalue-pairs

From the left pane you have a list of available fields that you can add to a filtered view, so lets say for example you only want to see the name, age and country you can add that to your filter and it should then look like the following:

kibana-filter-view

Let’s create a Pie chart to visualize the percentage of hobbies from the documents in our index. Head over to “Visualize“, hit create new visualization and select a “Pie” visualization:

kibana-pie-visualization

Select your “hello-world” index, from metrics, set your slice count as “Count” and buckets as “Split Slices“, select “Terms” from the aggregation drop down, the field we want to select is “hobbies.keyword” and we want to order by “Count“.

Select apply and you should see a visualization like this:

kibana-hobbies-visualisation

This is a basic example, but the more data you have the better it gets and then the true power of Elasticsearch shines. Elasticsearch is super powerful and amazingly fast.

Deleting your Index

To remove your index, we will use the DELETE method on our index:

$ curl -XDELETE http://localhost:9200/hello-world

Going Further

I have an Elasticsearch Cheatsheet that I put together and if this got you curious and you want to ingest lots of data into your Elasticsearch cluster, have a look at this python script

SHARE ON
Getting Started with Elasticsearch

You May Also Like

Leave a Reply

Your email address will not be published.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.