ELK 엘라스틱서치 Bucket Aggregation(버켓 어그리게이션)

2021-02-19

Aggregation, ELK

1. ELK 엘라스틱서치 Bucket Aggregation

Bucket Aggregation이란?

집계는 데이터를 그룹화하고 통계치를 얻는 기능입니다. SQL GROUP BY 및 SQL 집계 기능과 대략 같다고 보면 가장 쉽게 이해할 수 있습니다. Elasticsearch에서는 하나의 응답에서 검색 적중을 반환하는 검색을 실행함과 동시에 그와는 별도로 집계 결과를 반환할 수 있습니다. 즉 간결한 API를 사용하여 쿼리와 여러 집계를 실행하고 두 작업(또는 둘 중 하나)의 결과를 한꺼번에 얻어 네트워크 왕복을 피할 수 있다는 점에서 강력하고 효율적입니다.

요약하면, 평균을 구할때는 Matrix Aggregation을 사용하고 Bucket Aggregation은 그룹 및 통계치 기능하는 종류중 하나입니다.

AGGREGATIONS STRUCTURE(어그리게이션 구조)

GET <인덱스명>/_search
{
  "query": {
    … <쿼리 구문> …
  },
  "aggs": {
    "<임의의 aggregation 1>": {
      "<aggregation 종류>": {
        … <aggreagation 구문> …
      }
    },
    "<임의의 aggregation 2>": {
      "<aggregation 종류>": {
        … <aggreagation 구문> …
      }
    }
  }
}

Aggregation 에는 크게 Metrics 그리고 Bucket 두 종류가 있습니다. Aggregations 구문이나 옵션에 metrics 이거나 bucket 이라고 따로 명시를 하지는 않습니다. Aggregation 종류들 중 숫자 또는 날짜 필드의 값을 가지고 계산을 하는 aggregation 들을 metrics aggregation 이라고 분류하고, 범위나 keyword 값 등을 가지고 도큐먼트들을 그룹화 하는 aggregation 들을 bucket aggregation 이라고 분류 합니다.

basketball 인덱스 생성

1 2	curl -XPUT -H 'Content-Type:application/json' localhost:9200/basketball {"acknowledged":true,"shards_acknowledged":true,"index":"basketball"}%

baseketball 인덱스 매핑 조회
해당 basketball_mapping JSON값들을 확인해보겠습니다.

$vi basketball_mapping.json

{
  2     "record" : {
  3         "properties" : {
  4             "team" : {
  5                 "type" : "text",
  6                 "fielddata" : true
  7             },
  8             "name" : {
  9                 "type" : "text",
 10                 "fielddata" : true
 11             },
 12             "points" : {
 13                 "type" : "long"
 14             },
 15             "rebounds" : {
 16                 "type" : "long"
 17             },
 18             "assists" : {
 19                 "type" : "long"
 20             },
 21             "blocks" : {
 22                 "type" : "long"
 23             },
 24             "submit_date" : {
 25                 "type" : "date",
 26                 "format" : "yyyy-MM-dd"
 27             }
 28         }
 29     }
 30 }
~

baseketball 매핑 타입 삽입

1
2

curl -XPUT -H 'Content-Type:application/json' 'localhost:9200/basketball/record/_mapping&include_type_name=true' -d @basketball_mapping.json
{"_index":"basketball","_type":"record","_id":"_mapping&include_type_name=true","_version":5,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":12,"_primary_term":1}%

의미

basketball: Index
record: Type
_mapping: Mapping
-d: direct

baseketball documents 삽입

twoteam_basketball.json

{ "index" : { "_index" : "basketball", "_type" : "record", "_id" : "1" }     }
 2 {"team" : "Chicago","name" : "Michael Jordan", "points" : 30,"rebounds" :     3,"assists" : 4, "blocks" : 3, "submit_date" : "1996-10-11"}
 3 { "index" : { "_index" : "basketball", "_type" : "record", "_id" : "2" }     }
 4 {"team" : "Chicago","name" : "Michael Jordan","points" : 20,"rebounds" :     5,"assists" : 8, "blocks" : 4, "submit_date" : "1996-10-13"}
 5 { "index" : { "_index" : "basketball", "_type" : "record", "_id" : "3" }     }
 6 {"team" : "LA","name" : "Kobe Bryant","points" : 30,"rebounds" : 2,"assis    ts" : 8, "blocks" : 5, "submit_date" : "2014-10-13"}
 7 { "index" : { "_index" : "basketball", "_type" : "record", "_id" : "4" }     }
 8 {"team" : "LA","name" : "Kobe Bryant","points" : 40,"rebounds" : 4,"assis    ts" : 8, "blocks" : 6, "submit_date" : "2014-11-13"}

document삽입

1
2

curl -XPOST -H 'Content-Type:application/json' 'localhost:9200/_bulk' --data-binary @twoteam_basketball.json
{"took":411,"errors":false,"items":[{"index":{"_index":"basketball","_type":"record","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1,"status":201}},{"index":{"_index":"basketball","_type":"record","_id":"2","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":2,"_primary_term":1,"status":201}},{"index":{"_index":"basketball","_type":"record","_id":"3","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":3,"_primary_term":1,"status":201}},{"index":{"_index":"basketball","_type":"record","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4,"_primary_term":1,"status":201}}]}%

Term Aggregation(GROUP BY TEAM)

terms_aggs.json 데이터 확인

vi terms_aggs.json

1 {
2     "size" : 0,
3     "aggs" : {
4         "players" : {
5             "terms" : {
6                 "field" : "team"
7             }
8         }
9     }
10 }

terms_aggs.json 조회

$curl -XGET -H 'Content-Type:application/json' 'localhost:9200/_search?pretty' --data-binary @terms_aggs.json
{
  "took" : 880,
  "timed_out" : false,
  "_shards" : {
    "total" : 7,
    "successful" : 6,
    "skipped" : 0,
    "failed" : 1,
    "failures" : [
      {
        "shard" : 0,
        "index" : "basketball",
        "node" : "gsruIbFGTQmlTUIk7jr1Aw",
        "reason" : {
          "type" : "illegal_argument_exception",
          "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [team] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }
      }
    ]
  },
  "hits" : {
    "total" : {
      "value" : 48,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "players" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ ]
    }
  }
}

다음과 같이 정상적으로 Aggregation을 조회해온것을 확인할 수 있습니다.

AGGS(STATS GROUP BT TEAM)

vi stats_by_team.json 데이터확인

$vi stat_by_team.json
1 {
 2     "size" : 0,
 3     "aggs" : {
 4         "team_stats" : {
 5             "terms" : {
 6                 "field" : "team"
 7             },
 8             "aggs" : {
 9                 "stats_score" : {
10                     "stats" : {
11                         "field" : "points"
12                     }
13                 }
14             }
15         }
16     }
17 }

Aggregation 내용 확인하기

size는 보기편하기위해서 넣어준 속성이되고, 팀별로 documents들을 묶고, 나머지 어그리제이션을 사용하여 점수별로 Stat를 한번 표시해라 라는 의미가 됩니다. 각 팀별로 통계분석을 도출하면 되는 뜻입니다.

AGGS(STATS GROUP BT TEAM)조회

결과값 데이터가 제대로 들어가지않은부분이 있어서 다시 수정 필요.

curl -XGET -H 'Content-Type:application/json' 'localhost:9200/_search?q=points&pretty' --data-binary @stats_by_team.json
{
  "took" : 44,
  "timed_out" : false,
  "_shards" : {
    "total" : 7,
    "successful" : 6,
    "skipped" : 0,
    "failed" : 1,
    "failures" : [
      {
        "shard" : 0,
        "index" : "basketball",
        "node" : "gsruIbFGTQmlTUIk7jr1Aw",
        "reason" : {
          "type" : "illegal_argument_exception",
          "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [team] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }
      }
    ]
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "team_stats" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ ]
    }
  }
}

Reference

https://esbook.kimjmin.net/08-aggregations/8.2-bucket-aggregations