ElasticSearch中 distinct, count和group by的实现
目录
ElasticSearch中 distinct, count和group by的实现。ElasticSearch最新版高级插件已经完美支持sql的查询。这里备份一下针对老版本或者需要json翻译SQL的方法。主要是针对一些类SQL操作,原文地址:https://www.jianshu.com/p/62bed9cc8349
distinct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
# SQL SELECT DISTINCT(user_id) FROM table WHERE user_id_type = 3; # ElasticSearch: { "query": { "term": { "user_id_type": 3 } }, "collapse": { "field": "user_id" } } # Result { "hits": { "hits": [ { "_index": "es_qd_mkt_visitor_packet_dev_v1_20180621", "_type": "ad_crowd", "_source": { "user_id": "wx2af8414b502d4ca2_oHtrD0Vxv-_8c678figJNHmtaVQQ", "user_id_type": 3 }, "fields": { "user_id": [ "wx2af8414b502d4ca2_oHtrD0Vxv-_8c678figJNHmtaVQQ" ] } } ] } } |
总结:使用collapse字段后,查询结果中[hits]中会出现[fields]字段,其中包含了去重后的user_id.
注意:对于collapse字段,折叠功能ES5.3版本之后才发布的,聚合&折叠只能针对keyword类型有效。
count + distinct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# SQL: SELECT COUNT(DISTINCT(user_id)) FROM table WHERE user_id_type = 3; # ElasticSearch { "query": { "term": { "user_id_type": 3 } }, "aggs": { "count": { "cardinality": { "field": "user_id" } } } } # Result { "hits": {}, "aggregations": { "count": { "value": 121 } } } |
总结:aggs中cardinality的字段代表需要distinct的字段
count + group by
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# SQL: SELECT COUNT(user_id) FROM table GROUP BY user_id_type; # ElasticSearch { "aggs": { "user_type": { "terms": { "field": "user_id_type" } } } } # Result { "hits": {}, "aggregations": { "user_type": { ... "buckets": [ { "key": 4, "doc_count": 1220 }, { "key": 3, "doc_count": 488 } ] } } } |
总结:aggs中terms的字段代表需要gruop by的字段
count + distinct + group by
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# SQL: SELECT COUNT(DISTINCT(user_id)) FROM table GROUP BY user_id_type; # ElasticSearch { "aggs": { "user_type": { "terms": { "field": "user_id_type" }, "aggs": { "count": { "cardinality": { "field": "user_id" } } } } } } # Result { ... "hits": { ... }, "aggregations": { "user_type": { ... "buckets": [ { "key": 4, "doc_count": 1220, //去重前数据1220条 "count": { "value": 276 //去重后数据276条 } }, { "key": 3, "doc_count": 488, //去重前数据488条 "count": { "value": 121 //去重后数据121条 } } ] } } } |
count + distinct + group by + where
1 2 |
# SQL: SELECT COUNT(DISTINCT(user_id)) FROM table WHERE user_id_type = 2 GROUP BY user_id; |
总结:对于既有group by又有distinct的查询要求,需要在aggs中嵌套子aggs