本帖最后由 zhuqitian 于 2017-5-18 10:44 编辑
近期公司想要通过es出报表,总监就让我学习下es,这几天环境折腾好了,其实如果只是想搭好几台十几台的集群,最多也就半天吧,快则两小时版本5.3.0,目前保留下来的插件:kibana,head(也是最有用的两个插件了)
kibana:通过restful api查询特别方便,快捷:
//PUT /indexName/typeName/id 若是索引不存在会自动创建 若是不想关闭自动创建:elasticsearch.yml --> action.auto_create_index: false
PUT twitter/tweet/1
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
GET /_cat/indices?v //详细查看全部索引 ?v表示显示详细信息,其实就是把表头也展现出来
GET _cat/health?v //查看集群健康状态:green健康(所有的primary shard和replica shard都是active状态),yellow良好(所有primary shard都是active状态,但是部分replica shard挂掉了),red不健康(有primary shard挂掉的情况)
GET _cat/nodes?v //查看集群节点信息,标注为master(mi)为主节点,不作为数据节点
个人理解:shard就好像是hdfs中的block,数据库可以有副本,那es中是primary shard就有replica shard,主数据块和副本的分布策略跟hdfs中类似
es官方和github上也提供了很多比较不错的插件,但是都美中不足,比如x-pack据说是要收费的,可能是有些公司测出了es存在网络安全隐患,就开发除了一个xpack去弥补,进行ssl认证等,因为收费,所以可能有些企业不会去使用这个插件。还有个elasticsearch-sql插件,这个可以在github上找到安装方法,但是5.x版本的es貌似对这个sql插件支持的不是很好,每次我在url中手工指定一条sql作为参数传进去时查询成功,数据也能迅速展现,但是界面就是显示不出来,可能后期会修复,只是每个引擎或者框架都会有自己的dsl,es的restful api是原生的,所以建议多用
稍微复杂些的查询:有查询过滤条件,有聚合,有排序,算是重用restful api中比较全面的了
GET /orders/info/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{"match": {
"ordercodeofsys": "BADTCW0000000"
}}
],
"should": [
{"match": {
"paymentfee": 0
},
"bool": {
"must": [
{"match": {
"orderidofsys": "1941610"
}
}
],
"must_not": [
{"match": {
"updatetime": "2016-12-19T00:26:57+08:00"
}}
]
}
}
],
"filter": {
"prefix": {
"discountfee": 0
}
}
}
},
"aggs": {
"group by ordercodeofsys": {
"terms": {
"field": "ordercodeofsys",
"order": {
"avg paymentfee": "desc"
}
},
"aggs": {
"avg paymentfee": {
"avg": {
"field": "paymentfee"
}
}
}
}
}
}
至于安装注意事项,如何配置属性参数暂不详述,毕竟我也刚研究几天也说不全(不过参数不像hadoop,hive这么多),下面介绍下java client连接es:
[mw_shl_code=java,true]package org.elasticsearch.client.test;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.search.aggregations.Aggregation;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.histogram.DateHistogramInterval;
import org.elasticsearch.search.aggregations.bucket.histogram.Histogram;
import org.elasticsearch.search.aggregations.bucket.terms.StringTerms;
import org.elasticsearch.search.aggregations.metrics.avg.Avg;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.elasticsearch.search.aggregations.bucket.terms.Terms.Bucket;
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.Iterator;
import java.util.Map;
/**
* Created by 圣斗士宙斯 on 2017/5/16.
*/
public class ES_AGGS {
public static void main(String[] args) throws UnknownHostException{
//构建client
Settings settings = Settings.builder()
.put("cluster.name","bcw-dmp-es")
// .put("xpack.security.transport.ssl.enabled", false)
// .put("xpack.security.user","elastic:changeme")
.put("client.transport.sniff", true)
.put("client.transport.ping_timeout","5s")
.put("client.transport.nodes_sampler_interval","5s")
.build();
TransportClient client = new PreBuiltTransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host"), 9300));
SearchResponse response = client.prepareSearch("orders")
.addAggregation(AggregationBuilders.terms("group by postfee").field("postfee")
.subAggregation(AggregationBuilders.dateHistogram("group by updatetime").field("updatetime")
.dateHistogramInterval(DateHistogramInterval.YEAR)
.subAggregation(AggregationBuilders.avg("avg totalfee").field("totalfee"))))
.execute().actionGet();
Map<String,Aggregation> aggsMap = response.getAggregations().asMap();
StringTerms groupbyData = (StringTerms) aggsMap.get("group by postfee");
Iterator<Bucket> buckerIterator = groupbyData.getBuckets().iterator();
while (buckerIterator.hasNext()) {
Bucket bucket = buckerIterator.next();
System.out.print(bucket.getKey()+":" +bucket.getDocCount());
Histogram histogram = (Histogram) bucket.getAggregations().asMap().get("group by postfee");
Iterator<org.elasticsearch.search.aggregations.bucket.histogram.Histogram.Bucket> bucket2 = histogram.getBuckets().iterator();
while (bucket2.hasNext()){
org.elasticsearch.search.aggregations.bucket.histogram.Histogram.Bucket bucket3 = bucket2.next();
System.out.print(bucket3.getKey()+":"+bucket3.getDocCount());
Avg avg = (Avg) bucket3.getAggregations().asMap().get("avg totalfee");
System.out.print(avg.getValue());
}
}
client.close();
}
}
[/mw_shl_code]
构建client时跟spark等框架差不多,也可以手工写死一些参数,优先级最高
时间有限还要继续干活,我也差不多就这么货了,下次深究底层原理了再分享我的一些理解
thanks!
show your knowledge with the world!
|
|