复合查询包括其他复合查询或叶查询,可以组合其结果和分数,更改其行为,或者从查询切换到筛选上下文。
Bool Query 布尔查询
- 一个 bool 查询 ,是一个或者多个查询子句的组合
- 总共包括4中子句。其中两种会影响算分,2中不影响算分
- 相关性并不只是全文检索的专利。也适用于 yes|no的子句,匹配的子句越多,相关性评越高。如果多条子查询语句被合并为一跳复合查询语句,比如bool查询,则每个查询子句计算得出的评分会被合并到总的相关性评分中
类型 | 匹配 | 算分 |
---|---|---|
must | 必须匹配 | 贡献算分 |
should | 选择性匹配 | 贡献算分 |
must_not | Filter Context 必须不能匹配 |
不贡献算分 |
filter | Filter Context 必须匹配 |
不贡献算分 |
Filter Context -不影响算分
请求示例
准备数据
POST /news/_bulk
{"index":{"_id":1}}
{"content":"Apple Mac"}
{"index":{"_id":2}}
{"content":"Apple iPad"}
{"index":{"_id":3}}
{"content":"Apple employee like Apple Pie and Apple Juice"}
Must 查询
查询包含apple的内容
请求
POST news/_search
{
"query":{
"bool": {
"must": [
{"match": {"content": "apple"}}
]
}
}
}
返回结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.17280532,
"hits" : [
{
"_index" : "news",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.17280532,
"_source" : {
"content" : "Apple employee like Apple Pie and Apple Juice"
}
},
{
"_index" : "news",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.16786805,
"_source" : {
"content" : "Apple Mac"
}
},
{
"_index" : "news",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.16786805,
"_source" : {
"content" : "Apple iPad"
}
}
]
}
}
Must Not 查询
查询包含apple的内容 但是不包含 pie
请求
POST news/_search
{
"query":{
"bool": {
"must": [
{"match": {"content": "apple"}}
],
"must_not": [
{"match": {"content": "pie"}}
]
}
}
}
返回结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.16786805,
"hits" : [
{
"_index" : "news",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.16786805,
"_source" : {
"content" : "Apple Mac"
}
},
{
"_index" : "news",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.16786805,
"_source" : {
"content" : "Apple iPad"
}
}
]
}
}
Bool 嵌套
查询语句的结构,会对相关度算分产生影响
- 同一层级下对竞争字段,具有相同对权重
- 通过嵌套bool查询,可以改变对算分对影响
查询语法
- 子查询可以任意顺序出现
- 可以嵌套多个查询
- 如果你的bool查询中,没有must条件,should中必须至少满足一条查询
Boosting 相关性提升查询
- Boosting 是控制相关度的一种手段
- 参数 boost 的含义
- 当 boost > 1,打分的相关度相对性提升
- 当 0 < boost < 1,打分的权重相对性降低
- 当 boost < 0 时,贡献负分
请求示例
准备测试数据
POST /blogs/_bulk
{"index":{"_id":1}}
{"title":"Apple iPad","content":"Apple iPad,Apple iPad"}
{"index":{"_id":2}}
{"title":"Apple iPad,Apple iPad","content":"Apple iPad"}
测试
POST news/_search
{
"query":{
"boosting": {
"positive": { //提升
"match": {
"content": "apple"
}
},
"negative": { //降低
"match": {
"content": "pie"
}
},
"negative_boost": 0.5 //降低的分数
}
}
}
返回结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.16786805,
"hits" : [
{
"_index" : "news",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.16786805,
"_source" : {
"content" : "Apple Mac"
}
},
{
"_index" : "news",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.16786805,
"_source" : {
"content" : "Apple iPad"
}
},
{
"_index" : "news",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.08640266,
"_source" : {
"content" : "Apple employee like Apple Pie and Apple Juice"
}
}
]
}
}
可以把negative_boost
改成 1
对比查看效果。原本来说文档3中 apple
出现的频率高,算分高,通过降低相关性,调整了返回结果的算分。
顶级参数
positive
:(必需,查询对象)您希望运行的查询。任何返回的文档必须与此查询匹配。
negative
:(必需,查询对象)用于降低匹配文档的相关性得分的查询。
如果返回的文档与positive
查询和此查询匹配,则 boosting
查询将计算文档的最终相关性分数,如下所示:
- 从positive查询中获取原始相关性分数。
- 将得分乘以该negative_boost值。
negative_boost
:(必需,浮动)之间的浮点数0和1.0用于降低相关性得分的匹配的文件 negative的查询。
Constant Score 查询
- 将Query 转成Filter,忽略 TF-IDF计算,避免相关性算分的开销
- Filter 可以有效利用缓存
请求示例
POST news/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"content": "apple"
}
},
"boost": 1
}
}
}
顶级参数
filter
:
(必需,查询对象)要运行的筛选查询。任何返回的文档必须与此查询匹配。
过滤查询不会计算相关性分数 。为了加快性能,Elasticsearch会自动缓存经常使用的过滤器查询。
boost
:
(可选,浮点)浮点数用作匹配查询的每个文档的常量 相关性分数 filter。默认为1.0。
Disjunction Max 查询
- 将任何与任一查询匹配的文档作为结果返回
- 采用字段上最匹配的评分最终评分返回
请求示例
准备测试数据
PUT blogs/_bulk
{"index":{"_id":1}}
{"title":"Quick brown rabbits","body":"Brown rabbits are commonly seen."}
{"index":{"_id":2}}
{"title":"Keeping pets healthy","body":"My quick brown fox eats rabbits on a regular basis."}
使用 bool 查询
请求:
POST blogs/_search
{
"query": {
"bool": {
"should": [
{"match": {"title": "Brown fox"}},
{"match": {"body": "Brown fox"}}
]
}
}
}
返回结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.90425634,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.90425634,
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.77041256,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
}
]
}
}
使用 dis_max 查询 (Disjunction Max Query)
请求:
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{"match": {"title": "Brown fox"}},
{"match": {"body": "Brown fox"}}
]
}
}
}
返回结果:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.77041256,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.77041256,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
}
]
}
}
顶级参数
queries
:
(必需,查询对象数组)包含一个或多个查询子句。返回的文档必须与这些查询中的一个或多个匹配。如果文档与多个查询匹配,则Elasticsearch使用最高相关性分数。
tie_breaker
:
(可选,浮动)之间的浮动点数目0和1.0用于提高相关分数的匹配多个查询子句文档。默认为0.0。
Tie Breaker 参数调整评分
Tier Breaker 是一个介于0-1之间的浮点数。 0代表使用最佳匹配,1代表所有语句同等重要。
- 获得最佳匹配语句的评分
_score
- 将其他匹配语句的评分与
tie_breaker
相乘 - 将最高分数添加到相乘的分数中。
如果该tie_breaker值大于0.0,则所有匹配子句都计数,但得分最高的子句最多。
Function Score 查询
...