复合查询包括其他复合查询或叶查询,可以组合其结果和分数,更改其行为,或者从查询切换到筛选上下文。

Bool Query 布尔查询

  • 一个 bool 查询 ,是一个或者多个查询子句的组合
    • 总共包括4中子句。其中两种会影响算分,2中不影响算分
  • 相关性并不只是全文检索的专利。也适用于 yes|no的子句,匹配的子句越多,相关性评越高。如果多条子查询语句被合并为一跳复合查询语句,比如bool查询,则每个查询子句计算得出的评分会被合并到总的相关性评分中
类型 匹配 算分
must 必须匹配 贡献算分
should 选择性匹配 贡献算分
must_not Filter Context
必须不能匹配
不贡献算分
filter Filter Context
必须匹配
不贡献算分

Filter Context -不影响算分

请求示例

准备数据

POST /news/_bulk
{"index":{"_id":1}}
{"content":"Apple Mac"}
{"index":{"_id":2}}
{"content":"Apple iPad"}
{"index":{"_id":3}}
{"content":"Apple employee like Apple Pie and Apple Juice"}

Must 查询

查询包含apple的内容
请求

POST news/_search
{
  "query":{
    "bool": {
      "must": [
        {"match": {"content": "apple"}}
      ]
    }
  }
}

返回结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.17280532,
    "hits" : [
      {
        "_index" : "news",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.17280532,
        "_source" : {
          "content" : "Apple employee like Apple Pie and Apple Juice"
        }
      },
      {
        "_index" : "news",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.16786805,
        "_source" : {
          "content" : "Apple Mac"
        }
      },
      {
        "_index" : "news",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.16786805,
        "_source" : {
          "content" : "Apple iPad"
        }
      }
    ]
  }
}

Must Not 查询

查询包含apple的内容 但是不包含 pie
请求

POST news/_search
{
  "query":{
    "bool": {
      "must": [
        {"match": {"content": "apple"}}
      ],
      "must_not": [
        {"match": {"content": "pie"}}
      ]
    }
  }
}

返回结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.16786805,
    "hits" : [
      {
        "_index" : "news",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.16786805,
        "_source" : {
          "content" : "Apple Mac"
        }
      },
      {
        "_index" : "news",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.16786805,
        "_source" : {
          "content" : "Apple iPad"
        }
      }
    ]
  }
}

Bool 嵌套

查询语句的结构,会对相关度算分产生影响

  • 同一层级下对竞争字段,具有相同对权重
  • 通过嵌套bool查询,可以改变对算分对影响

查询语法

  • 子查询可以任意顺序出现
  • 可以嵌套多个查询
  • 如果你的bool查询中,没有must条件,should中必须至少满足一条查询

Boosting 相关性提升查询

  • Boosting 是控制相关度的一种手段
  • 参数 boost 的含义
    • 当 boost > 1,打分的相关度相对性提升
    • 当 0 < boost < 1,打分的权重相对性降低
    • 当 boost < 0 时,贡献负分

请求示例

准备测试数据

POST /blogs/_bulk
{"index":{"_id":1}}
{"title":"Apple iPad","content":"Apple iPad,Apple iPad"}
{"index":{"_id":2}}
{"title":"Apple iPad,Apple iPad","content":"Apple iPad"}

测试

POST news/_search
{
  "query":{
    "boosting": {
      "positive": {    //提升
        "match": {
          "content": "apple"
        }
      },
      "negative": { //降低
        "match": {
          "content": "pie"
        }
      },
      "negative_boost": 0.5 //降低的分数
    }
  }
}

返回结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.16786805,
    "hits" : [
      {
        "_index" : "news",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.16786805,
        "_source" : {
          "content" : "Apple Mac"
        }
      },
      {
        "_index" : "news",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.16786805,
        "_source" : {
          "content" : "Apple iPad"
        }
      },
      {
        "_index" : "news",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.08640266,
        "_source" : {
          "content" : "Apple employee like Apple Pie and Apple Juice"
        }
      }
    ]
  }
}

   可以把negative_boost 改成 1 对比查看效果。原本来说文档3中 apple 出现的频率高,算分高,通过降低相关性,调整了返回结果的算分。

顶级参数

positive:(必需,查询对象)您希望运行的查询。任何返回的文档必须与此查询匹配。
negative:(必需,查询对象)用于降低匹配文档的相关性得分的查询。
    如果返回的文档与positive查询和此查询匹配,则 boosting查询将计算文档的最终相关性分数,如下所示:

  • 从positive查询中获取原始相关性分数。
  • 将得分乘以该negative_boost值。

negative_boost:(必需,浮动)之间的浮点数0和1.0用于降低相关性得分的匹配的文件 negative的查询。

Constant Score 查询

  • 将Query 转成Filter,忽略 TF-IDF计算,避免相关性算分的开销
  • Filter 可以有效利用缓存

请求示例

POST news/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "content": "apple"
        }
      },
      "boost": 1
    }
  }
}

顶级参数

filter :
    (必需,查询对象)要运行的筛选查询。任何返回的文档必须与此查询匹配。
     过滤查询不会计算相关性分数 。为了加快性能,Elasticsearch会自动缓存经常使用的过滤器查询。

boost :
      (可选,浮点)浮点数用作匹配查询的每个文档的常量 相关性分数 filter。默认为1.0。

Disjunction Max 查询

  • 将任何与任一查询匹配的文档作为结果返回
  • 采用字段上最匹配的评分最终评分返回

请求示例

准备测试数据

PUT blogs/_bulk
{"index":{"_id":1}}
{"title":"Quick  brown rabbits","body":"Brown rabbits are commonly seen."}
{"index":{"_id":2}}
{"title":"Keeping pets healthy","body":"My quick brown fox eats rabbits on a regular basis."}

使用 bool 查询

请求:

POST blogs/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {"title": "Brown fox"}},
        {"match": {"body": "Brown fox"}}
      ]
    }
  }
}

返回结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.90425634,
    "hits" : [
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.90425634,
        "_source" : {
          "title" : "Quick  brown rabbits",
          "body" : "Brown rabbits are commonly seen."
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.77041256,
        "_source" : {
          "title" : "Keeping pets healthy",
          "body" : "My quick brown fox eats rabbits on a regular basis."
        }
      }
    ]
  }
}

使用 dis_max 查询 (Disjunction Max Query)

请求:

POST blogs/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {"match": {"title": "Brown fox"}},
        {"match": {"body": "Brown fox"}}
      ]
    }
  }
}

返回结果:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.77041256,
    "hits" : [
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.77041256,
        "_source" : {
          "title" : "Keeping pets healthy",
          "body" : "My quick brown fox eats rabbits on a regular basis."
        }
      },
      {
        "_index" : "blogs",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6931472,
        "_source" : {
          "title" : "Quick  brown rabbits",
          "body" : "Brown rabbits are commonly seen."
        }
      }
    ]
  }
}

顶级参数

queries :
    (必需,查询对象数组)包含一个或多个查询子句。返回的文档必须与这些查询中的一个或多个匹配。如果文档与多个查询匹配,则Elasticsearch使用最高相关性分数
tie_breaker :
    (可选,浮动)之间的浮动点数目0和1.0用于提高相关分数的匹配多个查询子句文档。默认为0.0。

Tie Breaker 参数调整评分

Tier Breaker 是一个介于0-1之间的浮点数。 0代表使用最佳匹配,1代表所有语句同等重要。

  • 获得最佳匹配语句的评分 _score
  • 将其他匹配语句的评分与 tie_breaker 相乘
  • 将最高分数添加到相乘的分数中。

 如果该tie_breaker值大于0.0,则所有匹配子句都计数,但得分最高的子句最多。

Function Score 查询

...


参考资料