"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How to Perform Fuzzy Matching of Email Addresses and Telephone Numbers Using Elasticsearch?

How to Perform Fuzzy Matching of Email Addresses and Telephone Numbers Using Elasticsearch?

Published on 2024-11-07
Browse:244

How to Perform Fuzzy Matching of Email Addresses and Telephone Numbers Using Elasticsearch?

Fuzzy Matching Email or Telephone Using Elasticsearch

Elasticsearch offers built-in capabilities for fuzzy matching of email addresses and telephone numbers.

Email Matching

To match email addresses ending with a specific domain (e.g., @gmail.com):

{
    "query": {
        "term": {
            "email": ".*@gmail.com"
        }
    }
}

Or, to match emails containing a specific string:

{
    "query": {
        "match": {
            "email": {
                "query": "sales@*",
                "operator": "and"
            }
        }
    }
}

Telephone Matching

For fuzzy matching of telephone numbers, you can use the following pattern:

{
    "query": {
        "prefix": {
            "tel": "136*"
        }
    }
}

This will match all phone numbers starting with "136".

Performance Optimization

To improve performance for fuzzy matching, consider using custom analyzers that leverage n-gram or edge n-gram token filters. These filters break down the text into smaller tokens, making it easier for Elasticsearch to perform fuzzy matching.

Email Analyzer Configuration:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "email_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "name_ngram_filter",
            "trim"
          ]
        }
      },
      "filter": {
        "name_ngram_filter": {
          "type": "ngram",
          "min_gram": "3",
          "max_gram": "20"
        }
      }
    }
  }
}

Telephone Analyzer Configuration:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "phone_analyzer": {
          "type": "custom",
          "char_filter": [
            "digit_only"
          ],
          "tokenizer": "digit_edge_ngram_tokenizer",
          "filter": [
            "trim"
          ]
        }
      },
      "char_filter": {
        "digit_only": {
          "type": "pattern_replace",
          "pattern": "\\D ",
          "replacement": ""
        }
      },
      "tokenizer": {
        "digit_edge_ngram_tokenizer": {
          "type": "edgeNGram",
          "min_gram": "3",
          "max_gram": "15",
          "token_chars": [
            "digit"
          ]
        }
      }
    }
  }
}
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3