Elasticsearch offers built-in capabilities for fuzzy matching of email addresses and telephone numbers.
To match email addresses ending with a specific domain (e.g., @gmail.com):
{
"query": {
"term": {
"email": ".*@gmail.com"
}
}
}
Or, to match emails containing a specific string:
{
"query": {
"match": {
"email": {
"query": "sales@*",
"operator": "and"
}
}
}
}
For fuzzy matching of telephone numbers, you can use the following pattern:
{
"query": {
"prefix": {
"tel": "136*"
}
}
}
This will match all phone numbers starting with "136".
To improve performance for fuzzy matching, consider using custom analyzers that leverage n-gram or edge n-gram token filters. These filters break down the text into smaller tokens, making it easier for Elasticsearch to perform fuzzy matching.
Email Analyzer Configuration:
{
"settings": {
"analysis": {
"analyzer": {
"email_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"name_ngram_filter",
"trim"
]
}
},
"filter": {
"name_ngram_filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "20"
}
}
}
}
}
Telephone Analyzer Configuration:
{
"settings": {
"analysis": {
"analyzer": {
"phone_analyzer": {
"type": "custom",
"char_filter": [
"digit_only"
],
"tokenizer": "digit_edge_ngram_tokenizer",
"filter": [
"trim"
]
}
},
"char_filter": {
"digit_only": {
"type": "pattern_replace",
"pattern": "\\D ",
"replacement": ""
}
},
"tokenizer": {
"digit_edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "3",
"max_gram": "15",
"token_chars": [
"digit"
]
}
}
}
}
}
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3