How Fuzzy Queries in OpenSearch Enhance Typo Tolerance: A Comprehensive Guide

Salonisuman
AWS Tip
Published in
3 min readApr 27, 2024

--

In the domain of search algorithms, precision remains a top priority. Nevertheless, real-world data often introduces challenges such as typos ,variations and misspelled keywords. The Solution is Fuzzy Query with OpenSearch.

image by Firmbee.com(unsplash)

Understanding Fuzziness

Fuzziness within search queries denotes the capacity to accommodate variances in spelling, word order, or structure, thereby facilitating relevant result retrieval despite user input errors or deviations from the exact match.

A fuzzy query employs the Levenshtein distance metric to retrieve documents that contain terms closely resembling the search term, within the specified threshold for allowable variations.

Lets Looks into the Query And It’s Implementation:

{
"query": {
"fuzzy": {
"field_name": { //Field_name_you_want_to_Search_for
"value": "search_term", //example: fuzy
"fuzziness": "AUTO" // Default: AUTO
}
}
}
}

InCase you want to search for the keywords in Multiple fields , use fuzzy query with Multimatch

{
"query": {
"multi_match": {
"fields": [ "summary", "title", "tag" ],
"query": "search_term",
"fuzziness": "AUTO"
}
}
}

Importance of Fuzziness applied in the Opensearch Query :

  1. AUTO : When you set the “fuzziness” field to “AUTO” in Elasticsearch, the system will automatically decide the appropriate fuzziness distance for the search term.

2. Fuzziness 1: With fuzziness set to 1, the search would allow for a maximum of 1 edit distance between the search term and the terms in the documents. This means that terms like “bat” (1 substitution), “cats” (1 insertion), “at” (1 deletion), or “act” (1 transposition) would be considered matches.

3. Fuzziness 2: In contrast, with fuzziness set to 2, the search would allow for a maximum of 2 edit distances between the search term and the terms in the documents. This extends the range of allowable variations, so terms like “bats” (2 substitutions), “catty” (1 substitution, 1 insertion), “at” (2 deletions), or “tac” (2 transpositions) would also be considered matches.

Request to OpenSearch:

curl --request POST \
--url http://localhost:3000/fuzzy-query/_doc/_search \
--header 'content-type: application/json' \
--data '{
"from": 0,
"size": 100,
"query": {
"fuzzy": {
"title": { //field: title
"value": "skincre", //example: skincre
"fuzziness": "1" // fuzziness: 1
}
}
},
"highlight": { //to highlight the matched keyword
"fields": {
"title": {}
}
}
}'

Response from Opensearch:

{
"took": 20,
"timed_out": false,
"_shards": {
"total": 153,
"successful": 153,
"skipped": 148,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 4.5032263,
"hits": [
{
"_index": "fuzzy-query",
"_id": "entity:node/1468:en",
"_score": 4.476768,
"_source": {
"field_title_as_url": [
"Skincare-for-winter"
],
"title": [
"Skincare for winter"
]
},
"highlight": {
"title": [
"<em>Skincare</em> for winter"
]
}
},
{
"_index": "fuzzy-query",
"_id": "entity:node/284:en",
"_score": 3.7779834,
"_source": {
"field_title_as_url": [
"Ayurveda-Skincare-Courses"
],
"title": [
"Ayurveda Skincare Courses"
]
},
"highlight": {
"title": [
"Ayurveda <em>Skincare</em> Courses"
]
}
},
{
"_index": "fuzzy-query",
"_id": "entity:node/288:en",
"_score": 3.6836915,
"_source": {
"field_title_as_url": [
"Creating-All-Natural-Skincare-CPD-Accredited"
],
"title": [
"Creating All Natural Skincare - CPD Accredited"
]
},
"highlight": {
"title": [
"Creating All Natural <em>Skincare</em> - CPD Accredited"
]
}
}
]
}
}

Note:

Utilizing highlighting in Elasticsearch aids in identifying which keywords your query has successfully matched within the search results. In real-world datasets, multiple words may match the query with added fuzziness, enhancing the search experience by providing clarity on relevant matches.

more on hightlight in opensearch

I hope you found this guide insightful and informative. If you have any questions or comments, feel free to leave them below. Your feedback is valuable, and I’m here to help address any queries you may have. Thank you for reading!

I have written other article on OpenSearch Multisearch Feature,If you want to learn feel free to explore here.

--

--