Python에서 OpenSearch를 사용하여 CRUD 작업 마스터하기: 실용 가이드

첫 장 > 프로그램 작성 > Python에서 OpenSearch를 사용하여 CRUD 작업 마스터하기: 실용 가이드

Python에서 OpenSearch를 사용하여 CRUD 작업 마스터하기: 실용 가이드

2024-11-06에 게시됨

검색:981

Mastering CRUD Operations with OpenSearch in Python: A Practical Guide

OpenSearch, an open-source alternative to Elasticsearch, is a powerful search and analytics engine built to handle large datasets with ease. In this blog, we’ll demonstrate how to perform basic CRUD (Create, Read, Update, Delete) operations in OpenSearch using Python.

Prerequisites:

Python 3.7
OpenSearch installed locally using Docker
Familiarity with RESTful APIs

Step 1: Setting Up OpenSearch Locally with Docker

To get started, we need a local OpenSearch instance. Below is a simple docker-compose.yml file that spins up OpenSearch and OpenSearch Dashboards.

version: '3'
services:
  opensearch-test-node-1:
    image: opensearchproject/opensearch:2.13.0
    container_name: opensearch-test-node-1
    environment:
      - cluster.name=opensearch-test-cluster
      - node.name=opensearch-test-node-1
      - discovery.seed_hosts=opensearch-test-node-1,opensearch-test-node-2
      - cluster.initial_cluster_manager_nodes=opensearch-test-node-1,opensearch-test-node-2
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
      - "DISABLE_INSTALL_DEMO_CONFIG=true"
      - "DISABLE_SECURITY_PLUGIN=true"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-test-data1:/usr/share/opensearch/data
    ports:
      - 9200:9200
      - 9600:9600
    networks:
      - opensearch-test-net

  opensearch-test-node-2:
    image: opensearchproject/opensearch:2.13.0
    container_name: opensearch-test-node-2
    environment:
      - cluster.name=opensearch-test-cluster
      - node.name=opensearch-test-node-2
      - discovery.seed_hosts=opensearch-test-node-1,opensearch-test-node-2
      - cluster.initial_cluster_manager_nodes=opensearch-test-node-1,opensearch-test-node-2
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
      - "DISABLE_INSTALL_DEMO_CONFIG=true"
      - "DISABLE_SECURITY_PLUGIN=true"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-test-data2:/usr/share/opensearch/data
    networks:
      - opensearch-test-net

  opensearch-test-dashboards:
    image: opensearchproject/opensearch-dashboards:2.13.0
    container_name: opensearch-test-dashboards
    ports:
      - 5601:5601
    expose:
      - "5601"
    environment:
      - 'OPENSEARCH_HOSTS=["http://opensearch-test-node-1:9200","http://opensearch-test-node-2:9200"]'
      - "DISABLE_SECURITY_DASHBOARDS_PLUGIN=true"
    networks:
      - opensearch-test-net

volumes:
  opensearch-test-data1:
  opensearch-test-data2:

networks:
  opensearch-test-net:

Run the following command to bring up your OpenSearch instance:
docker-compose up
OpenSearch will be accessible at http://localhost:9200.

Step 2: Setting Up the Python Environment

python -m venv .venv
source .venv/bin/activate
pip install opensearch-py

We'll also structure our project as follows:

├── interfaces.py
├── main.py
├── searchservice.py
├── docker-compose.yml

Step 3: Defining Interfaces and Resources (interfaces.py)

In the interfaces.py file, we define our Resource and Resources classes. These will help us dynamically handle different resource types in OpenSearch (in this case, users).

from dataclasses import dataclass, field

@dataclass
class Resource:
    name: str

    def __post_init__(self) -> None:
        self.name = self.name.lower()

@dataclass
class Resources:
    users: Resource = field(default_factory=lambda: Resource("Users"))

Step 4: CRUD Operations with OpenSearch (searchservice.py)

In searchservice.py, we define an abstract class SearchService to outline the required operations. The HTTPOpenSearchService class then implements these CRUD methods, interacting with the OpenSearch client.

# coding: utf-8

import abc
import logging
import typing as t
from dataclasses import dataclass
from uuid import UUID

from interfaces import Resource, Resources
from opensearchpy import NotFoundError, OpenSearch

resources = Resources()


class SearchService(abc.ABC):
    def search(
        self,
        kinds: t.List[Resource],
        tenants_id: UUID,
        companies_id: UUID,
        query: t.Dict[str, t.Any],
    ) -> t.Dict[t.Literal["hits"], t.Dict[str, t.Any]]:
        raise NotImplementedError

    def delete_index(
        self,
        kind: Resource,
        tenants_id: UUID,
        companies_id: UUID,
        data: t.Dict[str, t.Any],
    ) -> None:
        raise NotImplementedError

    def index(
        self,
        kind: Resource,
        tenants_id: UUID,
        companies_id: UUID,
        data: t.Dict[str, t.Any],
    ) -> t.Dict[str, t.Any]:
        raise NotImplementedError

    def delete_document(
        self,
        kind: Resource,
        tenants_id: UUID,
        companies_id: UUID,
        document_id: str,
    ) -> t.Optional[t.Dict[str, t.Any]]:
        raise NotImplementedError

    def create_index(
        self,
        kind: Resource,
        tenants_id: UUID,
        companies_id: UUID,
        data: t.Dict[str, t.Any],
    ) -> None:
        raise NotImplementedError


@dataclass(frozen=True)
class HTTPOpenSearchService(SearchService):
    client: OpenSearch

    def _gen_index(
        self,
        kind: Resource,
        tenants_id: UUID,
        companies_id: UUID,
    ) -> str:
        return (
            f"tenant_{str(UUID(str(tenants_id)))}"
            f"_company_{str(UUID(str(companies_id)))}"
            f"_kind_{kind.name}"
        )

    def index(
        self,
        kind: Resource,
        tenants_id: UUID,
        companies_id: UUID,
        data: t.Dict[str, t.Any],
    ) -> t.Dict[str, t.Any]:
        self.client.index(
            index=self._gen_index(kind, tenants_id, companies_id),
            body=data,
            id=data.get("id"),
        )
        return data

    def delete_index(
        self,
        kind: Resource,
        tenants_id: UUID,
        companies_id: UUID,
    ) -> None:
        try:
            index = self._gen_index(kind, tenants_id, companies_id)
            if self.client.indices.exists(index):
                self.client.indices.delete(index)
        except NotFoundError:
            pass

    def create_index(
        self,
        kind: Resource,
        tenants_id: UUID,
        companies_id: UUID,
    ) -> None:
        body: t.Dict[str, t.Any] = {}
        self.client.indices.create(
            index=self._gen_index(kind, tenants_id, companies_id),
            body=body,
        )

    def search(
        self,
        kinds: t.List[Resource],
        tenants_id: UUID,
        companies_id: UUID,
        query: t.Dict[str, t.Any],
    ) -> t.Dict[t.Literal["hits"], t.Dict[str, t.Any]]:
        return self.client.search(
            index=",".join(
                [self._gen_index(kind, tenants_id, companies_id) for kind in kinds]
            ),
            body={"query": query},
        )

    def delete_document(
        self,
        kind: Resource,
        tenants_id: UUID,
        companies_id: UUID,
        document_id: str,
    ) -> t.Optional[t.Dict[str, t.Any]]:
        try:
            response = self.client.delete(
                index=self._gen_index(kind, tenants_id, companies_id),
                id=document_id,
            )
            return response
        except Exception as e:
            logging.error(f"Error deleting document: {e}")
            return None

Step 5: Implementing CRUD in Main (main.py)

In main.py, we demonstrate how to:

Create an index in OpenSearch.

Index documents with sample user data.

Search for documents based on a query.

Delete a document using its ID.

main.py

# coding=utf-8

import logging
import os
import typing as t
from uuid import uuid4

import searchservice
from interfaces import Resources
from opensearchpy import OpenSearch

resources = Resources()

logging.basicConfig(level=logging.INFO)

search_service = searchservice.HTTPOpenSearchService(
    client=OpenSearch(
        hosts=[
            {
                "host": os.getenv("OPENSEARCH_HOST", "localhost"),
                "port": os.getenv("OPENSEARCH_PORT", "9200"),
            }
        ],
        http_auth=(
            os.getenv("OPENSEARCH_USERNAME", ""),
            os.getenv("OPENSEARCH_PASSWORD", ""),
        ),
        use_ssl=False,
        verify_certs=False,
    ),
)

tenants_id: str = "f0835e2d-bd68-406c-99a7-ad63a51e9ef9"
companies_id: str = "bf58c749-c90a-41e2-b66f-6d98aae17a6c"
search_str: str = "frank"
document_id_to_delete: str = str(uuid4())

fake_data: t.List[t.Dict[str, t.Any]] = [
    {"id": document_id_to_delete, "name": "Franklin", "tech": "python,node,golang"},
    {"id": str(uuid4()), "name": "Jarvis", "tech": "AI"},
    {"id": str(uuid4()), "name": "Parry", "tech": "Golang"},
    {"id": str(uuid4()), "name": "Steve", "tech": "iOS"},
    {"id": str(uuid4()), "name": "Frank", "tech": "node"},
]

search_service.delete_index(
    kind=resources.users, tenants_id=tenants_id, companies_id=companies_id
)

search_service.create_index(
    kind=resources.users,
    tenants_id=tenants_id,
    companies_id=companies_id,
)

for item in fake_data:
    search_service.index(
        kind=resources.users,
        tenants_id=tenants_id,
        companies_id=companies_id,
        data=dict(tenants_id=tenants_id, companies_id=companies_id, **item),
    )

search_query: t.Dict[str, t.Any] = {
    "bool": {
        "must": [],
        "must_not": [],
        "should": [],
        "filter": [
            {"term": {"tenants_id.keyword": tenants_id}},
            {"term": {"companies_id.keyword": companies_id}},
        ],
    }
}
search_query["bool"]["must"].append(
    {
        "multi_match": {
            "query": search_str,
            "type": "phrase_prefix",
            "fields": ["name", "tech"],
        }
    }
)

search_results = search_service.search(
    kinds=[resources.users],
    tenants_id=tenants_id,
    companies_id=companies_id,
    query=search_query,
)

final_result = search_results.get("hits", {}).get("hits", [])
for item in final_result:
    logging.info(["Item -> ", item.get("_source", {})])

deleted_result = search_service.delete_document(
    kind=resources.users,
    tenants_id=tenants_id,
    companies_id=companies_id,
    document_id=document_id_to_delete,
)
logging.info(["Deleted result -> ", deleted_result])

Step 6: Running the project

docker compose up
python main.py

Results:

It should print found & deleted records information.

Step 7: Conclusion

In this blog, we’ve demonstrated how to set up OpenSearch locally using Docker and perform basic CRUD operations with Python. OpenSearch provides a powerful and scalable solution for managing and querying large datasets. While this guide focuses on integrating OpenSearch with dummy data, in real-world applications, OpenSearch is often used as a read-optimized store for faster data retrieval. In such cases, it is common to implement different indexing strategies to ensure data consistency by updating both the primary database and OpenSearch concurrently.

This ensures that OpenSearch remains in sync with your primary data source, optimizing both performance and accuracy in data retrieval.

References:

https://github.com/FranklinThaker/opensearch-integration-example

릴리스 선언문 이 기사는 https://dev.to/franklinthaker/mastering-crud-Operations-with-opensearch-in-python-a-practical-guide-53k7?1에 복제되어 있습니다. 침해가 있는 경우에는 Study_golang@163으로 문의하시기 바랍니다. .com에서 삭제하세요

최신 튜토리얼 더>

데이터베이스 행 계산에 더 빠른 방법(PDO::rowCount 또는 COUNT(*))과 그 이유는 무엇입니까?
PDO::rowCount 대 COUNT(*) 성능데이터베이스 쿼리에서 행을 계산할 때 PDO 사용 중에서 선택:: rowCount 및 COUNT(*)는 성능에 큰 영향을 미칠 수 있습니다.PDO::rowCountPDO::rowCount는 마지막 SQL 문의 영향을 받은...

프로그램 작성 2024-11-06에 게시됨
PART# 대용량 데이터 세트에 HTTP를 사용하는 효율적인 파일 전송 시스템
제공된 HTML, PHP, JavaScript 및 CSS 코드를 분석해 보겠습니다. 청크 파일 업로드 대시보드를 부분별로 표시합니다. HTML 코드: 구조 개요: 레이아웃용 부트스트랩: 코드는 Bootstrap 4.5.2를 사용하여 두 가지...

프로그램 작성 2024-11-06에 게시됨
비교: Lithe와 기타 PHP 프레임워크
다음 프로젝트를 위해 PHP 프레임워크를 탐색 중이라면 Laravel, Symfony, Slim과 같은 옵션을 자연스럽게 접하게 됩니다. 하지만 Lithe가 더욱 강력하고 잘 알려진 프레임워크와 차별화되는 점은 무엇일까요? Lithe의 차별화된 특징을 강조하는 몇 가지...

프로그램 작성 2024-11-06에 게시됨
코딩 스타일 가이드: 깔끔한 코드 작성을 위한 실용 가이드
지난 5년 동안 저는 코딩 기술을 향상시키기 위해 끊임없이 노력해 왔으며 그 중 하나는 가장 권장되는 코딩 스타일을 배우고 따르는 것이었습니다. 이 가이드는 일관되고 우아한 코드를 작성하는 데 도움을 주고 코드 가독성과 유지 관리성을 향상시키는 몇 가지 조언을 포함합니...

프로그램 작성 2024-11-06에 게시됨
유형이 Go의 인터페이스를 충족하는지 확인하기
Go에서 개발자는 종종 인터페이스를 사용하여 예상되는 동작을 정의하여 코드를 유연하고 강력하게 만듭니다. 하지만 특히 대규모 코드베이스에서 유형이 인터페이스를 실제로 구현하는지 어떻게 보장할 수 있을까요? Go는 컴파일 타임에 이를 확인하는 간단하고 효과적인 방법을 ...

프로그램 작성 2024-11-06에 게시됨
JavaScript에서 &#this&# 키워드 마스터하기
JavaScript의 this 키워드는 이해되지 않으면 매우 까다로울 수 있습니다. 숙련된 개발자라도 쉽게 이해하기 어려운 것 중 하나이지만 일단 이해하고 나면 많은 시간을 절약할 수 있습니다. 이 글에서는 그것이 무엇인지, 다양한 상황에서 어떻게 작동하는지, 그리고...

프로그램 작성 2024-11-06에 게시됨
PHP의 사용자 브라우저 감지는 신뢰할 수 있습니까?
PHP를 사용한 안정적인 사용자 브라우저 감지사용자의 브라우저를 결정하는 것은 웹 경험을 맞춤화하는 데 중요할 수 있습니다. PHP는 $_SERVER['HTTP_USER_AGENT']와 get_browser() 함수라는 두 가지 잠재적인 메서드를 제공합니...

프로그램 작성 2024-11-06에 게시됨
웹 애니메이션 강화: 전문가처럼 requestAnimationFrame 최적화
부드럽고 성능이 뛰어난 애니메이션은 최신 웹 애플리케이션에 필수적입니다. 그러나 부적절하게 관리하면 브라우저의 메인 스레드에 과부하가 걸려 성능이 저하되고 애니메이션이 버벅거릴 수 있습니다. rAF(requestAnimationFrame)는 디스플레이의 새로 고침 빈도...

프로그램 작성 2024-11-06에 게시됨
MySQL 서버가 정확히 60초 만에 사라지는 이유는 무엇입니까?
MySQL 서버가 사라졌습니다 - 정확히 60초 만에이 시나리오에서는 이전에 성공적으로 실행되었던 MySQL 쿼리가 이제 60초 후에 시간 초과가 발생하고 "MySQL 서버가 사라졌습니다."라는 오류가 표시됩니다. wait_timeout 변수를 조정했...

프로그램 작성 2024-11-06에 게시됨
`display: block` 및 `width: auto`가 있는 버튼이 컨테이너를 채우기 위해 늘어나지 않는 이유는 무엇입니까?
"display: block" 및 "width: auto"를 사용하는 버튼 동작 이해"display: block"을 설정한 경우 버튼을 사용하면 사용 가능한 전체 너비를 차지하도록 레이아웃이 조정됩니다. 그러나 이를...

프로그램 작성 2024-11-06에 게시됨
Bluesky Social용 봇 만들기
How the bot will work We will develop a bot for the social network Bluesky, we will use Golang for this, this bot will monitor some hashtags ...

프로그램 작성 2024-11-06에 게시됨
PHP의 부동 소수점 연산이 예상치 못한 결과를 생성하는 이유는 무엇입니까?
PHP의 부동소수점 계산 정확도: 까다로운 이유와 이를 극복하는 방법PHP에서 부동소수점 숫자로 작업할 때 중요합니다. 고유한 정확도 한계를 인식해야 합니다. 코드 조각에서 알 수 있듯이:echo("success");} else {echo("...

프로그램 작성 2024-11-06에 게시됨
Python에서 객체를 얻기 위해 변수 ID를 뒤집을 수 있나요?
Python의 변수 ID에서 개체 참조 검색Python의 id() 함수는 개체의 고유 ID를 반환합니다. 이 프로세스를 거꾸로 하고 해당 ID에서 객체를 얻는 것이 가능한지 궁금합니다.구체적으로, 변수의 ID를 역참조하면 원래 객체를 검색하는지 확인하고 싶습니다.der...

프로그램 작성 2024-11-06에 게시됨
Go의 Defer 키워드는 함수 실행 순서에서 어떻게 작동하나요?
Go의 Defer 키워드 기능 이해Go로 작업할 때 defer 키워드의 동작을 이해하는 것이 중요합니다. 이 키워드를 사용하면 개발자는 주변 함수가 반환될 때까지 함수 실행을 연기할 수 있습니다. 그러나 defer 문이 실행될 때 함수의 값과 매개 변수가 평가된다는 점...

프로그램 작성 2024-11-06에 게시됨
WordPress Gutenberg의 전역 상태 관리에 대한 초보자 가이드
복잡한 WordPress 블록 편집기(Gutenberg) 애플리케이션을 구축할 때 상태를 효율적으로 관리하는 것이 중요합니다. @wordpress/data가 중요한 역할을 하는 곳이 바로 여기입니다. WordPress 애플리케이션의 다양한 블록과 구성 요소에서 전역 상...

프로그램 작성 2024-11-06에 게시됨