"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > Pydantic • Dealing with validating and sanitizing data

Pydantic • Dealing with validating and sanitizing data

Published on 2024-08-19
Browse:930

Pydantic • Dealing with validating and sanitizing data

Since I started programming, I've mostly used structured and procedural paradigms, as my tasks required more practical and direct solutions. When working with data extraction, I had to shift to new paradigms to achieve a more organized code.

A example of this necessity was during scraping tasks when I needed to capture specific data that was initially of a type I knew how to handle, but then suddenly, it either didn't exist or appeared in a different type during the capture.

Consequently, I had to add some if's and try and catch blocks to check if the data was an int or a string ... later discovering that nothing was captured, None, etc. With dictionaries, I ended up saving some uninteresting "default data" in situations like:

data.get(values, 0)

Well, the confusing error messages certainly had to stop appearing.

That's how Python is dynamic. Variables can have their types changed whenever it pleases, until you need more clarity about the types you are working with. Then suddenly, a bunch of information appears, and now I'm reading about how I can deal with data validation, with the IDE helping me with type hints and the interesting pydantic library.

Now, in tasks like data manipulation and with a new paradigm, I can have objects that will have their types explicitly declared, along with a library that will allow validating these types. If something goes wrong, it will be easier to debug by seeing the better-described error information.


Pydantic

So, here is the Pydantic documentation. For more questions, it is always good to consult.

Basically, as we already know, we start with:

pip install pydantic

And then, hypothetically, we want to capture emails from a source that contains these emails, and most of them look like this: "[email protected]". But sometimes, it may come like this: "xxxx@" or "xxxx". We have no doubts about the email format that should be captured, so we will validate this email string with Pydantic:

from pydantic import BaseModel, EmailStr

class Consumer(BaseModel):
    email: EmailStr
    account_id: int

consumer = Consumer(email="teste@teste", account_id=12345)

print(consumer)

Notice that I used an optional dependency, "email-validator", installed with: pip install pydantic[email]. When you run the code, as we know, the error will be in the invalid email format "teste@teste":

Traceback (most recent call last):
  ...
    consumer = Consumer(email="teste@teste", account_id=12345)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ...: 1 validation error for Consumer
email
  value is not a valid email address: The part after the @-sign is not valid. It should have a period. [type=value_error, input_value='teste@teste', input_type=str]

Using optional dependencies to validate data is interesting, just as creating our own validations is, and Pydantic allows this via field_validator. So, we know that account_id must be positive and greater than zero. If it's different, it would be interesting for Pydantic to warn that there was an exception, a value error. The code would then be:

from pydantic import BaseModel, EmailStr, field_validator

class Consumer(BaseModel):
    email: EmailStr
    account_id: int

    @field_validator("account_id")
    def validate_account_id(cls, value):
        """Custom Field Validation"""
        if value 





$ python capture_emails.py
Traceback (most recent call last):
...
    consumer = Consumer(email="[email protected]", account_id=0)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

...: 1 validation error for Consumer
account_id
  Value error, account_id must be positive: 0 [type=value_error, input_value=0, input_type=int]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error

Now, running the code with the correct values:

from pydantic import BaseModel, EmailStr, field_validator

class Consumer(BaseModel):
    email: EmailStr
    account_id: int

    @field_validator("account_id")
    def validate_account_id(cls, value):
        """Custom Field Validation"""
        if value 





$ python capture_emails.py
email='[email protected]' account_id=12345

Right?!

I also read something about the native "dataclasses" module, which is a bit simpler and has some similarities with Pydantic. However, Pydantic is better for handling more complex data models that require validations. Dataclasses was natively included in Python, while Pydantic is not—at least, not yet.

Release Statement This article is reproduced at: https://dev.to/evertontenorio/pydantic-dealing-with-validating-and-sanitizing-data-594p?1 If there is any infringement, please contact [email protected] to delete it
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3