"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How many Python packages are versioned correctly?

How many Python packages are versioned correctly?

Published on 2024-11-08
Browse:636

The other day, as I was looking into a database of vulnerabilities in Python packages, I realized that some of the package versions in there could not be easily parsed and compared with other version strings because they did not abide by the standards of Python versioning - either the old PEP 440 or the Version Specifiers specification that superseded it. So I started wondering how common this was. How many packages on the Python Package Index actually have valid versions?

The obvious answer was: go check. So I created a new virtual environment, downloaded requests, and proceeded to write a multiprocessing script to query the PyPI API for literally every version string used by every package . It took me a few hours even running on all cores but by the end of it I had retrieved over 6,057,703 version strings from 545,018 packages, stored in a neat SQLite database. You can find it on Kaggle.

Next came parsing. I found two libraries that promised to validate a version string for compliance:

  • pepver: "PEP-440 version parsing, interpretation and manipulation"
  • parver: "parver allows parsing and manipulation of PEP 440 version numbers"

Note that to be fair both these still stick to PEP-440, which has now been replaced, so I will keep that in mind, especially when looking at the strings marked as non compliant.

After another couple of hours of intense multiprocessing I had updated my database with two boolean columns indicating whether the strings parsed successfully with these two packages (also on Kaggle).

The results

How many Python packages are versioned correctly?

For a quick summary of my findings:

  • out of 6,057,703 version strings, 5,542 (0.09%) were found defective;

  • out of 545,018 packages, 1,285 (0.24%) had at least one defective version string.

So overall the state of the repository seems pretty healthy! The version strings found wrong by both libraries are of all kinds. Some simply use the suffixes in a non-standard way but overall follow the semantic versioning paradigm while others are just commit hashes or strings of words and numbers.

The cases where the two libraries disagree are more interesting. These are the ones that pepver does not validate but parver does:


0.0.2.R
0.0.2.R3
0.0.2.R4
0.0.2.R5
0.0.2.R6
0.0.2.R7


In this case, I would say pepver is in the wrong. Per PEP440 and current versioning rules, r is an acceptable spelling for the post-release tag (standardised to post), and letters are case-insensitive. So effectively 0.0.2.R3 normalizes to 0.0.2.post3 and is perfectly legal.

Meanwhile, here is a random sample of versions that pepver admits but parver does not:


0.0.1dev-20141025
1.5.0-dev-618
0.3.4.dev.20180830
1.15.0-dev-1552
1.4.0-dev-510
0.0.9.dev-20121012
0.2dev-20101203
0.3.4.dev.20180905
1.15.0-dev-1606
0.2.1dev-20110627
1.12.0-dev-1379
1.1.1-dev-275
1.3.1-dev-427


They all have in common the tendency to use other numbers (occasionally dates) after the dev suffix, with some separator. This is indeed also wrong, as the specification doesn't allow for the separator in this case. So again parver seems right.

Anyway, that pretty much satisfied my original curiosity, and reassured me that for the vast majority of cases, the standard methods of parsing and comparing versions will be sufficient. Even among the non-standard versions it's often fairly easy to identify an order, as the deviations are minimal. Still, it's useful to be aware of all the quirks of the official versioning, and to know when we can or can not rely on them.

Release Statement This article is reproduced at: https://dev.to/stur86/how-many-python-packages-are-versioned-correctly-5l8?1 If there is any infringement, please contact [email protected] to delete it
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3