”工欲善其事,必先利其器。“—孔子《论语.录灵公》
首页 > 编程 > 一键修复依赖关系

一键修复依赖关系

发布于2024-11-08
浏览:167

Photo by Maxim Hopman on Unsplash

If you maintain a JVM1 or Android project, chances are you've heard of the Dependency Analysis Gradle Plugin (DAGP). With over 1800 stars, it's used by some of largest Gradle projects in the world, as well as by Gradle itself. It fills what would otherwise be a substantial hole in the Gradle ecosystem: without it, I know of no other way to eliminate unused dependencies and to correctly declare all your actually-used dependencies. In other words, when you use this plugin, your dependency declarations are exactly what you need to build your project: nothing more, nothing less.

That might sound like a small thing, but for industrial-scale projects, a healthy dependency graph is a superpower that prevents bugs, eases debugging (at build and runtime), keeps builds faster, and keeps artifacts smaller. If developer productivity work is the public health of the software engineering world, then a healthy dependency graph is a working sewer system. You don't know how much you rely on it till it stops working and you've got shit everywhere.

The problem is that, if your tool only tells you all the problems you have but doesn't also fix them, you might have a massive(ly annoying) problem on your hands. I mentioned this as an important consideration in my recent rant against code style formatters. This is why, since v1.11.0, DAGP has had a fixDependencies task, which takes the problem report and rewrites build scripts in-place. Even before that, in v0.46.0, the plugin had first-class support for registering a "post-processing task" to enable advanced users to consume the "build health" report in any manner of their choosing. Foundry (née The Slack Gradle Plugin), for example, has a feature called the "dependency rake", which predates and inspired fixDependencies.

fixDependencies hasn't always worked well, though. For one thing, there might be a bug in the analysis such that, if you "fix" all the issues, your build might break. (DAGP is under very active development, so if this ever happens to you, please file an issue!) In this case, it can take an expert to understand what broke and how to fix it, or you can fall back to manual changes and iteration.

For another thing, the build script rewriter has relied on a simplified grammar for parsing and rewriting Gradle Groovy and Kotlin DSL build scripts. That grammar can fail if your scripts are complex.2 This problem will soon be solved with the introduction of a Gradle Kotlin DSL parser built on the KotlinEditor grammar, which has full support for the Kotlin language. (Gradle Groovy DSL scripts will continue to use the old simplified grammar, for now.)

There have also been many recent bugfixes to (1) improve the correctness of the analysis and (2) make the rewriting process more robust in the face of various common idioms. DAGP now has much better support for version catalog accessors, for example (no support yet for experimental project accessors).

With these improvements (real and planned), it's become feasible to imagine automating large-scale dependency fixes across hundreds of repos containing millions of lines of code and have it all just work. Here's the situation:

  1. Over 500 repositories.
  2. Each with its own version catalog.
  3. Most of the entries in the version catalogs use the same names, but there's some incidental skew in the namespace (multiple keys pointing to the same dependency coordinates).
  4. Over 2000 Gradle modules.
  5. Close to 15 million lines of Kotlin and Java code spread out over more than 100 thousand files, along with over 150 thousand lines of "Gradle" code in more than 3 thousand build scripts. This last point isn't as relevant as the first four, but helps to demonstrate what I mean when I say "industrial scale."3

Additionally, the build code we want to write to manage all this should follow Gradle best practices: it should be cacheable to the extent possible, should work with the configuration cache, and for bonus points should not violate the isolated projects contract either (which is also good for maximal performance). The ultimate goal is for developers and build maintainers to be able to run a single task and have it (1) fix all dependency declarations, which might mean adding new declarations to build scripts; (2) all build script declarations should have a version catalog entry wherever possible; (3) and all version catalog entries should come from the same global namespace so that the entire set of 500 repositories are fully consistent with each other. This last part is an important requirement because we're migrating these repos into a single mono/mega repo for other reasons.

Here's the task they can now run, for the record:

gradle :fixAllDependencies

(nb: we use gradle and not ./gradlew because we manage gradle per-repo with hermit.)

So, how do we do it?

Pre-processing

The first step was creating the global version catalog namespace. We did not attempt to actually create a single published global version catalog because, until we finish our megarepo migration, an important contract is that each repo maintains its own dependencies (and their versions). So instead, we collected the full map of version catalog names to dependency identifiers (the dependency coordinates less the version string). We eliminated all the duplication using pre-existing large-scale change tools we have, and then populated the final global set (now with 1:1 mappings) into our convention plugin that is already applied everywhere.

Conceptual framework

The Gradle framework, in general, takes the Project as the most important point of reference.4 A Project instance is what backs all your build.gradle[.kts] scripts, for example, and most plugins implement the Plugin interface. Safe, performant, high-quality build code respects this conceptual boundary and treats each project (AKA "module") as an atomic unit.

If Tasks have well-defined inputs and outputs (literally annotated @Input and @Output), then it might help to also think of projects as having inputs and outputs. In general, a project's inputs are its source code (which by convention is in the src/ directory at the project root), and its dependencies. A project's outputs are the artifacts it produces. For Java projects, the primary artifacts are jars (for external consumption), or class files (for consumption by other projects in a multi-project build).5

With that in mind, we can decide that if two projects need to talk to each other, they should do so via their well-defined inputs and outputs. We define relationships between projects via dependencies (A -> B means A depends on B, so B is an input to A), and we can flavor that connection such that we tell Gradle which of B's outputs A cares about. The default is the primary artifact (usually class files for classpath purposes), but it can also be anything (that can be written to disk). It can, for example, be some metadata about B. It can also be both! (You can declare multiple dependencies between the same two projects, with each edge having a different "flavor," that is, representing a different variant.) This may make more sense in a bit when we get to a concrete example.

Implementation: :fixAllDependencies

The rest of this post will focus on implementation, but at a relatively high level of detail. Some of the code will essentially be pseudocode. My goal is to demonstrate the full flow at a conceptual level, such that a (highly) motivated reader could implement something similar in their own workflow or, more likely, simply learn about how to do something Cool™️ with Gradle.

Here's a sketch of the simplified task graph with Excalidraw:

One click dependencies fix

Note how each project is independent of the other. Well-defined Gradle builds maximize concurrency by respecting project boundaries.

Step 1: The global namespace

As mentioned in the pre-processing section, we need a global namespace. We want all dependency declarations to refer to version catalog entries, i.e., libs.amazingMagic, rather than "com.amazing:magic:1.0". Since DAGP already supports version catalog references in its analysis, this will Just Work if your version catalog already has an entry for amazingMagic = "com.amazing:magic:1.0". However, if you don't, DAGP defaults to the "raw string" declaration. If we want, we can tell DAGP about other mappings that it can't detect by default:

// root build script
dependencyAnalysis {
  structure {
    map.putAll(
      "com.amazing:magic" to "libs.amazingMagic",
      // more entries
    )
  }
}

where dependencyAnalysis.structure.map is a MapProperty, which you can modify directly in your build scripts or via a plugin. Note the "raw string" version of the declaration doesn't include version information; this is important because the version you declare may not match the version that Gradle resolves.

Step 2: Update the version catalog, part 1

With Step 1, DAGP will rewrite build scripts via the built-in fixDependencies task to match your desired schema, but your next build will fail because you'll have dependencies referencing things like libs.amazingMagic which aren't actually present in your version catalog. So now we have to update the version catalog to ensure it has all of these new entries. This will be a multi-step process.

First, we have to calculate the possibly-missing entries. We write a new task, ComputeNewVersionCatalogEntriesTask, and have it extend AbstractPostProcessingTask, which comes from DAGP itself. This exposes a function, projectAdvice(), which gives subclasses access to the "project advice" that DAGP emits to the console, but in a form amenable to computer processing. We'll take that output, filter it for "add advice", and then write those values out to disk via our task's output. We only care about the add advice because that's the only type that might represent a dependency not in a version catalog.

// in a custom task action
val newEntries = projectAdvice()
  .dependencyAdvice
  .filter { it.isAnyAdd() }
  .filter { it.coordinates is ModuleCoordinates }
  .map { it.coordinates.gav() }
  .toSortedSet()

outputFile.writeText(newEntries.joinToString(separator = "\n")

Note that with Gradle task outputs, it's best practice to always sort outputs for stability and to enable use of the remote build cache.

Next we tell DAGP about this post-processing task (which is how it can access projectAdvice():

// subproject's build script
computeNewVersionCatalogEntries = tasks.register(...)

dependencyAnalysis {
  registerPostProcessingTask(computeNewVersionCatalogEntries)
}

And finally we also have to register our new task's output as an artifact of this project!

val publisher = interProjectPublisher(
  project,
  MyArtifacts.Kind.VERSION_CATALOG_ENTRIES
)
publisher.publish(
  computeNewVersionCatalogEntries.flatMap { 
    it.newVersionCatalogEntries 
  }
)

where the interProjectPublisher and related code is heavily inspired by DAGP's artifacts package, because I wrote both. The tl;dr is that this is what teaches Gradle about a project's secondary artifacts. I wish Gradle had a first-class API for this, alas.

Step 3: Update the version catalog, part 2

Back in the root project, we need to declare our dependencies to each subproject, flavoring that declaration to say we want the VERSION_CATALOG_ENTRIES artifact:

// root project
val resolver = interProjectResolver(
  project,
  MyArtifacts.Kind.VERSION_CATALOG_ENTRIES
)

// Yes, this CAN BE OK, but you must only access 
// IMMUTABLE PROPERTIES of each project p.
// This sets up the dependencies from the root to 
// each "real" subproject, where "real" filters
// out intermediate directories that don't have
// any code
allprojects.forEach { p ->
  // implementation left to reader
  if (isRealProject(p)) {
    dependencies.add(
      resolver.declarable.name, 
      // p.path is an immutable property, so we're
      // good
      dependencies.project(mapOf("path" to p.path))
    )
  }
}

val fixVersionCatalog = tasks.register(
  "fixVersionCatalog", 
  UpdateVersionCatalogTask::class.java
) { t ->
    t.newEntries.setFrom(resolver.internal)
    t.globalNamespace.putAll(...)
    t.versionCatalog.set(layout.projectDirectory.file("gradle/libs.versions.toml"))
  }

The root project is the correct place to register this task, because the version catalog will typically live in the root at gradle/libs.versions.toml.

With this setup, a user could now run gradle :fixVersionCatalog, and it would essentially run *:projectHealth, followed by *:computeNewVersionCatalogEntries, followed finally by :fixVersionCatalog, because those are the necessary steps as we've declared and wired them.

This updates the version catalog to contain every necessary reference to resolve all the potential libs. dependency declaration throughout the build.

Step 4: Fix all the dependency declarations

This step leverages DAGP's fixDependencies task, and is really just about wrapping everything up in a neat package.

We want a single task registered on the root. Let's call it :fixAllDependencies. This will be a lifecycle task, and invoking it will trigger :fixVersionCatalog as well as all the *:fixDependencies tasks.

// root project
val fixDependencies = mutableListOf()

allprojects.forEach { p ->
  if (isRealProject(p)) {
    // ...as before...

    // do not use something like `p.tasks.findByName()`,
    // that violates Isolated Projects as well as
    // lazy task configuration.
    fixDependencies.add("${p.path}:fixDependencies")    
  }
}

tasks.register("fixAllDependencies") { t ->
  t.dependsOn(fixVersionCatalog)
  t.dependsOn(fixDependencies)
}

And we're done.6

(Optional) Step 5: Sort dependency blocks

If you do all the preceding, you should have a successful build with a minimal dependency graph. ? But your dependency blocks will be horribly out-of-order, which can make them hard to visually scan. DAGP makes no effort to keep the declarations sorted because that is an orthogonal concern and different teams might have different ordering preferences. This is why I've also authored and published the Gradle Dependencies Sorter CLI and plugin, which applies what I consider to be a reasonable default. If you apply this to your builds (which we do to all of our builds via our convention plugins), you can follow-up :fixAllDependencies with

gradle sortDependencies

and this will usually Just Work. This plugin is in fact already using the enhanced Kotlin grammar from KotlinEditor, so Gradle Kotlin DSL build scripts shouldn't pose a problem for it.

And now we're really done.

Endnotes

1 Currently supported languages: Groovy, Java, Kotlin, and Scala. up

2 This is one reason why I think it's important to keep scripts simple and declarative. up

3 Measured with the cloc tool. up

4 Gradle's biggest footgun, in my opinion, is that the API doesn't enforce this conceptual boundary. up

5 This paragraph is an oversimplification for discussion purposes. up

6 Well, except for automated testing and blog-post writing. up

版本声明 本文转载于:https://dev.to/autonomousapps/one-click-dependencies-fix-191p?1如有侵犯,请联系[email protected]删除
最新教程 更多>
  • 除了“if”语句之外:还有什么地方可以在不进行强制转换的情况下使用具有显式“bool”转换的类型?
    除了“if”语句之外:还有什么地方可以在不进行强制转换的情况下使用具有显式“bool”转换的类型?
    无需强制转换即可上下文转换为 bool您的类定义了对 bool 的显式转换,使您能够在条件语句中直接使用其实例“t”。然而,这种显式转换提出了一个问题:“t”在哪里可以在不进行强制转换的情况下用作 bool?上下文转换场景C 标准指定了四种值可以根据上下文转换为的主要场景bool:语句:if、whi...
    编程 发布于2024-12-25
  • 如何在 PHP 中组合两个关联数组,同时保留唯一 ID 并处理重复名称?
    如何在 PHP 中组合两个关联数组,同时保留唯一 ID 并处理重复名称?
    在 PHP 中组合关联数组在 PHP 中,将两个关联数组组合成一个数组是一项常见任务。考虑以下请求:问题描述:提供的代码定义了两个关联数组,$array1 和 $array2。目标是创建一个新数组 $array3,它合并两个数组中的所有键值对。 此外,提供的数组具有唯一的 ID,而名称可能重合。要求...
    编程 发布于2024-12-25
  • 如何使用 MySQL 查找今天生日的用户?
    如何使用 MySQL 查找今天生日的用户?
    如何使用 MySQL 识别今天生日的用户使用 MySQL 确定今天是否是用户的生日涉及查找生日匹配的所有行今天的日期。这可以通过一个简单的 MySQL 查询来实现,该查询将存储为 UNIX 时间戳的生日与今天的日期进行比较。以下 SQL 查询将获取今天有生日的所有用户: FROM USERS ...
    编程 发布于2024-12-25
  • Bootstrap 4 Beta 中的列偏移发生了什么?
    Bootstrap 4 Beta 中的列偏移发生了什么?
    Bootstrap 4 Beta:列偏移的删除和恢复Bootstrap 4 在其 Beta 1 版本中引入了重大更改柱子偏移了。然而,随着 Beta 2 的后续发布,这些变化已经逆转。从 offset-md-* 到 ml-auto在 Bootstrap 4 Beta 1 中, offset-md-*...
    编程 发布于2024-12-25
  • 尽管代码有效,为什么 POST 请求无法捕获 PHP 中的输入?
    尽管代码有效,为什么 POST 请求无法捕获 PHP 中的输入?
    解决 PHP 中的 POST 请求故障在提供的代码片段中:action=''而不是:action="<?php echo $_SERVER['PHP_SELF'];?>";?>"检查 $_POST数组:表单提交后使用 var_dump 检查 $_POST 数...
    编程 发布于2024-12-25
  • 在 Go 中使用 WebSocket 进行实时通信
    在 Go 中使用 WebSocket 进行实时通信
    构建需要实时更新的应用程序(例如聊天应用程序、实时通知或协作工具)需要比传统 HTTP 更快、更具交互性的通信方法。这就是 WebSockets 发挥作用的地方!今天,我们将探讨如何在 Go 中使用 WebSocket,以便您可以向应用程序添加实时功能。 在这篇文章中,我们将介绍: WebSocke...
    编程 发布于2024-12-25
  • HTML 格式标签
    HTML 格式标签
    HTML 格式化元素 **HTML Formatting is a process of formatting text for better look and feel. HTML provides us ability to format text without us...
    编程 发布于2024-12-25
  • 如何修复 macOS 上 Django 中的“配置不正确:加载 MySQLdb 模块时出错”?
    如何修复 macOS 上 Django 中的“配置不正确:加载 MySQLdb 模块时出错”?
    MySQL配置不正确:相对路径的问题在Django中运行python manage.py runserver时,可能会遇到以下错误:ImproperlyConfigured: Error loading MySQLdb module: dlopen(/Library/Python/2.7/site-...
    编程 发布于2024-12-25
  • 大批
    大批
    方法是可以在对象上调用的 fns 数组是对象,因此它们在 JS 中也有方法。 slice(begin):将数组的一部分提取到新数组中,而不改变原始数组。 let arr = ['a','b','c','d','e']; // Usecase: Extract till index p...
    编程 发布于2024-12-25
  • 插入数据时如何修复“常规错误:2006 MySQL 服务器已消失”?
    插入数据时如何修复“常规错误:2006 MySQL 服务器已消失”?
    插入记录时如何解决“一般错误:2006 MySQL 服务器已消失”介绍:将数据插入 MySQL 数据库有时会导致错误“一般错误:2006 MySQL 服务器已消失”。当与服务器的连接丢失时会出现此错误,通常是由于 MySQL 配置中的两个变量之一所致。解决方案:解决此错误的关键是调整wait_tim...
    编程 发布于2024-12-25
  • 如何将 Pandas DataFrame 字符串条目分解(拆分)为单独的行?
    如何将 Pandas DataFrame 字符串条目分解(拆分)为单独的行?
    将 Pandas DataFrame 字符串条目分解(拆分)为单独的行在 Pandas 中,一个常见的要求是将逗号分隔的值拆分为文本字符串列并为每个条目创建一个新行。这可以通过各种方法来实现。使用Series.explode()或DataFrame.explode()对于Pandas版本0.25.0...
    编程 发布于2024-12-25
  • Java中如何使用Selenium WebDriver高效上传文件?
    Java中如何使用Selenium WebDriver高效上传文件?
    在 Java 中使用 Selenium WebDriver 上传文件:详细指南将文件上传到 Web 应用程序是软件测试期间的一项常见任务。 Selenium WebDriver 是一种流行的自动化框架,它提供了一种使用 Java 代码上传文件的简单方法。然而,重要的是要明白,在 Selenium 中...
    编程 发布于2024-12-24
  • 使用 GNU Emacs 进行 C 语言开发
    使用 GNU Emacs 进行 C 语言开发
    Emacs is designed with programming in mind, it supports languages like C, Python, and Lisp natively, offering advanced features such as syntax highli...
    编程 发布于2024-12-24
  • 如何在 PHP 中打印单引号内的变量?
    如何在 PHP 中打印单引号内的变量?
    无法直接回显带有单引号的变量需要在单引号字符串中打印变量?直接这样做是不可能的。如何在单引号内打印变量:方法 1:使用串联追加 为此,请使用点运算符将变量连接到字符串上:echo 'I love my ' . $variable . '.';此方法将变量追加到字符串中。方法 2:使用双引号或者,在字...
    编程 发布于2024-12-24
  • std::vector 与普通数组:性能何时真正重要?
    std::vector 与普通数组:性能何时真正重要?
    std::vector 与普通数组:性能评估虽然人们普遍认为 std::vector 的操作与数组类似,但最近的测试对这一概念提出了挑战。在本文中,我们将研究 std::vector 和普通数组之间的性能差异,并阐明根本原因。为了进行测试,实施了一个基准测试,其中涉及重复创建和修改大型数组像素对象。...
    编程 发布于2024-12-24

免责声明: 提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发到邮箱:[email protected] 我们会第一时间内为您处理。

Copyright© 2022 湘ICP备2022001581号-3