」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > 一鍵修復依賴關係

一鍵修復依賴關係

發佈於2024-11-08
瀏覽:529

Photo by Maxim Hopman on Unsplash

If you maintain a JVM1 or Android project, chances are you've heard of the Dependency Analysis Gradle Plugin (DAGP). With over 1800 stars, it's used by some of largest Gradle projects in the world, as well as by Gradle itself. It fills what would otherwise be a substantial hole in the Gradle ecosystem: without it, I know of no other way to eliminate unused dependencies and to correctly declare all your actually-used dependencies. In other words, when you use this plugin, your dependency declarations are exactly what you need to build your project: nothing more, nothing less.

That might sound like a small thing, but for industrial-scale projects, a healthy dependency graph is a superpower that prevents bugs, eases debugging (at build and runtime), keeps builds faster, and keeps artifacts smaller. If developer productivity work is the public health of the software engineering world, then a healthy dependency graph is a working sewer system. You don't know how much you rely on it till it stops working and you've got shit everywhere.

The problem is that, if your tool only tells you all the problems you have but doesn't also fix them, you might have a massive(ly annoying) problem on your hands. I mentioned this as an important consideration in my recent rant against code style formatters. This is why, since v1.11.0, DAGP has had a fixDependencies task, which takes the problem report and rewrites build scripts in-place. Even before that, in v0.46.0, the plugin had first-class support for registering a "post-processing task" to enable advanced users to consume the "build health" report in any manner of their choosing. Foundry (née The Slack Gradle Plugin), for example, has a feature called the "dependency rake", which predates and inspired fixDependencies.

fixDependencies hasn't always worked well, though. For one thing, there might be a bug in the analysis such that, if you "fix" all the issues, your build might break. (DAGP is under very active development, so if this ever happens to you, please file an issue!) In this case, it can take an expert to understand what broke and how to fix it, or you can fall back to manual changes and iteration.

For another thing, the build script rewriter has relied on a simplified grammar for parsing and rewriting Gradle Groovy and Kotlin DSL build scripts. That grammar can fail if your scripts are complex.2 This problem will soon be solved with the introduction of a Gradle Kotlin DSL parser built on the KotlinEditor grammar, which has full support for the Kotlin language. (Gradle Groovy DSL scripts will continue to use the old simplified grammar, for now.)

There have also been many recent bugfixes to (1) improve the correctness of the analysis and (2) make the rewriting process more robust in the face of various common idioms. DAGP now has much better support for version catalog accessors, for example (no support yet for experimental project accessors).

With these improvements (real and planned), it's become feasible to imagine automating large-scale dependency fixes across hundreds of repos containing millions of lines of code and have it all just work. Here's the situation:

  1. Over 500 repositories.
  2. Each with its own version catalog.
  3. Most of the entries in the version catalogs use the same names, but there's some incidental skew in the namespace (multiple keys pointing to the same dependency coordinates).
  4. Over 2000 Gradle modules.
  5. Close to 15 million lines of Kotlin and Java code spread out over more than 100 thousand files, along with over 150 thousand lines of "Gradle" code in more than 3 thousand build scripts. This last point isn't as relevant as the first four, but helps to demonstrate what I mean when I say "industrial scale."3

Additionally, the build code we want to write to manage all this should follow Gradle best practices: it should be cacheable to the extent possible, should work with the configuration cache, and for bonus points should not violate the isolated projects contract either (which is also good for maximal performance). The ultimate goal is for developers and build maintainers to be able to run a single task and have it (1) fix all dependency declarations, which might mean adding new declarations to build scripts; (2) all build script declarations should have a version catalog entry wherever possible; (3) and all version catalog entries should come from the same global namespace so that the entire set of 500 repositories are fully consistent with each other. This last part is an important requirement because we're migrating these repos into a single mono/mega repo for other reasons.

Here's the task they can now run, for the record:

gradle :fixAllDependencies

(nb: we use gradle and not ./gradlew because we manage gradle per-repo with hermit.)

So, how do we do it?

Pre-processing

The first step was creating the global version catalog namespace. We did not attempt to actually create a single published global version catalog because, until we finish our megarepo migration, an important contract is that each repo maintains its own dependencies (and their versions). So instead, we collected the full map of version catalog names to dependency identifiers (the dependency coordinates less the version string). We eliminated all the duplication using pre-existing large-scale change tools we have, and then populated the final global set (now with 1:1 mappings) into our convention plugin that is already applied everywhere.

Conceptual framework

The Gradle framework, in general, takes the Project as the most important point of reference.4 A Project instance is what backs all your build.gradle[.kts] scripts, for example, and most plugins implement the Plugin interface. Safe, performant, high-quality build code respects this conceptual boundary and treats each project (AKA "module") as an atomic unit.

If Tasks have well-defined inputs and outputs (literally annotated @Input and @Output), then it might help to also think of projects as having inputs and outputs. In general, a project's inputs are its source code (which by convention is in the src/ directory at the project root), and its dependencies. A project's outputs are the artifacts it produces. For Java projects, the primary artifacts are jars (for external consumption), or class files (for consumption by other projects in a multi-project build).5

With that in mind, we can decide that if two projects need to talk to each other, they should do so via their well-defined inputs and outputs. We define relationships between projects via dependencies (A -> B means A depends on B, so B is an input to A), and we can flavor that connection such that we tell Gradle which of B's outputs A cares about. The default is the primary artifact (usually class files for classpath purposes), but it can also be anything (that can be written to disk). It can, for example, be some metadata about B. It can also be both! (You can declare multiple dependencies between the same two projects, with each edge having a different "flavor," that is, representing a different variant.) This may make more sense in a bit when we get to a concrete example.

Implementation: :fixAllDependencies

The rest of this post will focus on implementation, but at a relatively high level of detail. Some of the code will essentially be pseudocode. My goal is to demonstrate the full flow at a conceptual level, such that a (highly) motivated reader could implement something similar in their own workflow or, more likely, simply learn about how to do something Cool™️ with Gradle.

Here's a sketch of the simplified task graph with Excalidraw:

One click dependencies fix

Note how each project is independent of the other. Well-defined Gradle builds maximize concurrency by respecting project boundaries.

Step 1: The global namespace

As mentioned in the pre-processing section, we need a global namespace. We want all dependency declarations to refer to version catalog entries, i.e., libs.amazingMagic, rather than "com.amazing:magic:1.0". Since DAGP already supports version catalog references in its analysis, this will Just Work if your version catalog already has an entry for amazingMagic = "com.amazing:magic:1.0". However, if you don't, DAGP defaults to the "raw string" declaration. If we want, we can tell DAGP about other mappings that it can't detect by default:

// root build script
dependencyAnalysis {
  structure {
    map.putAll(
      "com.amazing:magic" to "libs.amazingMagic",
      // more entries
    )
  }
}

where dependencyAnalysis.structure.map is a MapProperty, which you can modify directly in your build scripts or via a plugin. Note the "raw string" version of the declaration doesn't include version information; this is important because the version you declare may not match the version that Gradle resolves.

Step 2: Update the version catalog, part 1

With Step 1, DAGP will rewrite build scripts via the built-in fixDependencies task to match your desired schema, but your next build will fail because you'll have dependencies referencing things like libs.amazingMagic which aren't actually present in your version catalog. So now we have to update the version catalog to ensure it has all of these new entries. This will be a multi-step process.

First, we have to calculate the possibly-missing entries. We write a new task, ComputeNewVersionCatalogEntriesTask, and have it extend AbstractPostProcessingTask, which comes from DAGP itself. This exposes a function, projectAdvice(), which gives subclasses access to the "project advice" that DAGP emits to the console, but in a form amenable to computer processing. We'll take that output, filter it for "add advice", and then write those values out to disk via our task's output. We only care about the add advice because that's the only type that might represent a dependency not in a version catalog.

// in a custom task action
val newEntries = projectAdvice()
  .dependencyAdvice
  .filter { it.isAnyAdd() }
  .filter { it.coordinates is ModuleCoordinates }
  .map { it.coordinates.gav() }
  .toSortedSet()

outputFile.writeText(newEntries.joinToString(separator = "\n")

Note that with Gradle task outputs, it's best practice to always sort outputs for stability and to enable use of the remote build cache.

Next we tell DAGP about this post-processing task (which is how it can access projectAdvice():

// subproject's build script
computeNewVersionCatalogEntries = tasks.register(...)

dependencyAnalysis {
  registerPostProcessingTask(computeNewVersionCatalogEntries)
}

And finally we also have to register our new task's output as an artifact of this project!

val publisher = interProjectPublisher(
  project,
  MyArtifacts.Kind.VERSION_CATALOG_ENTRIES
)
publisher.publish(
  computeNewVersionCatalogEntries.flatMap { 
    it.newVersionCatalogEntries 
  }
)

where the interProjectPublisher and related code is heavily inspired by DAGP's artifacts package, because I wrote both. The tl;dr is that this is what teaches Gradle about a project's secondary artifacts. I wish Gradle had a first-class API for this, alas.

Step 3: Update the version catalog, part 2

Back in the root project, we need to declare our dependencies to each subproject, flavoring that declaration to say we want the VERSION_CATALOG_ENTRIES artifact:

// root project
val resolver = interProjectResolver(
  project,
  MyArtifacts.Kind.VERSION_CATALOG_ENTRIES
)

// Yes, this CAN BE OK, but you must only access 
// IMMUTABLE PROPERTIES of each project p.
// This sets up the dependencies from the root to 
// each "real" subproject, where "real" filters
// out intermediate directories that don't have
// any code
allprojects.forEach { p ->
  // implementation left to reader
  if (isRealProject(p)) {
    dependencies.add(
      resolver.declarable.name, 
      // p.path is an immutable property, so we're
      // good
      dependencies.project(mapOf("path" to p.path))
    )
  }
}

val fixVersionCatalog = tasks.register(
  "fixVersionCatalog", 
  UpdateVersionCatalogTask::class.java
) { t ->
    t.newEntries.setFrom(resolver.internal)
    t.globalNamespace.putAll(...)
    t.versionCatalog.set(layout.projectDirectory.file("gradle/libs.versions.toml"))
  }

The root project is the correct place to register this task, because the version catalog will typically live in the root at gradle/libs.versions.toml.

With this setup, a user could now run gradle :fixVersionCatalog, and it would essentially run *:projectHealth, followed by *:computeNewVersionCatalogEntries, followed finally by :fixVersionCatalog, because those are the necessary steps as we've declared and wired them.

This updates the version catalog to contain every necessary reference to resolve all the potential libs. dependency declaration throughout the build.

Step 4: Fix all the dependency declarations

This step leverages DAGP's fixDependencies task, and is really just about wrapping everything up in a neat package.

We want a single task registered on the root. Let's call it :fixAllDependencies. This will be a lifecycle task, and invoking it will trigger :fixVersionCatalog as well as all the *:fixDependencies tasks.

// root project
val fixDependencies = mutableListOf()

allprojects.forEach { p ->
  if (isRealProject(p)) {
    // ...as before...

    // do not use something like `p.tasks.findByName()`,
    // that violates Isolated Projects as well as
    // lazy task configuration.
    fixDependencies.add("${p.path}:fixDependencies")    
  }
}

tasks.register("fixAllDependencies") { t ->
  t.dependsOn(fixVersionCatalog)
  t.dependsOn(fixDependencies)
}

And we're done.6

(Optional) Step 5: Sort dependency blocks

If you do all the preceding, you should have a successful build with a minimal dependency graph. ? But your dependency blocks will be horribly out-of-order, which can make them hard to visually scan. DAGP makes no effort to keep the declarations sorted because that is an orthogonal concern and different teams might have different ordering preferences. This is why I've also authored and published the Gradle Dependencies Sorter CLI and plugin, which applies what I consider to be a reasonable default. If you apply this to your builds (which we do to all of our builds via our convention plugins), you can follow-up :fixAllDependencies with

gradle sortDependencies

and this will usually Just Work. This plugin is in fact already using the enhanced Kotlin grammar from KotlinEditor, so Gradle Kotlin DSL build scripts shouldn't pose a problem for it.

And now we're really done.

Endnotes

1 Currently supported languages: Groovy, Java, Kotlin, and Scala. up

2 This is one reason why I think it's important to keep scripts simple and declarative. up

3 Measured with the cloc tool. up

4 Gradle's biggest footgun, in my opinion, is that the API doesn't enforce this conceptual boundary. up

5 This paragraph is an oversimplification for discussion purposes. up

6 Well, except for automated testing and blog-post writing. up

版本聲明 本文轉載於:https://dev.to/autonomousapps/one-click-dependencies-fix-191p?1如有侵犯,請聯絡[email protected]刪除
最新教學 更多>
  • 使用 Gin/Golang 時如何處理空請求主體:綁定與除錯技術指南
    使用 Gin/Golang 時如何處理空請求主體:綁定與除錯技術指南
    Gin/Golang 中請求正文為空使用Gin 處理POST 請求時,偶爾可能會遇到請求正文顯示為空的問題是空的。這可能會令人沮喪,尤其是當您希望從客戶端接收資料時。此問題的一個常見原因是嘗試直接列印正文。 Gin 將請求內文表示為介面類型 ReadCloser。但列印該介面的字串值並不會洩漏實際的...
    程式設計 發佈於2024-11-08
  • Python 列表理解
    Python 列表理解
    Python 最酷的事情之一是清單推導式如何讓在一行程式碼中建立和操作清單變得非常容易。列表理解是一種透過轉換和過濾現有清單中的元素來建立新清單的簡潔方法。這個特性是 Python 使程式碼更具可讀性和高效性的眾多方法之一,對於初學者來說是一個很好的學習工具。在這裡閱讀更多範例...... 奧利佛 ...
    程式設計 發佈於2024-11-08
  • 如何在 Gin 中組織路由:分組路由定義指南?
    如何在 Gin 中組織路由:分組路由定義指南?
    如何在 Gin 中組織路由為了避免路由定義使主文件混亂,您可以將路由分組到單獨的文件中。這種方法可以實現更好的程式碼組織和可維護性。 要建立嵌套路由分組,您可以將路由器變數儲存在結構體或全域變數中。然後,各個檔案可以將處理程序新增至此共用路由器實例。 範例實作routes.gopackage app...
    程式設計 發佈於2024-11-08
  • Leetcode鍊錶問題
    Leetcode鍊錶問題
    反向鍊錶(LeetCode #206) 難度:簡單 概念:迭代與遞歸方法。 合併兩個排序清單 (LeetCode #21) 難度:簡單 概念:鍊錶遍歷與合併技術。 從清單結束時刪除第 N 個節點 **(LeetCode #19) **難度:中等 概念:兩指針技術(慢指針和快指針)。 鍊錶循環...
    程式設計 發佈於2024-11-08
  • 如何在 C++ 容器中儲存異質物件:boost::any 或自訂實作?
    如何在 C++ 容器中儲存異質物件:boost::any 或自訂實作?
    在C 容器中儲存異質物件C 容器通常需要同質元素,這意味著它們只能保存單一類型的對象。但是,在某些情況下,您可能需要一個可以容納混合資料類型的容器。本文探討如何使用 boost::any 函式庫和自訂方法來實現此目的。 使用 boost::anyboost::any 是一個模板類別可以容納任何 C ...
    程式設計 發佈於2024-11-08
  • 使用 Pandas 掌握數據分析:從數據中釋放洞察力
    使用 Pandas 掌握數據分析:從數據中釋放洞察力
    資料分析是資料科學的核心,Python 的 Pandas 函式庫是一個強大的工具,可以讓這項任務變得更輕鬆、更有效率。無論您使用簡單的電子表格還是大型資料集,Pandas 都可以讓您像專業人士一樣靈活地操作、分析和視覺化資料。在本文中,我們將深入探討 Pandas 的基礎知識,涵蓋從資料操作到進階分...
    程式設計 發佈於2024-11-08
  • 使用 GitLab CI/CD 和 Terraform 實作 Lambda 以進行 SFTP 整合、Go 中的 S Databricks
    使用 GitLab CI/CD 和 Terraform 實作 Lambda 以進行 SFTP 整合、Go 中的 S Databricks
    通过 Databricks 中的流程自动化降低成本 我的客户需要降低在 Databricks 上运行的流程的成本。 Databricks 负责的功能之一是从各种 SFTP 收集文件,解压缩它们并将它们放入数据湖中。 自动化数据工作流程是现代数据工程的重要组成部分。在本文中,我们将探...
    程式設計 發佈於2024-11-08
  • 最佳免費開源圖示庫 4
    最佳免費開源圖示庫 4
    In 2024, finding the best free icon library can significantly enhance the visual appeal of your websites, apps, or digital projects. Whether you're a ...
    程式設計 發佈於2024-11-08
  • React Part 元件、State 和 Props 入門
    React Part 元件、State 和 Props 入門
    歡迎回到我們的 React.js 之旅!在上一篇文章中,我們介紹了 React 的基礎知識,強調了它作為建立動態使用者介面的函式庫的優勢。今天,我們將深入探討創建 React 應用程式所需的三個基本概念:元件、狀態和屬性。讓我們詳細探討這些概念! 什麼是 React 元件? Rea...
    程式設計 發佈於2024-11-08
  • 如何利用原生 ES6 Promises 有效地連結異步 jQuery 函數?
    如何利用原生 ES6 Promises 有效地連結異步 jQuery 函數?
    JavaScript 的互通性承諾實現非同步jQuery 函數的高效連結連結異步jQuery 函數時,通常需要避免jQuery 的內建函數Promises 功能並使用原生ES6 Promises 取代。這種互通性允許 jQuery 操作和您想要的 Promise 實現之間的無縫整合。 使用Nativ...
    程式設計 發佈於2024-11-08
  • 在 Python 中使用 ElementTree 的「find」和「findall」方法時如何忽略 XML 命名空間?
    在 Python 中使用 ElementTree 的「find」和「findall」方法時如何忽略 XML 命名空間?
    在ElementTree 的“find”和“findall”方法中忽略XML 命名空間使用ElementTree 模組解析和定位XML 文件中的元素時,命名空間會帶來複雜性。以下介紹如何在 Python 中使用「find」和「findall」方法時忽略命名空間。 當 XML 文件包含命名空間時,會導...
    程式設計 發佈於2024-11-08
  • Bitbucket 綜合指南:功能、整合和最佳實踐
    Bitbucket 綜合指南:功能、整合和最佳實踐
    Bitbucket简介 Bitbucket 是 Atlassian 旗下基于 Git 的源代码存储库托管服务,以其强大的集成能力和强大的协作功能而闻名。它适合各种规模的团队,提供可简化开发工作流程、提高生产力并确保安全代码管理的解决方案。无论您是小型团队还是大型企业的一部分,Bitbucket 都...
    程式設計 發佈於2024-11-08
  • 用於 PDF 處理的 PHP 庫:評估和用例指南
    用於 PDF 處理的 PHP 庫:評估和用例指南
    PDF generation and processing is a common requirement in many web applications, especially for generating invoices, reports, or documents dynamically....
    程式設計 發佈於2024-11-08
  • 如何在 Python 中用逗號連接清單中的字串?
    如何在 Python 中用逗號連接清單中的字串?
    從列表中用逗號連接字符串將字符串列表映射到逗號分隔的字符串是編程中的常見任務。可以採用各種方法來實現此目標,每種方法都有自己的優點和缺點。 一種流行的方法是將 join 方法與映射函數結合使用。此方法需要建立一個中間字串,用作各個字串之間的分隔符號。例如:my_list = ['a', 'b', '...
    程式設計 發佈於2024-11-08
  • 如何處理 AngularJS 應用程式中的錨點哈希連結?
    如何處理 AngularJS 應用程式中的錨點哈希連結?
    AngularJS 中的錨點哈希處理使用錨點瀏覽網頁是一種常見的做法,特別是對於具有多個部分的長頁面。然而,在 AngularJS 應用程式中,錨連結處理可能會出現問題。 當點擊 AngularJS 中的錨定連結時,預設行為是攔截點擊並將使用者重新導向到不同的頁面。為了解決這個問題,AngularJ...
    程式設計 發佈於2024-11-08

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3