"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How do we group related identifiers in an undirected graph represented by two columns, \'Identifier1\' and \'Identifier2\', and assign them unique group IDs?

How do we group related identifiers in an undirected graph represented by two columns, \'Identifier1\' and \'Identifier2\', and assign them unique group IDs?

Published on 2024-10-31
Browse:737

How do we group related identifiers in an undirected graph represented by two columns, \'Identifier1\' and \'Identifier2\', and assign them unique group IDs?

Finding Connected Subgraphs in an Undirected Graph

Problem:

Given an undirected graph represented by two columns, 'Identifier1' and 'Identifier2', how do we group identifiers that are related to each other and assign them unique group IDs?

Solution:

This problem can be solved by treating the data as edges in a graph and traversing all edges recursively.

Recursive Algorithm:

  1. Create a table containing all unique identifiers from both columns.
  2. Create a table containing all edges (pairs of identifiers) in both directions.
  3. Define a recursive query that traverses the graph starting from each identifier and builds a path of traversed identifiers.
  4. Group the results by the starting identifier (anchor identifier) to identify connected components.
  5. Assign a unique group ID to each connected component based on the anchor identifier.

Example Query (SQL):

WITH
CTE_Idents AS (
    SELECT Ident1 AS Ident
    FROM @T

    UNION

    SELECT Ident2 AS Ident
    FROM @T
),
CTE_Pairs AS (
    SELECT Ident1, Ident2
    FROM @T
    WHERE Ident1 <> Ident2

    UNION

    SELECT Ident2 AS Ident1, Ident1 AS Ident2
    FROM @T
    WHERE Ident1 <> Ident2
),
CTE_Recursive AS (
    SELECT
        CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent 
        , Ident1
        , Ident2
        , CAST(','   Ident1   ','   Ident2   ',' AS varchar(8000)) AS IdentPath
        , 1 AS Lvl
    FROM 
        CTE_Pairs
        INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1

    UNION ALL

    SELECT 
        CTE_Recursive.AnchorIdent 
        , CTE_Pairs.Ident1
        , CTE_Pairs.Ident2
        , CAST(CTE_Recursive.IdentPath   CTE_Pairs.Ident2   ',' AS varchar(8000)) AS IdentPath
        , CTE_Recursive.Lvl   1 AS Lvl
    FROM
        CTE_Pairs
        INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
    WHERE
        CTE_Recursive.IdentPath NOT LIKE CAST('%,'   CTE_Pairs.Ident2   ',%' AS varchar(8000))
),
CTE_RecursionResult AS (
    SELECT AnchorIdent, Ident1, Ident2
    FROM CTE_Recursive
),
CTE_CleanResult AS (
    SELECT AnchorIdent, Ident1 AS Ident
    FROM CTE_RecursionResult

    UNION

    SELECT AnchorIdent, Ident2 AS Ident
    FROM CTE_RecursionResult
)
SELECT
    CTE_Idents.Ident
    ,CASE WHEN CA_Data.XML_Value IS NULL 
    THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers
    ,DENSE_RANK() OVER(ORDER BY 
        CASE WHEN CA_Data.XML_Value IS NULL 
        THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END
    ) AS GroupID
FROM
    CTE_Idents
    CROSS APPLY
    (
        SELECT CTE_CleanResult.Ident ','
        FROM CTE_CleanResult
        WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
        ORDER BY CTE_CleanResult.Ident FOR XML PATH(''), TYPE
    ) AS CA_XML(XML_Value)
    CROSS APPLY
    (
        SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)')
    ) AS CA_Data(XML_Value)
WHERE
    CTE_Idents.Ident IS NOT NULL
ORDER BY Ident;

Key Points:

  • The recursive CTE (Common Table Expression) traverses the graph and builds connected components.
  • The final SELECT statement assigns group IDs and generates output in the desired format.
  • This solution is optimized to avoid redundant calculations and provide efficient results.
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3