Building a Graph Navigator in Python
Table of Contents
Introduction
Developing browser-based applications in Python for the visualization and exploration of network graphs and associated data is potentially very useful in many domains but a significant challenge, due to the lack, over many years, of an obvious set of supporting packages to use.
The combination of web application frameworks like Shiny for Python1, component frameworks like Jupyter Widgets, and graph visualization libraries like D3 integrated with these offer an attractive possible solution. This report explores using different combinations of packages to achieve our goals.
The most successful approach by some distance was to use the D3 JavaScript library for the front-end graph layout and visualization, with other UI components provided through Shiny. This was a significant improvement on prior experiences using Bokeh for graph visualization and exploration and is a good candidate for building such applications on top of.
Targets
We want to create an application that, in rough priority order:
- Displays a network graph
- Supports user interactions driving server-side actions
- Supports user interactions driving UI elements mediated by server-side code
- Supports the above with incremental redrawing of updated UI elements
Also, our preferred order for architectures is:
- Primarily Python and server-side driven, with generated front-end Javascript, e.g. Shiny for Python, Bokeh or Plotly Dash.
- As above but with R rather than Python (i.e. R Shiny).
- Primarily front-end driven, with lightweight Javascript or Typescript calling into a mostly separate back-end API written in Python. A simple D3 or React front-end might be a candidate here.
- Anything else (e.g. other languages, heavy front-end frameworks like React).
Ways to extend Shiny
Shiny provides several approaches for extending its built-in component library with arbitrary JavaScript functionality, allowing one to create richer and more interactive user interfaces.
Including JavaScript directly
Shiny allows including JavaScript files
directly within an application's static assets directory (typically named www)
or via functions like ui.include_js
2.
This enables the execution of JavaScript code on the client-side,
allowing for interaction with the DOM and integration with JavaScript libraries.
While this method is straightforward for adding simple JavaScript functionality, wrapping a complex library like Cytoscape.js often benefits from the more structured approaches offered by creating either one-off custom components or custom component packages.
One-off custom components
One-off custom JavaScript components3 are developed for specific applications.
This allows the integration of specific JavaScript libraries without the need to create a full component package.
To create a one-off custom component, you need to write the following:
- A JavaScript output binding that tells Shiny how to render the component in the browser
- A Python output function to describe the HTML output element in the user interface. This refers to the JavaScript output binding.
- A Python render decorator to transfer data from the server to the client. This may refer to the Python output function to provide automated output.
Although not covered here, communication can also occur in the reverse direction, from JavaScript to Python, by using Shiny's input bindings or custom messages
- Input bindings allow JavaScript components to send data back to the Shiny server when user interactions occur.
- Custom messages provide a more flexible mechanism for asynchronous communication, enabling more complex interactions between JavaScript and Python.
For guidance on creating input bindings, see the "Custom components package" page in the Shiny documentation4, which describes how to create a custom component for Shiny inputs.
JavaScript output binding
The output binding typically includes two key methods:
find()
and renderValue()
.
Some sample code:
class TabulatorOutputBinding extends Shiny.OutputBinding { // Find element to render in find(scope) { ... } // Render output element in the found element renderValue(el, payload) { ... } } // Register the binding Shiny.outputBindings.register( new TabulatorOutputBinding(), "shiny-tabulator-output" );
The find()
method is responsible for locating the HTML element in the Shiny application
where the component will be rendered.
This is often done by targeting
a specific CSS class or ID assigned to a div element in the UI.
The renderValue()
method contains the core logic
for initializing and updating the JavaScript display.
This method receives data from the Python back-end, typically in a JSON format,
and uses it to configure the component's elements, style, and layout.
Python server-side code
Output function
On the Python side, an output function needs to be defined to place the designated HTML element in the Shiny application's user interface.
from shiny import App, Inputs, ui from shiny.module import resolve_id from htmltools import HTMLDependency tabulator_dep = HTMLDependency( "tabulator", "5.5.2", source={"subdir": "tabulator"}, script={"src": "tableComponent.js", "type": "module"}, stylesheet={"href": "tabulator.min.css"}, all_files=True, ) def output_tabulator(id, height="200px"): return ui.div( tabulator_dep, # Use resolve_id so that our component will work in a module id=resolve_id(id), class_="shiny-tabulator-output", style=f"height: {height}", )
The htmltools.HTMLDependency
class is used to ensure
that the necessary JavaScript and CSS assets and the custom output binding
are included in the Shiny application.
Render decorator
Additionally, a render decorator is needed to
transform values returned from the functions it decorates into
data to be passed to the JavaScript renderValue()
method,
generally using a JSON-serializable structure.
from shiny.render.renderer import Jsonifiable, Renderer class render_tabulator(Renderer[pd.DataFrame]): """ Render a pandas dataframe as a tabulator table. """ def auto_output_ui(self): """ Express UI for the tabulator renderer """ return ui.output_tabulator(self.output_name) async def transform(self, value: pd.DataFrame) -> Jsonifiable: """ Transform a pandas dataframe into a JSONifiable object that can be passed to the tabulator HTML dependency. """ if not isinstance(value, pd.DataFrame): # Throw an error if the value is not a dataframe raise TypeError(f"Expected a pandas.DataFrame, got {type(value)}. ") # Get data from dataframe as a list of lists where each inner list is a # row, column names as array of strings and types of each column as an # array of strings return { "data": value.values.tolist(), "columns": value.columns.tolist(), "type_hints": value.dtypes.astype(str).tolist(), }
An implementation of Renderer
produces a class to be used as a decorator and
requires at least 3 things:
- auto_output_ui (jm: possibly only needed for Shiny Express)
- either a transform or render function, and
- a value type parameter for the Renderer class.
Custom component packages
A custom component package allows reusing integrations across multiple applications.
The development process involves
building the front-end component using JavaScript or TypeScript,
packaging it with necessary metadata and build scripts,
and then using shiny-bindings-react
to connect to Shiny's reactivity system.
For details see the "Custom components package" page in the Shiny documentation4.
Debugging Shiny
Websockets logging
- https://websockets.readthedocs.io/en/stable/topics/logging.html
- https://websockets.readthedocs.io/en/stable/reference/variables.html
Shiny uses websockets for communication. To increase log verbosity and avoid truncating messages we can run
export WEBSOCKETS_MAX_LOG_SIZE=10000
Supporting packages
D3
- Basis: Pure JavaScript library
- Features: Provides a highly customizable framework for interactive visualizations. Low-level in terms of the graphical primitives it offers, but high-level in that it provides supporting tooling and offers a way to compose sophisticated visualizations from fairly simple elements.
Note:
- Writing "ground-up" visualizations with D3 is actually quite easy, even for someone not familiar with Javascript if using an LLM coding assistant. The resulting code can be directly embedded in HTML and deployed anywhere that will accept fragments of generic HTML documents, e.g. Confluence.
- Using D3 directly is essentially be a two-language approach with Javascript on the front end and Python on the back end. The front-end Javascript can also be extended to call external services, e.g. to get data or to trigger server-side calculations. The main session state and focus of the design moves partly into the Javascript world in this approach.
- This was an approach I hadn't taken seriously until playing around with it in Confluence, and calling out to external data sources. Language model support for wiring up the boilerplate to get started with was especially helpful.
- Other UI elements still have to come from somewhere, e.g. 1) Shiny, Dash or similar, 2) a javascript framework like react, or 3) something like htmx.
See Using D3 for a test application using it.
Jupyter widgets packages
Jupyter widgets, also known as ipywidgets, are interactive HTML widgets for Jupyter notebooks. They allow developers to create GUI-like elements within notebooks. Widgets let users interact with plots, images, and data tables, modify inputs, and see results in real time. They include components like sliders, buttons, and text boxes, which can link to Python functions and data events, allowing dynamic computation and visualization.
Application frameworks like Bokeh, Dash, and Shiny for Python support using Jupyter widgets to add interaction capabilities. This lets developers build browser-based applications with the same widget components that are used in Jupyter notebook environments.
shinywidgets
The shinywidgets Python package (https://shiny.posit.co/py/docs/jupyter-widgets.html) enables the integration of ipywidgets (Jupyter Widgets) into Shiny for Python applications. It is part of the ipywidgets ecosystem, in which popular Python packages provide interactive widgets including:
- Visualization, with tools like Altair, Bokeh, and Plotly
- Mapping, featuring Pydeck and ipyleaflet
- Data display, utilizing ipydatagrid and ipysheet
- 3D rendering, through ipyvolume and pythreejs
- Media, with ipywebrtc
This integration allows for seamless rendering, efficient updates, and responsive user interactions within Shiny applications using ipywidgets.
ipycytoscape
ipycytoscape (https://ipycytoscape.readthedocs.io/en/master/) is an interactive widget for Jupyter notebooks that leverages the Cytoscape.js library to provide dynamic graph visualizations. Its target users are data scientists and researchers who work with network data, as it allows users to create, interact with, and customize graphical representations of network structures using Jupyter widgets. It supports interactivity by allowing the user to bind actions to graph events, like clicks or node selections.
See Using ipycytoscape for test applications using it.
ipysigma
ipysigma is a Jupyter widget built on Sigma.js, designed to render interactive network visualizations effortlessly. It integrates with popular Python libraries such as networkx and igraph, allowing users to map graph structures and metrics into visual formats. ipysigma emphasizes customizable visualization features, focusing on node and edge properties like color, size, and shape, while supporting efficient rendering of large graphs. It also offers interactive capabilities, such as synchronized views for comparing features within the same graph.
For further information, you can visit
- the ipysigma GitHub repository and
- the PyPI page for ipysigma.
- this talk at FOSDEM 2023
ipysigma's main use cases are for exploring graph structure and data rather than graph-driven interactivity which may not be a good fit here.
See Using ipysigma for a test application using it.
Pyvis
- Basis: Wraps the vis.js JavaScript library.
- Features: Designed to make creating interactive network graphs easy with just a few lines of Python code. It allows for interactive exploration and physics adjustments directly within a Jupyter notebook or exported HTML file.
- Links:
- Note:
- Impressive interactivity capabilities, including a built-in control panel to control layout parameters.
- Doesn't have a native option to display using ipywidgets. There is a way to display in a Jupyter notebook by changing the arguments to the constructor.
- It is possible to include pyvis components in Shiny directly, without using py-shinywidgets, as noted in this response to a request for support in py-shinywidgets: https://github.com/posit-dev/py-shinywidgets/issues/63.
- Conclusion:
- Displays graphs using an external HTML file, so rejected.
See Using pyvis for a test application using it.
Not investigated in-depth
Cytoscape.js
Cytoscape.js (https://js.cytoscape.org/) is an open-source JavaScript library for visualizing and analyzing network data. It serves as a building block for creating interactive network visualizations in web browsers5. It's optimized, has no external dependencies, and works with most modern browsers. Features include pan and zoom, node and edge styling, layout algorithms, and support for graph theory algorithms.
Core to Cytoscape.js is the graph instance, initialized with a DOM container, an array of elements (nodes and edges), styling rules, and a layout algorithm. It uses a JSON format to represent graph data, defining nodes and edges as JavaScript objects with unique identifiers and source-target relationships. CSS-based styling allows customization of nodes and edges based on data attributes. The library provides an API for programmatic interaction, including modifying elements, applying layouts, and managing user interactions. Numerous examples and demos available online help showcase its capabilities.
It wasn't investigated outside the context of ipycytoscape. It's possible it may be useful with integration approaches similar ways to D3.
d3blocks
- https://github.com/d3blocks/d3blocks
- Basis: Python library for generating the Javascript and HTML needed for D3 visualizations.
- Features: Create stand-alone and interactive d3 charts. Supports a wide range of common chart types.
- Notes: Generates HTML and JavaScript using jinja. The HTML can then be e.g. shown in a Jupyter notebook. A well-developed package.
ipydagred3
- Basis: Uses dagre-d3 library.
- Features: Specifically designed for drawing Directed Acyclic Graphs (DAGs) within JupyterLab.
ipyelk
- Basis: Uses the Eclipse Layout Kernel (ELK) via elkjs.
- Features: Provides sophisticated layout algorithms, potentially useful for complex diagrams. Mentioned as having potential for features like collapsing/expanding subgraphs.
Dash
- Basis: Built on Plotly.js.
- Features: A very versatile library for creating a wide range of interactive plots, including network graphs, often using NetworkX data structures as input. It offers extensive customization and works seamlessly in Jupyter notebooks and Dash web applications. You can create interactive scatter plots for nodes and line plots for edges.
- Note:
- Search results highlight Plotly frequently for general interactive visualizations in Jupyter.
- Plotly has dedicated support for cytoscape through Dash Cytoscape. The documentation describes how to use callbacks triggered by clicks to add and remove nodes. This seems to be dong using Dash's own facilities for managing inputs, outputs and state rather than using the Cytoscape graph. Presumably the graph is redrawn each time, unless they have a workaround for Cytoscape's bugs.
- I didn't investigate Dash because, while a serious "Shiny for Python" effort, it has never been as well-designed and pleasant to use as Shiny for R, and now there is the true Shiny for Python package from Posit.
Bokeh
- Basis: Bokeh's own JS library.
- Features: Good for creating interactive, web-based visualizations, including network graphs. Known for high-performance interactivity, handling large datasets, and streaming data capabilities. Integrates well with Jupyter.
- Notes: While I have used Bokeh before to build working applications, the programming interface is distinctly lower-level and clunky compared to the reactive model offered by Shiny. Overall Bokeh has quite a high barrier-to-entry compared to Shiny. Unlike many visualization libraries, it's really necessary to grok Bokeh before building effectively with it, and
- Summary: Not investigated here as I know it from previous use and have seen experienced people go down blind alleys with it, ending up doing things manually that can be done in the framework.
Altair
- Basis: Built on Vega-Lite, a declarative grammar.
- Features: Focuses on a declarative approach, simplifying the creation of complex statistical visualizations. Does not directly support network graphs, although the nx_altair package provides some basic support.
- Summary: Not investigated as it seems like it would have quite a low ceiling for customization and interaction through server-side logic compared to options using D3 or similar.
Test applications
Using D3
This is a Shiny Core/D3 App that displays a graph using D3 showing information about the characters in Shakespeare's plays and the relations between them based on the scenes they speak in.
It was originally based on the Tabulator sample for Shiny one-off custom components.
Its features include:
- Interactive Graph Visualization: It uses D3.js to present a graph showing the characters from Shakespeare's plays and the relationships between them based on scene co-occurrences.
- Play Selection: Users can select different plays from a drop-down menu to update the graph and visualize the character interactions specific to the chosen play.
- Display of Character Data: Hovering over a node highlights it and displays information about the corresponding character. The data display is performed using Shiny.
- Focus on Selected Characters: Clicking on a node/character filters the displayed characters to a set relevant to the clicked one. This filtering is done with D3 and does not involve Shiny. Control-click can be used to manage a set of focus nodes.
- Link Representation: Links represent the interactions and are styled based on the interactions between characters inferred from shared scenes.
- Zoom and Pan: The graph visualization supports zoom and pan functionality.
It can be run with
shiny run --reload --log-level debug src/navigator/d3/app.py
The mechanism for focusing on characters uses Shiny to handle the state involved in the selection of nodes and links to display.
This was done to keep the logic for handling this managed through Shiny's reactive mechanism, rather than spread across both Shiny and D3. Roughly, there's extra wiring needed to make any given user-driven update pass through the back-end if it could be handled front-end only, but there's also a cost to having closely-related state spread across front and back ends, as the reactive updates require much more awkward coordination than if they happen solely through Shiny or D3.
Another plus of handling complex logic server side is that we can use the full range of Python packages, testing, etc., whereas our JavaScript development setup is much more basic.
Note that the front-end still manages some orthogonal state, e.g. for pan and zoom.
Overall the experience working with D3 was good, especially after working through the early rounds of updates on top of the sample code. The interaction between Shiny and D3 needs to be managed, but it's straightforward to keep the overall interaction structure simple, even for applications with fairly sophisticated interactions.
from dataclasses import dataclass, asdict from pathlib import Path import networkx as nx import pandas as pd import htmltools import shiny import shinyswatch from shiny import App, ui, render from shiny.module import resolve_id from shiny.render.renderer import Jsonifiable, Renderer from shiny.types import SilentException from navigator.shakespeare import CharacterNetwork, NodeId www_dir = Path(__file__).parent.parent.parent / "www" PLAY_DATA = CharacterNetwork( pd.read_csv(www_dir / "data" / "shakespeare_plays.csv", index_col=0) ) d3_dependency = htmltools.HTMLDependency( "d3graph", "0.0.0", source={"subdir": str(www_dir / "gen")}, script={"src": "d3_navigator.js", "type": "module"}, stylesheet={"href": "d3_navigator.css"}, all_files=True, ) @dataclass class SelectionState: zoom_nodes: list[NodeId] @dataclass class GraphAndState: graph: nx.Graph state: SelectionState class render_character_graph(Renderer[GraphAndState]): """ Render a dict as a D3 graph. """ async def transform(self, value: GraphAndState) -> Jsonifiable: """Transform a networkx graph representing character interactions in Shakespeare's plays into a Jsonifiable object. """ assert isinstance(value, GraphAndState) G = value.graph state = value.state def node_id_as_str(node_id) -> str: return ", ".join([node_id.play, node_id.name]) nodes = [ { "id": node_id_as_str(node_id), "play": node_id.play, "name": node_id.name, "sentences": int(node_attributes["sentences"]), } for node_id, node_attributes in G.nodes(data=True) ] links = [ { "source": node_id_as_str(source), "target": node_id_as_str(target), **edge_attributes, } for source, target, edge_attributes in G.edges(data=True) ] for node in nodes: node['is_focal'] = NodeId(play=str(node['play']), name=str(node['name'])) in set(state.zoom_nodes) for link in links: interactions = link["interactions"] link["interaction_strength"] = sum( interaction["sentences_char1"] * interaction["sentences_char2"] for interaction in interactions ) return { "nodes": nodes, "links": links, "state": asdict(state), } # type: ignore [reportReturnType] app_ui = ui.page_fluid( ui.tags.head( ui.tags.title("D3.js Directed Graph in Shiny"), d3_dependency, ), ui.div( id=resolve_id("d3Graph"), class_="shiny-d3-graph-output", ), ui.h2("Shakespeare's plays and characters"), ui.layout_columns( ui.card( ui.card_header("Inputs"), ui.input_select( id="play_name", label="Play", choices=PLAY_DATA.play_names, selected="Henry V", )), ui.card( ui.card_header("Overview"), ui.tags.svg( id="d3-graph", style="height: 600px; border: 1px solid black;", )), ui.card( ui.card_header("Details"), ui.div( ui.output_ui("detail_character_output")), ui.output_ui("detail_interaction_output")), col_widths={"xl": (2, 7, 3)}, ), ui.div(id="tooltip"), theme=shinyswatch.theme.cerulean, ) def server(input: shiny.Inputs, output: shiny.Outputs, _session: shiny.Session): zoom_nodes = shiny.reactive.Value(list[NodeId]()) @shiny.reactive.effect def handle_node_on_click(): play = input.play_name() click = input.node_on_click() shiny.req(click) clicked_node = NodeId(play=click['play'], name=click['name']) if click['ctrl']: # use isolate to avoid an infinite loop with shiny.reactive.isolate(): old_zoom_nodes = {n for n in zoom_nodes() if n.play == play} new_zoom_nodes = list({clicked_node} ^ old_zoom_nodes) else: new_zoom_nodes = [clicked_node] zoom_nodes.set(new_zoom_nodes) @shiny.reactive.calc def detail_character() -> NodeId | None: play = input.play_name() try: character = input.detail_character() except SilentException: return None if character["play"] != play: return None return NodeId(**character) @render_character_graph def d3Graph(): play_name = input.play_name() G = PLAY_DATA.graph if not zoom_nodes(): print(f"jm - displaying all characters in {play_name}") filter_node = lambda n: n.play == play_name # noqa: E731 else: neighborhood = set() cutoff=1 for source in zoom_nodes(): neighborhood |= set(nx.single_source_shortest_path_length(G, source, cutoff=cutoff)) filter_node = lambda n: n in neighborhood # noqa: E731 return GraphAndState( graph=nx.subgraph_view(G, filter_node=filter_node), state=SelectionState(zoom_nodes=zoom_nodes()) ) @staticmethod def character_html(character_graph: nx.Graph, *, character_id: NodeId) -> ui.Tag: """Return html representing a character""" assert isinstance(character_graph, nx.Graph) character_data = character_graph.nodes[character_id].copy() character_data.pop("play") return ui.div( *( ui.p(ui.strong(f"{key}:"), str(value)) for key, value in character_data.items() ), ) @output @render.ui def detail_character_output(): character_id = detail_character() if character_id is None: return ui.HTML("") html = character_html( PLAY_DATA.graph, character_id=character_id ) return ui.HTML(html) @render.table def detail_interaction_output(): interaction = input.detail_interaction() play_name = input.play_name() source_name = interaction["source"]["name"] source_play = interaction["source"]["play"] target_name = interaction["target"]["name"] target_play = interaction["target"]["play"] assert source_play == target_play if play_name != source_play: print("jm - Returning empty interactions DataFrame") return pd.DataFrame() interactions = PLAY_DATA.graph.edges[ NodeId(play=source_play, name=source_name), NodeId(play=target_play, name=target_name), ]["interactions"] return pd.DataFrame(interactions).rename( columns={ "sentences_char1": f"Sentences by {source_name}", "sentences_char2": f"Sentences by {target_name}", } ) app = App(app_ui, server, static_assets=www_dir)
import * as d3 from "https://cdn.jsdelivr.net/npm/d3@7/+esm"; // Using d3 v7 ESM function ensureG(parentSelection, className) { return parentSelection.data([42]).join("g").classed(className, true); } function getRadiusScale(nodeData, {radiusAttr, minRadius, maxRadius}) { const minRadiusAttr = d3.min(nodeData, d => d[radiusAttr]); const maxRadiusAttr = d3.max(nodeData, d => d[radiusAttr]); return d3.scaleSqrt() .domain([minRadiusAttr, maxRadiusAttr]) .range([minRadius, maxRadius]); } function createZoomBehavior(zoomGroup){ return d3.zoom() .scaleExtent([0.1, 10]) .on("zoom", (event) => { zoomGroup.attr("transform", event.transform); }); } function enterNode({width, height}, {radiusScale, radiusAttr}) { return function(enter) { const nodeEnter = enter.append("g") .attr("class", "node") .attr("transform", d => { // Position new nodes randomly within the SVG const x = Math.random() * width; const y = Math.random() * height; return `translate(${x}, ${y})`; }); nodeEnter.append("circle") .attr("r", d => radiusScale(d[radiusAttr])); nodeEnter.append("text") .attr("dy", "0.75em") .attr("y", d => radiusScale(d[radiusAttr]) + 5) // Position text below the circle .text(d => d.name) .style("pointer-events", "none") // Make text not interfere with node click ; nodeEnter.classed("focal-node", d => d.is_focal); return nodeEnter; }; } function linkStringId(d, i) { // an identifying hash for a link, as a string return `${d.source.id}--${d.target.id}`; } function getLinkWidthScale(linkData) { const linkStrengthExtent = d3.extent(linkData, d => d.interaction_strength); return d3.scaleSqrt() .domain(linkStrengthExtent) .range([0, 10]); } function getLinkColorScale(linkData) { const linkStrengthExtent = d3.extent(linkData, d => d.interaction_strength); return d3.scaleSqrt() .domain(linkStrengthExtent) .range(["#fff", "#f00"]); } function createLink(linkGroup, linkData) { const linkWidthScale = getLinkWidthScale(linkData); const linkColorScale = getLinkColorScale(linkData); return linkGroup.selectAll("line") .data(linkData, linkStringId) .join("line") .attr("class", "link") .attr("stroke-width", d => linkWidthScale(d.interaction_strength)) .attr("stroke", d => linkColorScale(d.interaction_strength)) ; } function createSimulation({width, height}, nodeData, {radiusAttr, radiusScale}, linkData) { // warning: forceLink modifies linkData. See https://d3js.org/d3-force/link return d3.forceSimulation(nodeData) .force("link", d3.forceLink(linkData).id(node => node.id).distance(100)) .force("charge", d3.forceManyBody().strength(-50)) .force("center", d3.forceCenter(width / 2, height / 2)) .force("collide", d3.forceCollide().radius(d => radiusScale(d[radiusAttr]) + 2)); } function updatePositions(link, node, {width, height}) { node.attr("transform", d => { const x = d.x = Math.max(10, Math.min(width - 10, d.x)); const y = d.y = Math.max(10, Math.min(height - 10, d.y)); return `translate(${x},${y})`; }); link.attr("x1", d => d.source.x) .attr("y1", d => d.source.y) .attr("x2", d => d.target.x) .attr("y2", d => d.target.y); } function isEventTargetSelected(event) { return d3.select(event.currentTarget).classed('selected'); } function nodeMouseOver(link) { return function(event, d) { Shiny.setInputValue("detail_character", {"play": d.play, "name": d.name}); d3.selectAll(".node").classed("selected", false); // 'this' refers to the node element that the event listener is attached to if (isEventTargetSelected(event)) { d3.select(this).classed("selected", false); } else { d3.select(this).classed("selected", true); } // Highlight links connected to the node link.classed("hidden", true); link.filter(l => (l.source === d) || (l.target === d)) .classed("hidden", false) .classed("highlighted-link", true); }; } function nodeMouseOut(link) { return function(event, d) { d3.select(this).classed("selected", false); link.classed("highlighted-link", false) .classed("hidden", false); }; } function linkMouseOver() { return function(event, d) { Shiny.setInputValue("detail_interaction", {"source": {"play": d.source.play, "name": d.source.name}, "target": {"play": d.target.play, "name": d.target.name}}); }; } function displayGraph({svgGeom, nodeData, nodeGeom, nodeGroup, linkData, linkGroup}) { let forceSimulation; function displayGraphInner(innerNodeData, innerLinkData) { const node = nodeGroup.selectAll("g") .data(innerNodeData, d => d.id) .join(enterNode(svgGeom, nodeGeom), update => update, exit => exit.remove()); node.classed("in_focus", d => d.is_focal); const link = createLink(linkGroup, innerLinkData); node.on("mouseover", nodeMouseOver(link)) .on("mouseout", nodeMouseOut(link)) .on("click", (event, d) => { Shiny.setInputValue("node_on_click", {"play": d.play, "name": d.name, "ctrl": event.ctrlKey}); }); link.on("mouseover", linkMouseOver()); if (forceSimulation) { forceSimulation.stop(); } forceSimulation = createSimulation(svgGeom, innerNodeData, nodeGeom, innerLinkData); forceSimulation.on("tick", () => updatePositions(link, node, svgGeom)); } displayGraphInner(nodeData, linkData); } if (Shiny) { class NavigatorGraphOutputBinding extends Shiny.OutputBinding { find(scope) { return scope.find(".shiny-d3-graph-output"); } renderValue(el, payload) { const { nodes: nodeData, links: linkData, state: state } = payload; const svg = d3.select("#d3-graph"); const zoomGroup = ensureG(svg.selectAll(".zoom-container"), "zoom-container"); const linkGroup = ensureG(zoomGroup.selectAll(".links"), "links"); const nodeGroup = ensureG(zoomGroup.selectAll(".nodes"), "nodes"); const width = svg.node().getBoundingClientRect().width; const height = svg.node().getBoundingClientRect().height; const radiusAttr = "sentences"; const radiusScale = getRadiusScale(nodeData, {radiusAttr: radiusAttr, minRadius: 5, maxRadius: 25}); svg.call(createZoomBehavior(zoomGroup)); const svgGeom = { width: width, height: height, }; const nodeGeom = { radiusScale: radiusScale, radiusAttr: radiusAttr }; displayGraph({svgGeom: svgGeom, nodeGroup: nodeGroup, nodeData: nodeData, nodeGeom: nodeGeom, linkGroup: linkGroup, linkData: linkData, }); } } Shiny.outputBindings.register(new NavigatorGraphOutputBinding(), "shiny-d3-graph-output"); }
body { font-family: 'Inter', sans-serif; } .hidden { display: none; } .node circle { fill: #69b3a2; stroke: #000000; stroke-width: 1.5px; transition: all 0.2s ease-in-out; /* Smooth transition for highlight */ } .node.selected circle { fill: #ff0000; } .node.in_focus circle { stroke-width: 3px; } .node text { font-size: 10px; pointer-events: none; /* Prevent text from interfering with drag */ text-anchor: middle; fill: #333; transition: all 0.2s ease-in-out; /* Smooth transition for highlight */ } .node.selected text { font-weight: bold; fill: #ff0000; } .link { /* stroke: #909090; */ stroke-opacity: 0.6; stroke-width: 2px; fill: none; /* Ensure links are lines, not filled shapes */ } .highlighted-link { stroke: #f00; /* Red color for highlighted links */ } .tooltip { position: absolute; text-align: center; width: auto; height: auto; padding: 8px; background: lightsteelblue; border: 0px; border-radius: 8px; pointer-events: none; opacity: 0; transition: opacity 0.2s; } table { width: 100%; border-collapse: collapse; margin-top: 10px; } th, td { border: 1px solid #ddd; padding: 8px; text-align: left; } th { background-color: #f2f2f2; }
Using ipycytoscape
Since Shiny provides the shinywidgets
package for integrating Jupyter widgets,
and ipycytoscape provides a Jupyter widget package for Cytoscape.js,
the obvious option is to use these together.
This should allow using Cytoscape.js in Shiny without creating a custom JavaScript component, or integrating Cytoscape.js ad hoc.
Ipyleaflet Shiny Express app
This demonstrates that we can use reactive.effect to update ipywidgets in Shiny. It's taken from https://shiny.posit.co/py/docs/jupyter-widgets.html#efficient-updates
It can be run with e.g.
shiny run --reload --log-level debug src/navigator/leaflet_express.py
This seems to work well, even though the Javascript component involved, a full mapping user interface, is quite complex.
import ipyleaflet as ipyl import shiny import shinywidgets as sw import shiny.express from .utils import trace city_centers = { "London": (51.5074, 0.1278), "Paris": (48.8566, 2.3522), "New York": (40.7128, -74.0060), } shiny.express.ui.input_select("center", "Center", choices=list(city_centers.keys())) @sw.render_widget def map(): return ipyl.Map(zoom=4) @shiny.reactive.effect @trace def _(): map.widget.center = city_centers[shiny.express.input.center()]
Ipyleaflet Shiny Core app with incremental redraw
The approach here is based on the Brownian motion example in the Shiny for Python repo at https://github.com/posit-dev/py-shiny/tree/main/examples/brownian
Each call of the server function leads to creating exactly one widget, which is then registered explicitly. Subsequent events trigger reactive updates, which mutate the widget.
shiny run --reload --log-level debug src/navigator/leaflet_core.py
import ipyleaflet import shiny import shinywidgets city_centers = { "London": (51.5074, 0.1278), "Paris": (48.8566, 2.3522), "New York": (40.7128, -74.0060), } map_id = "three_cities" app_ui = shiny.ui.page_fixed( shiny.ui.h2("Ipyleaflet widget, Shiny Core, incremental redraw"), shiny.ui.input_select("center", "Center", choices=list(city_centers)), shinywidgets.output_widget(map_id) ) def server(input, _output, _session): widget = ipyleaflet.Map(zoom=4) shinywidgets.register_widget(map_id, widget) @shiny.reactive.effect def _(): widget.center = city_centers[input.center()] # pyright: ignore[reportOptionalMemberAccess] app = shiny.App(app_ui, server)
Cytoscape Shiny Express app with full redraw
This is a Shiny Express App that displays an ipycytoscape Graph, with number of nodes driven by a reactive input.
This partially works:
- It displays the initial cytoscape graph for up to c. 20 nodes
- It responds to events like mouseover on nodes
However it has major bugs:
With more nodes, e.g. 30 it fails, with an error
Cannot read properties of undefined (reading 'on_some_change')
Even with only a few nodes, updating the input frequently fails with an error
Cannot read properties of undefined (reading 'on_some_change')
- These errors appear to be more common with
--log-level=debug
.
It can be run with
shiny run --reload --log-level debug src/navigator/cyto_express_full.py
import os import shiny.express import shinywidgets as sw import networkx as nx from ipycytoscape import CytoscapeWidget from navigator.utils import log_clicks, log_mouseovers, trace INITIAL_NUM_NODES = int(os.environ.get("NAVIGATOR_INITIAL_NUM_NODES", 5)) # For Shiny Express the UI elements are inferred from inline ui and render expressions. # These cannot be made into assignment statements shiny.express.ui.h2("Cytoscape Graph (Express API)") shiny.express.ui.input_numeric("num_nodes", "Number of nodes:", INITIAL_NUM_NODES, min=1) @sw.render_widget @trace def graph(): num_nodes = shiny.express.input.num_nodes() widget = CytoscapeWidget() widget.on("node", "mouseover", log_mouseovers) widget.on("node", "click", log_clicks) widget.graph.add_graph_from_networkx(nx.complete_graph(num_nodes)) return widget
Cytoscape Shiny Core app with full redraw
This is a Shiny Core App that displays an ipycytoscape Graph, with number of nodes driven by a reactive input.
It has similar issues to the corresponding Shiny Express app.
It can be run with
shiny run --reload --log-level debug src/navigator/cyto_core_full.py
import os from shiny import App, ui, Session import shinywidgets as sw import networkx as nx from ipycytoscape import CytoscapeWidget from navigator.utils import log_clicks, log_mouseovers, trace INITIAL_NUM_NODES = int(os.environ.get("NAVIGATOR_INITIAL_NUM_NODES", 5)) # For Shiny Core the UI elements are defined explicitly in a page container app_ui = ui.page_fluid( ui.h2("Cytoscape Graph (Core API)"), ui.input_numeric("num_nodes", "Number of nodes:", value=INITIAL_NUM_NODES, min=1), sw.output_widget("graph_output"), ) def server(input, output, session: Session): # Note: @sw.render_widget handles the @render.display/@output automatically # when the function name matches the output_widget ID. @sw.render_widget @trace def graph_output(): num_nodes = input.num_nodes() widget = CytoscapeWidget() widget.on("node", "mouseover", log_mouseovers) widget.on("node", "click", log_clicks) widget.graph.add_graph_from_networkx(nx.complete_graph(num_nodes)) return widget app = App(app_ui, server)
Cytoscape Shiny Core app with full redraw, no networkx
This is a Shiny Core App that displays an ipycytoscape Graph,
with number of nodes driven by a reactive input.
The Cytoscape graph is constructed explicitly
rather than using ipycytoscape's add_graph_from
methods.
It has similar issues to the corresponding Shiny Core app using networkx, but less severe in that it works properly with more nodes in the graph; changing the number of nodes in the graph generally works when <= 7.
It can be run with
shiny run --reload --log-level debug src/navigator/cyto_core_full2.py
import os from shiny import App, ui, Session import shinywidgets as sw from ipycytoscape import CytoscapeWidget from navigator.cyto_utils import make_complete_cyto from navigator.utils import log_clicks, log_mouseovers, trace INITIAL_NUM_NODES = int(os.environ.get("NAVIGATOR_INITIAL_NUM_NODES", 5)) # For Shiny Core the UI elements are defined explicitly in a page container app_ui = ui.page_fluid( ui.h2("Cytoscape Graph (Core API, Cytoscape graph)"), ui.input_numeric("num_nodes", "Number of nodes:", value=INITIAL_NUM_NODES, min=1), sw.output_widget("graph_output"), ) def server(input, output, session: Session): # Note: @sw.render_widget handles the @render.display/@output automatically # when the function name matches the output_widget ID. @sw.render_widget @trace def graph_output(): num_nodes = input.num_nodes() widget = CytoscapeWidget(graph = make_complete_cyto(num_nodes=num_nodes)) widget.on("node", "mouseover", log_mouseovers) widget.on("node", "click", log_clicks) return widget app = App(app_ui, server)
Cytoscape Shiny Core app with incremental redraw
import os import networkx as nx import shiny import shinywidgets from ipycytoscape import CytoscapeWidget from navigator.cyto_utils import make_complete_cyto INITIAL_NUM_NODES = int(os.environ.get("NAVIGATOR_INITIAL_NUM_NODES", 1)) graph_id = "graph_output" app_ui = shiny.ui.page_fluid( shiny.ui.h2("Cytoscape Graph (Core API, Cytoscape graph)"), shiny.ui.input_numeric("num_nodes", "Number of nodes:", value=INITIAL_NUM_NODES, min=0), shinywidgets.output_widget(graph_id), ) app_ui = shiny.ui.page_fluid( shiny.ui.h2("Cytoscape Graph (Core API, Cytoscape graph)"), shiny.ui.input_numeric("num_nodes", "Number of nodes:", value=INITIAL_NUM_NODES, min=0), shinywidgets.output_widget(graph_id), ) def server(input, _output, _session: shiny.Session): widget = CytoscapeWidget(graph = make_complete_cyto(num_nodes=INITIAL_NUM_NODES)) shinywidgets.register_widget(graph_id, widget) @shiny.reactive.effect def _(): num_nodes = input.num_nodes() print(f"updating for reactive effect {num_nodes=}") g = widget.graph g.clear() g.add_graph_from_networkx(nx.complete_graph(num_nodes)) print("finished updating") def server(input, _output, _session: shiny.Session): widget = CytoscapeWidget(graph = make_complete_cyto(num_nodes=INITIAL_NUM_NODES)) shinywidgets.register_widget(graph_id, widget) @shiny.reactive.effect def _(): num_nodes = input.num_nodes() print(f"updating for reactive effect {num_nodes=}") g = widget.graph g.clear() g.add_graph_from_networkx(nx.complete_graph(num_nodes)) print("finished updating") app = shiny.App(app_ui, server) def main(): pass if __name__ == '__main__': main()
Cytoscape utils
Some utilities for making graphs using ipycytoscape.
from ipycytoscape import Graph, Node, Edge import networkx as nx def _node_id(obj, prefix: str) -> str: """Return a string node id for the given object""" return prefix + str(obj) def cyto_node(obj, prefix: str = "") -> Node: """Return a Node for the given object and prefix. prefix is used in setting the node ids. """ return Node(data={"id": _node_id(obj, prefix)}) def cyto_edge(source, target, prefix: str = "") -> Edge: """Return an Edge for the given source and target. prefix is used in setting node ides.""" return Edge( data={"source": _node_id(source, prefix), "target": _node_id(target, prefix)} ) def make_complete_cyto(num_nodes: int) -> Graph: """Return a complete graph with the given number of nodes.""" nx_complete_graph = nx.complete_graph(num_nodes) cyto_complete_graph = Graph() prefix = "jmcc" cyto_complete_graph.add_nodes( [cyto_node(nx_node, prefix) for nx_node in nx_complete_graph.nodes()] ) cyto_complete_graph.add_edges( [cyto_edge(u, v, prefix) for u, v in nx_complete_graph.edges()] ) return cyto_complete_graph def cyto_node_ids(g: Graph) -> list[str]: """Return a list of node ids for the graph.""" return [n.data["id"] for n in g.nodes]
Conclusion
ipycytoscape is not a viable candidate to work from.
Although it has an attractive feature set, it is not nearly reliable enough to be productive to work with.
At first I thought this might be due to using the package in Shiny rather than in Jupyter but there are basic problems in that too, e.g. see https://github.com/cytoscape/ipycytoscape/issues/322.
The functionality for manipulating graphs after they have been displayed is completely broken and has clearly never been tested on even basic uses cases like the one in issue 322.
Using ipysigma
ipysigma Shiny Core app with full redraw
This is a Shiny Core App that displays an ipysigma Graph, with number of nodes driven by a reactive input.
It can be run with
shiny run --reload --log-level debug src/navigator/ipysigma_core_full.py
import os import shinywidgets as sw import networkx as nx from ipysigma import Sigma from shiny import App, ui, Session from navigator.utils import trace INITIAL_NUM_NODES = int(os.environ.get("NAVIGATOR_INITIAL_NUM_NODES", 5)) app_ui = ui.page_fluid( ui.h2("Ipysigma Graph (Core API), full redraw"), ui.input_numeric("num_nodes", "Number of nodes:", value=INITIAL_NUM_NODES, min=1), sw.output_widget("graph_output"), ) def server(input, _output, _session: Session): # pyright: ignore[reportUnusedFunction=false] @sw.render_widget @trace def graph_output(): num_nodes = input.num_nodes() nx_graph = nx.complete_graph(num_nodes) widget = Sigma(nx_graph) return widget app = App(app_ui, server)
ipysigma Shiny Core node navigator
This is a Shiny Core App that displays an ipysigma Graph, with the choice of nodes displayed driven by the user's selection of nodes on the displayed graph. Initially no nodes are selected, and in this state the entire underlying graph is shown. When the user selects a node, then the displayed graph is restricted to the selected node and its neighbors. When this happens the graph is re-rendered completely, including recomputing its layout. The layout is determined by ipysigma's default layout engine.
It can be run with
shiny run --reload --log-level debug src/navigator/ipysigma_core_navigator.py
import os from typing import cast import ipywidgets import pandas as pd import shiny import shinywidgets import networkx as nx from ipysigma import Sigma from shiny import App, ui, Session import navigator.graph_utils as graph_utils app_ui = ui.page_fluid( ui.h2("Ipysigma Graph (Core API), node navigator"), shinywidgets.output_widget('graph_output'), ui.output_text('selected_node_description'), ui.output_table('node_data') ) def server(_input, _output, _session: Session): initial_num_nodes: int = int(os.environ.get("NAVIGATOR_INITIAL_NUM_NODES", 5)) base_graph: nx.Graph = graph_utils.create_cycle_graph(n=initial_num_nodes) selected_node_reactive = shiny.reactive.Value[int | None](None) @shinywidgets.render_widget def graph_output() -> Sigma: """Return the ipysigma graph to render. Reactive reads: selected_node_reactive """ selected_node = selected_node_reactive.get() if selected_node is None: graph = base_graph extra_kwargs = {} else: graph = graph_utils.node_and_neighbors(base_graph, selected_node) extra_kwargs = dict(selected_node=selected_node) return Sigma( graph, start_layout=5, raw_node_size=lambda _: 24, raw_node_border_size=lambda _: 4, raw_node_border_color=lambda _: 'blue', show_all_labels=True, raw_node_label_size=lambda _: 18, raw_edge_size=lambda _: 4, raw_edge_color=lambda: 'red', **extra_kwargs ) @shiny.render.text def selected_node_description() -> str: """Return a text description of the selected node. Reactive reads: graph_output.widget Reactive sets: focus_node_reactive """ widget = cast(ipywidgets.Widget, graph_output.widget) selected_node_read = shinywidgets.reactive_read(widget, names="selected_node") selected_node: int | None = ( int(selected_node_read) if selected_node_read is not None else None ) selected_node_reactive.set(value=selected_node) return f"The selected node is: {selected_node}" @shiny.render.table def node_data() -> pd.DataFrame: # pyright: ignore[reportUnusedFunction] """Return a DataFrame with data about the selected node. Reactive reads: focus_node_reactive """ selected_node = selected_node_reactive.get() if selected_node is not None: # Generate some example data for the selected node data = { "Attribute": ["Node", "Degree", "Clustering Coefficient"], "Value": [ selected_node, nx.degree(base_graph, selected_node), nx.clustering(base_graph, selected_node) ] } df = pd.DataFrame(data) else: # Return an empty DataFrame as a placeholder df = pd.DataFrame(columns=["Attribute", "Value"]) return df app = App(app_ui, server)
Conclusion
The combination of ipysigma with Shiny for Python is a viable candidate to build on.
Importantly, ipysigma appears to be robust enough to work with. Unlike ipycytoscape, the core functionality seems to work, and implementing the core functionality needed for output navigation was straightforward to do within Shiny's paradigms. Notably, the package seems to have been developed by people who use it for network analysis, rather than as an exercise in wrapping tools for others.
The main downside of the package is that it is relatively unpopular and the maintenance is sporadic, with a small group of maintainers, a low (but non-zero) rate of commits and a significant number of bugs outstanding; there were 15 at the time of writing with some in quite basic areas like svg export.
A secondary downside is that the general appearance is quite "geeky" and less polished than the containers available through Bokeh, Plotly, etc.
Using pyvis
pyvis Shiny Core node navigator
This is a Shiny Core App that displays a graph using pyvis.
It's based on https://github.com/posit-dev/py-shinywidgets/issues/63
It can be run with
shiny run --reload --log-level debug src/navigator/pyvis_core_navigator.py
Note that pyvis operates, as here, by saving an HTML file. Nothing in the documentation at https://pyvis.readthedocs.io/en/latest/documentation.html indicates that any other mode is supported.
import os from pathlib import Path import shiny import networkx as nx from pyvis.network import Network from shiny import App, ui import navigator.graph_utils as graph_utils # Use a static_assets folder for holding the Network()'s html file WWW = Path(__file__).resolve().parent.parent / "www" PYVIS_OUTPUT_ID = "pyvis" app_ui = ui.page_fluid( ui.output_ui(PYVIS_OUTPUT_ID), ) def server(input: shiny.Inputs, output: shiny.Outputs, _session: shiny.Session): initial_num_nodes: int = int(os.environ.get("NAVIGATOR_INITIAL_NUM_NODES", 5)) @output(id=PYVIS_OUTPUT_ID) @shiny.render.ui def _(): G: nx.Graph = graph_utils.create_cycle_graph(n=initial_num_nodes) net = Network() net.from_nx(G) net.toggle_drag_nodes(True) f = WWW / f"{PYVIS_OUTPUT_ID}.html" new_content = net.generate_html(local=False) try: current_content = f.read_text() except FileNotFoundError: current_content = None # avoid triggering an endless loop of reloads because a watched file changes if new_content != current_content: f.write_text(new_content) return ui.tags.iframe( src=PYVIS_OUTPUT_ID + ".html", style="height:600px;width:100%;", scrolling="no", seamless="seamless", frameBorder="0", ) app = App(app_ui, server, static_assets=WWW)
Conclusion
pyvis is not a useful option given how it operates using external HTML files, although the underlying VisJS library might be useful.
Utilities
Generic utilities
We will need some utilities for debugging.
import functools import sys from pathlib import Path def trace(func): @functools.wraps(func) def traced(*args, **kwargs): func_name = func.__name__ arg_str = ", ".join(repr(arg) for arg in args) kwarg_str = ", ".join(f"{key}={repr(value)}" for key, value in kwargs.items()) all_args = ", ".join(filter(None, [arg_str, kwarg_str])) print(f"TRACE: Entering {func_name}({all_args})") try: result = func(*args, **kwargs) print(f"TRACE: Exiting {func_name} with result: {repr(result)}") return result except Exception as e: print(f"TRACE: Exiting {func_name} with exception: {e=}") raise return traced def log_mouseovers(node): print(f"mouseover: {node}") def log_clicks(node): print(f"click: {node}") def print_traits(widget): """Print the traits for an ipython widget """ print([t for t in widget.traits() if not str(t).startswith('_')]) def bind_to_name(name: str): """ A decorator that binds the decorated function to a specified string name at the module/global level. The function will be accessible via its original name and the new bound name. Args: name: The name to bind the function to at global scope. """ if not isinstance(name, str): raise TypeError("The name argument must be a string.") def decorator(func): @functools.wraps(func) def wrapper(*args, **kwargs): return func(*args, **kwargs) globals()[name] = func return wrapper return decorator
Graph utilities
We'll need some generic graph utilities; we'll build these on networkx for now.
from typing import Any import networkx as nx def create_cycle_graph(n)-> nx.Graph: """ Creates a circular graph with nodes valued from 0 to n-1, where each node is connected to its successor and predecessor mod n. Args: n: The number of nodes in the graph. Returns: A NetworkX graph object representing the circular graph. """ G = nx.cycle_graph(n) return G def node_and_neighbors(graph: nx.Graph, node: Any) -> nx.Graph: """ Returns a subgraph consisting of the given node and all its immediate neighbors. Args: graph: The input NetworkX undirected graph. node: The node within the graph for which to get the subgraph. Returns: A NetworkX graph object representing the subgraph containing the specified node and its neighbors. Raises: nx.NetworkXError: If the specified node is not in the graph. """ if node not in graph: raise nx.NetworkXError(f"Node {node} is not in the graph.") subgraph_nodes = list(graph.neighbors(node)) + [node] return graph.subgraph(subgraph_nodes)
Shakespeare data
This is a Python wrapper for a Shakespeare dataset downloaded from Kaggle. I did some minor cleanup on the main CSV file.
import re import warnings from dataclasses import dataclass from itertools import combinations from typing import Any, Iterable import pandas as pd import networkx as nx from thefuzz import process as thefuzz_process def calculate_candidate_character_replacements( character_names: pd.Series, min_similarity_score: int ) -> dict[str, str]: """Calculate candidate replacements to clean a Series of character names by standardizing strings using fuzzy matching. Args: series: The input Series. min_similarity_score: The minimum fuzzy matching score (0-100) to consider a string a potential match. Returns: A mapping from current character names to cleaned ones, where these are different """ def strip_name(name: str) -> str: name = name.strip() name = re.sub(r" +", " ", name) name = re.sub(r"[^a-zA-Z]+$", "", name) return name def best_match( name: str, stripped_names: Iterable[str], min_similarity_score: int ) -> str: stripped_name = strip_name(name) extracted = thefuzz_process.extractOne(stripped_name, stripped_names) if extracted is None: return stripped_name best_match, score = extracted if score < min_similarity_score: return stripped_name else: return best_match replacements = {} replacement_values = set() for name in character_names.value_counts().index: # most frequent values first replacements[name] = best_match(name, replacement_values, min_similarity_score) replacement_values.add(replacements[name]) return { name: mapped_name for name, mapped_name in replacements.items() if name != mapped_name } @dataclass(frozen=True) class NodeId: play: str name: str class CharacterNetwork: REQUIRED_COLUMNS = "play_name,genre,character,act,scene,sentence,text,sex".split( "," ) KNOWN_CHARACTER_FIXES: dict[str, dict[str, str]] = { "Richard III": { "Of Buckingham": "Ghost of Buckingham", "Of Prince Edward": "Ghost of Prince Edward", }, "Henry VI, part 1": {"Su Ffolk": "Suffolk"}, "Othello": {"Second Gentlemen": "Second Gentleman"}, "Romeo and Juliet": {"Lady Capulet": "Lady Capulet"}, "Hamlet": { "Guildenstern:": "Guildenstern", "Rosencrantz:": "Rosencrantz", }, "Taming of the Shrew": {"Katarina": "Katharina"}, "Henry VI, part 2": {"First Murder": "First Murderer"}, "Measure for Measure": {"Pomphey": "Pompey"}, } KNOWN_CHARACTER_NON_FIXES: set[tuple[str, str]] = { ("Antony and Cleopatra", "Attendants"), ("Coriolanus", "Citizen"), ("Henry IV, part 2", "King Henry V"), ("Henry VI, part 2", "Servant"), ("Timon of Athens", "Servants"), } # Names not referring to definite characters. These are not so interesting to look at # and may represent different people. INDEFINITE_NAMES = [ "& C", "A Lord", "A Patrician", "A Player", "All", "All Citizens", "All Conspirators", "All Ladies", "All Lords", "All Servants", "All The Goths", "All The Lords", "All The People", "Another", "As Long As You Or I", "Attendant", "Attendants", "Both Citizens", "Both Murderers", "Both Tribunes", "Boy", "Captain", "Children", "Chorus", "Citizen", "Citizens", "Clown", "Commons", "Court", "Courtezan", "Crier", "Gentleman", "Gentlemen", "Gentlewoman", "Girl", "Guard", "Knight", "Knights", "Lady", "Lieutenant", "Lord", "Lords", "Man", "Merchant", "Messenger", "Mother", "Musician", "Nobleman", "Nurse", "Officer", "Old Athenian", "Old Lady", "Old Man", "Outlaws", "Page", "Players", "Prince", "Princes", "Princess", "Prologue", "Sailor", "Scout", "Senator", "Senators", "Sergeant", "Servant", "Servants", "Shepard", "Shepherd", "Sheriff", "Soldier", "Soldiers", "Some Others", "Some Speak", "Soothsayer", "Steward", "Townsman", "Travellers", "Watch", "Watchman", "Wife", ] def __init__(self, df: pd.DataFrame, min_similarity_score: int = 92): """Initializes the CharacterNetwork with a dataframe containing character interactions in Shakespeare's plays. Nodes represent a unique character within a specific play (e.g., 'play-name-character'). Edges connect characters who speak in the same scene. Args: df: The dataframe, with columns: - play_name - genre - act - scene - character - sex - sentence - text min_similarity_score: minimum Levenshtein-based similarity score to warn if character name matches are found for """ self.df = CharacterNetwork.clean_playdata(df, min_similarity_score) self.graph = CharacterNetwork._build_shakespeare_network(self.df) self.play_names = sorted(set(df["play_name"])) @staticmethod def clean_playdata(df: pd.DataFrame, min_similarity_score: int) -> pd.DataFrame: """Return a cleaned version of df. Character names are replaced based on previously checked replacements. Warnings are issued if there are any candidate replacements not among those previously known or ruled out, based on a fuzzy-matching analysis of the character names. Args: min_similarity_score: the threshold above which character names are candidates for consolidation. Corresponds to the score returned from `thefuzz.process.extractOne`. """ assert set(df.columns) == set(CharacterNetwork.REQUIRED_COLUMNS) # 1. Apply previously known fixes dfr = df.copy() dfr["character"] = dfr.groupby("play_name")["character"].transform( lambda s: s.replace( CharacterNetwork.KNOWN_CHARACTER_FIXES.get(s.name) or {} ) ) # 2. Warn if there are any candidate fixes not previously known play_mappings = dfr.groupby("play_name").apply( lambda group_df: calculate_candidate_character_replacements( group_df["character"], min_similarity_score=min_similarity_score ) ) assert isinstance(play_mappings, pd.Series) play_mappings = { play: { original: replacement for original, replacement in replacements.items() if (play, original) not in CharacterNetwork.KNOWN_CHARACTER_NON_FIXES } for play, replacements in play_mappings.items() } for play, replacements in play_mappings.items(): if replacements: warnings.warn( f"Candidate character replacements found in play {play}: {replacements}" ) assert set(dfr.columns) == set(CharacterNetwork.REQUIRED_COLUMNS) # Remove entries with names not referring to definite characters dfr = dfr[~dfr["character"].isin(CharacterNetwork.INDEFINITE_NAMES)] return dfr @staticmethod def _build_shakespeare_network(df: pd.DataFrame) -> nx.Graph: assert set(df.columns) == set(CharacterNetwork.REQUIRED_COLUMNS) # Add the number of unique scenes. scenes_per_character = ( df.groupby(["play_name", "character", "act", "scene"]) .size() .reset_index(name="_count") .groupby(["play_name", "character"]) .size() .reset_index(name="num_scenes") ) # Group by play, character, and sex to get a summary for each character node. character_data = ( df.groupby(["play_name", "character", "sex"]) .agg(num_sentences=("sentence", "count"), num_acts=("act", "nunique")) .reset_index() ).merge(scenes_per_character, on=["play_name", "character"]) # Prepare a dictionary for node attributes, keyed by the unique node ID. node_attributes: dict[NodeId, dict[str, Any]] = {} for _, row in character_data.iterrows(): node_id = NodeId(play=row["play_name"], name=row["character"]) node_attributes[node_id] = { "play": row["play_name"], "character": row["character"], "sex": row["sex"], "acts": row["num_acts"], "scenes": row["num_scenes"], "sentences": row["num_sentences"], } # Process Edge Data interactions = ( df.groupby(["play_name", "act", "scene"])["character"] .apply(set) .reset_index(name="characters_in_scene") ) edge_attributes: dict[tuple[str, str, str], dict[str, Any]] = {} sentences_by_scene_and_character = df.groupby( ["play_name", "act", "scene", "character"] ).size() for _, row in interactions.iterrows(): play_name = row["play_name"] act = row["act"] scene = row["scene"] characters_in_scene = row["characters_in_scene"] # Get all unique pairs of characters in this scene. for char1, char2 in combinations(set(characters_in_scene), 2): edge_key = (play_name, char1, char2) sentences_char1 = sentences_by_scene_and_character.loc[ play_name, act, scene, char1 ] sentences_char2 = sentences_by_scene_and_character.loc[ play_name, act, scene, char2 ] if edge_key not in edge_attributes: edge_attributes[edge_key] = { "play_name": play_name, "interactions": [], } edge_attributes[edge_key]["interactions"].append( { "act": act, "scene": scene, "sentences_char1": int(sentences_char1), "sentences_char2": int(sentences_char2), } ) # Build the NetworkX Graph G = nx.Graph() for node_id, attributes in node_attributes.items(): G.add_node(node_id, **attributes) for edge_key, attributes in edge_attributes.items(): play, char1, char2 = edge_key G.add_edge( NodeId(play=play, name=char1), NodeId(play=play, name=char2), **attributes, ) return G # Example usage: # character_network = CharacterNetwork(pd.read_csv(data_dir / 'shakespeare_plays.csv')) # play_names = character_network.play_names # sdg = character_network.graph # print(f"jm - {sdg=}")
Footnotes:
Overview, Shiny for Python, https://shiny.posit.co/py/docs/overview.html
Shiny for Python and JavaScript: How to Add JS Scripts to Your Dashboards - Appsilon, https://www.appsilon.com/post/shiny-for-python-javascript
Custom JavaScript component, Shiny for Python, https://shiny.posit.co/py/docs/custom-component-one-off.html
Custom components package, Shiny for Python, https://shiny.posit.co/py/docs/custom-components-pkg.html
Cytoscape.js and Cytoscape, Cytoscape User Manual, https://manual.cytoscape.org/en/stable/Cytoscape.js_and_Cytoscape.html