Exports
=======
vConTACT3 generates several outputs to visualize and analyze genome clustering results. These are grouped into four
main categories:
- Graph-based outputs for network visualization
- Cluster profiles for shared protein content
- Other auxiliary outputs for completeness and intersections
- ANI-based tables
Graph-based Outputs
===================
Supported formats:
- **Cytoscape**: JSON-formatted network with annotations, suitable for direct import into `Cytoscape `_
- **GraphML**: XML-based network format for use in numerous other network/graph-based tools
- **D3js**: Interactive HTML network explorer (EXPERIMENTAL)
- **Cosmograph**: Network visualization with annotated metadata using `Cosmograph `_
REMINDER: Using the `--breaks` option will enable ALL of these to be separated into roughly even groupings of components.
With 5-10K genomes in the reference databases alone, D3js and Cytoscape could struggle without this option enabled.
Example: Cytoscape Network
--------------------------
Generated with `--exports cytoscape`
.. image:: images/cytoscape.png
:alt: Example Cytoscape network output from vConTACT3
:width: 70%
:align: center
To load/use these files, open Cytoscape and go to "File --> Import --> Network from file..."
Afterward, go to "Layout --> Apply Preferred Layout". Alternatively, use any layout that seems suitable. The network
layout might take a while to "layout". After it finishes, there should be a large network along with many smaller
networks adjacent to it in the network view window.
Note: Layouts are not provided as they can require exceedingly long times to calculate positions.
Example: Cosmograph Export
--------------------------
Generated with `--exports cosmograph`
.. image:: images/cosmograph.png
:alt: Example Cosmograph network output from vConTACT3
:width: 70%
:align: center
A GraphML export visualized in Gephi, showing clustered viral genomes based on shared gene content.
Example: D3js Interactive Network
---------------------------------
(EXPERIMENTAL)
Generated with `--exports d3js`
.. image:: images/d3js.png
:alt: Example D3js network output from vConTACT3
:width: 70%
:align: center
These are network/graph files suitable for rendering and interaction in a browser. There are at least 2 files for every
exported group:
- part#.d3.json
- part#.d3.json.html
At the moment, coloring the nodes by attribute is enabled. Eventually, more options will become available.
Cluster Profiles
================
Generated with `--exports profiles`
Cluster profiles (also known as PC profiles, or just profiles), are matrices of genomes X PCs, with the presence or
absence of a PC in a genome indicated by "1". The profile rank(s) are selected by `--target-rank`. If the user selects
'order' and 'family', then every single predicted order and every single predicted family will be exported. To reduce
the number of potentially "insignificant" groups, the minimum number of genomes can be filtered by `--target-members`.
For every group predicted of that rank, the following files are generated:
- rank_rank-name.svg (SVG heatmap)
- rank_rank-name.csv (presence/absense PC counts)
Exported formats:
- **SVG**: Visual profiles of shared protein clusters per taxonomic rank.
- **CSV**: Tabular summaries of protein cluster distributions.
As mentioned in :doc:`Inputs and Parameters `, if selecting lower ranks (e.g. subfamily and genus)
and/or the input dataset is large, this can result in 100s or 1000s of file pairs.
Example: Family-Level Protein Cluster Profile
---------------------------------------------
.. image:: images/family_Mesyanzhinovviridae.svg
:alt: Example family-level protein cluster profile
:width: 70%
:align: center
A protein cluster profile visualizing shared gene content across genomes in the family *Mesyanzhinovviridae*.
Example: Family-Level Protein Cluster Profile CSV Export
--------------------------------------------------------
.. csv-table:: pc_profiles/rank_rank-name.csv
:header: ,CLU0003308,CLU0003651,CLU0003652,CLU0003653,CLU0003654,CLU0004868,CLU0004870,CLU0004871,CLU0004872
NC_026594.1,0,0,0,0,0,0,1,0,0
NC_028770.1,0,0,0,0,0,0,1,0,0
NC_042115.1,0,0,0,0,0,0,0,0,0
NC_052965.1,0,0,0,0,0,0,1,0,0
MT664984.1,0,0,0,0,0,0,0,0,0
NC_028931.1,0,0,0,0,0,0,0,0,0
ON932079.1,0,0,0,0,0,0,0,0,0
OP361299.1,0,0,0,0,0,0,0,0,0
OP380269.1,0,0,0,0,0,0,0,0,0
NC_047876.1,1,0,0,1,0,1,1,0,1
NC_047877.1,1,0,0,1,0,1,1,0,1
NC_047878.1,1,0,0,1,0,1,1,0,1
NC_047879.1,1,0,0,1,0,1,1,0,1
NC_007809.1,1,1,1,1,1,1,1,0,1
NC_010116.1,1,0,0,1,0,1,1,1,1
NC_018282.1,1,0,0,1,0,1,1,1,1
NC_028980.1,1,1,0,1,0,1,1,1,1
NC_041934.1,1,0,1,1,1,1,1,1,1
ON529854.1,0,0,0,0,0,0,0,0,0
A CSV summary of protein cluster counts per genome of family *Mesyanzhinovviridae*.
ANI-based genome similarities
=============================
Users of vConTACT3 versions 3.1.0+ (database version 230+) can generate average nucleotide identities (ANI) between
genomes within predicted families.
Other Outputs
=============
Supported formats:
- **Completeness (CSV)**: Estimated genome completeness based on genus-level groups
- **PyUpSet (SVG/PNG)**: Intersection plot of realms
- **Newick (NWK)**: Genome dendrograms
- **Centroids**: Rank-group centroids
Example: Completeness Report (CSV)
----------------------------------
.. csv-table:: completeness.csv
:header: ,Status,Group,Core Genes,Core Gene Coverage,Genes (calc),Genes (ref),PC Coverage (95% CI low),PC Coverage (95% CI high)
NC_003214.2,Ref,Betalipothrixvirus,31,100.0,40,73,92.4,62.1
NC_010152.1,Ref,Betalipothrixvirus,31,100.0,63,66,145.5,97.8999
NC_010153.1,Ref,Betalipothrixvirus,31,100.0,43,57,99.3,66.8
Estimated completeness of genome clusters, useful for downstream prioritization.
Example: PyUpSet Plot of Realms
-------------------------------
Forthcoming
Example: Newick
---------------
Forthcoming
Example: Rank-Group Centroids
-----------------------------
Forthcoming