Inputs & Parameters

vConTACT3 has numerous options, but really only requires either a single nucleotide file in FASTA format, or proteins and a gene-to-genome mapping if using pre-called genes. All other options have defaults, which will be used when not specified.

Inputs

--nucleotide Path to a FASTA-formatted nucleotide file. Selecting this option will enable the gene-calling tool and disable --proteins.

Note

Avoid using mmseq in the input filename (e.g. my-mmseqs-results.fna), as the temporary file cleanup may inadvertently match temporary files. See FAQ & Troubleshooting for details.

--proteins FASTA file of predicted proteins. Requires --gene2genome, while --len-nucleotide is optional (see below).

--gene2genome TSV or parquet file linking protein IDs to genome IDs. Required when using --proteins. Expected columns: protein_id, genome_id, and optionally keywords (filled with None if absent).

--len-nucleotide TSV or parquet file mapping genome IDs to nucleotide lengths. Only applicable when using --proteins. When --nucleotide is provided, genome lengths are computed automatically from the sequences. Optional even in protein mode. If omitted, Size (Kb) will be NaN in the output and the ANI export will be disabled. Accepts a length column in base pairs (converted to KB automatically) or a Size (Kb) column.

--output Path to the output directory. Defaults to vConTACT3_results/.

Key Parameters

Though not necessary, these are the most frequently used parameters

--threads Number of CPU cores to use. Defaults to all available cores.

--max-iterations Iterations to use when resolving mixed-realm components/clusters. Increase to reduce chance of encountering. Default: 3

--reduce-memory Reduce memory usage by downcasting arrays to float16 (~50% savings).

--distance-metric The distance metric used between genomes in the gene sharing network. Options: SqRoot (default), VirClust, Shorter, Jaccard.

--breaks Splits large networks/graphs into smaller chunks during export.

--db-path Path to a specific database version file or directory. Defaults to using the latest version.

--db-domain Specify domain: archaea, bacteria, prokaryotes, or eukaryotes.

---exports Specify which export types to generate (e.g., graphml, cytoscape, profiles). See Exports for details.

Advanced Parameters

--verbose Increase logging verbosity (INFO, WARN, ERROR, DEBUG).

--keep-temp Preserve intermediate files (generally for MMSeqs2).

For a full list of command-line options, see the CLI Reference.