Memory Usage & Performance Considerations ========================================= vConTACT3 processes large-scale viral genome datasets, which can be resource-intensive. While it is nearly impossible to estimate how much memory and time it will take to process a dataset, this page outlines some expectations. Typical Resource Usage ---------------------- The following are **rough** estimates based on dataset size and complexity (this is based on 40-cores and 1.5% dataset sparsity): - **20,000 genomes**: ~5-7 GB RAM, 15 min - **50,000 genomes**: ~9-12 GB RAM, 25 min - **150,000 genomes**: 50-60 GB RAM, 1 hr 30 min - **400,000 genomes**: 175-225 GB, 4-5 hr - **750,000 genomes**: 700-800 GB, 16 hr Runtime scales with numerous factors: dataset size, dataset sparsity, number of user genomes co-localized with references, and most importantly, how large the largest connected component in the network is. Corrent and future work by the vConTACT3 team is primarily focused on *reducing* the memory required, so check back for updates. Performance Optimization Strategies ----------------------------------- - **Reduce Memory Consumption** Use the `--reduce-memory` flag to downcast clustering arrays to `float16`, reducing RAM usage by ~50%. .. code-block:: bash vcontact3 run --reduce-memory --nucleotide genomes.fna --output results_dir This is only a stop-gap measure. Depending on dataset complexity, 50% memory savings may not be sufficient for the increase in memory requirements. - **Split Large Exports** Use `--breaks` to chunk large network exports into smaller parts, improving file handling and visualization speed. .. code-block:: bash vcontact3 run --breaks 5 --nucleotide genomes.fna --output results_dir Keep in mind that this will not break network components, meaning a component with 100K nodes will always have 100K nodes. - **Adjust CPU Count** Several portions of vConTACT3 are multithreaded. Select the maximum number of CPUs to use with `--threads`. .. code-block:: bash vcontact3 run --threads 64 --nucleotide genomes.fna --output results_dir Related Documentation --------------------- - See :doc:`Results ` for details on standard output files. - Refer to the :doc:`CLI Reference ` for parameter descriptions.