Field FavorDistributedInference
Determines whether the runtime should prefer splitting computations across multiple GPUs.
public static bool FavorDistributedInference
Returns
- bool
true
to favor multi‐GPU distributed inference;false
to use a single GPU. Default isfalse
.
Remarks
When true
, the default tensor distribution will attempt to distribute work
across all available GPUs via LM.TensorDistribution, which can improve
throughput on large models by parallelizing tensor operations.
When false
(the default), inference will run entirely on a single GPU,
avoiding the overhead of inter‐GPU communication. This can be more efficient for
smaller models or when GPU-to-GPU bandwidth is limited.