Yes it is, world_size refers to the total number of GPUs, which in general (there are exceptions, for example commonly EP), is just a product of all parallelism dimensions. However, data_parallel_size = dp_shard_size * dp_replicate_size, as dp_world_size denotes how many distinct batches you dispatch. To get the number of GPUs each batch is ran on (i.e. the degree of model parallelism), it is calculated as a product of non-data-parallel sizes - non_data_parallel_size = tp_size * cp_size * sp_size * pp_size. Then from that total_size = data_parallel_size * non_data_parallel_size
All the assumptions above hold under some constraints, which can be broken by i.e. EP, which steals from dp_shard_size in most common implementations.