Simulation de gravité n-corps multi-moteur

2025-03-13

L’exemple D3D12nBodyGravity montre comment effectuer un travail de calcul de manière asynchrone. L’exemple fait tourner un certain nombre de threads chacun avec une file d’attente de commandes de calcul et planifie le travail de calcul sur le GPU qui effectue une simulation de gravité n-corps. Chaque thread fonctionne sur deux mémoires tampons pleines de données de position et de vitesse. Avec chaque itération, le nuanceur de calcul lit les données de position et de vélocité actuelles d’une mémoire tampon et écrit l’itération suivante dans l’autre mémoire tampon. Une fois l’itération terminée, le nuanceur de calcul permute la mémoire tampon qui est le SRV pour lire les données de position/vitesse et qui est l’UAV pour écrire des mises à jour de position/vitesse en modifiant l’état de la ressource sur chaque mémoire tampon.

Créer les signatures racines
Créer les mémoires tampons SRV et UAV
Créer les mémoires tampons CBV et vertex
Synchroniser les threads de rendu et de calcul
Exécuter l’exemple de
rubriques connexes

Créer les signatures racines

Nous commençons par créer à la fois un graphique et une signature racine de calcul, dans la méthode LoadAssets. Les deux signatures racines ont une vue de mémoire tampon constante racine (CBV) et une table de descripteur de descripteur de nuanceur (SRV). La signature racine de calcul a également une table de descripteur de vue d’accès non ordonné (UAV).

 // Create the root signatures.
       {
              CD3DX12_DESCRIPTOR_RANGE ranges[2];
              ranges[0].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 1, 0);
              ranges[1].Init(D3D12_DESCRIPTOR_RANGE_TYPE_UAV, 1, 0);

              CD3DX12_ROOT_PARAMETER rootParameters[RootParametersCount];
              rootParameters[RootParameterCB].InitAsConstantBufferView(0, 0, D3D12_SHADER_VISIBILITY_ALL);
              rootParameters[RootParameterSRV].InitAsDescriptorTable(1, &ranges[0], D3D12_SHADER_VISIBILITY_VERTEX);
              rootParameters[RootParameterUAV].InitAsDescriptorTable(1, &ranges[1], D3D12_SHADER_VISIBILITY_ALL);

              // The rendering pipeline does not need the UAV parameter.
              CD3DX12_ROOT_SIGNATURE_DESC rootSignatureDesc;
              rootSignatureDesc.Init(_countof(rootParameters) - 1, rootParameters, 0, nullptr, D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT);

              ComPtr<ID3DBlob> signature;
              ComPtr<ID3DBlob> error;
              ThrowIfFailed(D3D12SerializeRootSignature(&rootSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, &error));
              ThrowIfFailed(m_device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&m_rootSignature)));

              // Create compute signature. Must change visibility for the SRV.
              rootParameters[RootParameterSRV].ShaderVisibility = D3D12_SHADER_VISIBILITY_ALL;

              CD3DX12_ROOT_SIGNATURE_DESC computeRootSignatureDesc(_countof(rootParameters), rootParameters, 0, nullptr);
              ThrowIfFailed(D3D12SerializeRootSignature(&computeRootSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, &error));

              ThrowIfFailed(m_device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&m_computeRootSignature)));
       }

Flux d’appels	Paramètres
CD3DX12_DESCRIPTOR_RANGE	D3D12_DESCRIPTOR_RANGE_TYPE
CD3DX12_ROOT_PARAMETER	D3D12_SHADER_VISIBILITY
CD3DX12_ROOT_SIGNATURE_DESC	D3D12_ROOT_SIGNATURE_FLAGS
ID3DBlob
D3D12SerializeRootSignature	D3D_ROOT_SIGNATURE_VERSION
CreateRootSignature
CD3DX12_ROOT_SIGNATURE_DESC
D3D12SerializeRootSignature	D3D_ROOT_SIGNATURE_VERSION
CreateRootSignature

Créer les mémoires tampons SRV et UAV

Les mémoires tampons SRV et UAV se composent d’un tableau de données de position et de vitesse.

 // Position and velocity data for the particles in the system.
       // Two buffers full of Particle data are utilized in this sample.
       // The compute thread alternates writing to each of them.
       // The render thread renders using the buffer that is not currently
       // in use by the compute shader.
       struct Particle
       {
              XMFLOAT4 position;
              XMFLOAT4 velocity;
       };

Flux d’appels	Paramètres
XMFLOAT4

Créer les mémoires tampons CBV et vertex

Pour le pipeline graphique, la fonction CBV est un struct contenant deux matrices utilisées par le nuanceur géométrique. Le nuanceur géométrique prend la position de chaque particule dans le système et génère un quad pour le représenter à l’aide de ces matrices.

 struct ConstantBufferGS
       {
              XMMATRIX worldViewProjection;
              XMMATRIX inverseView;

              // Constant buffers are 256-byte aligned in GPU memory. Padding is added
              // for convenience when computing the struct's size.
              float padding[32];
       };

Flux d’appels	Paramètres
XMMATRIX

Par conséquent, la mémoire tampon de vertex utilisée par le nuanceur de vertex ne contient pas de données positionnelles.

 // "Vertex" definition for particles. Triangle vertices are generated 
       // by the geometry shader. Color data will be assigned to those 
       // vertices via this struct.
       struct ParticleVertex
       {
              XMFLOAT4 color;
       };

Flux d’appels	Paramètres
XMFLOAT4

Pour le pipeline de calcul, la fonction CBV est un struct contenant certaines constantes utilisées par la simulation de gravité n-body dans le nuanceur de calcul.

 struct ConstantBufferCS
       {
              UINT param[4];
              float paramf[4];
       };

Synchroniser le rendu et les threads de calcul

Une fois les mémoires tampons initialisées, le rendu et le travail de calcul commencent. Le thread de calcul modifie l’état des deux mémoires tampons de position/vitesse entre SRV et UAV, car il itère sur la simulation, et le thread de rendu doit s’assurer qu’il planifie le travail sur le pipeline graphique qui fonctionne sur le SRV. Les clôtures sont utilisées pour synchroniser l’accès aux deux mémoires tampons.

Sur le thread de rendu :

// Render the scene.
void D3D12nBodyGravity::OnRender()
{
       // Let the compute thread know that a new frame is being rendered.
       for (int n = 0; n < ThreadCount; n++)
       {
              InterlockedExchange(&m_renderContextFenceValues[n], m_renderContextFenceValue);
       }

       // Compute work must be completed before the frame can render or else the SRV 
       // will be in the wrong state.
       for (UINT n = 0; n < ThreadCount; n++)
       {
              UINT64 threadFenceValue = InterlockedGetValue(&m_threadFenceValues[n]);
              if (m_threadFences[n]->GetCompletedValue() < threadFenceValue)
              {
                     // Instruct the rendering command queue to wait for the current 
                     // compute work to complete.
                     ThrowIfFailed(m_commandQueue->Wait(m_threadFences[n].Get(), threadFenceValue));
              }
       }

       // Record all the commands we need to render the scene into the command list.
       PopulateCommandList();

       // Execute the command list.
       ID3D12CommandList* ppCommandLists[] = { m_commandList.Get() };
       m_commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);

       // Present the frame.
       ThrowIfFailed(m_swapChain->Present(0, 0));

       MoveToNextFrame();
}

Flux d’appels	Paramètres
InterlockedExchange
InterlockedGetValue
GetCompletedValue
attendre
ID3D12CommandList
ExecuteCommandLists
IDXGISwapChain1 ::P resent1

Pour simplifier l’exemple un peu, le thread de calcul attend que le GPU termine chaque itération avant de planifier davantage de travail de calcul. Dans la pratique, les applications souhaitent probablement conserver la file d’attente de calcul complète pour obtenir des performances maximales à partir du GPU.

Sur le thread de calcul :

DWORD D3D12nBodyGravity::AsyncComputeThreadProc(int threadIndex)
{
       ID3D12CommandQueue* pCommandQueue = m_computeCommandQueue[threadIndex].Get();
       ID3D12CommandAllocator* pCommandAllocator = m_computeAllocator[threadIndex].Get();
       ID3D12GraphicsCommandList* pCommandList = m_computeCommandList[threadIndex].Get();
       ID3D12Fence* pFence = m_threadFences[threadIndex].Get();

       while (0 == InterlockedGetValue(&m_terminating))
       {
              // Run the particle simulation.
              Simulate(threadIndex);

              // Close and execute the command list.
              ThrowIfFailed(pCommandList->Close());
              ID3D12CommandList* ppCommandLists[] = { pCommandList };

              pCommandQueue->ExecuteCommandLists(1, ppCommandLists);

              // Wait for the compute shader to complete the simulation.
              UINT64 threadFenceValue = InterlockedIncrement(&m_threadFenceValues[threadIndex]);
              ThrowIfFailed(pCommandQueue->Signal(pFence, threadFenceValue));
              ThrowIfFailed(pFence->SetEventOnCompletion(threadFenceValue, m_threadFenceEvents[threadIndex]));
              WaitForSingleObject(m_threadFenceEvents[threadIndex], INFINITE);

              // Wait for the render thread to be done with the SRV so that
              // the next frame in the simulation can run.
              UINT64 renderContextFenceValue = InterlockedGetValue(&m_renderContextFenceValues[threadIndex]);
              if (m_renderContextFence->GetCompletedValue() < renderContextFenceValue)
              {
                     ThrowIfFailed(pCommandQueue->Wait(m_renderContextFence.Get(), renderContextFenceValue));
                     InterlockedExchange(&m_renderContextFenceValues[threadIndex], 0);
              }

              // Swap the indices to the SRV and UAV.
              m_srvIndex[threadIndex] = 1 - m_srvIndex[threadIndex];

              // Prepare for the next frame.
              ThrowIfFailed(pCommandAllocator->Reset());
              ThrowIfFailed(pCommandList->Reset(pCommandAllocator, m_computeState.Get()));
       }

       return 0;
}

Flux d’appels	Paramètres
ID3D12CommandQueue
ID3D12CommandAllocator
ID3D12GraphicsCommandList
ID3D12Fence
InterlockedGetValue
Fermer
ID3D12CommandList
ExecuteCommandLists
interlockedIncrement
Signal
SetEventOnCompletion
WaitForSingleObject
InterlockedGetValue
GetCompletedValue
attendre
InterlockedExchange
ID3D12CommandAllocator ::Reset
ID3D12GraphicsCommandList ::Reset

Exécuter l’exemple

une capture d’écran de la simulation finale de gravité n corps

procédure pas à pas de code D3D12

synchronisation multi-moteurs

Partager via

Simulation de gravité n-corps multi-moteur

Créer les signatures racines

Créer les mémoires tampons SRV et UAV

Créer les mémoires tampons CBV et vertex

Synchroniser le rendu et les threads de calcul

Exécuter l’exemple

Rubriques connexes

Commentaires

Ressources supplémentaires