DX12 and threading - Graphics and GPU Programming - GameDev.net [PDF]

5 hours ago, lubbe75 said: ... I'm currently in a stage where my Dx9 renderer is about 20 times faster than my Dx12 rend

5 downloads 8 Views 562KB Size

Recommend Stories


Advanced Computer Architecture and GPU Programming
Why complain about yesterday, when you can make a better tomorrow by making the most of today? Anon

Open Graphics Programming Manual
Seek knowledge from cradle to the grave. Prophet Muhammad (Peace be upon him)

C64 Programmers Reference - Programming Graphics
Do not seek to follow in the footsteps of the wise. Seek what they sought. Matsuo Basho

DX12 & Vulkan Dawn of a New Generation of Graphics APIs
Learn to light a candle in the darkest moments of someone’s life. Be the light that helps others see; i

An Introduction to Graphics Processing Unit Architecture and Programming Models
Those who bring sunshine to the lives of others cannot keep it from themselves. J. M. Barrie

Unix and Shell Programming Pdf
Sorrow prepares you for joy. It violently sweeps everything out of your house, so that new joy can find

PdF Computer Graphics: Principles and Practice
Learn to light a candle in the darkest moments of someone’s life. Be the light that helps others see; i

Programming computer graphics and the development of concepts in geometry
Learn to light a candle in the darkest moments of someone’s life. Be the light that helps others see; i

Programming Logic and Techniques Pdf
Make yourself a priority once in a while. It's not selfish. It's necessary. Anonymous

2IV60 Computer graphics Graphics primitives and attributes
When you do things from your soul, you feel a river moving in you, a joy. Rumi

Idea Transcript


Browse

Community

Careers

Members

Search

Existing user? Sign In

Home Forums Programming Graphics and GPU Programming

GDNet Chat

All Activity



Sign Up

Search...

Find your next job in the games industry through GameDev.Jobs. Click here to learn more. ANNOUNCEMENTS

For Beginners is now a Group!

ADVERTISEMENT

DX12 and threading

DX12

Sign in to follow this

Followers



1

By lubbe75 , December 7, 2017 in Graphics and GPU Programming

1 2 NEXT



Page 1 of 2

Posted December 7, 2017 ADVERTISEMENT

Being new to DirectX 12 I am looking for examples on how to use threading. I have done lots of

lubbe75

OpenGL in the past and some DirectX, but with DX12 the threading magic is gone and I understand

Member

I have one command list, one command allocator, one bundle and one bundle allocator. I also have a

122

2D 3D Advice

that threading is crucial to get good performance. In my project I currently have one thread doing it all. million triangles, so it's about time that I start doing this.

Algorithm Animation

C# C++ Character Concept Design DX11

How do I split things up? How many threads should I use? How many command lists and allocators?

Feedback GameMaker Gameplay General

Java Javascript

PC

Mobile Music OpenGL

SFX Sprites

I realize this is a beginner's question , but I have to begin somewhere. I would be grateful if someone

Unity Unreal

could point me in a direction where I could find a simple code sample, tutorial or something similar.



NOBODY bought my game - storytime. Things to learn for future. By POKLU Started Friday at 03:13 PM Financial

Is Our Javascript Code Viewable by Browsers, etc. By Josheir Started Wednesday at 11:57 PM Javascript

Command Frames and Tick Synchronization By poettlr Started Monday at 07:40 PM

Thanks!

ADVERTISEMENT

9

Posted December 7, 2017 On December 7, 2017 at 10:08 AM, lubbe75 said:

Hodgman Moderator

14

I also have a million triangles, so it's about time that I start doing this.

51880



Number of triangles is irrelevant to the CPU - how many draw calls do you have? If it's thousands, you may get some benefit from using multiple threads to record the draw commands. In my experience, with less than around a thousand draws, there's not much benefit in threaded draw submission. On December 7, 2017 at 10:08 AM, lubbe75 said:

9

How many threads should I use? Most engines these days make a pool of one thread per CPU core, and then split all of their workloads

Retrieving the list of clients from the c ++ socket By TheRaider Tut Started Monday at 05:38 PM

13

Quickly finding location in an array that is similar to a smaller array. By Syerjchep Started May 7

12

up amongst that pool. So on a quad core, I'd use a max of 4 threads, and as above, also no more than around (draws/1000)+1 threads. 3

Posted December 8, 2017 We have a work-stealing task scheduler that spawns 1 thread for every core on the CPU (minus 1 for

MJP

the main thread). Then we create a bunch of tasks for groups of draw calls, and throw them a the task

Moderator

scheduler. We've tried both 1 thread per logical core (Intel CPU's with hyperthreading have 2 logical cores for every physical core) as well as 1 thread per physical core, and we've generally found that

19926

trying to run our task scheduler thread on both logical cores to be somewhat counterproductive. But



your mileage may vary. AMD has some code here that can show you how to query the relevant CPU information, Writing your own task scheduler can be quite a bit of work (especially fixing all of the bugs!), but it can also be very educational. There's a pretty good series of articles here that can get you started. There's also third-party libraries like Intel's Thread Building Blocks (which is very comprehensive, but also a bit complex and very heavyweight), or Doug Bink's enkiTS (which is simple and lightweight, but doesn't have fancier high-level features). Windows also has a built-in thead pool API, but I've never used it myself so I can't really vouch for its effectiveness in a game engine scenario. My general advice for starting on multithreading programming is to carefully plan out which data will be

ADVERTISEMENT

touched by each separate task. IMO the easiest (and fastest!) way to have multiple threads work effectively is to make sure that they never touch the same data, or at least do so as infrequently as possible. If you have lots of shared things it can messy, slow, and error-prone very quickly if you have

Number of commands to put in… By _void_ Hey!

to manually wrap things in critical sections. Also keep in mind that *reading* data from multiple threads is generally fine, and it's *writing* to the same data that usually gets you in trouble. So it can help to figure out exactly which data is immutable during a particular phase of execution, and perhaps also



enforce that through judicious use of the "const" keyword.

...

3

Exception while creating D3D1… By Jason Smith While working on a project using D3D12 I was getting an exception...

Posted December 8, 2017

HLSL noise or random for fake … By lubbe75 As far as I understand there is no

Thanks for the tips and the links!

lubbe75

After reading a bit more I get the idea that threading is mainly for recording command lists. Is this

Member

real random or noise function in...

correct? Would this also include executing command lists?

122

Before adding threads, will I benefit anything from using multiple command lists, command allocators

DirectX - Vulkan clip space By turanszkij Hi,

or command queues? I have read somewhere that using multiple command allocators can increase performance since I may

I finally managed to get the DX11...

not have to wait as often before recording the next frame. I guess it's a matter of experimenting with the number of allocators that would be needed in my case.

Discard vs Clip By NikiTo Some people say "discard" has not

Would using multiple command lists or multiple command queues have the same effect as using multiple allocators, or will this only make sense with multi-threading?

a positive effect on...

I'm currently in a stage where my Dx9 renderer is about 20 times faster than my Dx12 renderer, so I guessing it's mainly multi-threading that is missing. Do you know any other obvious and common beginner mistakes when starting with Dx12?

Posted December 9, 2017 Before messing around with threading, 1 thing you'll want to do is make sure that the CPU and GPU

MJP

are working in parallel. When starting out with DX12, you'll probably have things set up like this:

Moderator

Record command list for frame 0 -> submit command list for frame 0 - > wait for GPU to process

19926

frame 0 (by waiting on a fence -> Record comand list for frame 1



If you do it this way the GPU will be idle while the CPU is doing work, and the CPU will be idle while the GPU is doing work. To make sure that the CPU and GPU are pipelined (both working at the same time), you need to do it like this: Record command list for frame 0 -> submit command list for frame 0 -> record command list for frame 1 -> submit command list for frame 1 -> wait for the GPU to finish frame 0 -> record command list for frame 2 With this setup the GPU will effectively be a frame behind the CPU, but your overall throughput (framerate) will be higher since the CPU and GPU will be working concurrently instead of in lockstep. The big catch is that since the CPU is preparing the next frame while the GPU is actively processing commands, you need to be careful not to modify things that the GPU is reading from. This is where the "multiple command allocators" thing comes in: if you switch back and forth between two allocators, you'll always be modifying one command allocator while the GPU is reading from the other one. The ADVERTISEMENT

same concept applies to things like constant buffers that are written to by the CPU. Once you've got that working, you can look into splitting things up into multiple command lists that are recorded by multiple threads. Without multiple threads there's no reason to have more than 1 command list unless you're also submitting to multiple queues. Multi-queue is quite complicated, and is definitely an advanced topic. COPY queues are generally useful for initializing resources like textures. COMPUTE queues can be useful for GPU's that support concurrently processing compute commands alongside graphics commands, which can result in higher overall throughput in certain scenarios. They can also be useful for cases where the compute work is completely independent of your graphics work, and therefore doesn't need to be synchronized with your graphics commands. 1

Posted December 9, 2017 On December 8, 2017 at 4:13 AM, lubbe75 said:

Infinisearch Member

After reading a bit more I get the idea that threading is mainly for recording command lists. Is this

3046

correct? Would this also include executing command lists? Before adding threads, will I benefit anything from using multiple command lists, command allocators or command queues? Read through this document it should answer your questions. https://developer.nvidia.com/sites/default/files/akamai/gameworks/blog/GDC16/GDC16_gthomas_adu nn_Practical_DX12.pdf

Posted December 11, 2017 Thanks for that link, Infinisearch!

lubbe75

MJP, I have tried what you suggested, but I got poorer results compared to the straight forward 1-

Member

allocator method. Here is what I tried:

122

After initiating, setting frameIndex to 0 and resetting commandList with allocator 0 I run the following loop (pseudo-code): populate commandList; execute commandList; reset commandList (using allocator[frameIndex]); present the frame; frameIndex = swapChain.CurrentBackBufferIndex; // 0 -> 1, 1 -> 0 if (frameIndex == 1) { // set the fence after frame 0, 2, 4, 6, 8, ... commandQueue.Signal(fence, fenceValue); } else { // wait for the fence after frame 1, 3, 5, 7, 9, ... int currentFence = fenceValue; fenceValue++; if (fence.CompletedValue < currentFence) { fence.SetEventOnCompletion(currentFence, fenceEvent.SafeWaitHandle.Danger ousGetHandle()); fenceEvent.WaitOne(); } } Have I understood the idea correctly (I think I do)? Perhaps something here gets done in the wrong order?

Posted December 12, 2017 That's not quite what I meant. You'll still want to signal your fence and wait on it every frame, you just

MJP

need to wait on the value one frame later. The first frame you don't need to wait because there was no

Moderator

"previous" frame, but you do need to wait for every frame after that. Here's what my code looks like, minus a few things that aren't relevant:

19926

void EndFrame(IDXGISwapChain4* swapChain, uint32 syncIntervals) { DXCall(CmdList->Close()); ID3D12CommandList* commandLists[] = { CmdList }; GfxQueue->ExecuteCommandLists(ArraySize_(commandLists), commandLists); // Present the frame.

DXCall(swapChain->Present(syncIntervals, syncIntervals == 0 ? DXGI_PRESENT_ALL ++CurrentCPUFrame;

// Signal the fence with the current frame number, so that we can check back o FrameFence.Signal(GfxQueue, CurrentCPUFrame); // Wait for the GPU to catch up before we stomp an executing command buffer const uint64 gpuLag = DX12::CurrentCPUFrame - DX12::CurrentGPUFrame; Assert_(gpuLag = DX12::RenderLatency) { // Make sure that the previous frame is finished FrameFence.Wait(DX12::CurrentGPUFrame + 1); ++DX12::CurrentGPUFrame; } CurrFrameIdx = DX12::CurrentCPUFrame % NumCmdAllocators; // Prepare the command buffers to be used for the next frame DXCall(CmdAllocators[CurrFrameIdx]->Reset()); DXCall(CmdList->Reset(CmdAllocators[CurrFrameIdx], nullptr)); }

2

Posted December 12, 2017 On December 11, 2017 at 6:03 PM, MJP said:

Infinisearch Member

That's not quite what I meant. You'll still want to signal your fence and wait on it every frame, you

3046

just need to wait on the value one frame later. The first frame you don't need to wait because there was no "previous" frame, but you do need to wait for every frame after that. Here's what my code looks like, minus a few things that aren't relevant: MJP I didn't look at the linked code but do you do anything for frame pacing in the full code? I see that gamers on the internet complain about frame pacing quite a lot when they seem to percieve issues with it. Your code snippet above would render a certain number of frames on the CPU as fast as possible and then wait for the GPU to catch up. Wouldn't this lead to jerkiness in the input sampling and simulation? Would you just add some timer code to the above to delay the next iteration of the game loop if necessary? Or is it more complex?

Posted December 12, 2017 The code that I posted will let the CPU get no more than 1 frame ahead of the GPU. After the CPU

MJP

submits command lists to the direct queue, it waits for the previous GPU frame to finish. So if the GPU

Moderator

is taking more time to complete a frame than the CPU is (or if VSYNC is enabled), the CPU will be effectively throttled by fence and will stay tied to the GPU's effective framerate.

19926



In my experience, frame pacing issues usually come from situations where the time delta being used for updating the game's simulation doesn't match the rate at which frames are actually presented on the screen. This can happen very easily if you use the length of the previous frame as your delta for the next frame. When you do this, you're basically saying "I expect the next frame to take just as long to update and render as the previous frame". This assumption will hold when you're locked at a steady framerate (usually due to VSYNC), but if your framerate is erratic then you will likely have mismatches between your simulation time delta and the actual frame time. It can be especially bad when missing VSYNC, since your frame times may go from 16.6ms up to 33.3ms, and perhaps oscillate back and forth. I would probably suggest the following for mitigating this issue: 1. Enable VSYNC, and never miss a frame! This will you 100% smooth results, but obviously it's much easier said than done. 2. Detect when you're not making VSYNC, and increase the sync interval to 2. This will effectively halve your framerate (for instance, you'll go from 60Hz to 30Hz on a 60Hz display), but that may be preferable to "mostly" making full framerate with frequent dips. 3. Alternatively, disable VSYNC when you're not quite making it. This is common on consoles, where you have the ability to do this much better than you do on PC. It's good for when you're just barely missing your VSYNC rate, since in that case most of the screen will still get updated at full rate (however there will be a horizontal tear line). It will also keep you from dropping to half the VSYNC rate, which will reduce the error in your time delta assumption. 4. Triple buffering can also give you similar results to disabling VSYNC, but also prevent tearing (note that non-fullscreen D3D apps on Windows are effectively triple-buffered by default since they go through the desktop compositor) 5. You could also try filtering your time deltas a bit to keep them from getting too erratic when you don't make VSYNC. I've never tried this myself, but it's possible that having a more consistent but smaller errors in your time delta is better than less frequent but larger errors. Hopefully someone else can chime in with more thoughts if they have experience with this. I haven't really done any specific research or experimentation with this issue outside of making games feel good when they ship, so don't consider me an authority on this issue.

1 2 NEXT



Page 1 of 2

Create an account or sign in to comment You need to be a member in order to leave a comment

Create an account

Sign in

Sign up for a new account in our community. It's easy!

Already have an account? Sign in here.

Register a new account

Sign In Now

GO TO TOPIC LISTING

Graphics and GPU Programming







ADVERTISEMENT

Home Forums Programming Graphics and GPU Programming

GDNet Chat

Resources

Community

About

Articles & Tutorials Blogs Calendar Forums Gallery

Activity Guidelines GameDev Market GameDev Jobs Leaderboard

About Us Terms of Service Privacy Policy Contact Us

All Activity

Search...

Social

×

Sign Up with GameDev.net!

Participate in the game development conversation and more when you Copyright © 1999-2018 GameDev.net, LLC create an account on GameDev.net!

Important Information By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Sign me up!

I accept

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.