DMA & HDMA – Super Nintendo Entertainment System Features Pt. 07

This video is part 7 in a series about Super Nintendo Entertainment System features This time we'll learn how the processor communicates with external memories such as OAM, VRAM, and CGRAM

This will lead into Direct Memory Access, DMA, and eventually H-Blank DMA Some of the devices the processor communicates with in addition to main memory are MMIO devices, which stands for Memory Mapped Input/Output The amount of physical RAM in the Super NES is much less than the entire address space of 26^6 bytes As you will find out in a future video, some of these addresses in the address space are reserved for other things like controllers, cartridge ROM, and SRAM However, there are memories that are not memory mapped, and must be accessed indirectly

These include OAM, CGRAM, and VRAM Instead, a memory mapped hardware register is used as a pointer into one of these areas, and another couple registers are used to read to and write from the area at the memory location specified by the former For example, to write something in the first location of CGRAM, first the CGRAM address register at $2121 must be set to zero, then the data to write should be written to $2122 Whenever a value is read from or written to these registers, the corresponding address register is incremented automatically, so data can be easily accessed sequentially Here is a table that shows which registers are used for each memory

It is important to note that in these three locations, the data held at each address is 16 bits instead of 8 bits like in the main memory In the case of video RAM, these 16 bits are spread out between two registers, but in OAM and CGRAM's case, the 16 bits are written 8 at a time in a single register In addition to these 3 memories, work RAM can also be read or written to indirectly, using its hardware registers This might seem useless, but it allows direct memory access to WRAM, which results in a very speedy operation If you haven't figured by now, these special hardware registers are used as part of the direct memory access process

Direct Memory Access is just a way to quickly move data from one place to another More specifically, it allows transferring data between the A bus, aka CPU-space, a

ka anywhere in the 24-bit address space, and the B bus, aka

PPU-space, aka any of the 256 registers of $2100 through $21FF The reason why it is so fast is because it directly connects the data's source to its destination with no middle man, which would normally be the processor

Using the processor to load bytes from one place and store them to another is relatively slow, resulting in about 336 kilobytes/s Even using quick pop slides, or even the move block instructions specifically meant for moving large blocks of memory, only give 358 and 413 kilobytes/s respectively DMA however can reach speeds of up to 268 megabytes/s This is incredibly useful when moving around huge amounts of data, like graphics

The SNES has 8 DMA channels, which means 8 transfers can be prepared at once, and then initiated sequentially one after the other Each channel has 12 dedicated registers, 7 of which need to be populated with the DMA properties that explain how the transfer will work Three registers are used to determine the 24-bit address on the A bus to transfer, while another register is used for the 8-bit address on the B bus Two registers are used to hold the number of bytes to transfer in total, and the final register holds all the other properties for the DMA setup This includes which direction the data are transferred, whether or not to automatically increment or decrement the A bus address, and the format of one unit of data and how it should be sent over the B bus

As an example, here is how one would go about transferring a large block of graphics data from ROM to video RAM using DMA channel 0 First, the target VRAM address would be written to $2116 and $2117, and the B bus address should be set to the lower 8 bits of the VRAM data I/O register, $2118 Second, the A bus address should be set to the location in ROM where the graphics data is stored, and the total number of bytes should be specified as well Third, the transfer properties should be set–in this case, the transfer occurs from A bus to B bus, the A bus address should be incremented after each byte transferred, and the transferred bytes should be written two at a time in the order of low and high to match the VRAM data I/O registers Finally, the DMA can be initiated by setting the channel's corresponding bit in the DMA enable register at $420B

The least significant bit corresponds to channel 0, while the most significant bit is for channel 7 At this point, the processor would be suspended from executing instructions while the DMA transfer is occurring; once the transfer is complete it will be reactivated and execution continues normally While generally, DMA can be performed at any time, most of the registers accessible via the B bus are used by the Picture Processing Unit to display the image to the screen Therefore, in order to read and write values to these registers, care needs to be taken so the PPU and CPU don't clash with each other while reading or writing data Specifically, data should only be read or written by the CPU during certain blanking periods, one of H-Blank, V-Blank, and F-blank

As shown in the previous video, enabling forced blanking in the middle of rendering can lead to a black streak across the screen, and trying to time this maneuver to occur exactly during H-blank can be difficult Luckily, there is an easy way to do this, and it is called H-Blank direct memory access, or HDMA for short HDMA and DMA both use the same 8 transfer channels, so they can't use the same channel at the same time They also use the same set of 12 registers, although for HDMA only 5 need to be set manually beforehand, and 6 more are updated automatically during the transfer One register holds the properties for the HDMA setup, and one holds the 8-bit address on the B bus just like the general purpose DMA setup

The last three registers hold an address on the A bus that points to what is called an HDMA table All of this data only needs to be set once before initiating the HDMA by setting the channel's corresponding bit in the HDMA enable register at $420C After that, the data found in the HDMA table will be transferred to the specified hardware register automatically during H-blank at the scanlines specified in the table itself The HDMA table is fairly straightforward and just includes instructions on what data should be transferred and when it should be transferred It includes a list of data entries followed by a single zero byte which signals the end of the list

Each entry in the list includes a 7-bit line count, a 1-bit continue flag, and the data to transfer on that scanline If the continue flag is clear, the data will include only a single unit, and it will be sent over the B bus on this scanline Then this channel will pause for the number of scanlines specified by the count before moving to the next list entry If the continue flag is set, the data will include multiple units of data equal to that of the count One unit will be sent on each scanline for this many scanlines; after which it will move onto the next list entry

Finally, if the count is zero then this HDMA channel will be suspended for the rest of the frame An additional setting in the HDMA properties register allows for an indirectly addressed HDMA table–this means that instead of the units of data being included in the table, pointers to data can be used instead This is useful for dynamic HDMA tables that are stored in work RAM The pointers can be swapped out for others that point to different data tables in the ROM As an example, let's look at how this windowing effect could be recreated using HDMA

To set up the HDMA transfer we need to set the A bus and B bus addresses, as well as the transfer properties Suppose the table is stored in ROM at $0BE00F Using DMA channel 3, that address will go into registers $4332 through $4334 The PPU register to modify in order to set the left side of window 1 is $2126, so #$26 will be stored into $4331 Data is moving from the A bus to the B bus so we reset bit 7 of $4330

The HDMA table will use direct addressing format, not indirect format, so bit 6 stays reset also And finally, the window register is write-once and only one byte wide, so the transfer format is mode 0 This means each unit of data will be only one byte in size Then, to initiate the HDMA transfer on channel 3, we set bit 3 of $420C This could be done by loading in the constant #$08 into the register, but this would also disable any other HDMA transfers that were previously set up

Since the HDMA enable register is not readable, one way of properly initiating the transfer would be to set up all the channels first, then initiating them all at the same time Another way would be to keep track of which channels are currently enabled in a separate register, and logically OR with that value before writing it to $420C Now what would the HDMA table look like? The width of the left side of the window at the top of the screen is $60 pixels wide, and it stays at that position for $60 scanlines So the first entry to the table would just be $60 $60–set the register to $60 then wait for $60 scanlines Then, for $10 scanlines, the position of the window changes every scanline

The line count for the next entry in the table would be $10, and the continue flag should be set So the first byte would be $90 in this case Then, $10 bytes should follow, which would be the position of the window for these next $10 scanlines And finally, the position of the window doesn't change past this point of the screen So the HDMA transfer is complete, and $00 should be written at the end of the table

HDMA is very powerful since it allows easy modification of many different PPU registers in the middle of a rendering a frame The end result of one frame of rendering is just that–a single image; however, the values of the registers used to create that image were not constant and changed over time A great way of visualizing this change in the register is by recreating the image one scanline at a time by referencing a virtual copy of the screen that has the entire image rendered at once using only the values of the registers at that instant The end result is something that looks very similar to the rolling shutter effect that occurs in real life with certain video cameras Looking back at the windowing example, we see that even though there is only a single register that controls the left position of the window, it changes over time exactly when the image is rendered to produce the illusion that each scanline is controlled separately

The rest of this video will just be looking at examples of various PPU registers and what common effects can be produced by using DMA and HDMA transfers By using DMA to write to OAM in the middle of the frame, the number of objects on screen can be artificially increased This example was hinted to in the previous video; Super Mario Kart keeps track of two OAM mirrors, one for each half of the screen F-blank is enabled in order to perform this transfer, which explains the thick black bar in the middle of the screen If the animation is slowed down, each entry into OAM can be seen updating one after the other

Probably the most useful and versatile register to modify with HDMA is the background mode register Many games will opt to render a HUD or text box in one mode, and the rest of the game in a different mode This allows for the main portion of the game to take advantage of the higher bit-depth of one background mode, but allow the other portions of the screen to benefit from the multiple layers, like in Super Mario World here Scrolling the background layers mid-frame allows for wave-like effects In the first screen of Castlevania: Dracula X, both vertical waves for the fire and horizontal waves for the background was used

By scrolling the background horizontally in large jumps, parallax scrolling can be achieved without putting each background element on a separate layer In Donkey Kong Country 2, this effect is used for the clouds and the ocean combined The most well- known usage of HDMA is to achieve perspective effects via mode 7 scaling and rotation Mode 7 transformations are strictly linear, so in order to have non-linear transformations, HDMA must be used to modify the matrix parameters every scanline Many games made use of this effect, including F-zero and Pilotwings shown here

This was commonly used together with game mode switching in order to have a horizon and background, as well as a HUD on a different background layer The fixed color constant can be modified to create gradient effects with color math It can be used to change just the back area color, such as this one in Yoshi's Island And it can also be used along with color addition and subtraction–this is the method used to create the gradient in the text boxes in Final Fantasy III And finally, any windowing effect that isn't just vertical bars uses HDMA to modify the left and right boundaries of the window while the image is rendered

Super Metroid has a couple windowing effects, one for the Eye shining its light beam, and another for the power bombs And with that, the PPU-centric chapters of this series come to a close The next video will be about controllers and gamepads, and how input is taken from the player's hands and fed into the software As always, thank you for watching

Recommended

Recommended