//FPGA Core Development Series: Part 5

FPGA Core Development Series: Part 5

Last time, we talked about the graphics subsystem and how graphics work in general in the FPGA core. You learned some techniques used to draw things on the screen, and how graphics work on a scanline by scanline basis as opposed to drawing whole tiles to a framebuffer and flipping it. Generally, full framebuffers were quite rare back then as you need a lot of RAM to hold something like that, and on top of it, the RAM needs to be pretty fast.

I guess after the last post you probably are left wondering, how do we actually draw things on the screen though? What is the algorithm or setup that allows this? In this post, I will go over the graphics algorithms needed to draw sprites on the screen in the Raizing system. Although specific to objects, generally the same algorithms and approach can be applied to drawing the scroll layer, or any extra tile layer.

Basic Approach

The following is the sequence of events from a system perspective that happen prior to drawing sprites in the game program:

  1. VBLANK occurs
  2. The spriteram is wiped clean and reset to 0s.
  3. The game program writes the sprite data for the next frame to spriteram
  4. The renderer picks up and draws lines as they come.

Now, in the last post, I talked about the specific Sprite RAM format that the game uses. Each sprite entry takes up 64 bits. Each bit or set of bits corresponds to properties about the sprite. It tells you what the location in ROM is for the particular sprite, in addition to the coordinate location it is to be drawn at, as well as if it is flipped or not on the x or y axis (or both). Basically, you will retrieve the data from ROM for the sprite, and draw the appropriate part of it at the right x-axis location, when the v-line reaches the designated point.

Queueing

The first phase of drawing sprites is called queueing. There are a maximum of 256 sprites that can possibly exist on screen, with no limits on per-scanline sprite amounts. 256×64 is 16,384. 16,384 bits is 2,048 bytes which is the exact amount of sprite RAM the board has, and that you will have to scan through. Below is the code I use for queueing. It is part of a larger sprite state machine, and consists of 3 cycles:

//queueing phase
            case(st)
                0: begin //begin scanning the sprite position
                    if(spr<max_sprite) begin
                        if(!spr_idx_queue_reset && !spr_idx_queue[spr] && FAST_OBJ_QUEUE) begin //if the idx queue is initialized, and there's no sprite in this slot, skip over.
                            spr<=spr+1;
                            st<=st;
                        end else begin
                            GP9001RAM_GCU_ADDR<= sprite_addr_base;
                            GP9001RAM2_GCU_ADDR<= sprite_addr_base+3;
                        end
                    end else begin
                        if(spr_idx_queue_reset) spr_idx_queue_reset<=0; //queue is initialized completely on first line.

                        if(sprite_queue_n == 0) begin
                            busy<=0;
                            start<=1'b0;
                            spr_q_we<=1'b0;
                            // $display("%d", c);
                        end else begin
                            st<=4; //it found something, go to scanning
                            //setup the conditions for scanning
                            spr_q_we<=1'b0;
                            sprite_queue_priority_n_scan_buf_i<=0;
                            spr_scan_i<=0;
                            
                            priority_i<= pri_has_sprite[0] ? 0 :
                                         pri_has_sprite[1] ? 1 :
                                         pri_has_sprite[2] ? 2 :
                                         pri_has_sprite[3] ? 3 :
                                         pri_has_sprite[4] ? 4 :
                                         pri_has_sprite[5] ? 5 :
                                         pri_has_sprite[6] ? 6 :
                                         pri_has_sprite[7] ? 7 :
                                         pri_has_sprite[8] ? 8 :
                                         pri_has_sprite[9] ? 9 :
                                         pri_has_sprite[10] ? 10 :
                                         pri_has_sprite[11] ? 11 :
                                         pri_has_sprite[12] ? 12 :
                                         pri_has_sprite[13] ? 13 :
                                         pri_has_sprite[14] ? 14 :
                                         pri_has_sprite[15] ? 15 :
                                         16;
                        end
                    end
                end
                1: begin
                    spr_q_we<=1'b1;
                end
                2: begin //check if the sprite is active
                    if(GP9001RAM_GCU_DOUT[15]) begin //sprite is active
                        if(spr_idx_queue_reset) begin //fill spr idx queue on first line.
                            spr_idx_queue[spr]<=1;
                        end
                        mc=GP9001RAM_GCU_DOUT[14]; //is a multiconnected sprite
                        yfl= GP9001RAM_GCU_DOUT[13]; //is y-flipped
                        xfl= GP9001RAM_GCU_DOUT[12]; //is x-flipped
                        priority_l= GP9001RAM_GCU_DOUT[11:8]; //get the sprite priority

                        sprite_y_size_t = GP9001RAM2_GCU_DOUT[3:0];
                        sprite_y_pos_t = !mc ? 
                                            (GP9001RAM2_GCU_DOUT[15:7] + SPRITE_SCROLL_Y + SPRITE_SCROLL_YOFFS) & 'h1FF :
                                            (multiconnector_y + GP9001RAM2_GCU_DOUT[15:7]) & 'h1FF;
                        
                        if(yfl) sprite_y_pos_t=sprite_y_pos_t-((sprite_y_size_t+1) << 3) + 1;
                        
                        if(sprite_y_pos_t > 384 || (sprite_y_pos_t > 448 && yfl)) sprite_y_pos_t = sprite_y_pos_t - 'h200;
                        
                        if(sprite_y_pos_t < 0 && $signed(VRENDER) < $signed(sprite_y_pos_t + ((sprite_y_size_t+1) << 3))) begin 
                            sprite_y_size_t = (sprite_y_size_t+1) - (-sprite_y_pos_t>>3);
                            sprite_y_pos_t = 0;
                        end

                        if(VRENDER >= sprite_y_pos_t && VRENDER < (sprite_y_pos_t + ((sprite_y_size_t+1) << 3))) begin
                            $display("queue: %d, %d, %d, %h, %h, %h", VRENDER, sprite_y_size_t, sprite_y_pos_t, sprite_queue_priority_n[((priority_l+1)<<3)-1 -:8]+1, priority_l, spr[7:0]);
                            wr_spr_q <= spr[7:0];
                            wr_spr_q_addr<=((priority_l<<11) | sprite_queue_priority_n[((priority_l+1)<<3)-1 -: 8]);
                            pri_has_sprite[priority_l]<=1'b1;

                            sprite_queue_priority_n[((priority_l+1)<<3)-1 -:8] <= sprite_queue_priority_n[((priority_l+1)<<3)-1 -: 8]+1;
                            sprite_queue_n<=sprite_queue_n+1;                            
                        end

                        st<=0;//go to next sprite
                        spr<=spr+1;
                    end else begin //the sprite is not active, go to next sprite
                        st<=0;
                        spr<=spr+1;
                    end
                end
            endcase
  • State 0:
    This state keeps track of the sprite that is currently being looked at in RAM as possible to queue, and starts that process by making a request to the dpram that holds the sprite RAM. It also serves as a transition state to the scanning phase. I make multiple duplicates of sprite RAM as an optimization, otherwise it will take more cycles to get the data I need to make a decision. In this case, I need 2 pieces of information to make a decision on whether to queue that sprite for drawing or not:
    • I need to know whether the sprite is active or not, as well as the basic properties of the sprite (ie. is it flipped on the x/y axis, priority level).
    • I need to know the y-position of the sprite so that I can determine, in conjunction with the properties of the sprite, whether it should be drawn on the current line or not.
  • State 1:
    This is the wait state. Because a request to dpram takes 1 cycle to get there, and then one more cycle to get the data out, the data coming out is still invalid at this stage. So, I use this opportunity to set the sprite queue write enable to 1 so I can prepare to write to it in the next step.
  • State 2:
    This is where the bulk of the stuff happens as you can see.
    • First of all, assuming the sprite is active, I extract the properties I got of the sprite from dpram and put them in temporary regs.
    • Next, I calculate the actual y position of the sprite. The actual y position is determined by not only the sprite y position that was found in sprite ram, but you also have to add sprite scroll and also sprite offset to that to get the final position.
    • Moving along, if the sprite is y flipped, then basically the sprite is backwards and should be inverted along its y-axis. To calculate this, you take the size of the sprite (plus 1) * 8, and subtract that from the sprite-y position reported in sprite ram. Then you add 1 to that, because what happens is when sprites are y-flipped, they are off by 1 according to the way the entries are made by the game program (ie. this is the “seam” point in a symmetrical sprite that is composed of a mirror and non-mirrored half).
    • If this results in the sprite being off the screen, that is, some part of the sprite has a starting coordinate that is less than 0, you need to compensate for this by shifting the position to 0, and then subtracting the part that is “off-screen”.
    • At this point, with all the above, I have a final starting y-coordinate for the sprite to be drawn. I know the final compensated height of the sprite to be drawn as well. So, the starting y-position + the height of the sprite is the space it takes up. I know the current v-line that is being rendered (as we only draw scanline by scanline). Using this information, I can detect whether the current sprite I am taking a look at intersects this line or not. If it does, I queue it in the sprite queue. If not, I skip to the next sprite in spriteram and start the process all over again.

This process is done for a max of 256 entries. That means that in the worst case, it could possibly take 768 cycles to scan through all the entries. As a result of the tight timing requirement for drawing sprites due to complexity, I created an optimization to avoid scanning everything if we know where all the active sprites are for the frame. How it works is every vblank, I create a vector on the first scan that has 256 bit positions. If I find a sprite that’s active, I toggle the bit. This way, on the first scan, I can avoid scanning everything through over again in subsequent lines until the next vblank, thereby saving lots of precious cycles that I can use for fetching data from SDRAM, which is the most time consuming process.

Scanning

//scanning phase
            case(st)
                4: begin
                    pri_has_sprite[priority_i]<=0;
                    if(sprite_queue_priority_n[((priority_i+1)<<3)-1 -: 8] > 0 && priority_i<max_priority) begin //if there are sprites in this priority level
                        // $display("scan: %d", priority_i);
                        sprite_queue_priority_n_scan_buf_i <= sprite_queue_priority_n[((priority_i+1)<<3)-1 -: 8];
                        st<=5;
                        //setup the conditions for rendering
                        sprite_queue_i<=0;
                        tx<=0;
                        spr_x_render<=0;
                    end else begin //there are no sprites in this priority level
                        //there are no more priority levels to go, exit
                        busy<=0;
                        start<=1'b0;
                        st<=0;
                        // $display("%d", c);
                    end
                end
            endcase

Once we have the sprites we want to draw for the scanline all in the queue, the next step is scanning. Scanning entails drawing the sprites in priority order and also RAM order. For each priority level, you must draw the sprites in the same order that they appeared ordinally in sprite RAM, or else some objects will appear out of order layer wise.

I also created another optimization here that involves a vector holding the priority levels the current queue’s sprites appear in. By doing this, I save a few cycles as I know which priority levels I need to draw to. In this step I setup for rendering, the final phase.

Rendering

The final phase, rendering, is undoubtedly the most complex part of the object drawing process. We will take it step by step:

6: begin
                    if(sprite_queue_i < sprite_queue_priority_n_scan_buf_i) begin //if not all the sprites have been rendered from the queue
                        // $display("render: %d, %d, %d", VRENDER, priority_i, sprite_queue_priority_n_scan_buf_i);
                        spr<=spr_q_out;
                        spr_x_render<=0;
                        //it takes 2 clock cycles to get the first data
                    end else begin //if all the sprites have been rendered from this priority level
                        if(pri_has_sprite> 0) begin // and there are still more priority levels to go
                            st<=4; //go back to scanning
                            //setup conditions for scanning
                            sprite_queue_priority_n_scan_buf_i<=0;
                            spr_scan_i<=0;

                            priority_i<= pri_has_sprite[0] ? 0 :
                                         pri_has_sprite[1] ? 1 :
                                         pri_has_sprite[2] ? 2 :
                                         pri_has_sprite[3] ? 3 :
                                         pri_has_sprite[4] ? 4 :
                                         pri_has_sprite[5] ? 5 :
                                         pri_has_sprite[6] ? 6 :
                                         pri_has_sprite[7] ? 7 :
                                         pri_has_sprite[8] ? 8 :
                                         pri_has_sprite[9] ? 9 :
                                         pri_has_sprite[10] ? 10 :
                                         pri_has_sprite[11] ? 11 :
                                         pri_has_sprite[12] ? 12 :
                                         pri_has_sprite[13] ? 13 :
                                         pri_has_sprite[14] ? 14 :
                                         pri_has_sprite[15] ? 15 :
                                         16;
                        end else begin //if all is done, end
                            busy<=0;
                            start<=1'b0;
                            st<=0;
                            // $display("%d", c);
                        end
                    end
                end

State 6 is an iterator that keeps track of the sprite that is being drawn in this priority level. Basically, we increment the sprite index to be drawn in the queue from this step, and it also serves as a transition state to the previous scanning state if the priority level is to be incremented (after all sprites are drawn in this priority level).

7: begin
                    GP9001RAM_GCU_ADDR<=sprite_addr_base;
                    GP9001RAM2_GCU_ADDR <= sprite_addr_base + 1;
                end
                8:  begin
                    GP9001RAM_GCU_ADDR<=sprite_addr_base + 2;
                    GP9001RAM2_GCU_ADDR <= sprite_addr_base + 3;
                end
                9: begin
                    sprite_attributes[63:48] <= GP9001RAM_GCU_DOUT; 
                    sprite_attributes[47:32] <= GP9001RAM2_GCU_DOUT;
                end
                10: begin 
                    sprite_attributes[31:16] <= GP9001RAM_GCU_DOUT;
                    sprite_attributes[15:0] <= GP9001RAM2_GCU_DOUT;
                end
                11: begin
                    // $display("attrib: %d:%h, cur_pri: %d", sprite_attributes[59:56],sprite_attributes, priority_i);
                    xflip<=sprite_attributes[60];
                    yflip<=sprite_attributes[61];
                    palette<=(sprite_attributes[55:50]<<4);
                    sprite_num<=(sprite_attributes[47:32] & 'h7FFF);
                    sprite_bank<=sprite_attributes[49:47];
                    sprite_x_size<=sprite_attributes[19:16];
                    sprite_y_size<=sprite_attributes[3:0];
                    
                    if(sprite_attributes[62]) begin //is a multiconnected sprite
                        sprite_x_pos <= (multiconnector_x + sprite_attributes[31:23]) & 'h1ff;
                        sprite_y_pos <= (multiconnector_y + GP9001RAM2_GCU_DOUT[15:7]) & 'h1ff;
                    end else begin
                        sprite_x_pos<=(sprite_attributes[31:23]+SPRITE_SCROLL_X+SPRITE_SCROLL_XOFFS) & 'h01FF;
                        sprite_y_pos<=(GP9001RAM2_GCU_DOUT[15:7]+SPRITE_SCROLL_Y+SPRITE_SCROLL_YOFFS) & 'h01FF;
                    end   
                end
                12: begin
                    $display("xpos: %d %d %d %d %d", sprite_x_pos, SPRITE_SCROLL_X, SPRITE_SCROLL_XOFFS, (sprite_attributes[31:23]+SPRITE_SCROLL_X+SPRITE_SCROLL_XOFFS) & 'h01FF, sprite_x_size);
                    // $display("ypos: %d %d %d %d %d", sprite_attributes[15:7], SPRITE_SCROLL_Y, SPRITE_SCROLL_YOFFS, (sprite_attributes[15:7]+SPRITE_SCROLL_Y+SPRITE_SCROLL_YOFFS) & 'h01FF, sprite_y_size);
                    
                    //process flips on x and y axis for sprite
                    multiconnector_x<=sprite_x_pos;
                    multiconnector_y<=sprite_y_pos;

                    sprite_y_pos_t=sprite_y_pos;
                    sprite_y_size_t=sprite_y_size;

                    if(xflip) begin
                        if($signed(sprite_x_pos-7) > 448) begin
                            sprite_x_pos <= $signed(sprite_x_pos - 'h200 - 'd7);
                        end
                        else begin
                            sprite_x_pos<=$signed(sprite_x_pos-7);
                        end
                    end else begin

                        if($signed(sprite_x_pos) > 384) begin
                            sprite_x_pos <= $signed(sprite_x_pos - 'h200);
                        end
                    end

                    if(yflip) begin
                        if($signed(sprite_y_pos-((sprite_y_size + 1) << 3)) > 384) begin
                             sprite_y_pos_t = $signed(sprite_y_pos - 'h200);
                        end
                        else begin
                            sprite_y_pos_t=$signed(sprite_y_pos - ((sprite_y_size + 1) << 3));
                        end
                        sprite_y_pos_t = sprite_y_pos_t+1;
                    end

                    sprite_y_pos<=sprite_y_pos_t;
                    sprite_y_size<=sprite_y_size_t;

                    //setup conditions for drawing
                    spr_x_render<=0;
                    if(xflip) tx<=7;
                    else tx<=0;

                    st<=15;
                end

States 7-12 focus on retrieving all 64 bits of the sprite entry from sprite RAM. Remember, the queue we created only contains pointers to indices in sprite RAM as well as the priority level. We use this information to retrieve the real entry from current Sprite RAM to setup for drawing. There are some basic calculations done here to setup the x-position of the sprite to be drawn, because remember, once we determine a sprite should be drawn on the current scanline, the y position is now irrelevant. We now must focus on where to draw it horizontally. So, there are similar calculations done here for cases where the sprite is off screen, flipped, etc. that we did for the y-axis in the queueing state.

15: begin //finally draw the sprite
                    if(spr_x_render < (sprite_x_size + 1)) begin //if not all the tiles in the sprite have been drawn
                        if(sprite_num <= max_sprite_num) begin //and the sprite is within the active area
                            GFX_CS<=1'b1;
                            TILE_NUMBER<=sprite_num;
                            TILE_NUMBER_OFFS<=tile_offs;
                            TILE_BANK<=sprite_bank;
                        end else begin //the sprite is out of bounds, don't render it and skip to the next sprite.
                            // $display("sprite out of bounds: %d", sprite_x_pos);
                            st<=5;
                            tx<=0;
                            GFX_CS<=1'b0;
                            spr_x_render<=0;
                            buf_we<=1'b0;
                            sprite_queue_i<=sprite_queue_i+1;
                        end
                    end else begin //all the tiles in the sprite have been drawn, go to the next sprite in the queue
                        // $display("all tiles drawn, next sprite");
                        st<=5;
                        tx<=0;
                        GFX_CS<=1'b0;
                        spr_x_render<=0;
                        buf_we<=1'b0;
                        sprite_queue_i<=sprite_queue_i+1;
                    end
                end
                16: st<=17; //wait state
                17: begin //pull the tile slice for a tile in the sprite
                    if(spr_x_render == (sprite_x_size + 1)) begin
                        st<=5;
                        tx<=0;
                        GFX_CS<=1'b0;
                        spr_x_render<=0;
                        buf_we<=1'b0;
                        sprite_queue_i<=sprite_queue_i+1;
                    end else begin
                        if(GFX_OK) begin
                            GFX_CS<=1'b0;
                            TILE_NUMBER<=0;
                            TILE_NUMBER_OFFS<=0;
                            TILE_BANK<=0;
                            sprite_line<=GFX_DATA;
                            $display("%d, %d, %d, %h, %h %h", VRENDER, sprite_x_pos, sprite_x_size, TILE_NUMBER, TILE_NUMBER_OFFS, GFX_DATA);
                            // $display("%d %d %d", tiles_across, tiles_down, cur_row_lines_down);
                            st<=18;
                            buf_we<=1'b1;
                            if(xflip) tx<=7;
                            else tx<= 0;
                            drawn_pixels = 8'h0;
                        end else begin //the sprite was not ready yet from sdram.
                            st<=st;
                        end
                    end
                end
                18: begin
                    if(spr_x_render + 1 < (sprite_x_size + 1)) begin //pre-emptive
                        GFX_CS<=1'b1;
                        TILE_NUMBER<=sprite_num;
                        TILE_NUMBER_OFFS<=tile_offs+32;
                        TILE_BANK<=sprite_bank;
                    end
                    st<=22;
                end
                22: begin //draw the slice, every slice is 8 pixels
                    if(xflip) tx<=tx-1;
                    else tx<= tx+1;

                    // tx <= (xflip ? tx - 1 : tx + 1);
                    
                    if( sprite_code > 0 && buf_code>=0 && buf_code<320) begin //if the sprite is not blank
                        drawn_pixels[tx] = 1'b1;
                        buf_addr<=(FLIPX ? 319-buf_code : buf_code)&'h1FF;
                        buf_data<=((SHIFT_SPRITE_PRI ? sprite_attributes[59:56] + 1 : sprite_attributes[59:56]) << 12) + (palette&'h7F0)+sprite_code;
                    end else begin //it is a blank sprite
                        //do nothing, because other layers of sprites might be on top.
                    end
                    
                    if(xflip ? tx == 0 : tx == 7) st<=30;
                    else st<=st;
                end
                30: begin //go to next slice
                    drawn_pixels = 8'h0;
                    buf_we<=1'b0;
                    if(xflip) tx<=7;
                    else tx<=0;
                    spr_x_render<=spr_x_render+1;
                    st<=17;
                end
            endcase

The final part is rendering. That is, drawing the pixel to a line buffer. A sprite can be composed of multiple 8×8 tiles as I mentioned in the last post about the system, so state 15 serves as a sort of iterator to keep track of which tile is to be drawn. As an optimization, after fetching the first data, while it is unpacking and drawing the 8 pixels of the tile, I make another request to SDRAM to pre-emptively fetch the next one. This is an important optimization that increases the efficiency of the object drawing engine and makes it more statistically possible that things will be drawn on time. Of course, it’s not perfect, but more likely than not, no striping glitches will occur except in exceptional circumstances.

Drawing a tile takes 8 cycles. Fetching a tile from SDRAM also takes around the same amount of time.

There are a couple of extra bits I added here as well, like flipping the sprite if the player has the flip dip switch active, and also there’s a hack in there specific to Sorcer Striker, the SHIFT_SPRITE_PRI option.

After the Line Buffer

So, all this state machine does is make the final entries in the line buffer that include not only the pixels (ie. location in a palette), but also the priority level of those pixels. After the data enters the line buffer, it goes to a palette module that is only responsible for calculating colors, matching them in a palette and outputting them at the rate of the pixel clock.

Of course, before this point you need to mux the colors from the other layers together and determine which layer should be on top for that particular pixel.

Other Layers?

Drawing the other layers actually isn’t so different from this approach. They are far less complicated, yes, but still pretty similar. Since the other layers do not have tiles in random parts of the screen and are not movable, just scrollable, they are a lot easier to draw. For instance, there’s no need for any intersection algorithm. You will always know the current line being drawn corresponds to a specific location in the RAM. Sure, there’s a scroll and offset value that gets appended to that, but that’s for ALL tiles on screen. So, think of these other layers as a flat sheet of paper that can be moved up and down as opposed to independent tiles that can be in any random area of the screen.