Last time, we talked about the graphics subsystem and how graphics work in general in the FPGA core. You learned some techniques used to draw things on the screen, and how graphics work on a scanline by scanline basis as opposed to drawing whole tiles to a framebuffer and flipping it. Generally, full framebuffers were quite rare back then as you need a lot of RAM to hold something like that, and on top of it, the RAM needs to be pretty fast.
I guess after the last post you probably are left wondering, how do we actually draw things on the screen though? What is the algorithm or setup that allows this? In this post, I will go over the graphics algorithms needed to draw sprites on the screen in the Raizing system. Although specific to objects, generally the same algorithms and approach can be applied to drawing the scroll layer, or any extra tile layer.
The following is the sequence of events from a system perspective that happen prior to drawing sprites in the game program:
- VBLANK occurs
- The spriteram is wiped clean and reset to 0s.
- The game program writes the sprite data for the next frame to spriteram
- The renderer picks up and draws lines as they come.
Now, in the last post, I talked about the specific Sprite RAM format that the game uses. Each sprite entry takes up 64 bits. Each bit or set of bits corresponds to properties about the sprite. It tells you what the location in ROM is for the particular sprite, in addition to the coordinate location it is to be drawn at, as well as if it is flipped or not on the x or y axis (or both). Basically, you will retrieve the data from ROM for the sprite, and draw the appropriate part of it at the right x-axis location, when the v-line reaches the designated point.
The first phase of drawing sprites is called queueing. There are a maximum of 256 sprites that can possibly exist on screen, with no limits on per-scanline sprite amounts. 256×64 is 16,384. 16,384 bits is 2,048 bytes which is the exact amount of sprite RAM the board has, and that you will have to scan through. Below is the code I use for queueing. It is part of a larger sprite state machine, and consists of 3 cycles:
//queueing phase case(st) 0: begin //begin scanning the sprite position if(spr<max_sprite) begin if(!spr_idx_queue_reset && !spr_idx_queue[spr] && FAST_OBJ_QUEUE) begin //if the idx queue is initialized, and there's no sprite in this slot, skip over. spr<=spr+1; st<=st; end else begin GP9001RAM_GCU_ADDR<= sprite_addr_base; GP9001RAM2_GCU_ADDR<= sprite_addr_base+3; end end else begin if(spr_idx_queue_reset) spr_idx_queue_reset<=0; //queue is initialized completely on first line. if(sprite_queue_n == 0) begin busy<=0; start<=1'b0; spr_q_we<=1'b0; // $display("%d", c); end else begin st<=4; //it found something, go to scanning //setup the conditions for scanning spr_q_we<=1'b0; sprite_queue_priority_n_scan_buf_i<=0; spr_scan_i<=0; priority_i<= pri_has_sprite ? 0 : pri_has_sprite ? 1 : pri_has_sprite ? 2 : pri_has_sprite ? 3 : pri_has_sprite ? 4 : pri_has_sprite ? 5 : pri_has_sprite ? 6 : pri_has_sprite ? 7 : pri_has_sprite ? 8 : pri_has_sprite ? 9 : pri_has_sprite ? 10 : pri_has_sprite ? 11 : pri_has_sprite ? 12 : pri_has_sprite ? 13 : pri_has_sprite ? 14 : pri_has_sprite ? 15 : 16; end end end 1: begin spr_q_we<=1'b1; end 2: begin //check if the sprite is active if(GP9001RAM_GCU_DOUT) begin //sprite is active if(spr_idx_queue_reset) begin //fill spr idx queue on first line. spr_idx_queue[spr]<=1; end mc=GP9001RAM_GCU_DOUT; //is a multiconnected sprite yfl= GP9001RAM_GCU_DOUT; //is y-flipped xfl= GP9001RAM_GCU_DOUT; //is x-flipped priority_l= GP9001RAM_GCU_DOUT[11:8]; //get the sprite priority sprite_y_size_t = GP9001RAM2_GCU_DOUT[3:0]; sprite_y_pos_t = !mc ? (GP9001RAM2_GCU_DOUT[15:7] + SPRITE_SCROLL_Y + SPRITE_SCROLL_YOFFS) & 'h1FF : (multiconnector_y + GP9001RAM2_GCU_DOUT[15:7]) & 'h1FF; if(yfl) sprite_y_pos_t=sprite_y_pos_t-((sprite_y_size_t+1) << 3) + 1; if(sprite_y_pos_t > 384 || (sprite_y_pos_t > 448 && yfl)) sprite_y_pos_t = sprite_y_pos_t - 'h200; if(sprite_y_pos_t < 0 && $signed(VRENDER) < $signed(sprite_y_pos_t + ((sprite_y_size_t+1) << 3))) begin sprite_y_size_t = (sprite_y_size_t+1) - (-sprite_y_pos_t>>3); sprite_y_pos_t = 0; end if(VRENDER >= sprite_y_pos_t && VRENDER < (sprite_y_pos_t + ((sprite_y_size_t+1) << 3))) begin $display("queue: %d, %d, %d, %h, %h, %h", VRENDER, sprite_y_size_t, sprite_y_pos_t, sprite_queue_priority_n[((priority_l+1)<<3)-1 -:8]+1, priority_l, spr[7:0]); wr_spr_q <= spr[7:0]; wr_spr_q_addr<=((priority_l<<11) | sprite_queue_priority_n[((priority_l+1)<<3)-1 -: 8]); pri_has_sprite[priority_l]<=1'b1; sprite_queue_priority_n[((priority_l+1)<<3)-1 -:8] <= sprite_queue_priority_n[((priority_l+1)<<3)-1 -: 8]+1; sprite_queue_n<=sprite_queue_n+1; end st<=0;//go to next sprite spr<=spr+1; end else begin //the sprite is not active, go to next sprite st<=0; spr<=spr+1; end end endcase
- State 0:
This state keeps track of the sprite that is currently being looked at in RAM as possible to queue, and starts that process by making a request to the dpram that holds the sprite RAM. It also serves as a transition state to the scanning phase. I make multiple duplicates of sprite RAM as an optimization, otherwise it will take more cycles to get the data I need to make a decision. In this case, I need 2 pieces of information to make a decision on whether to queue that sprite for drawing or not:
- I need to know whether the sprite is active or not, as well as the basic properties of the sprite (ie. is it flipped on the x/y axis, priority level).
- I need to know the y-position of the sprite so that I can determine, in conjunction with the properties of the sprite, whether it should be drawn on the current line or not.
- State 1:
This is the wait state. Because a request to dpram takes 1 cycle to get there, and then one more cycle to get the data out, the data coming out is still invalid at this stage. So, I use this opportunity to set the sprite queue write enable to 1 so I can prepare to write to it in the next step.
- State 2:
This is where the bulk of the stuff happens as you can see.
- First of all, assuming the sprite is active, I extract the properties I got of the sprite from dpram and put them in temporary regs.
- Next, I calculate the actual y position of the sprite. The actual y position is determined by not only the sprite y position that was found in sprite ram, but you also have to add sprite scroll and also sprite offset to that to get the final position.
- Moving along, if the sprite is y flipped, then basically the sprite is backwards and should be inverted along its y-axis. To calculate this, you take the size of the sprite (plus 1) * 8, and subtract that from the sprite-y position reported in sprite ram. Then you add 1 to that, because what happens is when sprites are y-flipped, they are off by 1 according to the way the entries are made by the game program (ie. this is the “seam” point in a symmetrical sprite that is composed of a mirror and non-mirrored half).
- If this results in the sprite being off the screen, that is, some part of the sprite has a starting coordinate that is less than 0, you need to compensate for this by shifting the position to 0, and then subtracting the part that is “off-screen”.
- At this point, with all the above, I have a final starting y-coordinate for the sprite to be drawn. I know the final compensated height of the sprite to be drawn as well. So, the starting y-position + the height of the sprite is the space it takes up. I know the current v-line that is being rendered (as we only draw scanline by scanline). Using this information, I can detect whether the current sprite I am taking a look at intersects this line or not. If it does, I queue it in the sprite queue. If not, I skip to the next sprite in spriteram and start the process all over again.
This process is done for a max of 256 entries. That means that in the worst case, it could possibly take 768 cycles to scan through all the entries. As a result of the tight timing requirement for drawing sprites due to complexity, I created an optimization to avoid scanning everything if we know where all the active sprites are for the frame. How it works is every vblank, I create a vector on the first scan that has 256 bit positions. If I find a sprite that’s active, I toggle the bit. This way, on the first scan, I can avoid scanning everything through over again in subsequent lines until the next vblank, thereby saving lots of precious cycles that I can use for fetching data from SDRAM, which is the most time consuming process.
//scanning phase case(st) 4: begin pri_has_sprite[priority_i]<=0; if(sprite_queue_priority_n[((priority_i+1)<<3)-1 -: 8] > 0 && priority_i<max_priority) begin //if there are sprites in this priority level // $display("scan: %d", priority_i); sprite_queue_priority_n_scan_buf_i <= sprite_queue_priority_n[((priority_i+1)<<3)-1 -: 8]; st<=5; //setup the conditions for rendering sprite_queue_i<=0; tx<=0; spr_x_render<=0; end else begin //there are no sprites in this priority level //there are no more priority levels to go, exit busy<=0; start<=1'b0; st<=0; // $display("%d", c); end end endcase
Once we have the sprites we want to draw for the scanline all in the queue, the next step is scanning. Scanning entails drawing the sprites in priority order and also RAM order. For each priority level, you must draw the sprites in the same order that they appeared ordinally in sprite RAM, or else some objects will appear out of order layer wise.
I also created another optimization here that involves a vector holding the priority levels the current queue’s sprites appear in. By doing this, I save a few cycles as I know which priority levels I need to draw to. In this step I setup for rendering, the final phase.
The final phase, rendering, is undoubtedly the most complex part of the object drawing process. We will take it step by step:
6: begin if(sprite_queue_i < sprite_queue_priority_n_scan_buf_i) begin //if not all the sprites have been rendered from the queue // $display("render: %d, %d, %d", VRENDER, priority_i, sprite_queue_priority_n_scan_buf_i); spr<=spr_q_out; spr_x_render<=0; //it takes 2 clock cycles to get the first data end else begin //if all the sprites have been rendered from this priority level if(pri_has_sprite> 0) begin // and there are still more priority levels to go st<=4; //go back to scanning //setup conditions for scanning sprite_queue_priority_n_scan_buf_i<=0; spr_scan_i<=0; priority_i<= pri_has_sprite ? 0 : pri_has_sprite ? 1 : pri_has_sprite ? 2 : pri_has_sprite ? 3 : pri_has_sprite ? 4 : pri_has_sprite ? 5 : pri_has_sprite ? 6 : pri_has_sprite ? 7 : pri_has_sprite ? 8 : pri_has_sprite ? 9 : pri_has_sprite ? 10 : pri_has_sprite ? 11 : pri_has_sprite ? 12 : pri_has_sprite ? 13 : pri_has_sprite ? 14 : pri_has_sprite ? 15 : 16; end else begin //if all is done, end busy<=0; start<=1'b0; st<=0; // $display("%d", c); end end end
State 6 is an iterator that keeps track of the sprite that is being drawn in this priority level. Basically, we increment the sprite index to be drawn in the queue from this step, and it also serves as a transition state to the previous scanning state if the priority level is to be incremented (after all sprites are drawn in this priority level).
7: begin GP9001RAM_GCU_ADDR<=sprite_addr_base; GP9001RAM2_GCU_ADDR <= sprite_addr_base + 1; end 8: begin GP9001RAM_GCU_ADDR<=sprite_addr_base + 2; GP9001RAM2_GCU_ADDR <= sprite_addr_base + 3; end 9: begin sprite_attributes[63:48] <= GP9001RAM_GCU_DOUT; sprite_attributes[47:32] <= GP9001RAM2_GCU_DOUT; end 10: begin sprite_attributes[31:16] <= GP9001RAM_GCU_DOUT; sprite_attributes[15:0] <= GP9001RAM2_GCU_DOUT; end 11: begin // $display("attrib: %d:%h, cur_pri: %d", sprite_attributes[59:56],sprite_attributes, priority_i); xflip<=sprite_attributes; yflip<=sprite_attributes; palette<=(sprite_attributes[55:50]<<4); sprite_num<=(sprite_attributes[47:32] & 'h7FFF); sprite_bank<=sprite_attributes[49:47]; sprite_x_size<=sprite_attributes[19:16]; sprite_y_size<=sprite_attributes[3:0]; if(sprite_attributes) begin //is a multiconnected sprite sprite_x_pos <= (multiconnector_x + sprite_attributes[31:23]) & 'h1ff; sprite_y_pos <= (multiconnector_y + GP9001RAM2_GCU_DOUT[15:7]) & 'h1ff; end else begin sprite_x_pos<=(sprite_attributes[31:23]+SPRITE_SCROLL_X+SPRITE_SCROLL_XOFFS) & 'h01FF; sprite_y_pos<=(GP9001RAM2_GCU_DOUT[15:7]+SPRITE_SCROLL_Y+SPRITE_SCROLL_YOFFS) & 'h01FF; end end 12: begin $display("xpos: %d %d %d %d %d", sprite_x_pos, SPRITE_SCROLL_X, SPRITE_SCROLL_XOFFS, (sprite_attributes[31:23]+SPRITE_SCROLL_X+SPRITE_SCROLL_XOFFS) & 'h01FF, sprite_x_size); // $display("ypos: %d %d %d %d %d", sprite_attributes[15:7], SPRITE_SCROLL_Y, SPRITE_SCROLL_YOFFS, (sprite_attributes[15:7]+SPRITE_SCROLL_Y+SPRITE_SCROLL_YOFFS) & 'h01FF, sprite_y_size); //process flips on x and y axis for sprite multiconnector_x<=sprite_x_pos; multiconnector_y<=sprite_y_pos; sprite_y_pos_t=sprite_y_pos; sprite_y_size_t=sprite_y_size; if(xflip) begin if($signed(sprite_x_pos-7) > 448) begin sprite_x_pos <= $signed(sprite_x_pos - 'h200 - 'd7); end else begin sprite_x_pos<=$signed(sprite_x_pos-7); end end else begin if($signed(sprite_x_pos) > 384) begin sprite_x_pos <= $signed(sprite_x_pos - 'h200); end end if(yflip) begin if($signed(sprite_y_pos-((sprite_y_size + 1) << 3)) > 384) begin sprite_y_pos_t = $signed(sprite_y_pos - 'h200); end else begin sprite_y_pos_t=$signed(sprite_y_pos - ((sprite_y_size + 1) << 3)); end sprite_y_pos_t = sprite_y_pos_t+1; end sprite_y_pos<=sprite_y_pos_t; sprite_y_size<=sprite_y_size_t; //setup conditions for drawing spr_x_render<=0; if(xflip) tx<=7; else tx<=0; st<=15; end
States 7-12 focus on retrieving all 64 bits of the sprite entry from sprite RAM. Remember, the queue we created only contains pointers to indices in sprite RAM as well as the priority level. We use this information to retrieve the real entry from current Sprite RAM to setup for drawing. There are some basic calculations done here to setup the x-position of the sprite to be drawn, because remember, once we determine a sprite should be drawn on the current scanline, the y position is now irrelevant. We now must focus on where to draw it horizontally. So, there are similar calculations done here for cases where the sprite is off screen, flipped, etc. that we did for the y-axis in the queueing state.
15: begin //finally draw the sprite if(spr_x_render < (sprite_x_size + 1)) begin //if not all the tiles in the sprite have been drawn if(sprite_num <= max_sprite_num) begin //and the sprite is within the active area GFX_CS<=1'b1; TILE_NUMBER<=sprite_num; TILE_NUMBER_OFFS<=tile_offs; TILE_BANK<=sprite_bank; end else begin //the sprite is out of bounds, don't render it and skip to the next sprite. // $display("sprite out of bounds: %d", sprite_x_pos); st<=5; tx<=0; GFX_CS<=1'b0; spr_x_render<=0; buf_we<=1'b0; sprite_queue_i<=sprite_queue_i+1; end end else begin //all the tiles in the sprite have been drawn, go to the next sprite in the queue // $display("all tiles drawn, next sprite"); st<=5; tx<=0; GFX_CS<=1'b0; spr_x_render<=0; buf_we<=1'b0; sprite_queue_i<=sprite_queue_i+1; end end 16: st<=17; //wait state 17: begin //pull the tile slice for a tile in the sprite if(spr_x_render == (sprite_x_size + 1)) begin st<=5; tx<=0; GFX_CS<=1'b0; spr_x_render<=0; buf_we<=1'b0; sprite_queue_i<=sprite_queue_i+1; end else begin if(GFX_OK) begin GFX_CS<=1'b0; TILE_NUMBER<=0; TILE_NUMBER_OFFS<=0; TILE_BANK<=0; sprite_line<=GFX_DATA; $display("%d, %d, %d, %h, %h %h", VRENDER, sprite_x_pos, sprite_x_size, TILE_NUMBER, TILE_NUMBER_OFFS, GFX_DATA); // $display("%d %d %d", tiles_across, tiles_down, cur_row_lines_down); st<=18; buf_we<=1'b1; if(xflip) tx<=7; else tx<= 0; drawn_pixels = 8'h0; end else begin //the sprite was not ready yet from sdram. st<=st; end end end 18: begin if(spr_x_render + 1 < (sprite_x_size + 1)) begin //pre-emptive GFX_CS<=1'b1; TILE_NUMBER<=sprite_num; TILE_NUMBER_OFFS<=tile_offs+32; TILE_BANK<=sprite_bank; end st<=22; end 22: begin //draw the slice, every slice is 8 pixels if(xflip) tx<=tx-1; else tx<= tx+1; // tx <= (xflip ? tx - 1 : tx + 1); if( sprite_code > 0 && buf_code>=0 && buf_code<320) begin //if the sprite is not blank drawn_pixels[tx] = 1'b1; buf_addr<=(FLIPX ? 319-buf_code : buf_code)&'h1FF; buf_data<=((SHIFT_SPRITE_PRI ? sprite_attributes[59:56] + 1 : sprite_attributes[59:56]) << 12) + (palette&'h7F0)+sprite_code; end else begin //it is a blank sprite //do nothing, because other layers of sprites might be on top. end if(xflip ? tx == 0 : tx == 7) st<=30; else st<=st; end 30: begin //go to next slice drawn_pixels = 8'h0; buf_we<=1'b0; if(xflip) tx<=7; else tx<=0; spr_x_render<=spr_x_render+1; st<=17; end endcase
The final part is rendering. That is, drawing the pixel to a line buffer. A sprite can be composed of multiple 8×8 tiles as I mentioned in the last post about the system, so state 15 serves as a sort of iterator to keep track of which tile is to be drawn. As an optimization, after fetching the first data, while it is unpacking and drawing the 8 pixels of the tile, I make another request to SDRAM to pre-emptively fetch the next one. This is an important optimization that increases the efficiency of the object drawing engine and makes it more statistically possible that things will be drawn on time. Of course, it’s not perfect, but more likely than not, no striping glitches will occur except in exceptional circumstances.
Drawing a tile takes 8 cycles. Fetching a tile from SDRAM also takes around the same amount of time.
There are a couple of extra bits I added here as well, like flipping the sprite if the player has the flip dip switch active, and also there’s a hack in there specific to Sorcer Striker, the SHIFT_SPRITE_PRI option.
After the Line Buffer
So, all this state machine does is make the final entries in the line buffer that include not only the pixels (ie. location in a palette), but also the priority level of those pixels. After the data enters the line buffer, it goes to a palette module that is only responsible for calculating colors, matching them in a palette and outputting them at the rate of the pixel clock.
Of course, before this point you need to mux the colors from the other layers together and determine which layer should be on top for that particular pixel.
Drawing the other layers actually isn’t so different from this approach. They are far less complicated, yes, but still pretty similar. Since the other layers do not have tiles in random parts of the screen and are not movable, just scrollable, they are a lot easier to draw. For instance, there’s no need for any intersection algorithm. You will always know the current line being drawn corresponds to a specific location in the RAM. Sure, there’s a scroll and offset value that gets appended to that, but that’s for ALL tiles on screen. So, think of these other layers as a flat sheet of paper that can be moved up and down as opposed to independent tiles that can be in any random area of the screen.