The TIA’s design for low transistor count is what makes Atari VCS programming “interesting”:
One of the means to achieve the low transister count was using linear-feedback shift register (LFSR) for the horizontal postion counters (what replaced the counter chains of early video game hardware). On the one hand, this saved a few transisters, on the other hand, these LFSRs are not progressing linearly, but over a pseudo random order.
While there’s still a unique match for each sprite positions and a given screen pixel, we, as a programmer, can’t just tell TIA chip where to place the sprite. Instead, we have to tell the TIA, “now”, and the TIA will latch the counter value. The problem being, the TIA runs at 3 times the clock speed of the 6502 CPU. (It actually clocks the CPU at every three TIA “color clocks” and the 6507 is actually a “slave” [sorry for the term, but I really don’t know a better one in this context] to the TIA. This not being in control of the CPU is also the reason, why we can tell the TIA to put the CPU to sleep until it reaches horisontal sync, in order to synchronize the program with the screen. It is really the TIA, which is in control.) Anyways, the CPU runs at a third of the speed of TIA and we have to wait for a position, with the tightes loop in 6502/6507 code amounting to 5 CPU cycles or 15 screen pixels. Oops.
However, there is also the equivalent of the slipping counter chains of early video games with a stop code of 7, meaning, we can displace a sprite by up to 7 pixels to the left or right, in order to fine adjust the position at up to 15 pixels. Which matches somewhat fortunately the granularity of our 6502/6507 tight delay loop. There are even registers for this, which actually accept somewhat sane vales. (They are only shifted right by 4 bits, in the high nibble. In contrast to this, the bytes for the “fat” backround pixels are not in sequential order. But this saves a few other transistors.) We “only” have to deduce the suitable offset to our coarse postion resulting from the wait loop…
The other way the TIA saves a lot of transistors is by not “knowing” anything about vertical positions, screen extend and refresh. It’s just not implemented. Anything vertical is to be handled by our program. Meaning, we have to count scanlines, turn on video after the end of vertical blank, count visible screen lines and orchestrate them (what’s called a video kernel), switch off the video at the lower border, count up to vertical sync and orchestrate the VSYNC pattern. (The timers of the 6532 “RIOT” PIA are of some help here, as we can set up timers to keep track of VSYNC, for example.)
The interesting part is that this isn’t as horrible as it sounds, at all. Since the TIA is really in control of things and also the main agency to effect anything outside the CPU, we’re talking to the TIA much like we would talk to an operating system. (A small difference is that the TIA doesn’t return anything. Actually, the same addresses are used to instruct the TIA by writing or just strobing them, and for returning any input values on read access. So it is a rather taciturn compagnion, but a trustworthy one.) The effect is really much like programming on bare metal and at the same time talking to an OS, which happens to be bare metal as well.