In case, you were building from scratch, I seriously recommend the caching scheme of the Grundy NewBrain’s BASIC, which I’m fascinated of, since I learned about it.
In brief, there’s the problem of two opposing goals: faithful listings of the original input on the one hand and performance on the other hand. The former requires repeated interpretation of the source text, e.g., for any literal constants, contributing much to the slowness of BASIC, while the latter may convert and tokenize all input early (as early as on input) for once, but will lose the original meaning as conveyed by the source text. (Is it “1000” or “+1E3”?)
What the Grundy BASIC does, is preserving the source text as-is, but, once a line is encountered, its interpreted version is cached (e.g., on a kind of heap) and a link to the cached version is attached to the source line for further use. Should the system run short on memory, parts of the cache can be purged easily, with no further penalty than the need to reinterpret and cache a line yet again. (A visit counter for the cached lines may help in deciding which cached lines to purge. Even a single 1-bit “seen again” flag may be of help.)
I think, this is a very elegant solution to this problem, especially, if there are no tight memory constraints.