I’m trying to tell you limited context is a feature not a bug, even other bots do the same thing like Replika. Even when all past data is stored serverside and available, it won’t matter because you need to reduce the weighting or you prevent significant change in output values (and less change as the history grows larger). Time decay of information is important to making these systems useful.
That’s kind of the point and how’s it different than a human. A human is going to weight local/recent contextual information as much more relevant to the conversation because they’re actively learning and storing the information (our brains work on more of an associative memory basis than temporal). However, with our current models it’s simulated by decaying weights over the data stream. So when you get conflicts between contextual correct vs “global” correct output, global has a tendency to win out that is more obvious. Remember you can’t actually make changes to the model as a user without active learning. Thus the model will always eventually return to it’s original behaviour as long as you can fill up the memory.