GPT-4 became more stupid, and it was revealed that the cache history replied: a joke was told 800 times, and I didn't listen to the new one

巴比特_

2023-11-02 06:40:51

Original source: qubits

Image source: Generated by Unbounded AI

Some netizens found another proof that GPT-4 has become “stupid”.

He questioned:

OpenAI will cache historical responses, allowing GPT-4 to directly retell previously generated answers.

The most obvious example of this is telling jokes.

Evidence shows that even when he turned up the temperature value of the model, GPT-4 repeated the same “scientists and atoms” response.

It’s the “Why don’t scientists trust atoms?” Because everything is made up" by them".

Here, it stands to reason that the higher the temperature value, the easier it is for the model to generate some unexpected words, and the same joke should not be repeated.

Not only that, but even if we don’t move the parameters, change the wording, and emphasize having it tell a new, different joke, it won’t help.

According to the finder:

This shows that GPT-4 not only uses caching, but also clustered queries rather than matching a question exactly.

The benefits of this are self-evident, and the response speed can be faster.

However, since I bought a membership at a high price, I only enjoy such a cache retrieval service, and no one is happy.

Some people feel after reading it:

If that’s the case, isn’t it unfair that we keep using GPT-4 to evaluate the answers of other large models?

Of course, there are also people who don’t think that this is the result of an external cache, and perhaps the repetitiveness of the answers in the model itself is so high**:

Previous studies have shown that ChatGPT repeats the same 25 jokes 90% of the time.

How do you say that?

Evidence Real Hammer GPT-4 with Cache Reply

Not only did he ignore the temperature value, but this netizen also found:

It’s useless to change the top_p value of the model, GPT-4 does just that.

(top_p: It is used to control the authenticity of the results returned by the model, and the value is lowered if you want more accurate and fact-based answers, and the answers that are more diverse are turned up)

The only way to crack it is to pull up the randomness parameter n so that we can get the “non-cached” answer and get a new joke.

However, it comes at the “cost” of slower responses, as there is a delay in generating new content.

It is worth mentioning that others seem to have found a similar phenomenon on the local model.

It has been suggested that the “prefix-match hit” in the screenshot seems to prove that the cache is indeed used.

So the question is, how exactly does the big model cache our chat information?

Good question, from the second example shown at the beginning, it is clear that there is some kind of “clustering” operation, but we don’t know how to apply it to deep multi-round conversations.

Regardless of this question, some people saw this and remembered ChatGPT’s statement that “your data is stored with us, but once the chat ends, the conversation content will be deleted”, and suddenly realized.

This can’t help but make some people start to worry about data security:

Does this mean that the chats we initiate are still saved in their database?

Of course, some people may be overthinking this concern:

Maybe it’s just that our query embedding and answer caches are stored.

So, as the discoverer himself said:

I’m not too worried about the caching itself.
I’m worried that OpenAI is so simple and rude to summarize our questions to answer, regardless of settings such as temperature, and directly aggregate prompts with obviously different meanings, which will have a bad impact and may “scrap” many (GPT-4-based) applications.

Of course, not everyone agrees that the above findings prove that OpenAI is really using cached replies.

Their reasoning is that the case adopted by the author happens to be a joke.

After all, in June of this year, two German scholars tested and found that 90% of the 1,008 results of ChatGPT telling a random joke were variations of the same 25 jokes.

“Scientists and atoms” appears most frequently in particular, with 119 times.

So you can understand why it looks as if the previous answer is cached.

Therefore, some netizens also proposed to use other types of questions to test and then see.

However, the authors insist that it doesn’t have to be a problem, and that it’s easy to tell if it’s cached just by measuring the latency.

Finally, let’s look at this question from a “different perspective”:

What’s wrong with GPT-4 telling a joke all the time?

Haven’t we always emphasized the need for large models to output consistent and reliable answers? No, how obedient it is (manual dog head).

So, does GPT-4 have caches or not, and have you observed anything similar?

Reference Links:

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.