Scan to download
BTC $74,670.10 -0.34%
ETH $2,327.23 -1.33%
BNB $629.05 +0.76%
XRP $1.43 +1.69%
SOL $87.89 +3.01%
TRX $0.3255 +0.06%
DOGE $0.0975 +0.00%
ADA $0.2546 +1.83%
BCH $450.24 +1.70%
LINK $9.42 +1.33%
HYPE $43.43 -3.43%
AAVE $113.59 +6.13%
SUI $0.9889 +1.03%
XLM $0.1661 +3.91%
ZEC $334.37 -1.46%
BTC $74,670.10 -0.34%
ETH $2,327.23 -1.33%
BNB $629.05 +0.76%
XRP $1.43 +1.69%
SOL $87.89 +3.01%
TRX $0.3255 +0.06%
DOGE $0.0975 +0.00%
ADA $0.2546 +1.83%
BCH $450.24 +1.70%
LINK $9.42 +1.33%
HYPE $43.43 -3.43%
AAVE $113.59 +6.13%
SUI $0.9889 +1.03%
XLM $0.1661 +3.91%
ZEC $334.37 -1.46%

OpenAI launched the "game-changing" product GPT-4. Can it fully detect vulnerabilities in smart contracts?

Summary: Is AI security auditing reliable?
Beosin
2023-03-16 11:53:13
Collection
Is AI security auditing reliable?

Author: Beosin

In the early morning of March 15, Beijing time, the artificial intelligence startup OpenAI officially announced the latest generation of its AI language model, GPT-4.

OpenAI stated in its announcement that the birth of GPT-4 is the latest milestone in amplifying deep learning.

So what surprises will the evolved GPT-4 bring us?

How "explosive" is the evolved GPT-4?

According to the official introduction from OpenAI, GPT-4 is a super large multimodal model, which means its input can be text (up to 25,000 words) as well as images.

Where does its terrifying AI capability manifest? For example, in the photo below.

You ask it what will happen if the glove in the picture falls down?

It will answer: It will fall onto the wooden board, and the ball will be bounced away. (Can you imagine this logical ability?)

It can even recognize a website just by drawing a simple draft on paper.

Upload a photo to GPT-4, and it can immediately generate the HTML code for the website!

It is evident that GPT-4 is more reliable and creative than GPT-3.5, capable of handling more subtle instructions.

In addition, ChatGPT-4 has also improved significantly in content accuracy and logical ability compared to its predecessor. In the Uniform Bar Exam, GPT-4 scored better than 90% of others, while GPT-3.5 only surpassed 10%. GPT-4 scored 700 in SAT Math, while GPT-3.5 scored 590, an improvement of 110 points. In other standardized tests, GPT-4's performance is also much better than that of GPT-3.5.

In the official demonstration, GPT-4 took almost only 1-2 seconds to recognize the hand-drawn website image and generated the webpage code in real-time, creating a website almost identical to the hand-drawn version.

Besides ordinary images, GPT-4 can also handle more complex image information, including tables, screenshots of exam questions, screenshots of papers, comics, etc., such as providing summaries and key points directly from professional papers.

So powerful, don’t you feel like you might be losing your job soon? .

GPT-4 can interpret papers Source: OpenAI official website

What happens when using ChatGPT-4 to audit smart contracts?

Last December, we published a research article on ChatGPT to see what would happen when it audited smart contracts. Further reading: The "strongest AI" ChatGPT that is sweeping the internet, can it detect smart contract vulnerabilities?

On March 15, Coinbase executive Conor Grogan posted on social media that he had inserted a real-time Ethereum smart contract into ChatGPT-4, and the AI instantly found security vulnerabilities, even demonstrating how to exploit these vulnerabilities.

Conor Grogan stated that the contract had indeed been exploited by hackers in 2018, and he also revealed that he tried Euler's smart contract, but it was too long for ChatGPT-4 to process. Conor Grogan admitted that AI will ultimately make smart contracts safer and easier to build.

Some group members also mentioned that ChatGPT seems capable of auditing the vulnerabilities in the Euler Finance approximately $200 million theft case that occurred two days ago. Related event reading: Reviewing the ins and outs of the $200 million theft case of Euler Finance, what insights does this event bring us?

But is it really that simple?

Image source: Internet

In fact, like the earlier GPT models, GPT-4 still has certain limitations.

OpenAI officially states that it is not completely reliable and may make reasoning errors. "GPT-4 lacks understanding of events that occurred after the cutoff of the vast majority of data (September 2021) and cannot learn from them… It sometimes makes simple reasoning errors, it can be overly trusting of users' obviously false statements, and sometimes it fails on difficult problems like a human, for example, introducing security vulnerabilities in the code it generates."

Based on this, OpenAI reminds users to be particularly cautious when using language models, it is best to supplement with human review, additional context, or completely avoid using it in high-risk situations.

ChatGPT VS Beosin VaaS, who audits contracts better?

Beosin's formal verification experts stated: "ChatGPT can learn complex patterns of contracts, understand and classify contracts from different dimensions, which can help static detection techniques enhance expert models, increase the types of vulnerabilities that can be identified, and reduce false negative and false positive rates. It can assist in effectively linking attribute-based testing verification techniques with domain attribute libraries, achieving fully automated testing verification through automatic contract recognition and attribute insertion. However, ChatGPT struggles to identify rapidly evolving specific domain deep logical vulnerabilities, which are often closely tied to project requirements and require domain security experts as judges to continuously summarize and form domain attribute libraries to adjudicate the security of contracts."

We also found that ChatGPT cannot solve all problems, as many vulnerabilities still require rigorous audits by experts or the use of formal verification tools like Beosin VaaS to detect issues.

Beosin VaaS is a globally leading "one-click" formal verification platform for smart contracts. It boasts an accuracy rate of over 97%, accurately locating risky code positions and providing modification suggestions, automatically detecting over 80 common security vulnerabilities and functional logic defects in smart contracts. Beosin VaaS can automatically discover common vulnerabilities, business logic errors, and other security issues in smart contracts, providing expert remediation advice. It also supports the detection of hundreds of common security vulnerabilities and business logic defects in smart contracts across all public chains of EVM and WASM, accurately locating risky code positions to help developers enhance the security capabilities of their smart contracts.

Formal verification tool Beosin VaaS: https://vaas.beosin.com/

For example, in the attack incident we warned about on March 15 regarding the Locked Deal contract of Poolz Finance, the attacker invoked the vulnerable function CreateMassPools in the LockedDeal contract and triggered an integer overflow vulnerability in the parameter _StartAmount. We tested this vulnerability and found that it could be detected by the VaaS tool, but ChatGPT could not.

At the same time, ChatGPT also fails to detect deep logical issues related to k-value verification.

Since the actual exchange transfer operations of DEXs like Uniswap are implemented in the Pair's swap() function, to prevent attackers from bypassing the Router contract and directly calling the Pair contract for swap() transfers, k-value verification needs to be performed in the Pair contract's swap() function, meaning that the k-value in the pair must remain conserved after the swap. If there is a security vulnerability in the code related to k-value verification, attackers can exchange a large portion of the tokens in the Pair with a minimal amount of tokens.

The contract's cheapSwap function does not check k-value

Through research on the k-value verification issue, we summarized the characteristics of this problem and extracted its general attributes for use by the VaaS tool. After that, we analyzed node information and extracted contract information from a total of 140,000 addresses on ETH and BSC. All these address contracts are similar business contracts that may have k-value verification issues.

In addition to using the formal verification tool VaaS, Beosin's formal verification experts also abstract the security issues distilled by security audit experts into reusable security attribute invariants using strict mathematical logic, and hand them over to a hybrid machine engine for automated detection, testing, and verification. Practice has proven that these reusable security attribute invariants can effectively discover new subtle vulnerabilities in smart contracts. These are parts that AI like ChatGPT cannot replace.

However, in an article titled "The Illusion of Success of ChatGPT" published on March 8 on the New York Times website, the author wrote: "Today, the so-called revolutionary progress we have made in the field of artificial intelligence indeed makes us both optimistic and worried. The optimism comes from the fact that intelligence is our means of solving problems; the worry arises from our fear that the most popular and fashionable AI (machine learning) will, like a viral strain, embed fundamentally flawed language and knowledge concepts into our technology, thereby lowering our scientific standards and diminishing our moral norms."

warnning Risk warning
app_icon
ChainCatcher Building the Web3 world with innovations.