Web3 data exploration and analysis using SQL on Dune
I want to believe, you aren’t here by mistake. You want to know how to be an efficient data analyst in web3. Are you trying to build your Web3 analytics dashboard or just getting interested in web3 data? Either way, Welcome!
As a data person, the most challenging part of Web3 data is understanding the data itself. In my experience, one thing that makes it easy for me is to see each as a giant data warehouse and protocols on top of the Ethereum blockchain as various schemas containing multiple tables and views in the warehouse. PS: This applies to other blockchains too.
For this article, I’m going to try to make things succinct and highlight them using these topics:
- Web3 Data Flow and Landscape: Understanding Web3, decentralized data storage,
- Tools for decoding transaction data: an overview of tools along with the web3 data stack, and how to leverage them
Let’s get into it!
The crux of web3 is focused on becoming more independent and autonomous.
This is achieved through blockchains and distributed peer-to-peer networks instead of server-client relationships. This makes for a different data flow and format than we’re used to in Web2.
From research, I found out data mappers like Dune Analytics and Flipside Crypto with decoded blockchain data and built-in visualization tooling for data scientists interested in Web3 to work and build on.
The image above shows the different ways data can be extracted for analysis from different blockchain schemas, using Ethereum as an example in this case.
If you’ve ever made a transaction on Ethereum (or any smart contract blockchain), then you’ve probably looked it up on a block explorer and seen this kind of information:
Learning to read the details of a transaction will be the foundation for all Ethereum data analysis and knowledge, but it’s important to understand the flow of a token and transaction data.
I’m probably biased toward using the Ethereum blockchain to explain decoding blockchain data but from experience, Ethereum is easier to understand when you learn how data is stored and managed on smart contracts. That being said it’s important to note that many of the concepts I will be talking about will apply broadly to all EVM-compatible chains and smart contracts eg Polygon, Fantom Opera, BSC, Arbitrum One, etc.
A smart contract transaction is comparable to a backend API call in a smart contract powered web3 application. The details of the activities on a smart contract and its resulting application state changes are recorded in data elements known as transactions, calls, and logs. The transaction data element represents the function call initiated by a user (or EOA to be more precise), the call data elements represent additional function calls initiated within the transaction by the smart contract, and the log data elements represent events that have occurred during the transaction execution.
To understand the transaction data on the Ethereum blockchain, we can try using Etherscan(as a base for transaction examples, I used a Mirror Crowdfund contract). This is like checking the dashboard of your car for pointers to fix what’s wrong with your car but as a data scientist, we must pop the hood of the car and know what’s happening behind the scenes. Typically there are 3 distinct types of transactions:
- Transfer of value in the form of the base currency from one User (EOA) to another, eg John sends Jane 3 ETH on the Ethereum network
- Creation of a smart contract by an EOA, eg Jack commits code to an address on the blockchain, creating a smart contract that a smart contract to transfer ownership of an NFT once a certain amount of resources have been transferred to the seller’s account(or wallet ).
- Call to a smart contract by an EOA, eg Jane calls Jack’s smart contract to exchange 999ETH for the NFT.
When a user makes a request on a blockchain, what happens underneath the hood is:
- The EOA associated with the user initiates a transaction that specifies the target smart contract address, the target function, the arguments for that function, the transaction payment (if any), and the gas fee that it is willing to pay (if any).
- The transaction is broadcast to the network and picked up by a willing miner who executes the specified function in the target smart contract.
- If execution is successful, the smart contract emits events that mark the completion of certain milestones. The resulting event data structure is called `logs`.
- The target smart contract may initiate internal transactions (additional calls) to other smart contracts. These internal transactions create data structures called `traces`, and may also emit additional log events during their respective executions.
How the data is structured
Now that you’ve understood the activities that happen behind the scene and corresponding data elements. It’s time to itemize the various elements that make up the web3 landscape. The transaction and trace data structures contain details of the smart contract function call, in this order;
- On the blockchain, each transaction has a unique hash `transaction hash` with all sorts of transaction details. For example, we can look up this transaction hash
- `From` is the sending party of the transaction, and `To` is the receiving party of the transaction, which can also be a contract address. In this case, it’s a transfer from
Valueis the ETH value that was transferred. Sometimes the ETH value can be 0 and in this case, it is.
Transaction Fee = Gas Used by Transaction * Gas Priceand you can find more details related to Ethereum gas here.
- Input data contains additional information about the transaction. In this case, it’s a transfer function with details including the address and value of the transaction.
- The data itself is bytecode and has datatype
- The first 8 characters
MethodIdof the function signature, which is not always unique (but should).
- Every subsequent 64 characters is a different input variable. Here we have two, corresponding to the
Based on the information above, we can use the following query to pull data on Dune –
Tools for accessing and working with Web3 data
Now that we have a good understanding of elements of Web3 data, what it looks like, and have queried our first transaction data on the blockchain, you might wonder how to quickly get yourself familiar with a protocol and its data. There is an array of great tools to help with that.
As earlier mentioned, using a block explorer is a great way to get insights into a transaction on a blockchain. Block explorers are online browsers that can show the details of all transactions that have ever happened on a blockchain network. See the screenshot above for an example and it’s important to know all major blockchains have explorers — prominent examples include Etherscan, Polygonscan, BSCScan, Solana beach, etc
Even though block explorers are great for interrogating individual records within the blockchain ledger, they are not great for answering questions that require aggregation or transformation of the data. For example, if you wanted to know how many NFTs were sold through the Opensea exchange in the last 3 months, or if one would like to know the frequency of flow of transactions from `Coinbase` to `Axie Infinity`, it would be very difficult to answer that with just block explorers. For that one will need direct access to the data. This has been covered in my previous post.
Analyzing The Data
Dune analytics is a great resource for accessing and analyzing blockchain data. It has both raw and decoded data for Ethereum, Solana, Xdai, Polygon, Optimism, and BSC as of the writing of this post. It offers a PostgreSQL interface for querying the datasets and a simple point and click interface for creating simple dashboards on top of the query results. The community of users on Dune is also quite active and has generated an extensive library of example queries and dashboards to learn from. Here are a couple of examples analyses that have been created on Dune
If you’re completely new to SQL, I would recommend these useful tips and guidelines for Dune Analytics
- Inline Ethereum addresses: When working in Dune, Ethereum addresses are stored as PostgresSQL bytearrays which are encoded with
xprefix rather than the customary
0xwhen you look up on Etherscan.
- Identifying contract addresses in transactions: When trying to differentiate a contract address from a normal one, and you can do this by check if
CREATEopcode was called in
3. Leveraging Labels on Dune: A label is a piece of metadata about an address, a tag or metadata if you will. It comes in the form of a key-value pair. The key is the label type, and the value the label name. Essentially you can either use labels to look up addresses and see what they are (see the label table below). You can get more insights on what labels are and how to use them here
Using this UDF if you wanted to get labels for addresses you look at, you do this;
4. Quotation mark for table and column names: PostgresSQL doesn’t recognise camel case columns and table names without quotation marks. So in Postgres, double quotes are reserved for tables and columns, whereas single quotes are reserved for values as a result, Dune would throw an error if you query a table with upper case in the table name.
5. Removing Decimals: When working with token amounts, remember to check the decimals in either
6. Generating time series: When dealing with a dataset with discontinuous time series, use the following
Hopefully, this was a useful discussion and you already know enough to do basic Web3 data exploration and analysis using SQL on Dune, what it looks like, and how to work with it. When analyzing economic activity and user behavior in web3, it is important to develop an interest in the understanding of how the specific smart contract works, which goes on to help identify the key functions and events involved in the metric of interest.
Web3 is going through rapid development and high variance, where new ideas, products, communities, and experiments spring up every day. It currently touches aspects of life, including but not limited to payment, finance, art, music, gaming, community, governance, and identity, which makes it exciting for anyone to participate.
So please understand that a combination of actually using the Web3 product, examining the data exhaust on block explorers like Etherscan, and developing the smart contract source code, is a crucial requisite for the right strategy for data analysis.
I’m still learning, and feel free to reach out if you need help getting started.