Data Insertion and Data Persistence in a Vector Database.
In the previous post in the Deep Dive series, we have introduced how data is processed in Milvus, the world’s most advanced vector database. This article will examine the components of data insertion, illustrate the data model in detail, and explain how data persistence is achieved in Milvus.
Milvus Architecture Recap
SDK sends data requests to the proxy, the portal, via a load balancer. Then the proxy interacts with the coordinator service to write DDL (data definition language) and DML (data manipulation language) requests into message storage.
Worker nodes, including query node, data node, and index node, consume the requests from message storage. More specifically, the query node is in charge of data query; The index node is responsible for data insertion and data persistence, and the index node mainly deals with index building and query acceleration.
The bottom layer is a storage object, which mainly leverages MinIO, S3, and AzureBlob for storing logs, delta binlogs, and index files.
The Portal of Data Insertion Requests
The proxy serves as a portal of data insertion requests.
- Initially, the proxy accepts data insertion requests from SDKs and allocates those requests into several buckets using a hash algorithm.
- Then the proxy requests data coord to assign segments, the smallest unit in Milvus, for data storage.
- Afterward, the proxy inserts information of the requested segments into the message store so that this information will not be lost.
Data Coord and Data Node
The main function of data coord is to manage channel and segment allocation, while the main function of data node is to consume and persist inserted data.
Data coord serves in the following aspects:
- Allocate segment space Data coord allocates space in growing segments to the proxy so that the proxy can use free space in segments to insert data.
- Record segment allocation and the expiry time of the allocated space in the segment. The space within each segment allocated by the data coord is not permanent. Therefore, the data coord also needs to keep a record of the expiry time of each segment allocation.
- Automatically flush segment data If the segment is full, the data coord automatically triggers data flush.
- Allocate channels to data nodes. A collection can have multiple vchannels. Data coord determines which vchannels are consumed by which data nodes.
Data node serves in the following aspects:
- Consume data Data node consumes data from the channels assigned by the data coord and creates a sequence for the data.
- Data persistence Cache inserted data in memory and auto-flush those inserted data to disk when data volume reach a certain threshold.
As shown in the image above, the collection has four vchannels (V1, V2, V3, and V4), and there are two data nodes. The data coord will likely assign one data node to consume data from V1 and V2, and the other data node from V3 and V4. One vchannel cannot be assigned to multiple data nodes, which prevents repetition of data consumption, which will otherwise cause the same batch of data to be inserted into the same segment repetitively.
Root Coord and Time Tick
Root coord manages TSO (timestamp Oracle) and publishes time tick messages globally. Each data insertion request has a timestamp assigned by root coord. Time Tick is the cornerstone of Milvus, which acts as a clock in Milvus and signs at which point of time the Milvus system is.
Each data insertion request carries a timestamp when data are written in Milvus. Each time, the data node consumes data whose timestamps are within a certain range during data consumption.
The image above is the process of data insertion. The value of the timestamps is represented by the number 1,2,6,5,7,8. The data are written into the system by two proxies: p1 and p2. For example, during data consumption, if the current time of the Time Tick is 5, data nodes can only read data 1 and 2. Then during the second read, if the current time of the Time Tick becomes 9, data 6,7, 8 can be read by the data node.
Data Organization: Collection, Partition, Shard (Channel), Segment
Read this article first to understand the data model in Milvus and the concepts of collection, shard, partition, and segment.
In summary, the most significant data unit in Milvus is a collection that can be likened to a table in a relational database. A collection can have multiple shards (each corresponding to a channel) and multiple partitions within each shard. The illustration above shows that channels (shards) are vertical bars while partitions are horizontal. At each intersection is the concept of the segment, the smallest unit for data allocation. In Milvus, indexes are built on segments. During a query, the Milvus system also balances query loads in different query nodes, and this process is conducted based on the unit of segments. Segments contain several binlogs, and when the segment data are consumed, a binlog file will be generated.
There are three segments with different statuses in Milvus: growing, sealed, and flushed segment.
A growing segment is a newly created segment that can be assigned to the proxy for data insertion. The internal space of a segment can be used, assigned, or free.
- Used: this part of the space of a growing segment has been consumed by the data node.
- Allocated: this part of the space of a growing segment has been requested by the proxy and assigned by the data coord. Allocated space will expire after a certain period.
- Free: this part of the space of a growing segment has not been used. The value of free space equals the overall space of the segment subtracted by the value of used and assigned space. So the free space of a segment increases as the allocated space expires.
A sealed segment is a closed segment that can no longer be assigned to the proxy for data insertion.
A growing segment is sealed in the following circumstances:
- If the used space in a growing segment reaches 75% of the total space, the segment will be sealed.
- Flush() is manually called by a Milvus user to persist all data in a collection.
- Growing segments that are not sealed after a long period will be sealed as too many growing segments cause data nodes to over-consume memory.
A flushed segment is a segment that has already been written into disk. Flush refers to storing segment data to object storage for data persistence. A segment can only be flushed when the allocated space in a sealed segment expires. When flushed, the sealed segment turns into a flushed segment.
A channel is assigned :
- When the data node starts or shuts down; or
- When segment space assigned is requested by the proxy.
Then there are several strategies for channel allocation. Milvus supports 2 of the strategy:
1. Consistent hashing
The default strategy in Milvus. This strategy leverages the hashing technique to assign each channel a position on the ring, then searches in a clockwise direction to find the nearest data node to a channel. Thus, in the illustration above, channel 1 is assigned to data node 2, while channel 2 is assigned to data node 3.
However, one problem with this strategy is that the increase or decrease in the number of data nodes (eg, A new data node starts or a data node suddenly shuts down) can affect the process of channel allocation. To solve this issue, the data coord monitors the status of data nodes via, etc., so that the data coord can be immediately notified if there is any change in the status of data nodes. Then data coord further determines which data node to allocate the channels properly.
2. Load balancing
The second strategy is to allocate channels of the same collection to different data nodes, ensuring the channels are evenly assigned. The purpose of this strategy is to achieve load balance.
Data Allocation: When and How
The process of data allocation starts with the client. It first sends data insertion requests with a timestamp
t1 to the proxy. Then the proxy sends a request to the data coord for segment allocation.
Upon receiving the segment allocation request, the data coord checks segment status and allocates the segment. If the current space of the created segments is sufficient for the newly inserted rows of data, the data coord allocates those created segments. However, if the space available in current segments is not sufficient, the data coord will allocate a new segment. The data coord can return one or more segments upon each request. In the meantime, the data coord also saves the assigned segment in the meta server for data persistence.
Subsequently, the data coord returns the information of the assigned segment (including segment ID, number of rows, expiry time, etc.) to the proxy. The proxy sends such information of the allocated segment to the message store to correctly record this information. Note that the value of
t1 must be smaller than that of
t2. The default value of
t2 is 2,000 milliseconds, and it can be changed by configuring the parameter
segment.assignmentExpiration in the
Binlog File Structure and Data Persistence
The data node subscribes to the message store because data insertion requests are kept in the message store, and the data nodes can thus consume insert messages. The data nodes first place insert requests in an insert buffer, and as the requests accumulate, they will be flushed to object storage after reaching a threshold.
Binlog File Structure
The binlog file structure in Milvus is similar to that in MySQL. Binlog is used to serve two functions: data recovery and index building.
A binlog contains many events. Each event has an event header and event data.
Metadata, including binlog creation time, write node ID, event length, NextPosition (offset of the next event), etc., are written in the event header.
Event data can be divided into two parts: fixed and variable.
The fixed part in the event data of an
The variable part stores inserted data. The insert data are sequenced into the format of parquet and stored in this file.
If there are multiple columns in the schema, Milvus will store binlogs in columns.
As illustrated in the image above, the first column is the primary key binlog. The second one is the timestamp column. The rest are the columns defined in the schema. The file path of binlogs in MinIO is also indicated in the image above.
With the official announcement of the general availability of Milvus 2.0, we orchestrated this Milvus Deep Dive blog series to provide an in-depth interpretation of the Milvus architecture and source code. Topics covered in this blog series include: