Position: Home page » Computing » Redis decentralization

Redis decentralization

Publish: 2021-05-21 08:05:43
1. Difference between redis and memcached

problems encountered by traditional MySQL + memcached architecture
MySQL is suitable for mass data storage. Hot data is loaded into cache through memcached to speed up access. Many companies have used this architecture before, but with the continuous increase of business data volume and access volume, We have encountered many problems:
1. MySQL needs to be constantly disassembled, and memcached needs to be constantly expanded, which takes up a lot of development time
2. Data consistency between memcached and mysql
3. The hit rate of memcached data is low or the machine is down, and a large number of access directly penetrates into dB, which cannot be supported by mysql
4. Cross machine cache synchronization
many NoSQL procts are in full bloom, how to choose them
in recent years, many kinds of NoSQL procts have emerged in the instry, so how to correctly use these procts and maximize their advantages is a problem that we need to deeply study and think about. In the final analysis, the most important thing is to understand the positioning of these procts, In general, these nosqls are mainly used to solve the following problems
1. A small amount of data storage, high-speed read-write access. This kind of proct ensures high-speed access by means of all data in-motion, and provides the function of data landing. In fact, this is the main application scenario of redis
2. Massive data storage, distributed system support, data consistency guarantee, convenient cluster node addition / deletion
3. Dynamo and BigTable are the most representative papers in this field. The former is a completely decentralized design, in which cluster information is transmitted between nodes through gossip mode to ensure the final consistency of data. The latter is a centralized scheme design, which guarantees strong consistency through a distributed lock service. Data is written to memory and redo log first, and then is periodically compated to disk, optimizing random writing to sequential writing, Improve write performance
4. Schema free, auto sharding, etc. For example, some common document databases support schema free, directly store JSON format data, and support auto sharding and other functions, such as mongodb
in the face of these different types of NoSQL procts, we need to choose the most appropriate proct according to our business scenarios
redis is suitable for scenarios, and how to use it correctly
we have analyzed earlier that redis is most suitable for scenarios where all data are in momory. Although redis also provides persistence function, it is actually more of a disk backed function, which is quite different from the traditional persistence function. Then you may have doubts, it seems that redis is more like an enhanced version of memcached, So when to use memcached and redis

if you simply compare the difference between redis and memcached, most of you will get the following opinion:

1 redis not only supports simple K / V type data, but also provides list, set, Zset, hash and other data structure storage

2 redis supports data backup, that is, data backup in master slave mode

3 redis supports data persistence. It can keep the data in memory in the disk, and can be loaded again when it is restarted

apart from these, we can go deep into the internal structure of redis to observe more essential differences and understand the design of redis

in redis, not all data are always stored in memory. This is the biggest difference from memcached. Redis will only cache all the key information. If redis finds that the memory usage exceeds a certain threshold, it will trigger the swap operation. According to the "swap ability = age * log (size)_ in_ It calculates which values of the keys need to be swap to the disk. Then the values corresponding to these keys are persisted to the disk and cleared in the memory. This feature enables redis to keep more data than its own memory size. Of course, the memory of the machine itself must be able to hold all the keys. After all, the data will not be swap. At the same time, when redis swips the data in memory to the disk, the main thread providing the service and the sub thread performing the swap operation will share this part of the memory. Therefore, if the data requiring swap is updated, redis will block the operation until the sub thread completes the swap operation

comparison before and after using redis specific memory model:
VM off: 300K keys, 4096 bytes values: 1.3g used
VM on: 300K keys, 4096 bytes values: 73m used
VM off: 1 million keys, 256 bytes values: 430.12m used
VM on: 1 million keys, 256 bytes values: 160.09m used
VM on: 1 million keys, values as large as you want, Still: 160.09m used

when reading data from redis, if the value corresponding to the read key is not in memory, redis needs to load the corresponding data from the swap file, and then return it to the requester. There is an I / O thread pool problem. By default, redis will be blocked, that is, it will not be blocked until all the swap files are loaded. This strategy is suitable for batch operation when the number of clients is small. However, if redis is applied in a large website application, it is obviously unable to meet the situation of large concurrency. So when redis is running, we set the size of the I / O thread pool to perform concurrent operations on the read requests that need to load the corresponding data from the swap file to rece the blocking time

if you want to use redis in the environment of massive data, I believe it is indispensable to understand the memory design and blocking of redis<

supplementary knowledge points:
comparison of memcached and redis
1 network IO model
memcached is a multi-threaded, non blocking IO multiplexing network model, which is divided into listening main thread and worker sub thread. The listening thread listens for network connection, and after receiving the request, it passes the connection description word pipe to the worker thread to read and write io. The network layer uses the event library encapsulated by libevent, Multithreading model can play the role of multi-core, but it introces cache coherence and lock problems. For example, stats command is the most commonly used command in memcached. In fact, all operations of memcached have to lock and count the global variable, which brings performance loss<

(memcached network IO model)
redis uses a single thread IO reuse model and encapsulates a simple aeevent event processing framework, which mainly implements epoll, kqueue and select. For simple IO only operations, single thread can maximize the speed advantage, but redis also provides some simple computing functions, such as sorting, aggregation, etc, For these operations, the single thread model will seriously affect the overall throughput. In the CPU computing process, the entire IO scheling is blocked
2. Memory management
memcached uses pre allocated memory pool, uses slab and chunk with different sizes to manage memory, and selects appropriate chunk for item according to size. Memory pool can save the cost of applying / releasing memory and rece the generation of memory fragmentation, but it also brings a certain degree of space waste, In addition, when there is still a lot of memory space, new data may also be eliminated. For the reasons, please refer to timyang's article: http://timyang.net/data/Memcached-lru-evictions/
redis uses on-site memory application to store data, and rarely uses free list to optimize memory allocation, which leads to memory fragmentation to a certain extent, According to the storage command parameters, redis will store the data with expiration time separately and call them temporary data. Non temporary data will never be deleted. Even if the physical memory is not enough, swap will not delete any non temporary data (but will try to delete some temporary data). In this regard, redis is more suitable for storage than cache
3. Data consistency
memcached provides CAS command, which can ensure the consistency of the same data in multiple concurrent access operations. Redis doesn't provide CAS command, which can't guarantee this. However, redis provides transaction function, which can guarantee the atomicity of a series of commands and won't be interrupted by any operation<
4. Storage mode and other aspects
memcached only supports simple key value storage, does not support enumeration, persistence, replication and other functions
in addition to key / value, redis also supports many data structures such as list, set, sorted set and hash, and provides keys
for enumeration operation, but it cannot be used online, Redis provides a tool to scan its MP files directly and enumerate all the data. Redis also provides functions such as persistence and replication
5. Client support in different languages
in terms of clients in different languages, memcached and redis have a wealth of third-party clients to choose from. However, memcached has been developing for a longer time. At present, in terms of client support, many clients of memcached are more mature and stable, while redis is more complex than memcached because of its protocol, In addition, the author continues to add new functions, the corresponding third-party client follow-up speed may not catch up, sometimes you may need to make some modifications on the basis of the third-party client in order to better use
according to the above comparison, when we do not want data to be kicked out, or need more data types than key / value, or need landing function, redis is more suitable than memcached
about some peripheral functions of redis
in addition to being used as storage, redis also provides some other functions, such as aggregate computing, PubSub, scripting, etc. for this kind of function, we need to understand its implementation principle and clearly understand its limitations before we can use it properly, such as PubSub function, which actually has no persistence support, All messages from the consumer will be lost when the connection is broken or reconnected. For example, functions such as aggregate computing and scripting are limited by the redis single thread model, so it is impossible to achieve high throughput, so they need to be used with caution
generally speaking, redis author is a very diligent developer. We can often see that the author is trying a variety of new ideas and ideas. For these functions, we need to have a deep understanding before using them< Summary:
1. The best way to use redis is to use all data in memory
2. Redis is used as a substitute for memcached in more scenarios
3. When more data types than key / value are needed, redis is more suitable
4. When the stored data cannot be eliminated, redis is more suitable.
2. In the design of redis cluster, decentralization and middleware are considered. That is to say, each node in the cluster is equal and equal, and each node keeps its own data and the state of the whole cluster
each node is connected to all other nodes, and these connections remain active, which ensures that we only need to connect any node in the cluster to obtain the data of other nodes.
3. Redis is an in memory database, and its access speed is very fast, so it can solve some cache type problems, as follows:
1, session cache
2, full page cache (FPC)
3, queue
4, leaderboard / counter
5, publish / subscribe
4. The former is a completely decentralized design, in which the cluster information is transferred between nodes through gossip mode to ensure the final consistency of data
the latter is a centralized scheme design, which guarantees the strong consistency through a distributed lock service. The data is written to the memory and redo log first, and then is merged to the disk periodically, optimizing the random write to sequential write, Improve write performance.
5. Recently, as a high-performance K / V database, the performance of redis is incomparable if the data is not swap. Recently, I'm doing a cache of system attachments. I try to put the attachments in redis and write a method to save the files. public class TestRedis{ Jedis redis = new Jedis(" localhost");...
6. Redis is used to read and write data, and queue processor is used to write data to MySQL regularly. At the same time, it is necessary to avoid conflicts. When starting redis, read all table key values from MySQL and store them in redis. When writing data to redis, the redis primary key is automatically incremented and read. If the MySQL update fails, you need to clear the cache and synchronize the redis primary key in time. In this way, redis is mainly used to read and write redis in real time, while MySQL data is processed asynchronously through queues to relieve the pressure of MySQL. However, the application scenarios of this method are mainly based on high concurrency, and the highly available cluster architecture of redis is relatively more complex, which is generally not recommended.
7.

Salvatore Sanfilippo, the author of redis, has made a comparison between the two memory based data storage systems:

1. Redis supports server-side data operation: compared with memcached, redis has more data structures and supports richer data operations. Generally in memcached, you need to take the data to the client for similar modification and then set it back. This greatly increases the number of network IO and data volume. In redis, these complex operations are usually as efficient as general get / set. Therefore, if you need cache to support more complex structures and operations, redis will be a good choice

2. Comparison of memory utilization efficiency: memcached has higher memory utilization with simple key value storage, while redis uses hash structure for key value storage, because of its combined compression, its memory utilization will be higher than memcached

3. Performance comparison: since redis only uses a single core, while memcached can use multiple cores, on average, redis performs better than memcached in storing small data on each core. For data over 100k, the performance of memcached is higher than that of redis. Although redis has recently optimized the performance of storing big data, it is still slightly inferior to that of memcached


the specific reasons for the above conclusions are as follows:

1. Different data types are supported

different from memcached data records which only support simple key value structure, redis supports much more data types. The most commonly used data types are string, hash, list, set and sorted set. Redis internally uses a redisobject object to represent all the keys and values. The main information of redisobject is shown in the figure:

type represents the specific data type of a value object. Encoding is the internal storage method of different data types in redis. For example, type = string represents that value stores an ordinary string, so the corresponding encoding can be raw or int, If it is int, it means that the actual redis stores and represents the string according to the numerical class. Of course, the premise is that the string itself can be expressed by numerical value, such as "123" and "456". Only when the virtual memory function of redis is turned on, the VM field will actually allocate memory. This function is turned off by default

1) string

< UL >
  • common commands: set / get / decr / incr / mget, etc

  • application scenario: string is the most commonly used data type, and ordinary key / value storage can be classified as this type

  • implementation: string is stored in redis. By default, it is a string, which is referenced by redisobject. When encountering operations such as incr and decr, it will be converted to numerical type for calculation. At this time, the encoding field of redisobject is int

  • 2) hash

  • common commands: hget / hset / hgetall

  • application scenario: we want to store a user information object data, including user ID, user name, age and birthday. Through user ID, we want to get the user's name or age or birthday

  • implementation: redis's hash is actually an internal stored value, which is a HashMap and provides an interface for directly accessing the map member. As shown in the figure, key is the user ID and value is a map. The key of this map is the attribute name of the member, and value is the attribute value. In this way, the data can be modified and accessed directly through the key of the internal map (the key of the internal map is called field in redis), that is, the corresponding attribute data can be operated through the key (user ID) + field (attribute tag). At present, there are two ways to implement HashMap: when the number of members of a HashMap is small, redis will use a way similar to one-dimensional array to save memory, instead of using the real HashMap structure. At this time, the encoding of the redisobject of the corresponding value is zipmap. When the number of members increases, redis will automatically convert to a real HashMap, and the encoding is HT

  • 3) list

  • common commands: lpush / rpush / lpop / rpop / lrange, etc

  • application scenarios: there are many application scenarios of redis list, which is also one of the most important data structures of redis. For example, Twitter's follow list and fan list can be implemented with redis's list structure

  • implementation: redis list is a bidirectional linked list, which can support reverse lookup and traversal, which is more convenient for operation, but it brings some additional memory overhead. Many internal implementations of redis, including send buffer queue, also use this data structure

  • 4) set

  • common commands: sad / Pop / smembers / Sunion, etc

  • application scenario: the external function of redis set is similar to that of list, which is a list function. The special point is that set can automatically arrange plicate data. When you need to store a list data and do not want plicate data, set is a good choice, and set provides an important interface to judge whether a member is in a set set, This is what list can't provide

  • implementation: the internal implementation of set is a HashMap whose value is always null. In fact, it can quickly arrange the plicate by calculating hash, which is also the reason why set can judge whether a member is in the set

    Common commands: zadd / zrange / zrem / zcard, etc

  • application scenario: the usage scenario of redis sorted set is similar to that of set, the difference is that set is not automatically ordered, and sorted set can sort members by providing an additional score parameter, and it is inserted orderly, that is, automatic sorting. When you need an ordered and non repetitive list of collections, you can choose the sorted set data structure. For example, Twitter's public timeline can be stored with the publication time as the score, so that when you get it, it will be automatically sorted according to the time

  • implementation: redis sorted set uses HashMap and skipplist to ensure the storage and order of data. The mapping from member to score is put in HashMap, and all members are stored in skipping table. The sorting is based on the score in HashMap. The structure of skipping table can achieve high efficiency, And the implementation is relatively simple

  • 2. Different memory management mechanisms

    in redis, not all data are always stored in memory. This is the biggest difference from memcached. When the physical memory runs out, redis can exchange some unused values to disk. Redis will only cache all the key information. If redis finds that the memory usage exceeds a certain threshold, it will trigger the swap operation. According to the "swap ability = age * log (size)_ in_ It calculates which values of the keys need to be swap to the disk. Then the values corresponding to these keys are persisted to the disk and cleared in the memory. This feature enables redis to keep more data than its own memory size. Of course, the memory of the machine itself must be able to hold all the keys. After all, the data will not be swap. At the same time, when redis swips the data in memory to the disk, the main thread providing the service and the sub thread performing the swap operation will share this part of the memory. Therefore, if the data requiring swap is updated, redis will block the operation until the sub thread completes the swap operation. When reading data from redis, if the value corresponding to the read key is not in memory, redis needs to load the corresponding data from the swap file, and then return it to the requester. There is an I / O thread pool problem. By default, redis will be blocked, that is, it will not be blocked until all the swap files are loaded. This strategy is suitable for batch operation when the number of clients is small. However, if redis is applied in a large website application, it is obviously unable to meet the situation of large concurrency. So when redis is running, we set the size of the I / O thread pool to perform concurrent operations on the read requests that need to load the corresponding data from the swap file to rece the blocking time

    for memory based database systems such as redis and memcached, the efficiency of memory management is the key factor affecting system performance. Malloc / free function in traditional C language is the most commonly used method to allocate and release memory, but this method has many defects: firstly, for developers, mismatched malloc and free are easy to cause memory leakage; Secondly, frequent calls will cause a large number of memory fragments can not be recycled, recing memory utilization; Finally, as a system call, its system overhead is far greater than the general function call. Therefore, in order to improve the efficiency of memory management, efficient memory management solutions will not directly use malloc / free call. Redis and memcached both use their own designed memory management mechanism, but the implementation methods are quite different. The following will introce the memory management mechanisms of redis and memcached

    by default, memcached uses the slab allocation mechanism to manage memory. Its main idea is to divide the allocated memory into blocks of a specific length according to the predetermined size to store the corresponding length of key value data records, so as to completely solve the problem of memory fragmentation. The slab allocation mechanism is only designed to store external data, that is to say, all key value data are stored in the slab allocation system, while other memory requests of memcached are applied through ordinary malloc / free, because the number and frequency of these requests determine that they will not affect the performance of the whole system. The principle of slab allocation is quite simple. As shown in the figure, it first applies for a large block of memory from the operating system, divides it into chunks of various sizes, and divides the chunks of the same size into the group slab class. Among them, chunk is the smallest unit used to store key value data. The size of each slab class can be controlled by setting the growth factor when memcached is started. Suppose that the value of growth factor in the figure is 1.25. If the size of the first group of chunks is 88 bytes, the size of the second group of chunks is 112 bytes, and so on

    when memcached receives the data sent by the client, it will first select the most appropriate slab class according to the size of the received data, and then query the list of free chunks in the slab class saved by memcached to find a chunk that can be used to store data. When a database is expired or discarded, the chunk occupied by the record can be recycled and added to the free list again. From the above process, we can see that memcached's memory management system is efficient and will not cause memory fragmentation, but its biggest disadvantage is that it will lead to a waste of space. Because each chunk allocates a specific length of memory space, variable length data cannot make full use of this space. As shown in the figure, cache 100 bytes of data into 128 bytes of chunk, and the remaining 28 bytes are wasted

    The memory management of

    redis is mainly realized by zmalloc. H and zmalloc. C in the source code. Redis allocates a block of memory for the convenience of memory management

    8. Dynamo and BigTable are the most representative papers in this field. The former is a completely decentralized design, in which the cluster information is transferred between nodes through gossip mode to ensure the final consistency of data
    the latter is a centralized scheme design, which guarantees the strong consistency through a distributed lock service. The data is written to the memory and redo log first, and then is merged to the disk periodically, optimizing the random write to sequential write, Improve write performance.
    9. Unknown_Error
    10. Based on the above, redis cluster solution is particularly important. There are usually three ways: Official redis cluster; Fragmentation by proxy; Smart client. The above three schemes have their own advantages and disadvantages<

    redis cluster (official): Although the official version has been released for more than a year, there is still a lack of best practice; The protocol has been greatly modified, so that not all the mainstream clients have been supported, and some of the supported clients have not been verified in the mass proction environment; Decentralized design makes the whole system highly coupled, which makes it difficult to upgrade the business painlessly

    proxy: now many mainstream redis clusters use proxy, such as CODIS, which has been open source for a long time. This scheme has many advantages, because it supports the original redis protocol, so the client does not need to upgrade, and it is more business friendly. And the upgrade is relatively smooth, you can upgrade one by one after multiple proxies. But the disadvantage is that there will be an average performance overhead of about 30% because there will be one more jump. Moreover, because the native client can't bind multiple proxies at one time, if the connected proxy hangs up, it still needs to be manually participated. Unless it encapsulates the original client like smart client and supports reconnection to other proxies, it also brings some disadvantages of client fragmentation. And although multiple proxies can be used, and the dynamic increase of proxy can increase performance, all clients share all proxies, so some abnormal services may affect other services. Setting up a proxy for each service independently will also bring extra work to deployment.
    Hot content
    Inn digger Publish: 2021-05-29 20:04:36 Views: 341
    Purchase of virtual currency in trust contract dispute Publish: 2021-05-29 20:04:33 Views: 942
    Blockchain trust machine Publish: 2021-05-29 20:04:26 Views: 720
    Brief introduction of ant mine Publish: 2021-05-29 20:04:25 Views: 848
    Will digital currency open in November Publish: 2021-05-29 19:56:16 Views: 861
    Global digital currency asset exchange Publish: 2021-05-29 19:54:29 Views: 603
    Mining chip machine S11 Publish: 2021-05-29 19:54:26 Views: 945
    Ethereum algorithm Sha3 Publish: 2021-05-29 19:52:40 Views: 643
    Talking about blockchain is not reliable Publish: 2021-05-29 19:52:26 Views: 754
    Mining machine node query Publish: 2021-05-29 19:36:37 Views: 750