Is a check a digital currency
The purpose of bitcoin:
1. You can buy some virtual goods, just like game currency, you can buy game equipment, clothes, weapons, etc
investment and financing. In the trading platform, through trading to earn a certain price difference, so as to achieve the role of financial management
is used to purchase goods or services
as a means of marketing. Once Dell supported bitcoin trading
bitcoin has exclusive ownership: private key is needed to control bitcoin, which can be stored in any storage medium in isolation. No one can get it except the user himself
bitcoin has no hidden cost, no cumbersome quota and proceres. If you know the other party's bitcoin address, you can pay
Advantages of centralized data processing:
1, simple deployment structure
The data is easy to back up, just need to back up the data on the central computer As long as the central computer is well protected, the terminal generally does not need external equipment, so the probability of virus infection is very low The function of central computer is very powerful, and the terminal only needs simple and cheap equipmentdisadvantages:
1. The central computer needs to perform all the operations. When there are many terminals, the response speed will slow down
2. If the end users have different needs, it is difficult and inefficient to configure the programs and resources of each user separately in the centralized system
advantages of distributed data processing:
1. Each machine in the distributed network can store and process data, which reces the requirements of machine performance, so it is not necessary to buy expensive high-performance machines, which greatly reces the cost of hardware investment
2, excellent expansibility. When the current system storage or computing capacity is insufficient, we can simply increase the processing and storage capacity of the system by adding cheap PC
The treatment ability was very strong. The huge computing tasks can be processed by the machines in the distributed network in parallel after reasonable partition. The disadvantages are as follows: when the computing program is running at full load, it will still cause certain pressure on all parts of the computer2. For the project side, the volunteers participating in distributed computing are not the project side's own personnel, not all of them can be trusted, so we must introce a certain rendant computer system to prevent calculation errors and malicious cheating
extended data
distributed computing means that information is not only distributed on one software or computer, but also distributed on multiple software. Multiple or one computer can run several software at the same time to share information through the network. Compared with other algorithms, distributed algorithm has obvious advantages:
1, sharing resources is more convenient
2. It can realize the balance of computing load and use multiple computers to process tasks at the same time
According to the actual needs, the appropriate computer can be selected to run the program. The soul of computer distributed computing is to balance load and share resources. Distributed computing has the advantages of high efficiency, quickness and accuracyand distributed is a very specific concept
without distributed computing, the cloud is impossible. But distributed computing is not always cloud. "
distributed is to decompose tasks through application design
cloud computing is through grid like things, and the system automatically combines resources
what is distributed computing? The so-called distributed computing is a computer science. It studies how to divide a problem that needs a huge amount of computing power into many small parts, then assign these parts to many computers for processing, and finally synthesize these calculation results to get the final result. Recent distributed computing projects have been used to use the idle computing power of thousands of volunteers' computers around the world. Through the Internet, you can analyze the electrical signals from outer space, find hidden black holes, and explore the possible existence of alien intelligent life; You can search for Mason prime numbers with more than 10 million digits; You can also find and discover more effective drugs against HIV. These projects are huge and require a huge amount of calculation. It is impossible for a single computer or indivial to complete the calculation in an acceptable time
distributed computing is a kind of computing science that uses the idle processing capacity of the CPU of the computer on the Internet to solve large-scale computing problems. Next, let's see how it works:
first, we need to find a problem that requires a lot of computing power to solve. This kind of problem is generally an interdisciplinary, challenging and urgent scientific research topic. Among them, the most famous one is:
1. Solving more complex mathematical problems, such as GIMPS (finding the largest Mersenne prime)
2. Research and find the most secure cryptosystem, such as rc-72 (password cracking)
3. Biopathological studies, such as: Folding@home (study protein folding, misunderstanding, polymerization and related diseases)
4. Drug research on various diseases, such as United devices
5. Signal processing, for example: SETI@Home Looking for extraterrestrial civilization at home
from these practical examples, we can see that these projects are very large and require amazing amount of calculation. It is absolutely impossible for a single computer or indivial to complete the calculation in an acceptable time. In the past, these problems should be solved by supercomputers. However, the cost and maintenance of supercomputers are very expensive, which is not affordable by an ordinary scientific research organization. With the development of science, a low-cost, efficient and easy to maintain computing method emerges as the times require - distributed computing
with the popularity of computers, personal computers have entered thousands of households. Accompanied by it is the use of computers. More and more computers are idle, even in the boot state, the potential of CPU is far from being fully utilized. We can imagine that a home computer will spend most of its time "waiting". Even when users actually use their computer, the processor is still silent consumption, still waiting for countless (waiting for input, but actually doing nothing). The emergence of the Internet makes it a reality to connect and call all these computer systems with limited computing resources
then, some problems that are very complex but suitable for dividing into a large number of smaller computing fragments are proposed, and then a research institute develops the computing server and client through a lot of hard work. The server is responsible for dividing the computing problem into many small computing parts, and then assigning these parts to many networked computers for parallel processing. Finally, these computing results are integrated to get the final result
of course, it seems very primitive and difficult, but with the increasing number of participants and computers involved in the calculation, the calculation plan has become very fast and proved to be feasible. At present, the processing capacity of some large distributed computing projects can reach or even surpass the world's fastest supercomputers
you can also choose to participate in some projects to donate CPU core processing time. You will find that the CPU core processing time you provide will appear in the contribution statistics of projects. You can compete with other participants for the ranking of contribution time. You can also join an existing computing group or set up a computing group yourself. This method is good for mobilizing the enthusiasm of the participants
with the increasing number of private teams, many large organizations (such as companies, schools and various websites) have also started to form their own teams. At the same time, a large number of communities with the theme of distributed computing technology and project discussion have formed. Most of these communities translate and proce the use of distributed computing projects, publish relevant technical articles, and provide the necessary technical support
who is likely to join these projects? Anyone, of course! If you have joined a project and considered joining a computing group, you will find your home in China distributed computing center and forum. Anyone can join any distributed computing team formed by our website. I hope you have fun in China distributed terminal and forum
to participate in distributed computing -- the most meaningful choice to give full play to the value of your personal computer -- you just need to download the relevant program, and then the program will run on the computer with the lowest priority, which has little impact on the normal use of the computer. If you want to do something useful in your spare time, what are you hesitating about? Take action now, your little effort may make you leave a great mark in the history of human science
Professional definition (the definition of distributed computing by China Institute of science and technology information)
distributed computing is a new computing method proposed in recent years. The so-called distributed computing means that two or more softwares share information with each other. These softwares can run on the same computer or on multiple computers connected through the network. Compared with other algorithms, distributed computing has the following advantages:
1. Rare resources can be shared,
2. Computing load can be balanced on multiple computers through distributed computing,
3. Programs can be put on the most suitable computer to run,
among them, sharing rare resources and balancing load is one of the core ideas of computer distributed computing< In fact, grid computing is a kind of distributed computing. If we say that a certain work is distributed, then it is not only a computer but also a computer network that participates in this work. Obviously, this "ant moving mountain" method will have strong data processing ability. The essence of grid computing is to combine and share resources and ensure system security<
cloud computing in narrow sense refers to the delivery and use mode of IT infrastructure, and refers to obtaining the required resources (hardware, platform and software) through the network in an on-demand and easy to expand way. The network that provides resources is called "cloud"“ In the view of users, the resources in "cloud" can be expanded infinitely, and can be obtained at any time, used on demand, expanded at any time, and paid by use. This feature is often referred to as using it infrastructure like hydropower<
2. Generalized cloud computing
generalized cloud computing refers to the delivery and use mode of services, which means to obtain the required services on demand and easy to expand through the network. This kind of service can be it and software, Internet related, or any other service
explanation:
this kind of resource pool is called "cloud"“ "Cloud" is a kind of virtual computing resources that can be self maintained and managed, usually for some large server clusters, including computing servers, storage servers, broadband resources and so on. Cloud computing centralizes all the computing resources, which are automatically managed by software without human participation. This enables application providers to focus more on their own business without worrying about tedious details, which is concive to innovation and cost rection
someone made an analogy: This is like changing from the old single generator mode to the centralized power supply mode of power plant. It means that computing power can also be circulated as a commodity, just like gas, water and electricity, which is easy to use and low-cost. The biggest difference is that it is transmitted through the Internet
cloud computing is the development of parallel computing, distributed computing and grid computing, or the commercial realization of these computer science concepts. Cloud computing is the result of the hybrid evolution of virtualization, utility computing, IAAs, PAAS and SaaS
in general, cloud computing can be regarded as a commercial evolution of grid computing. As early as 2002, Liu Peng of our country proposed the concept of computing pool to solve the problem that traditional grid computing ideas are not practical: "connect the high-performance computers scattered in various places with high-speed network, organically glue them together with specially designed middleware software, and accept the computing requests put forward by scientific workers in various places on the web interface, And assign it to the appropriate node to run. Computing pool can greatly improve the quality of service and utilization of resources, at the same time avoid the inefficiency and complexity caused by cross node partition of applications, and can meet the practical requirements under the current conditions. " If we change "high performance computer" into "server cluster" and "scientific workers" into "commercial users", it will be very close to the current cloud computing
cloud computing has the following characteristics:
(1) super large scale“ "Cloud" has a considerable scale. Google cloud computing has more than 1 million servers, and Amazon, IBM, Microsoft, Yahoo and other "cloud" have hundreds of thousands of servers. Enterprise private cloud generally has hundreds of thousands of servers“ "Cloud" can give users unprecedented computing power
(2) virtualization. Cloud Computing supports users to obtain application services at any location and using various terminals. The requested resources come from the "cloud" rather than a fixed tangible entity. The application runs somewhere in the "cloud", but actually users don't need to know and worry about the specific location of the application. Just need a laptop or a mobile phone, we can achieve everything we need through network services, even including tasks such as supercomputing
(3) high reliability“ "Cloud" uses fault tolerance of multiple copies of data, isomorphism and interchangeability of computing nodes to ensure the high reliability of services. Using cloud computing is more reliable than using local computers
(4) generality. Cloud computing is not specific to specific applications. With the support of "cloud", it can construct a variety of applications,
these two concepts are often heard. You can simply understand them as follows: distributed improves efficiency by shortening the execution time of a single task, while cluster improves efficiency by increasing the number of tasks executed per unit time
Image Description:
If a task is composed of 10 subtasks, and each subtask takes 1 hour to execute separately, it takes 10 hours to execute the task on a server
the distributed solution is adopted to provide 10 servers. Each server is only responsible for one subtask, regardless of the dependency between subtasks. It takes only one hour to complete the task
the cluster solution also provides 10 servers, each of which can handle this task independently. Suppose that 10 tasks arrive at the same time, 10 servers will work at the same time, 10 hours later, 10 tasks will be completed at the same time, so, the whole body, or 1 hour to complete a task< Cluster concept:
1. Two key features
cluster is a group of service entities working together to provide a service platform with more scalability and availability than a single service entity. In the view of clients, a cluster is like a service entity, but in fact, a cluster is composed of a group of service entities. Compared with a single service entity, cluster provides the following two key features:
· scalability - the performance of cluster is not limited to a single service entity, and new service entities can join the cluster dynamically to enhance the performance of cluster
· high availability - the cluster avoids the warning of out of service by rendant service entities. In a cluster, the same service can be provided by multiple service entities. If one service entity fails, another service entity will take over the failed service entity. Cluster provides the function of recovering from one service entity to another, which enhances the usability of applications< Two capabilities
in order to have the characteristics of scalability and high availability, the cluster must have the following two capabilities:
· load balancing load balancing can distribute tasks to the computing and network resources in the cluster environment more evenly
· error recovery - for some reason, the resource executing a task fails, and the resource executing the same task in another service entity then completes the task. This process is called error recovery because the resource in one entity cannot work and the resource in another entity continues to complete the task transparently
both load balancing and error recovery require resources in each service entity to perform the same task, and for each resource of the same task, the information view (information context) required to perform the task must be the same<
3. Two technologies
to realize cluster business, the following two technologies are necessary:
· cluster address - the cluster is composed of multiple service entities, and the cluster client obtains the functions of each service entity in the cluster by accessing the cluster address. Having a single cluster address (also known as a single image) is a basic feature of a cluster. The settings that maintain the cluster address are called load balancers. The internal load balancer is responsible for managing the join and exit of each service entity, and the external load balancer is responsible for translating the cluster address to the internal service entity address. Some load balancers implement real load balancing algorithms, while others only support task conversion. The load balancer that only implements task transformation is suitable for the cluster environment that supports active-standby. There is only one service entity working in the cluster. When the working service entity fails, the load balancer turns the subsequent task to another service entity
· internal communication -- in order to work together, realize load balancing and error recovery, all entities in the cluster must communicate frequently, such as the communication between the load balancer and the heartbeat test information of the service entity, and the communication between the task execution context information of the service entity
having the same cluster address enables clients to access the computing services provided by the cluster, and the internal addresses of each service entity are hidden under one cluster address, so that the computing services required by customers can be distributed among each service entity. Internal communication is the basis of the normal operation of the cluster, which makes the cluster have the ability of load balancing and error recovery
Distributed Concept:
the so-called distributed computing is a computer science, which studies how to divide a problem that requires a huge amount of computing power into many small parts, then assign these parts to many computers for processing, and finally synthesize these calculation results to get the final result. Distributed network storage technology is to store data in a number of independent machines. The distributed network storage system adopts the scalable system structure, uses multiple storage servers to share the storage load, and uses the location server to locate the storage information. It not only solves the bottleneck problem of single storage server in the traditional centralized storage system, but also improves the reliability, availability and scalability of the system
distributed refers to the distribution of different services in different places. Cluster refers to several servers together to achieve the same business. Every node in the distributed system can be clustered. Clusters are not necessarily distributed. Each distributed node completes different services. If a node fails, the service will be inaccessible.
Let me give you an answer. Now there are not many references
Lamport algorithm
when requesting the critical area: send $request (T) to all other processes_ i. I) $, save the request to the local request queue
If a request is received, the request is added to the local request queue by timestamp, and then a $reply (T) is returned_ j) $is also the current timestamp
after sending the request, the process will wait and wait until
< blockquote > < UL >receives $reply (t *) $returned by all other processes, expecting to be $t for the local process_ I $, so $t * & gt; t_ I $, that is to say, I sent a message, the other party only reply to the message
and local $t_ If I $is in the queue head of the local queue
then it can be executed. When I enter the critical area
and exit the critical area, I delete my request from the local queue, Send a broadcast with time stamp , informing all queues that I $request can be deleted
< / UL > < / blockquote >therefore, the message overhead of Lamport algorithm is 3 (n-1), which is to make resource requests, receive notifications, send messages to other processes except itself And notification release
Ricart Agrawala algorithm
is mainly used to optimize the message communication mechanism of the previous algorithm. The rendancy of the previous algorithm mainly includes:
< blockquote > < UL >1
Similarly, it's useless to reply reply early, and you can't get in if you reply1.2. Therefore, it's better to wait until release to reply
2
this algorithm optimizes for 1
< / UL > < / blockquote >this algorithm removes the $release $message, and $release $does not need to be sent
why? According to our previous analysis, it is unnecessary to send $release $to those processes that have not applied for the critical zone, so we do not want to send it to them. The process that applied for the critical zone is the process that sent us the $request $message. So we just need to send $release $to those processes that need $reply $
continue to analyze, in fact, it's unnecessary to reply to $reply $immediately. Anyway, you can't access it immediately after replying to it. It's better to reply to it when it can access it (it's no different for the requester process, right)
based on this idea, we delay $reply $ to when resources are released. We can find that when we send $reply, does not continue to occupy the critical area resources , that is to say, the two messages are merged into one message, which reces the message complexity by 1 / 3
As a result, the message overhead of the algorithm is reced to 2 (n-1)the blogger has made a little modification and supplement to his expression, so that you can understand his meaning easily
in short, distributed computing improves efficiency by shortening the execution time of a single task, while cluster computing improves efficiency by increasing the number of tasks executed per unit time
for example,
If a task is composed of 10 subtasks, and each subtask takes 1 hour to execute independently, it takes 10 hours to execute the task on a server
the distributed solution is adopted to provide 10 servers. Each server is only responsible for one subtask, regardless of the dependency between subtasks. It takes only one hour to complete the task A typical representative of this working mode is Hadoop's map / rec distributed computing model, and the cluster scheme also provides 10 servers, each of which can handle this task independently. Suppose that 10 tasks arrive at the same time, 10 servers will work at the same time, 10 hours later, 10 tasks will be completed at the same time, so that the whole body, or an average of 1 hour to complete a task Pay attention to the difference between task and subtask)
(2) Zhihu https://www.hu.com/question/20004877
this ape friend is very simple and clear:
distributed: a business is divided into multiple sub businesses and deployed on different servers
cluster: the same business, Deployed on multiple servers
another ape friend expressed it from another perspective:
cluster is a physical form, and distributed is a working mode
this ape friend's description is also very concise, but it is more abstract:
according to my understanding, cluster is to solve the problem of high availability, while distributed is to solve the problem of high performance and high concurrency
(3) network http://ke..com/view/4804677.htm 、 http://ke..com/view/3022776.htm
cluster:
a cluster is a group of independent clusters Computers interconnected through high-speed networks form a group and are managed as a single system. When a client interacts with a cluster, the cluster is like an independent server. Cluster configuration is used to improve availability and scalability<
distributed:
a computer processing technology based on network, corresponding to centralized. As the performance of personal computer has been greatly improved and the popularity of its use, it is possible to distribute the processing power to all computers on the network. Distributed computing is a concept opposite to centralized computing. The data of distributed computing can be distributed in a large area
do you feel like you don't understand after reading these? Bloggers are the same! So let's move on
the blogger mentioned above that he has been exposed to Dubbo, a distributed service framework. Let's see why it says he is a distributed service architecture http://bbo.io/User+Guide-zh.htm#UserGuide -When there are more and more vertical applications, the interaction between applications is inevitable, and the core business is extracted as an independent service, graally forming a stable service center, so that the front-end applications can respond to the changing market demand more quickly
at this time, the distributed service framework (RPC) for improving business reuse and integration is the key
by chance, it was found that "Git is a distributed version control system". Why is it distributed<
Git is a distributed version control system, corresponding to centralized version control, such as SVN. In short, distributed version control means that everyone can create an independent code warehouse for management, and all kinds of version control operations can be completed locally. The code modified by each person can be pushed and merged into another code repository. Like SVN, there is only one central control, and all developers must rely on this code warehouse. Each version control operation must also be linked to the server to complete. Many companies like to use centralized version control to better control the code. If you develop it by yourself, you can choose git as a distributed system< From the perspective of general developers, GIT has the following functions:
1. Clone the complete git warehouse (including code and version information) from the server to the stand-alone
2. Create branches and modify code on your own machine according to different development purposes
3. Submit the code on the branch created on the stand-alone computer
4. Merge branches on a single machine
5. Fetch the latest version of the code on the server and merge it with your main branch
6. Generate a patch and send it to the main developer
7. Look at the feedback of the main developer. If the main developer finds that there is a conflict between two general developers (a conflict that can be solved by cooperation between them), he will ask them to solve the conflict first and then submit it by one of them. If the main developer can solve the problem on his own, or if there is no conflict, he can do it by himself
8. Generally, developers can use the pull command to resolve conflicts, and then submit patches to the main developer
after looking at the descriptions of Dubbo and git, it seems that they are similar to the above "distributed: one service is divided into multiple sub services and deployed on different servers; Cluster: the same service is deployed on multiple servers"
Dubbo extracts the core business, as an independent service mole, each mole only needs to rely on the interface, and the interface is separated, so the developers can complete their own service moles, and finally complete a complete system. Their goal is to complete a system, and each sub service mole is equivalent to a sub business. Git is similar
in fact, distributed computing can't open clusters in many cases, which is reflected in Dubbo, Hadoop and elasticsearch
now the concept of distribution may be relatively clear, and the concept of cluster may be relatively vague. In addition, how does the cluster cooperate with the distributed? Next, let's continue to understand the cluster
clusters are mainly divided into three categories (high availability cluster, load balancing cluster and scientific computing cluster)
High Availability Cluster
Load Balance Cluster
High Performance Computing Cluster
1 High availability cluster
ha cluster with two nodes has many popular and unscientific names, such as "al hot standby", "al standby", "al standby"
what high availability cluster solves is to ensure the ability of users' applications to continuously provide external services Please note that the high availability cluster is not used to protect the business data, but to protect the business program of the user to provide services continuously, so as to minimize the impact of software / hardware / man-made failures on the business
2. Load balance cluster
Load Balancing System: all nodes in the cluster are active, and they share the workload of the system. Generally, web server cluster, database cluster and application server cluster all belong to this type
Load Balancing cluster is generally used for web server and database server of corresponding network request. This kind of cluster can check the servers that accept less requests and are not busy when receiving requests, and transfer the requests to these servers. From the point of view of checking the state of other servers, load balancing and fault-tolerant clusters are very close, but the difference is that they are more numerous
3. High performance computing cluster
high performance computing cluster, HPC cluster for short. This kind of cluster is dedicated to providing powerful computing power that a single computer cannot provide<
High Performance Computing classification:
3.1, high throughput computing
there is a kind of high performance computing, which can be divided into several parallel subtasks, and each subtask is not related to each other. Searching for aliens at home SETI@HOME Search for extraterrestrial intelligence at home is one of these applications
this project uses idle computing resources on the Internet to search for aliens. The server of SETI project sends a set of data and data patterns to the computing nodes participating in SETI on the Internet. The computing nodes search with the given patterns on the given data, and then send the search results to the server. The server is responsible for integrating the data returned from each computing node into complete data. Because a common feature of this type of application is to search some patterns on massive data, this kind of computing is called high throughput computing
the so-called Internet Computing belongs to this category. According to Flynn's classification, high throughput computing belongs to the category of SIMD (single instruction / multiple data)
3.2, distributed computing
another kind of computing is just the opposite of high throughput computing. Although they can be divided into several parallel subtasks, they are closely related and need a lot of data exchange. According to Flynn's classification, distributed high-performance computing belongs to the category of MIMD (multiple instruction / multiple data)
let's talk about the application scenarios of these clusters:
high availability clusters are not explained here
I think Dubbo is more inclined to load balancing cluster, and used ape friends should know (you can know for yourself if you don't know). Dubbo can have multiple providers for the same service. When a consumer comes, it wants to consume that provider. Here is a load balancing mechanism
elasticsearch is more inclined to the distributed computing of scientific computing cluster
here, many ape friends may know some terms of cluster: cluster fault tolerance and load balancing
take Dubbo as an example:
cluster fault tolerance http://bbo.io/User+Guide-zh.htm#UserGuide -Dubbo provides these fault-tolerant strategies:
cluster fault-tolerant mode:
the cluster fault-tolerant strategy can be extended by itself. See: cluster extension
failure cluster
automatic switch, When a failure occurs, try another server again Default)
is usually used for read operations, but retrying results in a longer delay
you can use retries = & quot; 2" To set the number of retries (excluding the first one)<
failfast cluster
it fails quickly, only one call is initiated, and if it fails, an error will be reported immediately
it is usually used for non idempotent write operations, such as adding new records
failsafe cluster
fail safe. When an exception occurs, it will be ignored directly
usually used for writing audit
The first process subset is (1.2.4.10); The second process subset is (2.3.5.11); After that, the number of each column is increased, and after 13, it starts from 1
The types ofdistributed system can be roughly classified into three categories:
1, distributed data, but there is only one general database and no local database
Each layer has its own database There is no central control part in the fully distributed network, and there are many kinds of connection modes between nodes, such as loose connection, close connection, dynamic connection, broadcast notification connection, etc
extended data
the index of measuring distributed system
1. Performance: the throughput capacity of the system refers to the total amount of data that the system can process at a certain time, which can be usually measured by the total amount of data that the system can process per second; The response delay of the system refers to the time needed for the system to complete a certain function
system concurrency refers to the ability of a system to perform a certain function at the same time, which is usually measured by QPS (query per second). The above three performance indicators often restrict each other, and it is difficult to achieve low latency for systems pursuing high throughput; When the average response time of the system is long, it is difficult to improve the QPS
2. Availability: the availability of the system refers to the ability of the system to provide services correctly in the face of various exceptions
the availability of a system can be measured by the ratio of the time when the system stops service to the time when the system is in normal service, or by the ratio of the number of failures to the number of successes of a function. Availability is an important index of distributed system, which measures the robustness of the system and reflects the fault tolerance ability of the system
Scalability: the scalability of the distributed system refers to the characteristics that the distributed system improves the system performance (throughput, latency, concurrency), storage capacity and computing power by expanding the cluster machine scalea good distributed system always pursues "linear scalability", which means that a certain index of the system can grow linearly with the number of machines in the cluster
4, consistency: in order to improve the availability of distributed system, it is inevitable to use the mechanism of replica, which leads to the problem of replica consistency. The stronger the consistency of the model, the easier it is for users to use