...

Introduction to RAID (Redundant Arrays of Independent Disks)

Alan   utmel.com   2021-10-21 17:40:09

Catalog

Ⅰ Introduction
Ⅱ Features of RAID
Ⅲ Classification of RAID
Ⅳ Working of RAID
Ⅴ RAID Level
Ⅵ Advantages and disadvantages of RAID
Ⅶ Applications of RAID technology
Ⅷ Solid State Drive RAID Technology


Ⅰ Introduction

RAID (redundant array of independent disks) is a method of storing the same data in different places on multiple hard disks. By placing data on multiple hard drives, input and output operations overlap in a balanced manner. Because multiple hard drives increase the mean time between failures (MTBF), storing redundant data also increases fault tolerance.

An article published by the University of California-Berkeley in 1988: "A Case for Redundant Arrays of Inexpensive Disks". In the article, I talked about the term RAID and defined the 5 levels of RAID. The purpose of the University of Berkeley's research is to reflect the fast performance of the CPU at that time. CPU performance grows by about 30-50% every year, while hard magnetic machines can only grow by about 7%. The research team hopes to find a new technology that can immediately improve performance to balance the computing power of the computer in the short term. At that time, the main research purpose of the Berkeley research group was efficiency and cost.

The research team also designed fault-tolerance and logical data redundancy, which led to the RAID theory. At the beginning of the research, inexpensive disks were also the main focus, but later a large number of cheap disk combinations were not suitable for the actual production environment. Later, independent replaced inexpensive, with many independent disk groups.

Ⅱ Features of RAID

RAID technology mainly has the following three basic functions:

(1) By striping the data on the disk, block access to the data is realized, which reduces the mechanical seek time of the disk and improves the data access speed.

(2) By reading several disks in an array at the same time, the mechanical seek time of the disks is reduced and the data access speed is improved.

(3) Mirroring or storing parity information help realize redundant protection of data.

Ⅲ Classification of RAID

There are three types of RAID, one is an external RAID cabinet, the other is an internal RAID card, and the third is a simulation using software.

Large servers often use external RAID cabinets. They have the feature of Hot Swap, but these products are very expensive.

Internal RAID is cheap, but, it requires high installation technology. It is suitable for technical personnel to use and operate. The hardware array can provide functions such as online expansion, dynamic modification of the array level, automatic data recovery, drive roaming, and ultra-high-speed buffering. It can provide solutions for performance, data protection, reliability, availability, and manageability.

The use of software simulation means that multiple hard disks on the connected common SCSI card are configured into logical disks through the disk management function provided by the network operating system itself to form an array. The software array can provide data redundancy. However, it reduced the performance of the disk subsystem. And some of the reduction is relatively large, up to about 30%. Therefore, it will slow down the speed of the machine and is not suitable for servers with large data traffic.

Ⅳ Working of RAID

As an independent system, the RAID directly connects outside the host or connected to the host through a network. The RAID has multiple ports that can be connected by different hosts or different ports. A host connected to different ports of the array can increase the transmission speed.

Like the integrated cache of a single disk used by PCs at that time, a certain amount of buffer memory is provided inside the disk array to speed up the interaction with the host. The host interacts with the cache of the disk array, and the cache interacts with the specific disk data.

In applications, some commonly used data needs to be read frequently. The RAID finds these frequently read data according to internal algorithms, and stores them in the cache to speed up the host's reading of these data. For other caches If the host is not in the file, it is directly read from the disk and transmitted to the host by the array. For the data written by the host, it is only written in the cache, and the host can complete the write operation immediately. Then the cache is slowly written to the disk.

Ⅴ RAID Level

RAID JBOD

RAID JBOD.png

RAID JBOD means Just a Bunch Of Disks, which connects multiple hard disks in series to form a large storage device. In a sense, this type is not counted as RAID.

RAID 0

RAID 0.gif

RAID 0 selects reasonable strips on N hard disks to create a strip set. The principle is to divide the data into different strips (Stripe) similar to the interlace scan of the display, and write it to all the hard disks for reading and writing at the same time. The parallel operation of multiple hard disks increases the speed of disk read and write at the same time by N times.

RAID 1

RAID 1.gif

RAID 1 is called disk mirroring. The principle is to mirror the data of one disk to another disk. That is to say, when data is written to one disk, a mirror file will be generated on another idle disk without affecting performance. To ensure the reliability and repairability of the system to the maximum extent.

RAID0+1

RAID0+1.gif

From the name of RAID 0+1, we can see that it is a combination of RAID0 and RAID1. We create a stripe set in the disk mirroring. Because this configuration method combines the advantages of striping and mirroring, it is called RAID 0+1.

RAID2

RAID2: with Hamming code verification. Conceptually, RAID 2 is similar to RAID 3. Both of them strip data on different hard disks, and the unit of stripe is bit or byte. However, RAID 2 uses certain encoding techniques to provide error checking and recovery. This encoding technique requires multiple disks to store inspection and recovery information, making the implementation of RAID 2 technology more complicated.

RAID3

RAID 3.png

RAID3 (parallel transmission with parity check code). This kind of check code is different from RAID2, it can only check errors but not correct them. It processes one band at a time when accessing data, which can increase the speed of reading and writing.

RAID4

RAID4 (independent disk structure with parity check code). RAID4 is very similar to RAID3. The difference is that its access to data is carried out by data block, that is, by disk.

RAID 5

RAID 5.png

RAID5 (independent disk structure with distributed parity). It can be seen from its schematic diagram that its parity check code exists on all disks, where p0 represents the parity check value of the 0th zone. The reading efficiency of RAID5 is very high, the write efficiency is average, and the block-based collective access efficiency is good.

RAID6

RAID6 is an independent disk structure with two types of distributed storage of parity codes. It is an extension of RAID5 and is mainly used for occasions where data must not be wrong.

RAID7

RAID7 (optimized high-speed data transfer disk structure). All I/O transfers of RAID7 are carried out synchronously and can be controlled separately, which improves the parallelism of the system and the speed of the system to access data.

RAID10

RAID10 (high reliability and efficient disk structure). This structure is nothing more than a zone structure plus a mirror structure, because the two structures have their own advantages and disadvantages, so they can complement each other to achieve both high efficiency and high speed.

RAID53

RAID53 (Efficient Data Transfer Disk Structure). The later structure is a kind of repetition and reuse of the previous structure. This structure is the unification of RAID3 and the stripe structure, so it is faster and has fault tolerance.

RAID 5E

RAID 5E is an improvement on the basis of the RAID 5 level. Similar to RAID 5, the verification information of data is evenly distributed on each hard disk. However, a part of unused space is reserved on each hard disk, which is not carried out. Striping allows up to two physical hard drives to fail.

RAID 5EE

Compared with RAID 5E, RAID 5EE's data distribution is more efficient. Part of the space of each hard disk is used as a distributed hot spare disk. They are part of the array. When a physical hard disk in the array fails, the speed of data reconstruction Will be faster.

Ⅵ Advantages and disadvantages of RAID

Advantages of RAID

RAID increases the transmission rate. It greatly improves the data throughput of the storage system by storing and reading data on multiple disks at the same time. In RAID, many disk drives can transmit data at the same time. These disk drives are logically one disk drive, so the use of RAID can reach the speed of a single disk drive several times, tens of times, or even hundreds of times. This is also the problem that RAID originally wanted to solve. Because the speed of the CPU was increasing rapidly at that time, and the data transfer rate of the disk drive could not be greatly increased. A solution was needed to resolve the contradiction between the two. RAID finally succeeded.

RAID provides fault tolerance through data verification. Ordinary disk drives cannot provide fault tolerance if it does not include the CRC (cyclic redundancy check) code written on the disk. RAID fault tolerance is built on the hardware fault tolerance of each disk drive, so it provides higher security. In many RAID modes, there are relatively complete mutual verification/recovery measures, and even direct mutual mirror backup, which greatly improves the fault tolerance of the RAID system and improves the stability and redundancy of the system.

Disadvantages of RAID

RAID0 has no redundancy function. If a disk (physical) is damaged, all data cannot be used. The maximum utilization of RAID1 disks can only reach 50%, which is the lowest among all RAID levels.

RAID0+1 is a compromise between RAID 0 and RAID 1. RAID 0+1 can provide data security for the system. The degree of protection is lower than that of Mirror and the disk space utilization rate is higher than that of Mirror.

Ⅶ Applications of RAID technology

DAS--direct access storage device

DAS is server-centric. Traditional network storage devices connect the RAID hard disk array directly to the server of the network system. This form of network storage structure called DAS (Direct Attached Storage).

NAS--Network Attached Storage

NAS is data-centric. NAS is the abbreviation of Network Attached Storage. In the NAS storage structure, the storage system is no longer attached to a specific server or client through the I/O bus. It is directly connected to the network through the network interface and is accessed by the user through the network.

SAN--Storage Area Networks

SAN is network-centric, and SAN is a high-speed storage network similar to an ordinary local area network. SAN provides an easy way to connect to existing LANs, allowing companies to increase storage capacity independently, and keep network performance from being affected by data access. This independent proprietary network storage method makes SAN has many advantages: high scalability; easy to manage; centralized management software enables remote management and unattended operation; fault tolerance strong ability.

SAN is mainly used in working environments with large storage capacity, such as large-scale PACS in hospitals, etc. However, the low demand and high cost have affected the SAN market.

Ⅷ Solid State Drive RAID Technology

At present, there are three main types of RAID array technologies for solid-state hard drives. Among them, a hybrid RAID is a combination of solid-state hard drives and mechanical hard drives. It achieves the complementarity of the characteristics of the two. 

The current price of solid-state drives is higher than that of mechanical hard drives. The hybrid RAID array composed of solid-state drives and mechanical hard drives. It has a greater advantage in cost control than other pure solid-state drive RAID arrays.

However, RAID arrays composed of multiple solid-state drives are better than hybrid RAID arrays composed of solid-state drives and mechanical hard drives. At present, most solid-state drive manufacturers use chip-level RAID inside solid-state drives to further improve performance and reduce power consumption.

For the embedded RAID technology iRAID, the preliminary research results of this structure show that the RAID system will no longer be a group of independent drives. There may be only a single high-density disk in the future. This will enable the disk arrays of these storage systems to have greater improvements in performance, power consumption, and volume.

Embedded RAID technology will become one of the main research directions of solid-state hard disk RAID array technology. It has broad application prospects, involving education, entertainment, defense, and other application fields, especially in complex working environments. Areas with high levels and high data security requirements will have great achievements. Due to the lack of research on evaluating the reliability of solid-state hard disk RAID, it is necessary to improve the evaluation system and method for RAID reliability as soon as possible. Therefore, reliability analysis and research will also become one of the research focuses of solid-state hard disk RAID technology.

Catalog

Ⅰ Introduction
Ⅱ Features of RAID
Ⅲ Classification of RAID
Ⅳ Working of RAID
Ⅴ RAID Level
Ⅵ Advantages and disadvantages of RAID
Ⅶ Applications of RAID technology
Ⅷ Solid State Drive RAID Technology


Ⅰ Introduction

RAID (redundant array of independent disks) is a method of storing the same data in different places on multiple hard disks. By placing data on multiple hard drives, input and output operations overlap in a balanced manner. Because multiple hard drives increase the mean time between failures (MTBF), storing redundant data also increases fault tolerance.

An article published by the University of California-Berkeley in 1988: "A Case for Redundant Arrays of Inexpensive Disks". In the article, I talked about the term RAID and defined the 5 levels of RAID. The purpose of the University of Berkeley's research is to reflect the fast performance of the CPU at that time. CPU performance grows by about 30-50% every year, while hard magnetic machines can only grow by about 7%. The research team hopes to find a new technology that can immediately improve performance to balance the computing power of the computer in the short term. At that time, the main research purpose of the Berkeley research group was efficiency and cost.

The research team also designed fault-tolerance and logical data redundancy, which led to the RAID theory. At the beginning of the research, inexpensive disks were also the main focus, but later a large number of cheap disk combinations were not suitable for the actual production environment. Later, independent replaced inexpensive, with many independent disk groups.

Ⅱ Features of RAID

RAID technology mainly has the following three basic functions:

(1) By striping the data on the disk, block access to the data is realized, which reduces the mechanical seek time of the disk and improves the data access speed.

(2) By reading several disks in an array at the same time, the mechanical seek time of the disks is reduced and the data access speed is improved.

(3) Mirroring or storing parity information help realize redundant protection of data.

Ⅲ Classification of RAID

There are three types of RAID, one is an external RAID cabinet, the other is an internal RAID card, and the third is a simulation using software.

Large servers often use external RAID cabinets. They have the feature of Hot Swap, but these products are very expensive.

Internal RAID is cheap, but, it requires high installation technology. It is suitable for technical personnel to use and operate. The hardware array can provide functions such as online expansion, dynamic modification of the array level, automatic data recovery, drive roaming, and ultra-high-speed buffering. It can provide solutions for performance, data protection, reliability, availability, and manageability.

The use of software simulation means that multiple hard disks on the connected common SCSI card are configured into logical disks through the disk management function provided by the network operating system itself to form an array. The software array can provide data redundancy. However, it reduced the performance of the disk subsystem. And some of the reduction is relatively large, up to about 30%. Therefore, it will slow down the speed of the machine and is not suitable for servers with large data traffic.

Ⅳ Working of RAID

As an independent system, the RAID directly connects outside the host or connected to the host through a network. The RAID has multiple ports that can be connected by different hosts or different ports. A host connected to different ports of the array can increase the transmission speed.

Like the integrated cache of a single disk used by PCs at that time, a certain amount of buffer memory is provided inside the disk array to speed up the interaction with the host. The host interacts with the cache of the disk array, and the cache interacts with the specific disk data.

In applications, some commonly used data needs to be read frequently. The RAID finds these frequently read data according to internal algorithms, and stores them in the cache to speed up the host's reading of these data. For other caches If the host is not in the file, it is directly read from the disk and transmitted to the host by the array. For the data written by the host, it is only written in the cache, and the host can complete the write operation immediately. Then the cache is slowly written to the disk.

Ⅴ RAID Level

RAID JBOD

RAID JBOD.png

RAID JBOD means Just a Bunch Of Disks, which connects multiple hard disks in series to form a large storage device. In a sense, this type is not counted as RAID.

RAID 0

RAID 0.gif

RAID 0 selects reasonable strips on N hard disks to create a strip set. The principle is to divide the data into different strips (Stripe) similar to the interlace scan of the display, and write it to all the hard disks for reading and writing at the same time. The parallel operation of multiple hard disks increases the speed of disk read and write at the same time by N times.

RAID 1

RAID 1.gif

RAID 1 is called disk mirroring. The principle is to mirror the data of one disk to another disk. That is to say, when data is written to one disk, a mirror file will be generated on another idle disk without affecting performance. To ensure the reliability and repairability of the system to the maximum extent.

RAID0+1

RAID0+1.gif

From the name of RAID 0+1, we can see that it is a combination of RAID0 and RAID1. We create a stripe set in the disk mirroring. Because this configuration method combines the advantages of striping and mirroring, it is called RAID 0+1.

RAID2

RAID2: with Hamming code verification. Conceptually, RAID 2 is similar to RAID 3. Both of them strip data on different hard disks, and the unit of stripe is bit or byte. However, RAID 2 uses certain encoding techniques to provide error checking and recovery. This encoding technique requires multiple disks to store inspection and recovery information, making the implementation of RAID 2 technology more complicated.

RAID3

RAID 3.png

RAID3 (parallel transmission with parity check code). This kind of check code is different from RAID2, it can only check errors but not correct them. It processes one band at a time when accessing data, which can increase the speed of reading and writing.

RAID4

RAID4 (independent disk structure with parity check code). RAID4 is very similar to RAID3. The difference is that its access to data is carried out by data block, that is, by disk.

RAID 5

RAID 5.png

RAID5 (independent disk structure with distributed parity). It can be seen from its schematic diagram that its parity check code exists on all disks, where p0 represents the parity check value of the 0th zone. The reading efficiency of RAID5 is very high, the write efficiency is average, and the block-based collective access efficiency is good.

RAID6

RAID6 is an independent disk structure with two types of distributed storage of parity codes. It is an extension of RAID5 and is mainly used for occasions where data must not be wrong.

RAID7

RAID7 (optimized high-speed data transfer disk structure). All I/O transfers of RAID7 are carried out synchronously and can be controlled separately, which improves the parallelism of the system and the speed of the system to access data.

RAID10

RAID10 (high reliability and efficient disk structure). This structure is nothing more than a zone structure plus a mirror structure, because the two structures have their own advantages and disadvantages, so they can complement each other to achieve both high efficiency and high speed.

RAID53

RAID53 (Efficient Data Transfer Disk Structure). The later structure is a kind of repetition and reuse of the previous structure. This structure is the unification of RAID3 and the stripe structure, so it is faster and has fault tolerance.

RAID 5E

RAID 5E is an improvement on the basis of the RAID 5 level. Similar to RAID 5, the verification information of data is evenly distributed on each hard disk. However, a part of unused space is reserved on each hard disk, which is not carried out. Striping allows up to two physical hard drives to fail.

RAID 5EE

Compared with RAID 5E, RAID 5EE's data distribution is more efficient. Part of the space of each hard disk is used as a distributed hot spare disk. They are part of the array. When a physical hard disk in the array fails, the speed of data reconstruction Will be faster.

Ⅵ Advantages and disadvantages of RAID

Advantages of RAID

RAID increases the transmission rate. It greatly improves the data throughput of the storage system by storing and reading data on multiple disks at the same time. In RAID, many disk drives can transmit data at the same time. These disk drives are logically one disk drive, so the use of RAID can reach the speed of a single disk drive several times, tens of times, or even hundreds of times. This is also the problem that RAID originally wanted to solve. Because the speed of the CPU was increasing rapidly at that time, and the data transfer rate of the disk drive could not be greatly increased. A solution was needed to resolve the contradiction between the two. RAID finally succeeded.

RAID provides fault tolerance through data verification. Ordinary disk drives cannot provide fault tolerance if it does not include the CRC (cyclic redundancy check) code written on the disk. RAID fault tolerance is built on the hardware fault tolerance of each disk drive, so it provides higher security. In many RAID modes, there are relatively complete mutual verification/recovery measures, and even direct mutual mirror backup, which greatly improves the fault tolerance of the RAID system and improves the stability and redundancy of the system.

Disadvantages of RAID

RAID0 has no redundancy function. If a disk (physical) is damaged, all data cannot be used. The maximum utilization of RAID1 disks can only reach 50%, which is the lowest among all RAID levels.

RAID0+1 is a compromise between RAID 0 and RAID 1. RAID 0+1 can provide data security for the system. The degree of protection is lower than that of Mirror and the disk space utilization rate is higher than that of Mirror.

Ⅶ Applications of RAID technology

DAS--direct access storage device

DAS is server-centric. Traditional network storage devices connect the RAID hard disk array directly to the server of the network system. This form of network storage structure called DAS (Direct Attached Storage).

NAS--Network Attached Storage

NAS is data-centric. NAS is the abbreviation of Network Attached Storage. In the NAS storage structure, the storage system is no longer attached to a specific server or client through the I/O bus. It is directly connected to the network through the network interface and is accessed by the user through the network.

SAN--Storage Area Networks

SAN is network-centric, and SAN is a high-speed storage network similar to an ordinary local area network. SAN provides an easy way to connect to existing LANs, allowing companies to increase storage capacity independently, and keep network performance from being affected by data access. This independent proprietary network storage method makes SAN has many advantages: high scalability; easy to manage; centralized management software enables remote management and unattended operation; fault tolerance strong ability.

SAN is mainly used in working environments with large storage capacity, such as large-scale PACS in hospitals, etc. However, the low demand and high cost have affected the SAN market.

Ⅷ Solid State Drive RAID Technology

At present, there are three main types of RAID array technologies for solid-state hard drives. Among them, a hybrid RAID is a combination of solid-state hard drives and mechanical hard drives. It achieves the complementarity of the characteristics of the two. 

The current price of solid-state drives is higher than that of mechanical hard drives. The hybrid RAID array composed of solid-state drives and mechanical hard drives. It has a greater advantage in cost control than other pure solid-state drive RAID arrays.

However, RAID arrays composed of multiple solid-state drives are better than hybrid RAID arrays composed of solid-state drives and mechanical hard drives. At present, most solid-state drive manufacturers use chip-level RAID inside solid-state drives to further improve performance and reduce power consumption.

For the embedded RAID technology iRAID, the preliminary research results of this structure show that the RAID system will no longer be a group of independent drives. There may be only a single high-density disk in the future. This will enable the disk arrays of these storage systems to have greater improvements in performance, power consumption, and volume.

Embedded RAID technology will become one of the main research directions of solid-state hard disk RAID array technology. It has broad application prospects, involving education, entertainment, defense, and other application fields, especially in complex working environments. Areas with high levels and high data security requirements will have great achievements. Due to the lack of research on evaluating the reliability of solid-state hard disk RAID, it is necessary to improve the evaluation system and method for RAID reliability as soon as possible. Therefore, reliability analysis and research will also become one of the research focuses of solid-state hard disk RAID technology.

0755-82568876

gdl02@szgdl.com.cn
0