Pages

20 Aug 2013

SAN interview questions - PART 1


1. WHAT ARE THE BENEFITS OF FIBER CHANNEL SANS?

 Fiber Channel SANs exceptional reliability, scalability, consolidation, and performance. Fiber Channel SANs provide significant advantages over direct-attached storage through improved storage utilization, higher data availability, reduced management costs, and highly scalable capacity and performance. FC SANs reduce total cost of ownership (TCO).



    2. WHAT ENVIRONMENT IS MOST SUITABLE FOR FIBER CHANNEL SANS?
A> Typically, Fiber Channel SANs are most suitable for large data centers running business-critical data, as well as applications that require high-bandwidth performance such as medical imaging, streaming media, and large databases.

    B> Fiber Channel SAN solutions can easily scale to meet the most demanding performance and availability requirements.   
A typical SAN architecture is displayed as follows:




    3. WHAT CUSTOMER PROBLEMS DO FIBER CHANNEL SANS SOLVE?

    Ø  Highly effective backup and recovery approach, including LAN-free and server-free backup models

    Ø  By providing flexible connectivity options and resource sharing, Fibre Channel SANs also greatly reduce the number of physical devices and disparate systems that must be purchased and managed, which can dramatically lower capital expenditures.

    Ø  Heterogeneous SAN management provides a single point of control for all devices on the SAN, lowering costs and freeing personnel to do other tasks.



    4. HOW LONG HAS FIBER CHANNEL BEEN AROUND?
Development started in 1988, ANSI standard approval occurred in 1994, and large deployments began in 1998. Fiber Channel is a mature, safe, and widely deployed solution for high-speed (1 GB, 2 GB, 4 GB, 8 GB, 16GB) communications and is the foundation for the majority of SAN installations throughout the world



   5. WHAT ARE THE BENEFITS OF 4GB FIBER CHANNEL?
Benefits include twice the performance with little or no price increase, investment protection with backward compatibility to 2 GB, higher reliability due to fewer SAN components (switch and HBA ports) required, and the ability to replicate, back up, and restore data more quickly. 4 GB Fiber Channel systems are ideally suited for applications that need to quickly transfer large amounts of data such as remote replication across a SAN, streaming video on demand, modeling and rendering, and large databases. 4 GB technology is shipping today.



   6. HOW IS FIBER CHANNEL DIFFERENT FROM ISCSI?
Fiber Channel generally provides high performance and high availability for business-critical applications, usually in the corporate data center. (High-end)

In contrast, iSCSI is generally used to provide SANs for smaller regional or departmental data centers. (Mid-range)



   7. WHEN SHOULD I DEPLOY FIBER CHANNEL INSTEAD OF ISCSI?
For environments consisting of high-end servers that require high bandwidth or data center environments with business-critical data, Fiber Channel is a better fit than iSCSI. For environments consisting of many midrange or low-end servers, an IP SAN solution often delivers the most appropriate price/performance.



    8. NAME SOME OF THE SAN TOPOLOGIES

    Point-to-point, arbitrated loop, and switched fabric topologies
           
       
 
  
9. WHAT IS THE NEED FOR SEPARATE NETWORK FOR STORAGE ?
WHY LAN CANNOT BE USED ?

LAN hardware and operating systems are geared to user traffic, and LANs are tuned for a fast user response to messaging requests.
With a SAN, the storage units can be secured separately from the servers and totally apart from the user network enhancing storage access in data blocks (bulk data transfers), advantageous for server-less backups.

If the customer wants to have implement iSCSI San & normal network in the same server or storage, he has to use separate adapters for both.

For eg: 1 network card for normal management or network teaming & another card in another slot for iSCSI SAN.



10. WHAT ARE THE ADVANTAGES OF RAID?

“Redundant Array of Inexpensive Disks”
Depending on how we configure the array, we can have the
- data mirrored [RAID 1] (duplicate copies on separate drives)
- striped [RAID 0] (interleaved across several drives), or
- parity protected [RAID 5](extra data written to identify errors).
These can be used in combination to deliver the balance of performance and reliability that the user requires.



11. DEFINE RAID. WHICH ONE DO YOU RECOMMEND AS A GOOD CHOICE?
RAID (Redundant array of Independent Disks) is a technology to achieve redundancy with faster I/O. There are Many Levels of RAID to meet different needs of the customer which are: R0, R1, R3, R4, R5, R10, R6.

R0 – Striped set without parity
      > High performance 
       >  No redundancy
     > Not good for critical data, but good for data streaming. 
       > Any 1 drive failure leads to data loss.

R1 - Mirrored set without parity.
  > Data is mirrored as 2 sets. 
 > Data is still accessible in 1 drive failure condition. More than 1 wil cause data loss.
  >Using RAID 1 with a separate controller for each disk is sometimes called duplexing


R3 - Striped set with dedicated parity

Not commonly used

R4 - Block level striping with parity.

Not commonly used

R5 - Striped set with distributed parity.
>Since RAID 5 works on XOR formula and XOR formula needs 3 bits for              calculation, the minimum number of drives in RAID 5 are 3.
> Here is the XOR table:

0 X 0 = 0

1 X 1 = 0

1 X 0 = 1

0 X 1 = 0

> The parity is distributed, and even in n-1 drive failure, we can access the data.

>  The array will have data loss in the event of a second drive failure



R6 - Striped set with dual distributed Parity.


       > Same as in RAID-5, but calculates 2 parity for double redundancy and the parity is distributed across the drives. 
      > Array continues to operate with up to two failed drives. This makes larger RAID groups more practical, especially for high availability systems.
      > RAID-6 is also called ADG~Advanced Data Guard 

CHOICE OF RAID:

Every RAID level is perfect at it's own level, it purely depends up on customer's need & the setup.


12. WHAT ARE THE LAYERS OF SAN?

1.     Client layer

2.    Server layer

3.    Fabric layer

4.    Storage layer



13. What are the difference between RAID 0+1 and RAID 1+0
RAID 0+1 (Mirrored Stripped)
The data is striped first and then the final data is stored in mirrored volumes.
RAID 1+0 (Stripped Mirrored)
The data is mirrored first and then the stored in striped volumes.
This RAID level is most preferred for high performance and high data protection because rebuilding of RAID 1+0 is less time consuming in comparison to RAID 0+1.

14. When JBOD's are used?
“Just Bunch of Disks”
It is a collection of disks that share a common connection to the server, without any RAID.
There is no intelligent storage controller or cache in JOBDs. Most of the JBODs are used as DAS (direct attached storage)

15. Differentiate RAID & JBOD?
RAID: “Redundant Array of Inexpensive Disks”
Fault-tolerant grouping of disks that server sees as a single disk volume
Combination of parity-checking, mirroring, striping
JBOD: “Just a Bunch of Disks”
Drives independently attached to the I/O channel
Scalable, but requires server to manage multiple volumes.
Do not provide protection in case of drive failure

16. What is an HBA?
Host bus adapters (HBAs) are needed to connect the server (host) to the storage. They are the initiators in iSCSI environment. Some of the vendors, who are market leaders, are Qlogic, Emulex, Brocade etc.

17. What are the advantages of SAN?
Massively extended scalability
Greatly enhanced device connectivity
Storage consolidation
LAN-free backup
Server-less (active-fabric) backup
Server clustering
Heterogeneous data sharing
Disaster recovery - Remote mirroring.

18. What is the difference b/w SAN and NAS?



19. What is a typical storage area network consists of - if we consider it for implementation in a small business setup?
Following are essentials components of SAN
- Fabric Switch
- FC Controllers
- JBOD's


20. Can you briefly explain each of the Storage area components?
Fabric Switch: It's a device which interconnects multiple network devices .There are switches starting from 16 port to 512 ports which connect 16 or 32 machine nodes etc. vendors who manufacture these kind of switches are Brocade, cisco, etc.
FC Controllers: These are Data transfer media, which are connected to the HBAs, which sits on PCI slots of Server; you can configure Arrays and volumes on it. The controller has a cache & a battery for better performance.

JBOD: Just Bunch of Disks is Storage Box, it consists of Enclosure where set of hard-drives are hosted in many combinations such SCSI drives, SAS, FC, SATA. No intelligent controllers, and has I/O card which will inturn gets connected to the HBAs on the server’s PCI slot.

21. What is the most critical component in SAN?
Each component has its own criticality with respect to business needs of a company !

22 How is a SAN managed?
There are many management software’s used for managing SAN's to name a few
- Santricity
- IBM Tivoli Storage Manager.
- Navisphere
- Veritas Volumemanger.
- HP’s command view

23. Which one is the Default ID for SCSI HBA?
Generally the default ID for SCSI HBA or the storage controller is 7.
(SCSI- Small Computer System Interface
HBA - Host Bus Adaptor)


24. What is the highest and lowest priority of SCSI?
There are 16 different ID’s which can be assigned to SCSI device 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8.
Highest priority of SCSI is ID 7 and lowest ID is 8.
SCSI supports 16 devices & SCSI HBA support 15 devices (1 for itself)

25. How do you install device drivers of HBA for the first time during OS installation?
In some scenarios you are supposed to install Operating System on the drives connected thru SCSI HBA or SCSI RAID Controllers, but most of the OS will not be updated with drivers for those controllers, that time you need to supply drivers externally, if you are installing windows, you need to press F6 during the installation of OS and provide the driver disk or CD which came along with HBA.
If you are installing Linux you need to type "linux dd" for installing any driver.


26. What is Array?
Array is a group of Independent physical disks to configure any Volumes or RAID volumes.

27. Which are the SAN topologies?
Point to Point topology
Fiber channel Arbitrated Loop
Switched Fabric


28. Which are the 4 types of SAN architecture types
 a. Core-edge
b. Full-Mesh
c. Partial-Mesh
d. Cascade

29. Which command is used in Linux to know the driver version of any hardware device?
dmesg

30. Can you name some of the states of RAID array?
a. Online
b. Degraded (working, but 1 more drive failure leads to data loss)
c. Rebuilding (usually happens after failed drive replacement)
d. Failed

31. what is the normal rebuilding rate ?
Depends up on the manufacturer & the I/O.
But usually, to rebuild 1 GB of data takes 15 ~ 20 minutes.

32. Name the features of SCSI-3 standard?
QAS: Quick arbitration and selection
Domain Validation
CRC: Cyclic redundancy check


33. Can we assign a hot spare to R0 (RAID 0) array?
No, since R0 is not redundant array, failure of any disks results in failure of the entire array so we cannot rebuild the hot spare for the R0 array.

34. Can you name some of the available tape media types?
There are many types of tape media available to back up the data some of them are
DLT: digital linear tape - technology for tape backup/archive of networks and servers; DLT technology addresses midrange to high-end tape backup requirements.
LTO: linear tape open; a new standard tape format developed by HP, IBM, and Seagate.
AIT: advanced intelligent tape; a helical scan technology developed by Sony for tape backup/archive of networks and servers, specifically addressing midrange to high-end backup requirements.


35. What is HA?
HA High Availability is a technology to achieve failover with very less latency. It’s a practical requirement of data centers these days when customers expect the servers to be running 24 hours on all 7 days around the whole 365 days a year - usually referred as 24x7x365. So to achieve this, a redundant infrastructure is created to make sure if one database server or if one app server fails there is a replica Database or Appserver ready to take-over the operations. End customer never experiences any outage when there is a HA network infrastructure.
Below is a picture of basic HA setup. At any point of time, if 1 server and 1 switch and 1 storage fail, even after 3 major box failures, we will still be able to access the data.

36. What is virtualization?
Virtualization is logical representation of physical devices. It is the technique of managing and presenting storage devices and resources functionally, regardless of their physical layout or location. Virtualization is the pooling of physical storage from multiple network storage devices into what appears to be a single storage device that is managed from a central console. Storage virtualization is commonly used in a storage area network (SAN). The management of storage devices can be tedious and time-consuming. Storage virtualization helps the storage administrator perform the tasks of backup, archiving, and recovery more easily, and in less time, by disguising the actual complexity of the SAN.

37. DESCRIBE IN BRIEF THE COMPOSITION OF FC FRAME.




    Start of the Frame locator
Frame header (includes destination id and source id)
Data Payload (encapsulate SCSI instruction)
CRC (error checking)
End of Frame (1 byte)

38. WHAT IS STORAGE VIRTUALIZATION?
Storage virtualization is amalgamation of multiple n/w storage devices into single storage unit.


39. WHAT ARE THE PROTOCOLS USED IN PHYSICAL, DATALINK & NETWORK LAYER ?
     a) Ethernet
b) SCSI
c) Fibre Channel

40. WHAT ARE THE TYPES OF DISK ARRAY USED IN SAN?
    a) JBOD
b) RAID

41. WHAT ARE THE DIFFERENT TYPES OF PROTOCOLS USED IN TRANSPORTATION & SESSION LAYERS OF SAN?
    a) Fibre Channel Protocol (FCP)
b) Internet SCSI (iSCSI)
c) Fibre Channel IP (FCIP)

42.WHAT IS THE TYPE OF ENCODING USED IN FIBER CHANNEL?

8b/10b, as the encoding technique is able to detect all most all the bit errors How:
1 byte is divided into 2 (3bit s + 5 bits). Add 1 bit parity to each, together it forms 10 bit.
1 byte = 3 bit + 5 bit = 8b
(3 bit + 1 bit) + (5 bit + 1 bit) = 10b
That’s why it is called 8b/10b encoding technique.

43. HOW MANY CLASSES OF SERVICES ARE AVAILABLE IN FC?
Seven Classes of service are available in Fiber Channel
Class-1: Dedicated connection between two communicators with acknowledgement of frame delivery. Class 1 is rarely used.
Class-2: connection less but provides acknowledgement 
Class-3: connection less and provides no notification of delivery.
Class 3 is a commonly used class of service in Fiber Channel networks.
Class-4: allows fractional bandwidth for virtual circuits 
Class -5: Class 5 is called isochronous service, and it is intended for applications that require immediate delivery of the data as it arrives, with no buffering. It is not clearly defined yet
Class-6: Provides multicast, dedicated connection with acknowledgment 
Class-F: used for switch to switch communication in the fabric. E-ports to E-ports.


44.WHAT ARE THE MAIN CONSTRAINTS OF SCSI?
    a) Deployment distance (max. of 25 mts)
b) Number of devices that can be interconnected (16)

45. WHAT IS A FABRIC?
Interconnection of Fiber Channel Switches

46. WHAT ARE THE SERVICES PROVIDED BY FABRIC TO ALL NODES?
a) Fabric Login
b) SNS
    c) Fabric Address Notification
    d) Registered state change notification
e) Broadcast Servers 




47. WHAT IS THE DIFFERENCE BETWEEN LUN & WWN?
LUN: unique number that is assigned to the volumes of each storage device
WWN: 64bit address that is hard coded into a fiber channel HBA and this is used to identify individual port (N_Port or F_Port) in the fabric.

48. WHAT ARE THE LAYERS OF FC PROTOCOL?

Below are the 5 layers of FC protocol,
  

    a) FC0 Physical Media
b) FC1 Encoder and Decoder
c) FC2 Framing and Flow control
d) FC3 Common Services
e) FC4 Upper Level Protocol Mapping

49. WHAT IS ZONING
Fabric management service that can be used to create logical subsets of devices within a SAN. This enables portioning of resources for management and access control purpose.

50. WHAT ARE THE MAJOR CLASSIFICATION OF ZONING?
a) Soft  Zoning  (wwn based – Highly recommended)
b) Hard  Zoning (port-based)


51. WHAT ARE THE DIFFERENT LEVELS OF ZONING?
    a) Port Level zoning
b) WWN Level zoning
c) Device Level zoning
d) Protocol Level zoning
e) LUN Level zoning

52. WHAT ARE THE 3 PROMINENT CHARACTERISTICS OF SAS PROTOCOL?
    a) Native Command Queuing (NCQ)
b) Port Multiplier
c) Port Selector

53. WHAT ARE THE 5 STATES OF ARBITRARY LOOP IN FC?
    a) Loop Initialization
b) Loop Monitoring
c) Loop arbitration
d) Open Loop
e) Close Loop

54. HOW DOES AN FC SWITCH MAINTAINS THE ADDRESS?
FC Switch uses simple name server (SNS) to maintain the mapping table

 
55. WHAT IS VIRTUALIZATION?
    A technique of hiding the physical characteristics of computer resources from the way in which other system application or end user interact with those resources. Aggregation, spanning or concatenation of the combined multiple resources into larger resource pools.

56. WHAT IS MULTIPATH I/O?


    Fault tolerant technique where, there is more than one physical path between the CPU in the computer systems and its main storage devices through the buses, controllers, switches and other bridge devices connecting them.
57. WHAT IS STRIPE-SIZE UNIT?
It is data distribution scheme that complements the way operating system request data. Granularity at which data is stored on one drive of the array before subsequent data is stored on the next drive of the array. Stripe unit size should be close to the size of the system I/O request.

   #1 The first, and probably least important, is that the windows swap file always uses 4kB cluster allocation units. Given that this file is very much in use, you will be able to almost double the swap file write and read speed as you can execute two 4kB writes at different addresses at the same time (random access) – if the stripe size is 4 kB.

#2 The second factor is that reading and writing sequentially large data files (for example copying, downloading, video editing) benefits from a large stripe size, examples of up to 1024 kB.

#3 The third point is the random access of small chunks of data benefits from less IO access time – that is how fast it takes from a request for a read/write is started to it actual starts.

58. WHAT IS LUN MASKING?
    The process which makes the LUN available to some hosts & unavailable to some hosts. It is implemented at the HBA level. Many storage controllers also support LUN masking


59. HOW IS THE CAPACITY OF HARD DRIVE CALCULATED?
Number of Heads X Number of Cylinders X Sectors per track X Sector Size

60. WHAT IS BAD BLOCK RE-ALLOCATION?
A bad sector is remapped or reallocated to good spare block and this information is stored in the internal table on the hard disk drive. The bad blocks are identified during the media test of the HDD as well as during various types of read write operations performed during the I/O tests. Apart from the new generation of HDD comes with a technology called BGMS (background media scan) which continuously scans the HDD media for defects and maps them when the drive is idle (this is performed after the HDD is attached to the system).

61. WHAT ARE THE 2 TYPES OF RECORDING TECHNIQUES ON TAPES?
    a) Linear Recording
b) Helical Scan Recording.

62. WHAT IS A SNAPSHOT?
    A snapshot of data object contains an image of data at a particular point of time.Basically, point-in-time copy of the data.

63. WHAT IS HSM?
Hierarchical storage management - An application that attempts to match the priority of data with the cost of storage.


64. WHAT IS HOT-SWAPPING?
Devices are allowed to be removed and inserted into a system without turning off the system.
For example, hard drives & power supplies are hot-swappable.

65. WHAT IS HOT-SPARING?
A spare is a stand-by device, and gets kicked in if there is any active device failure.
Above is an example of RAID-5 with 1 hot spare. In this case, the hot-spare doesn’t involve in any activity. But if any one of the drives fails (from disk 1 to disk 4), drive 5 gets kicked in and replace the failed drive. Later on, that failed drive can be replaced at any time. This process doesn’t affect the I/O.

66. WHAT ARE THE DIFFERENT TYPES OF BACK-UP SYSTEMS?
     a) Offline
b) Online
c) Near Line

67. WHAT IS THE DIFFERENCE BETWEEN ROUTING & MULTIPATHING?
Routing: Determined by switches independent of SCSI Recreates n/w route after a failure
Multipathing: Two initiator to one target Selects the LUN initiator pair to use

68. NAME FEW TYPES OF TAPE STORAGE?
     a) Digital Linear Tape
b) Advanced Intelligent Tape
c) Linear Tape Open

69. WHAT IS THE SEQUENCE IN FC?
Group of one or more frames that encompasses one or more “information units” of the upper layer protocol.
Example:
It requires
i) One sequence to transfer the command
ii) One or more sequence to transfer the data
iii) Once sequence to transfer the status. 

To be continued......


5 comments:

  1. Really a good document .. Thank you very much

    ReplyDelete
  2. I Have Been Searching For San Interview Questions In Online ,Finally I Found this Blog.It's Very Help Full For all the SAN Engineers.Hope Blog Reach As Many Peoples Soon.

    ReplyDelete
  3. Don't Know Why People Are Not Coming To This Page

    ReplyDelete
  4. Very helpful questions & answers.
    Thanks....

    ReplyDelete
  5. xor table is

    0 X 0 = 0

    1 X 1 = 0

    1 X 0 = 1

    0 X 1 = 1

    ReplyDelete