|
The Linux System Administrator's Guide |
|
|
|
|
Written by Hemanshu Patel
|
|
Sunday, 23 December 2007 |
|
Page 4 of 15
Chapter 5. Using Disks and Other Storage Media"On a clear disk you can seek forever. "
When you install or upgrade your system, you need to do a fair amount of work on your disks. You have to make filesystems on your disks so that files can be stored on them and reserve space for the different parts of your system. This chapter explains all these initial activities. Usually, once you get your system set up, you won't have to go through the work again, except for using floppies. You'll need to come back to this chapter if you add a new disk or want to fine-tune your disk usage. The basic tasks in administering disks are: Format your disk. This does various things to prepare it for use, such as checking for bad sectors. (Formatting is nowadays not necessary for most hard disks.) Partition a hard disk, if you want to use it for several activities that aren't supposed to interfere with one another. One reason for partitioning is to store different operating systems on the same disk. Another reason is to keep user files separate from system files, which simplifies back-ups and helps protect the system files from corruption. Make a filesystem (of a suitable type) on each disk or partition. The disk means nothing to Linux until you make a filesystem; then files can be created and accessed on it. Mount different filesystems to form a single tree structure, either automatically, or manually as needed. (Manually mounted filesystems usually need to be unmounted manually as well.) Chapter 6 contains information about virtual memory and disk caching, of which you also need to be aware when using disks.
5.1. Two kinds of devicesUNIX, and therefore Linux, recognizes two different kinds of device: random-access block devices (such as disks), and character devices (such as tapes and serial lines) , some of which may be serial, and some random-access. Each supported device is represented in the filesystem as a device file. When you read or write a device file, the data comes from or goes to the device it represents. This way no special programs (and no special application programming methodology, such as catching interrupts or polling a serial port) are necessary to access devices; for example, to send a file to the printer, one could just say $ cat filename > /dev/lp1 $ |
and the contents of the file are printed (the file must, of course, be in a form that the printer understands). However, since it is not a good idea to have several people cat their files to the printer at the same time, one usually uses a special program to send the files to be printed (usually lpr ). This program makes sure that only one file is being printed at a time, and will automatically send files to the printer as soon as it finishes with the previous file. Something similar is needed for most devices. In fact, one seldom needs to worry about device files at all. Since devices show up as files in the filesystem (in the /dev directory), it is easy to see just what device files exist, using ls or another suitable command. In the output of ls -l, the first column contains the type of the file and its permissions. For example, inspecting a serial device might give $ ls -l /dev/ttyS0 crw-rw-r-- 1 root dialout 4, 64 Aug 19 18:56 /dev/ttyS0 $ |
The first character in the first column, i.e., `c' in crw-rw-rw- above, tells an informed user the type of the file, in this case a character device. For ordinary files, the first character is `-', for directories it is `d', and for block devices `b'; see the ls man page for further information. Note that usually all device files exist even though the device itself might be not be installed. So just because you have a file /dev/sda, it doesn't mean that you really do have an SCSI hard disk. Having all the device files makes the installation programs simpler, and makes it easier to add new hardware (there is no need to find out the correct parameters for and create the device files for the new device).
5.2. Hard disksThis subsection introduces terminology related to hard disks. If you already know the terms and concepts, you can skip this subsection. See Figure 5-1 for a schematic picture of the important parts in a hard disk. A hard disk consists of one or more circular aluminum platters\ , of which either or both surfaces are coated with a magnetic substance used for recording the data. For each surface, there is a read-write head that examines or alters the recorded data. The platters rotate on a common axis; typical rotation speed is 5400 or 7200 rotations per minute, although high-performance hard disks have higher speeds and older disks may have lower speeds. The heads move along the radius of the platters; this movement combined with the rotation of the platters allows the head to access all parts of the surfaces. The processor (CPU) and the actual disk communicate through a disk controller . This relieves the rest of the computer from knowing how to use the drive, since the controllers for different types of disks can be made to use the same interface towards the rest of the computer. Therefore, the computer can say just ``hey disk, give me what I want'', instead of a long and complex series of electric signals to move the head to the proper location and waiting for the correct position to come under the head and doing all the other unpleasant stuff necessary. (In reality, the interface to the controller is still complex, but much less so than it would otherwise be.) The controller may also do other things, such as caching, or automatic bad sector replacement. The above is usually all one needs to understand about the hardware. There are also other things, such as the motor that rotates the platters and moves the heads, and the electronics that control the operation of the mechanical parts, but they are mostly not relevant for understanding the working principles of a hard disk. The surfaces are usually divided into concentric rings, called tracks, and these in turn are divided into sectors. This division is used to specify locations on the hard disk and to allocate disk space to files. To find a given place on the hard disk, one might say ``surface 3, track 5, sector 7''. Usually the number of sectors is the same for all tracks, but some hard disks put more sectors in outer tracks (all sectors are of the same physical size, so more of them fit in the longer outer tracks). Typically, a sector will hold 512 bytes of data. The disk itself can't handle smaller amounts of data than one sector. Each surface is divided into tracks (and sectors) in the same way. This means that when the head for one surface is on a track, the heads for the other surfaces are also on the corresponding tracks. All the corresponding tracks taken together are called a cylinder. It takes time to move the heads from one track (cylinder) to another, so by placing the data that is often accessed together (say, a file) so that it is within one cylinder, it is not necessary to move the heads to read all of it. This improves performance. It is not always possible to place files like this; files that are stored in several places on the disk are called fragmented. The number of surfaces (or heads, which is the same thing), cylinders, and sectors vary a lot; the specification of the number of each is called the geometry of a hard disk. The geometry is usually stored in a special, battery-powered memory location called the CMOS RAM , from where the operating system can fetch it during bootup or driver initialization. Unfortunately, the BIOS has a design limitation, which makes it impossible to specify a track number that is larger than 1024 in the CMOS RAM, which is too little for a large hard disk. To overcome this, the hard disk controller lies about the geometry, and translates the addresses given by the computer into something that fits reality. For example, a hard disk might have 8 heads, 2048 tracks, and 35 sectors per track. Its controller could lie to the computer and claim that it has 16 heads, 1024 tracks, and 35 sectors per track, thus not exceeding the limit on tracks, and translates the address that the computer gives it by halving the head number, and doubling the track number. The mathematics can be more complicated in reality, because the numbers are not as nice as here (but again, the details are not relevant for understanding the principle). This translation distorts the operating system's view of how the disk is organized, thus making it impractical to use the all-data-on-one-cylinder trick to boost performance. The translation is only a problem for IDE disks. SCSI disks use a sequential sector number (i.e., the controller translates a sequential sector number to a head, cylinder, and sector triplet), and a completely different method for the CPU to talk with the controller, so they are insulated from the problem. Note, however, that the computer might not know the real geometry of an SCSI disk either. Since Linux often will not know the real geometry of a disk, its filesystems don't even try to keep files within a single cylinder. Instead, it tries to assign sequentially numbered sectors to files, which almost always gives similar performance. The issue is further complicated by on-controller caches, and automatic prefetches done by the controller. Each hard disk is represented by a separate device file. There can (usually) be only two or four IDE hard disks. These are known as /dev/hda, /dev/hdb, /dev/hdc, and /dev/hdd, respectively. SCSI hard disks are known as /dev/sda, /dev/sdb, and so on. Similar naming conventions exist for other hard disk types; see Chapter 4 for more information. Note that the device files for the hard disks give access to the entire disk, with no regard to partitions (which will be discussed below), and it's easy to mess up the partitions or the data in them if you aren't careful. The disks' device files are usually used only to get access to the master boot record (which will also be discussed below).
5.3. Storage Area Networks - DraftA SAN is a dedicated storage network that provides block level access to LUNs. A LUN, or logical unit number, is a virtual disk provided by the SAN. The system administrator the same access and rights to the LUN as if it were a disk directly attached to it. The administrator can partition, and format the disk in any means he or she chooses. Two networking protocols commonly used in a SAN are fibre channel and iSCSI . A fibre channel network is very fast and is not burdened by the other network traffic in a company's LAN. However, it's very expensive. Fibre channel cards cost around $1000.00 USD each. They also require special fibre channel switches. iSCSI is a newer technology that sends SCSI commands over a TCP/IP network. While this method may not be as fast as a Fibre Channel network, it does save money by using less expensive network hardware. More To Be Added
5.4. Network Attached Storage - DraftA NAS uses your companies existing Ethernet network to allow access to shared disks. This is filesystem level access. The system administrator does not have the ability to partition or format the disks since they are potentially shared by multiple computers. This technology is commonly used to provide multiple workstations access to the same data. Similar to a SAN, a NAS need to make use of a protocol to allow access to it's disks. With a NAS this is either CIFS/Samba , or NFS. Traditionally CIFS was used with Microsoft Windows networks, and NFS was used with UNIX & Linux networks. However, with Samba, Linux machines can also make use of CIFS shares. Does this mean that your Windows 2003 server or your Linux box are NAS servers because they provide access to shared drives over your network? Yes, they are. You could also purchase a NAS device from a number of manufacturers. These devices are specifically designed to provide high speed access to data. More To Be Added
5.5. FloppiesA floppy disk consists of a flexible membrane covered on one or both sides with similar magnetic substance as a hard disk. The floppy disk itself doesn't have a read-write head, that is included in the drive. A floppy corresponds to one platter in a hard disk, but is removable and one drive can be used to access different floppies, and the same floppy can be read by many drives, whereas the hard disk is one indivisible unit. Like a hard disk, a floppy is divided into tracks and sectors (and the two corresponding tracks on either side of a floppy form a cylinder), but there are many fewer of them than on a hard disk. A floppy drive can usually use several different types of disks; for example, a 3.5 inch drive can use both 720 KB and 1.44 MB disks. Since the drive has to operate a bit differently and the operating system must know how big the disk is, there are many device files for floppy drives, one per combination of drive and disk type. Therefore, /dev/fd0H1440 is the first floppy drive (fd0), which must be a 3.5 inch drive, using a 3.5 inch, high density disk (H) of size 1440 KB (1440), i.e., a normal 3.5 inch HD floppy. The names for floppy drives are complex, however, and Linux therefore has a special floppy device type that automatically detects the type of the disk in the drive. It works by trying to read the first sector of a newly inserted floppy using different floppy types until it finds the correct one. This naturally requires that the floppy is formatted first. The automatic devices are called /dev/fd0, /dev/fd1, and so on. The parameters the automatic device uses to access a disk can also be set using the program setfdprm . This can be useful if you need to use disks that do not follow any usual floppy sizes, e.g., if they have an unusual number of sectors, or if the autodetecting for some reason fails and the proper device file is missing. Linux can handle many nonstandard floppy disk formats in addition to all the standard ones. Some of these require using special formatting programs. We'll skip these disk types for now, but in the mean time you can examine the /etc/fdprm file. It specifies the settings that setfdprm recognizes. The operating system must know when a disk has been changed in a floppy drive, for example, in order to avoid using cached data from the previous disk. Unfortunately, the signal line that is used for this is sometimes broken, and worse, this won't always be noticeable when using the drive from within MS-DOS. If you are experiencing weird problems using floppies, this might be the reason. The only way to correct it is to repair the floppy drive.
5.6. CD-ROMsA CD-ROM drive uses an optically read, plastic coated disk. The information is recorded on the surface of the disk in small `holes' aligned along a spiral from the center to the edge. The drive directs a laser beam along the spiral to read the disk. When the laser hits a hole, the laser is reflected in one way; when it hits smooth surface, it is reflected in another way. This makes it easy to code bits, and therefore information. The rest is easy, mere mechanics. CD-ROM drives are slow compared to hard disks. Whereas a typical hard disk will have an average seek time less than 15 milliseconds, a fast CD-ROM drive can use tenths of a second for seeks. The actual data transfer rate is fairly high at hundreds of kilobytes per second. The slowness means that CD-ROM drives are not as pleasant to use as hard disks (some Linux distributions provide `live' filesystems on CD-ROMs, making it unnecessary to copy the files to the hard disk, making installation easier and saving a lot of hard disk space), although it is still possible. For installing new software, CD-ROMs are very good, since maximum speed is not essential during installation. There are several ways to arrange data on a CD-ROM. The most popular one is specified by the international standard ISO 9660 . This standard specifies a very minimal filesystem, which is even more crude than the one MS-DOS uses. On the other hand, it is so minimal that every operating system should be able to map it to its native system. For normal UNIX use, the ISO 9660 filesystem is not usable, so an extension to the standard has been developed, called the Rock Ridge extension. Rock Ridge allows longer filenames, symbolic links, and a lot of other goodies, making a CD-ROM look more or less like any contemporary UNIX filesystem. Even better, a Rock Ridge filesystem is still a valid ISO 9660 filesystem, making it usable by non-UNIX systems as well. Linux supports both ISO 9660 and the Rock Ridge extensions; the extensions are recognized and used automatically. The filesystem is only half the battle, however. Most CD-ROMs contain data that requires a special program to access, and most of these programs do not run under Linux (except, possibly, under dosemu, the Linux MS-DOS emulator, or wine, the Windows emulator. Ironically perhaps, wine actually stands for ``Wine Is Not an Emulator''. Wine, more strictly, is an API (Application Program Interface) replacement. Please see the wine documentation at 3.5. The /usr filesystem. The /usr filesystem is often large, since all programs are installed there. All files in /usr usually come from a Linux distribution; locally installed programs and other stuff goes below /usr/local. This makes it possible to update the system from a new version of the distribution, or even a completely new distribution, without having to install all programs again. Some of the subdirectories of /usr are listed below (some of the less important directories have been dropped; see the FSSTND for more information). - /usr/X11R6.
The X Window System, all files. To simplify the development and installation of X, the X files have not been integrated into the rest of the system. There is a directory tree below /usr/X11R6 similar to that below /usr itself. - /usr/bin.
Almost all user commands. Some commands are in /bin or in /usr/local/bin. - /usr/sbin
System administration commands that are not needed on the root filesystem, e.g., most server programs. - /usr/share/man, /usr/share/info, /usr/share/doc
Manual pages, GNU Info documents, and miscellaneous other documentation files, respectively. - /usr/include
Header files for the C programming language. This should actually be below /usr/lib for consistency, but the tradition is overwhelmingly in support for this name. - /usr/lib
Unchanging data files for programs and subsystems, including some site-wide configuration files. The name lib comes from library; originally libraries of programming subroutines were stored in /usr/lib. - /usr/local
The place for locally installed software and other files. Distributions may not install anything in here. It is reserved solely for the use of the local administrator. This way he can be absolutely certain that no updates or upgrades to his distribution will overwrite any extra software he has installed locally.
3.6. The /var filesystemThe /var contains data that is changed when the system is running normally. It is specific for each system, i.e., not shared over the network with other computers. - /var/cache/man
A cache for man pages that are formatted on demand. The source for manual pages is usually stored in /usr/share/man/man?/ (where ? is the manual section. See the manual page for man in section 7); some manual pages might come with a pre-formatted version, which might be stored in /usr/share/man/cat* . Other manual pages need to be formatted when they are first viewed; the formatted version is then stored in /var/cache/man so that the next person to view the same page won't have to wait for it to be formatted. - /var/games
Any variable data belonging to games in /usr should be placed here. This is in case /usr is mounted read only. - /var/lib
Files that change while the system is running normally. - /var/local
Variable data for programs that are installed in /usr/local (i.e., programs that have been installed by the system administrator). Note that even locally installed programs should use the other /var directories if they are appropriate, e.g., /var/lock. - /var/lock
Lock files. Many programs follow a convention to create a lock file in /var/lock to indicate that they are using a particular device or file. Other programs will notice the lock file and won't attempt to use the device or file. - /var/log
Log files from various programs, especially login(/var/log/wtmp, which logs all logins and logouts into the system) and syslog(/var/log/messages, where all kernel and system program message are usually stored). Files in /var/log can often grow indefinitely, and may require cleaning at regular intervals. - /var/mail
This is the FHS approved location for user mailbox files. Depending on how far your distribution has gone towards FHS compliance, these files may still be held in /var/spool/mail. - /var/run
Files that contain information about the system that is valid until the system is next booted. For example, /var/run/utmp contains information about people currently logged in. - /var/spool
Directories for news, printer queues, and other queued work. Each different spool has its own subdirectory below /var/spool, e.g., the news spool is in /var/spool/news . Note that some installations which are not fully compliant with the latest version of the FHS may have user mailboxes under /var/spool/mail. - /var/tmp
Temporary files that are large or that need to exist for a longer time than what is allowed for /tmp . (Although the system administrator might not allow very old files in /var/tmp either.)
3.7. The /proc filesystemThe /proc filesystem contains a illusionary filesystem. It does not exist on a disk. Instead, the kernel creates it in memory. It is used to provide information about the system (originally about processes, hence the name). Some of the more important files and directories are explained below. The /proc filesystem is described in more detail in the proc manual page. - /proc/1
A directory with information about process number 1. Each process has a directory below /proc with the name being its process identification number. - /proc/cpuinfo
Information about the processor, such as its type, make, model, and performance. - /proc/devices
List of device drivers configured into the currently running kernel. - /proc/dma
Shows which DMA channels are being used at the moment. - /proc/filesystems
Filesystems configured into the kernel. - /proc/interrupts
Shows which interrupts are in use, and how many of each there have been. - /proc/ioports
Which I/O ports are in use at the moment. - /proc/kcore
An image of the physical memory of the system. This is exactly the same size as your physical memory, but does not really take up that much memory; it is generated on the fly as programs access it. (Remember: unless you copy it elsewhere, nothing under /proc takes up any disk space at all.) - /proc/kmsg
Messages output by the kernel. These are also routed to syslog. - /proc/ksyms
Symbol table for the kernel. - /proc/loadavg
The `load average' of the system; three meaningless indicators of how much work the system has to do at the moment. - /proc/meminfo
Information about memory usage, both physical and swap. - /proc/modules
Which kernel modules are loaded at the moment. - /proc/net
Status information about network protocols. - /proc/self
A symbolic link to the process directory of the program that is looking at /proc. When two processes look at /proc, they get different links. This is mainly a convenience to make it easier for programs to get at their process directory. - /proc/stat
Various statistics about the system, such as the number of page faults since the system was booted. - /proc/uptime
The time the system has been up. - /proc/version
The kernel version.
Note that while the above files tend to be easily readable text files, they can sometimes be formatted in a way that is not easily digestible. There are many commands that do little more than read the above files and format them for easier understanding. For example, the freeprogram reads /proc/meminfo converts the amounts given in bytes to kilobytes (and adds a little more information, as well).
|
|
Last Updated ( Sunday, 23 December 2007 )
|
|
| |
|
|