e chap3 Chapter 3: Cache Manager Architecture
- The AFS Cache Manager is a kernel-resident agent with the following duties and responsibilities:
- Users are to be given the illusion that files stored in the AFS distributed file system are in fact part of the local unix file system of their client machine. There are several areas in which this illusion is not fully realized:
- Semantics: Full unix semantics are not maintained by the set of agents implementing the AFS distributed file system. The largest deviation involves the time when changes made to a file are seen by others who also have the file open. In AFS, modifications made to a cached copy of a file are not necessarily reflected immediately to the central copy (the one hosted by File Server disk storage), and thus to other cache sites. Rather, the changes are only guaranteed to be visible to others who simultaneously have their own cached copies open when the modifying process executes a unix close() operation on the file.
- This differs from the semantics expected from the single-machine, local unix environment, where writes performed on one open file descriptor are immediately visible to all processes reading the file via their own file descriptors. Thus, instead of the standard "last writer wins" behavior, users see "last closer wins" behavior on their AFS files. Incidentally, other DFSs, such as NFS, do not implement full unix semantics in this case either.
- Partial failures: A panic experienced by a local, single-machine unix file system will, by definition, cause all local processes to terminate immediately. On the other hand, any hard or soft failure experienced by a File Server process or the machine upon which it is executing does not cause any of the Cache Managers interacting with it to crash. Rather, the Cache Managers will now have to reflect their failures in getting responses from the affected File Server back up to their callers. Network partitions also induce the same behavior. From the user's point of view, part of the file system tree has become inaccessible. In addition, certain system calls (e.g., open() and read()) may return unexpected failures to their users. Thus, certain coding practices that have become common amongst experienced (single-machine) unix programmers (e.g., not checking error codes from operations that "can't" fail) cause these programs to misbehave in the face of partial failures.
- To support this transparent access paradigm, the Cache Manager proceeds to:
- Intercept all standard unix operations directed towards AFS objects, mapping them to references aimed at the corresponding copies in the local cache.
- Keep a synchronized local cache of AFS files referenced by the client machine's users. If the chunks involved in an operation reading data from an object are either stale or do not exist in the local cache, then they must be fetched from the File Server(s) on which they reside. This may require a query to the volume location service in order to locate the place(s) of residence. Authentication challenges from File Servers needing to verify the caller's identity are handled by the Cache Manager, and the chunk is then incorporated into the cache.
- Upon receipt of a unix close, all dirty chunks belonging to the object will be flushed back to the appropriate File Server.
- Callback deliveries and withdrawals from File Servers must be processed, keeping the local cache in close synchrony with the state of affairs at the central store.
- Interfaces are also be provided for those principals who wish to perform AFS-specific operations, such as Access Control List (ACL) manipulations or changes to the Cache Manager's configuration.
- This chapter takes a tour of the Cache Manager's architecture, and examines how it supports these roles and responsibilities. First, the set of AFS agents with which it must interact are discussed. Next, some of the Cache Manager's implementation and interface choices are examined. Finally, the server's ability to arbitrarily dispose of callback information without affecting the correctness of the cache consistency algorithm is explained.
- The main AFS agent interacting with a Cache Manager is the File Server. The most common operation performed by the Cache Manager is to act as its users' agent in fetching and storing files to and from the centralized repositories. Related to this activity, a Cache Manager must be prepared to answer queries from a File Server concerning its health. It must also be able to accept callback revocation notices generated by File Servers. Since the Cache Manager not only engages in data transfer but must also determine where the data is located in the first place, it also directs inquiries to Volume Location Server agents. There must also be an interface allowing direct interactions with both common and administrative users. Certain AFS-specific operations must be made available to these parties. In addition, administrative users may desire to dynamically reconfigure the Cache Manager. For example, information about a newly-created cell may be added without restarting the client's machine.
- The above roles and behaviors for the Cache Manager influenced the implementation choices and methods used to construct it, along with the desire to maximize portability. This section begins by showing how the VFS/vnode interface, pioneered and standardized by Sun Microsystems, provides not only the necessary fine-grain access to user file system operations, but also facilitates Cache Manager ports to new hardware and operating system platforms. Next, the use of unix system calls is examined. Finally, the threading structure employed is described.
- As mentioned above, Sun Microsystems has introduced and propagated an important concept in the file system world, that of the Virtual File System (VFS) interface. This abstraction defines a core collection of file system functions which cover all operations required for users to manipulate their data. System calls are written in terms of these standardized routines. Also, the associated vnode concept generalizes the original unix inode idea and provides hooks for differing underlying environments. Thus, to port a system to a new hardware platform, the system programmers have only to construct implementations of this base array of functions consistent with the new underlying machine.
- The VFS abstraction also allows multiple file systems (e.g., vanilla unix, DOS, NFS, and AFS) to coexist on the same machine without interference. Thus, to make a machine AFS-capable, a system designer first extends the base vnode structure in well-defined ways in order to store AFS-specific operations with each file description. Then, the base function array is coded so that calls upon the proper AFS agents are made to accomplish each function's standard objectives. In effect, the Cache Manager consists of code that interprets the standard set of unix operations imported through this interface and executes the AFS protocols to carry them out.
- As mentioned above, many unix system calls are implemented in terms of the base function array of vnode-oriented operations. In addition, one existing system call has been modified and two new system calls have been added to perform AFS-specific operations apart from the Cache Manager's unix 'emulation' activities. The standard ioctl() system call has been augmented to handle AFS-related operations on objects accessed via open unix file descriptors. One of the brand-new system calls is pioctl(), which is much like ioctl() except it names targeted objects by pathname instead of file descriptor. Another is afs call(), which is used to initialize the Cache Manager threads, as described in the section immediately following.
- In order to execute its many roles, the Cache Manager is organized as a multi-threaded entity. It is implemented with (potentially multiple instantiations of) the following three thread classes:
- CallBack Listener: This thread implements the Cache Manager callback RPC interface, as described in Section 6.5.
- Periodic Maintenance: Certain maintenance and checkup activities need to be performed at five set intervals. Currently, the frequency of each of these operations is hard-wired. It would be a simple matter, though, to make these times configurable by adding command-line parameters to the Cache Manager.
- Thirty seconds: Flush pending writes for NFS clients coming in through the NFS-AFS Translator facility.
- One minute: Make sure local cache usage is below the assigned quota, write out dirty buffers holding directory data, and keep flock()s alive.
- Three minutes: Check for the resuscitation of File Servers previously determined to be down, and check the cache of previously computed access information in light of any newly expired tickets.
- Ten minutes: Check health of all File Servers marked as active, and garbage-collect old RPC connections.
- One hour: Check the status of the root AFS volume as well as all cached information concerning read-only volumes.
- Background Operations: The Cache Manager is capable of prefetching file system objects, as well as carrying out delayed stores, occurring sometime after a close() operation. At least two threads are created at Cache Manager initialization time and held in reserve to carry out these objectives. This class of background threads implements the following three operations:
- Prefetch operation: Fetches particular file system object chunks in the expectation that they will soon be needed.
- Path-based prefetch operation: The prefetch daemon mentioned above operates on objects already at least partly resident in the local cache, referenced by their vnode. The path-based prefetch daemon performs the same actions, but on objects named solely by their unix pathname.
- Delayed store operation: Flush all modified chunks from a file system object to the appropriate File Server's disks.
- The Cache Manager is free to throw away any or all of the callbacks it has received from the set of File Servers from which it has cached files. This housecleaning does not in any way compromise the correctness of the AFS cache consistency algorithm. The File Server RPC interface described in this paper provides a call to allow a Cache Manager to advise of such unilateral jettisoning. However, failure to use this routine still leaves the machine's cache consistent. Let us examine the case of a Cache Manager on machine C disposing of its callback on file X from File Server F. The next user access on file X on machine C will cause the Cache Manager to notice that it does not currently hold a callback on it (although the File Server will think it does). The Cache Manager on C attempts to revalidate its entry when it is entirely possible that the file is still in sync with the central store. In response, the File Server will extend the existing callback information it has and deliver the new promise to the Cache Manager on C. Now consider the case where file X is modified by a party on a machine other than C before such an access occurs on C. Under these circumstances, the File Server will break its callback on file X before performing the central update. The Cache Manager on C will receive one of these "break callback" messages. Since it no longer has a callback on file X, the Cache Manager on C will cheerfully acknowledge the File Server's notification and move on to other matters. In either case, the callback information for both parties will eventually resynchronize. The only potential penalty paid is extra inquiries by the Cache Manager and thus providing for reduced performance instead of failure of operation.
- This chapter discusses the definitions used in common by the File Server and the Cache Manager. They appear in the common.xg file, used by Rxgen to generate the C code instantiations of these definitions.
- This is the type for file system objects within AFS.
Fields
- unsigned long Volume - This provides the identifier for the volume in which the object resides.
- unsigned long Vnode - This specifies the index within the given volume corresponding to the object.
- unsigned long Unique - This is a 'uniquifier' or generation number for the slot identified by the Vnode field.
- There are three types of callbacks defined by AFS-3:
- EXCLUSIVE: This version of callback has not been implemented. Its intent was to allow a single Cache Manager to have exclusive rights on the associated file data.
- SHARED: This callback type indicates that the status information kept by a Cache Manager for the associated file is up to date. All cached chunks from this file whose version numbers match the status information are thus guaranteed to also be up to date. This type of callback is non-exclusive, allowing any number of other Cache Managers to have callbacks on this file and cache chunks from the file.
- DROPPED: This is used to indicate that the given callback promise has been cancelled by the issuing File Server. The Cache Manager is forced to mark the status of its cache entry as unknown, forcing it to stat the file the next time a user attempts to access any chunk from it.
- This is the canonical callback structure passed in many File Server RPC interface calls.
Fields
- unsigned long CallBackVersion - Callback version number.
- unsigned long ExpirationTime - Time when the callback expires, measured in seconds.
- unsigned long CallBackType - The type of callback involved, one of EXCLUSIVE, SHARED, or DROPPED.
- AFS-3 sometimes does callbacks in bulk. Up to AFSCBMAX (50) callbacks can be handled at once. Layouts for the two related structures implementing callback arrays, struct AFSCBFids and struct AFSCBs, follow below. Note that the callback descriptor in slot i of the array in the AFSCBs structure applies to the file identifier contained in slot i in the fid array in the matching AFSCBFids structure.
Fields
- u int AFSCBFids len - Number of AFS file identifiers stored in the structure, up to a maximum of AFSCBMAX.
- AFSFid *AFSCBFids val - Pointer to the first element of the array of file identifiers.
Fields
- u int AFSCBs len - Number of AFS callback descriptors stored in the structure, up to a maximum of AFSCBMAX.
- AFSCallBack *AFSCBs val - Pointer to the actual array of callback descriptors
- This structure describes the state of an AFS lock.
Fields
- char waitStates - Types of lockers waiting for the lock.
- char exclLocked - Does anyone have a boosted, shared or write lock? (A boosted lock allows the holder to have data read-locked and then 'boost' up to a write lock on the data without ever relinquishing the lock.)
- char readersReading - Number of readers that actually hold a read lock on the associated object.
- char numWaiting - Total number of parties waiting to acquire this lock in some fashion.
- This structure defines the description of a Cache Manager local cache entry, as made accessible via the RXAFSCB GetCE() callback RPC call. Note that File Servers do not make the above call. Rather, client debugging programs (such as cmdebug) are the agents which call RXAFSCB GetCE().
Fields
- long addr - Memory location in the Cache Manager where this description is located.
- long cell - Cell part of the fid.
- AFSFid netFid - Network (standard) part of the fid
- long Length - Number of bytes in the cache entry.
- long DataVersion - Data version number for the contents of the cache entry.
- struct AFSDBLockDesc lock - Status of the lock object controlling access to this cache entry.
- long callback - Index in callback records for this object.
- long cbExpires - Time when the callback expires.
- short refCount - General reference count.
- short opens - Number of opens performed on this object.
- short writers - Number of writers active on this object.
- char mvstat - The file classification, indicating one of normal file, mount point, or volume root.
- char states - Remembers the state of the given file with a set of bits indicating, from lowest-order to highest order: stat info valid, read-only file, mount point valid, pending core file, wait-for-store, and mapped file.
- This is a fuller description of an AFS lock, including a string name used to identify it.
Fields
- char name[16] - String name of the lock.
- struct AFSDBLockDesc lock - Contents of the lock itself.
- A maximum size for opaque structures passed via the File Server interface is defined as AFSOPAQUEMAX. Currently, this is set to 1,024 bytes. The AFSOpaque typedef is defined for use by those parameters that wish their contents to travel completely uninterpreted across the network.
- Two common definitions used to specify basic AFS string lengths are AFSNAMEMAX and AFSPATHMAX. AFSNAMEMAX places an upper limit of 256 characters on such things as file and directory names passed as parameters. AFSPATHMAX defines the longest pathname expected by the system, composed of slash-separated instances of the individual directory and file names mentioned above. The longest acceptable pathname is currently set to 1,024 characters.
Generated on Mon Aug 3 00:31:55 2009 by
1.5.1