#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2018-12-08 14:45-0200\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#. Put one translator per line, in the form NAME <EMAIL>, YEAR1, YEAR2
msgctxt "_"
msgid "translator-credits"
msgstr ""

#. (itstool) path: info/title
#: article.translate.xml:6
msgid "Design elements of the FreeBSD VM system"
msgstr ""

#. (itstool) path: affiliation/address
#: article.translate.xml:11
#, no-wrap
msgid "\n"
"	    <email>dillon@apollo.backplane.com</email>\n"
"	  "
msgstr ""

#. (itstool) path: authorgroup/author
#: article.translate.xml:10
msgid ""
"<personname><firstname>Matthew</firstname><surname>Dillon</surname></"
"personname><affiliation> <_:address-1/> </affiliation>"
msgstr ""

#. (itstool) path: legalnotice/para
#: article.translate.xml:18
msgid "FreeBSD is a registered trademark of the FreeBSD Foundation."
msgstr ""

#. (itstool) path: legalnotice/para
#: article.translate.xml:20
msgid "Linux is a registered trademark of Linus Torvalds."
msgstr ""

#. (itstool) path: legalnotice/para
#: article.translate.xml:22
msgid ""
"Microsoft, IntelliMouse, MS-DOS, Outlook, Windows, Windows Media and Windows "
"NT are either registered trademarks or trademarks of Microsoft Corporation "
"in the United States and/or other countries."
msgstr ""

#. (itstool) path: legalnotice/para
#: article.translate.xml:26
msgid ""
"Motif, OSF/1, and UNIX are registered trademarks and IT DialTone and The "
"Open Group are trademarks of The Open Group in the United States and other "
"countries."
msgstr ""

#. (itstool) path: legalnotice/para
#: article.translate.xml:30
msgid ""
"Many of the designations used by manufacturers and sellers to distinguish "
"their products are claimed as trademarks. Where those designations appear in "
"this document, and the FreeBSD Project was aware of the trademark claim, the "
"designations have been followed by the <quote>™</quote> or the <quote>®</"
"quote> symbol."
msgstr ""

#. (itstool) path: info/pubdate
#. (itstool) path: info/releaseinfo
#: article.translate.xml:38 article.translate.xml:40
msgid ""
"$FreeBSD: head/en_US.ISO8859-1/articles/vm-design/article.xml 43184 2013-11-"
"13 07:52:45Z hrs $"
msgstr ""

#. (itstool) path: abstract/para
#: article.translate.xml:43
msgid ""
"The title is really just a fancy way of saying that I am going to attempt to "
"describe the whole VM enchilada, hopefully in a way that everyone can follow."
" For the last year I have concentrated on a number of major kernel "
"subsystems within FreeBSD, with the VM and Swap subsystems being the most "
"interesting and NFS being <quote>a necessary chore</quote>. I rewrote only "
"small portions of the code. In the VM arena the only major rewrite I have "
"done is to the swap subsystem. Most of my work was cleanup and maintenance, "
"with only moderate code rewriting and no major algorithmic adjustments "
"within the VM subsystem. The bulk of the VM subsystem's theoretical base "
"remains unchanged and a lot of the credit for the modernization effort in "
"the last few years belongs to John Dyson and David Greenman. Not being a "
"historian like Kirk I will not attempt to tag all the various features with "
"peoples names, since I will invariably get it wrong."
msgstr ""

#. (itstool) path: legalnotice/para
#: article.translate.xml:60
msgid ""
"This article was originally published in the January 2000 issue of <link "
"xlink:href=\"http://www.daemonnews.org/\">DaemonNews</link>. This version of "
"the article may include updates from Matt and other authors to reflect "
"changes in FreeBSD's VM implementation."
msgstr ""

#. (itstool) path: sect1/title
#: article.translate.xml:68
msgid "Introduction"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:70
msgid ""
"Before moving along to the actual design let's spend a little time on the "
"necessity of maintaining and modernizing any long-living codebase. In the "
"programming world, algorithms tend to be more important than code and it is "
"precisely due to BSD's academic roots that a great deal of attention was "
"paid to algorithm design from the beginning. More attention paid to the "
"design generally leads to a clean and flexible codebase that can be fairly "
"easily modified, extended, or replaced over time. While BSD is considered an "
"<quote>old</quote> operating system by some people, those of us who work on "
"it tend to view it more as a <quote>mature</quote> codebase which has "
"various components modified, extended, or replaced with modern code. It has "
"evolved, and FreeBSD is at the bleeding edge no matter how old some of the "
"code might be. This is an important distinction to make and one that is "
"unfortunately lost to many people. The biggest error a programmer can make "
"is to not learn from history, and this is precisely the error that many "
"other modern operating systems have made. <trademark class="
"\"registered\">Windows NT</trademark> is the best example of this, and the "
"consequences have been dire. Linux also makes this mistake to some "
"degree—enough that we BSD folk can make small jokes about it every once in a "
"while, anyway. Linux's problem is simply one of a lack of experience and "
"history to compare ideas against, a problem that is easily and rapidly being "
"addressed by the Linux community in the same way it has been addressed in "
"the BSD community—by continuous code development. The <trademark class="
"\"registered\">Windows NT</trademark> folk, on the other hand, repeatedly "
"make the same mistakes solved by <trademark class=\"registered\">UNIX</"
"trademark> decades ago and then spend years fixing them. Over and over again."
" They have a severe case of <quote>not designed here</quote> and <quote>we "
"are always right because our marketing department says so</quote>. I have "
"little tolerance for anyone who cannot learn from history."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:99
msgid ""
"Much of the apparent complexity of the FreeBSD design, especially in the VM/"
"Swap subsystem, is a direct result of having to solve serious performance "
"issues that occur under various conditions. These issues are not due to bad "
"algorithmic design but instead rise from environmental factors. In any "
"direct comparison between platforms, these issues become most apparent when "
"system resources begin to get stressed. As I describe FreeBSD's VM/Swap "
"subsystem the reader should always keep two points in mind:"
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:110
msgid ""
"The most important aspect of performance design is what is known as "
"<quote>Optimizing the Critical Path</quote>. It is often the case that "
"performance optimizations add a little bloat to the code in order to make "
"the critical path perform better."
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:117
msgid ""
"A solid, generalized design outperforms a heavily-optimized design over the "
"long run. While a generalized design may end up being slower than an heavily-"
"optimized design when they are first implemented, the generalized design "
"tends to be easier to adapt to changing conditions and the heavily-optimized "
"design winds up having to be thrown away."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:126
msgid ""
"Any codebase that will survive and be maintainable for years must therefore "
"be designed properly from the beginning even if it costs some performance. "
"Twenty years ago people were still arguing that programming in assembly was "
"better than programming in a high-level language because it produced code "
"that was ten times as fast. Today, the fallibility of that argument is "
"obvious  — as are the parallels to algorithmic design and code "
"generalization."
msgstr ""

#. (itstool) path: sect1/title
#: article.translate.xml:136
msgid "VM Objects"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:138
msgid ""
"The best way to begin describing the FreeBSD VM system is to look at it from "
"the perspective of a user-level process. Each user process sees a single, "
"private, contiguous VM address space containing several types of memory "
"objects. These objects have various characteristics. Program code and "
"program data are effectively a single memory-mapped file (the binary file "
"being run), but program code is read-only while program data is copy-on-"
"write. Program BSS is just memory allocated and filled with zeros on demand, "
"called demand zero page fill. Arbitrary files can be memory-mapped into the "
"address space as well, which is how the shared library mechanism works. Such "
"mappings can require modifications to remain private to the process making "
"them. The fork system call adds an entirely new dimension to the VM "
"management problem on top of the complexity already given."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:152
msgid ""
"A program binary data page (which is a basic copy-on-write page) illustrates "
"the complexity. A program binary contains a preinitialized data section "
"which is initially mapped directly from the program file. When a program is "
"loaded into a process's VM space, this area is initially memory-mapped and "
"backed by the program binary itself, allowing the VM system to free/reuse "
"the page and later load it back in from the binary. The moment a process "
"modifies this data, however, the VM system must make a private copy of the "
"page for that process. Since the private copy has been modified, the VM "
"system may no longer free it, because there is no longer any way to restore "
"it later on."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:163
msgid ""
"You will notice immediately that what was originally a simple file mapping "
"has become much more complex. Data may be modified on a page-by-page basis "
"whereas the file mapping encompasses many pages at once. The complexity "
"further increases when a process forks. When a process forks, the result is "
"two processes—each with their own private address spaces, including any "
"modifications made by the original process prior to the call to "
"<function>fork()</function>. It would be silly for the VM system to make a "
"complete copy of the data at the time of the <function>fork()</function> "
"because it is quite possible that at least one of the two processes will "
"only need to read from that page from then on, allowing the original page to "
"continue to be used. What was a private page is made copy-on-write again, "
"since each process (parent and child) expects their own personal post-fork "
"modifications to remain private to themselves and not effect the other."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:178
msgid ""
"FreeBSD manages all of this with a layered VM Object model. The original "
"binary program file winds up being the lowest VM Object layer. A copy-on-"
"write layer is pushed on top of that to hold those pages which had to be "
"copied from the original file. If the program modifies a data page belonging "
"to the original file the VM system takes a fault and makes a copy of the "
"page in the higher layer. When a process forks, additional VM Object layers "
"are pushed on. This might make a little more sense with a fairly basic "
"example. A <function>fork()</function> is a common operation for any *BSD "
"system, so this example will consider a program that starts up, and forks. "
"When the process starts, the VM system creates an object layer, let's call "
"this A:"
msgstr ""

#. (itstool) path: imageobject/imagedata
#. This is a reference to an external file such as an image or video. When
#. the file changes, the md5 hash will change to let you know you need to
#. update your localized copy. The msgstr is not used at all. Set it to
#. whatever you like once you have updated your copy of the file.
#: article.translate.xml:192
msgctxt "_"
msgid "external ref='fig1' md5='__failed__'"
msgstr ""

#. (itstool) path: textobject/literallayout
#: article.translate.xml:196
#, no-wrap
msgid "+---------------+\n"
"|       A       |\n"
"+---------------+"
msgstr ""

#. (itstool) path: sect1/mediaobject
#: article.translate.xml:190
msgid ""
"<imageobject> <imagedata fileref=\"fig1\"/> </imageobject> <textobject> <_:"
"literallayout-1/> </textobject> <textobject> <phrase>A picture</phrase> </"
"textobject>"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:206
msgid ""
"A represents the file—pages may be paged in and out of the file's physical "
"media as necessary. Paging in from the disk is reasonable for a program, but "
"we really do not want to page back out and overwrite the executable. The VM "
"system therefore creates a second layer, B, that will be physically backed "
"by swap space:"
msgstr ""

#. (itstool) path: imageobject/imagedata
#. This is a reference to an external file such as an image or video. When
#. the file changes, the md5 hash will change to let you know you need to
#. update your localized copy. The msgstr is not used at all. Set it to
#. whatever you like once you have updated your copy of the file.
#: article.translate.xml:214
msgctxt "_"
msgid "external ref='fig2' md5='__failed__'"
msgstr ""

#. (itstool) path: textobject/literallayout
#: article.translate.xml:218
#, no-wrap
msgid ""
"+---------------+\n"
"|       B       |\n"
"+---------------+\n"
"|       A       |\n"
"+---------------+"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:226
msgid ""
"On the first write to a page after this, a new page is created in B, and its "
"contents are initialized from A. All pages in B can be paged in or out to a "
"swap device. When the program forks, the VM system creates two new object "
"layers—C1 for the parent, and C2 for the child—that rest on top of B:"
msgstr ""

#. (itstool) path: imageobject/imagedata
#. This is a reference to an external file such as an image or video. When
#. the file changes, the md5 hash will change to let you know you need to
#. update your localized copy. The msgstr is not used at all. Set it to
#. whatever you like once you have updated your copy of the file.
#: article.translate.xml:234
msgctxt "_"
msgid "external ref='fig3' md5='__failed__'"
msgstr ""

#. (itstool) path: textobject/literallayout
#: article.translate.xml:238
#, no-wrap
msgid ""
"+-------+-------+\n"
"|   C1  |   C2  |\n"
"+-------+-------+\n"
"|       B       |\n"
"+---------------+\n"
"|       A       |\n"
"+---------------+"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:248
msgid ""
"In this case, let's say a page in B is modified by the original parent "
"process. The process will take a copy-on-write fault and duplicate the page "
"in C1, leaving the original page in B untouched. Now, let's say the same "
"page in B is modified by the child process. The process will take a copy-on-"
"write fault and duplicate the page in C2. The original page in B is now "
"completely hidden since both C1 and C2 have a copy and B could theoretically "
"be destroyed if it does not represent a <quote>real</quote> file; however, "
"this sort of optimization is not trivial to make because it is so fine-"
"grained. FreeBSD does not make this optimization. Now, suppose (as is often "
"the case) that the child process does an <function>exec()</function>. Its "
"current address space is usually replaced by a new address space "
"representing a new file. In this case, the C2 layer is destroyed:"
msgstr ""

#. (itstool) path: imageobject/imagedata
#. This is a reference to an external file such as an image or video. When
#. the file changes, the md5 hash will change to let you know you need to
#. update your localized copy. The msgstr is not used at all. Set it to
#. whatever you like once you have updated your copy of the file.
#: article.translate.xml:264
msgctxt "_"
msgid "external ref='fig4' md5='__failed__'"
msgstr ""

#. (itstool) path: textobject/literallayout
#: article.translate.xml:268
#, no-wrap
msgid ""
"+-------+\n"
"|   C1  |\n"
"+-------+-------+\n"
"|       B       |\n"
"+---------------+\n"
"|       A       |\n"
"+---------------+"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:278
msgid ""
"In this case, the number of children of B drops to one, and all accesses to "
"B now go through C1. This means that B and C1 can be collapsed together. Any "
"pages in B that also exist in C1 are deleted from B during the collapse. "
"Thus, even though the optimization in the previous step could not be made, "
"we can recover the dead pages when either of the processes exit or "
"<function>exec()</function>."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:285
msgid ""
"This model creates a number of potential problems. The first is that you can "
"wind up with a relatively deep stack of layered VM Objects which can cost "
"scanning time and memory when you take a fault. Deep layering can occur when "
"processes fork and then fork again (either parent or child). The second "
"problem is that you can wind up with dead, inaccessible pages deep in the "
"stack of VM Objects. In our last example if both the parent and child "
"processes modify the same page, they both get their own private copies of "
"the page and the original page in B is no longer accessible by anyone. That "
"page in B can be freed."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:295
msgid ""
"FreeBSD solves the deep layering problem with a special optimization called "
"the <quote>All Shadowed Case</quote>. This case occurs if either C1 or C2 "
"take sufficient COW faults to completely shadow all pages in B. Lets say "
"that C1 achieves this. C1 can now bypass B entirely, so rather then have C1-"
"&gt;B-&gt;A and C2-&gt;B-&gt;A we now have C1-&gt;A and C2-&gt;B-&gt;A. But "
"look what also happened—now B has only one reference (C2), so we can "
"collapse B and C2 together. The end result is that B is deleted entirely and "
"we have C1-&gt;A and C2-&gt;A. It is often the case that B will contain a "
"large number of pages and neither C1 nor C2 will be able to completely "
"overshadow it. If we fork again and create a set of D layers, however, it is "
"much more likely that one of the D layers will eventually be able to "
"completely overshadow the much smaller dataset represented by C1 or C2. The "
"same optimization will work at any point in the graph and the grand result "
"of this is that even on a heavily forked machine VM Object stacks tend to "
"not get much deeper then 4. This is true of both the parent and the children "
"and true whether the parent is doing the forking or whether the children "
"cascade forks."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:313
msgid ""
"The dead page problem still exists in the case where C1 or C2 do not "
"completely overshadow B. Due to our other optimizations this case does not "
"represent much of a problem and we simply allow the pages to be dead. If the "
"system runs low on memory it will swap them out, eating a little swap, but "
"that is it."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:319
msgid ""
"The advantage to the VM Object model is that <function>fork()</function> is "
"extremely fast, since no real data copying need take place. The disadvantage "
"is that you can build a relatively complex VM Object layering that slows "
"page fault handling down a little, and you spend memory managing the VM "
"Object structures. The optimizations FreeBSD makes proves to reduce the "
"problems enough that they can be ignored, leaving no real disadvantage."
msgstr ""

#. (itstool) path: sect1/title
#: article.translate.xml:329
msgid "SWAP Layers"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:331
msgid ""
"Private data pages are initially either copy-on-write or zero-fill pages. "
"When a change, and therefore a copy, is made, the original backing object "
"(usually a file) can no longer be used to save a copy of the page when the "
"VM system needs to reuse it for other purposes. This is where SWAP comes in. "
"SWAP is allocated to create backing store for memory that does not otherwise "
"have it. FreeBSD allocates the swap management structure for a VM Object "
"only when it is actually needed. However, the swap management structure has "
"had problems historically:"
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:343
msgid ""
"Under FreeBSD 3.X the swap management structure preallocates an array that "
"encompasses the entire object requiring swap backing store—even if only a "
"few pages of that object are swap-backed. This creates a kernel memory "
"fragmentation problem when large objects are mapped, or processes with large "
"runsizes (RSS) fork."
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:352
msgid ""
"Also, in order to keep track of swap space, a <quote>list of holes</quote> "
"is kept in kernel memory, and this tends to get severely fragmented as well. "
"Since the <quote>list of holes</quote> is a linear list, the swap allocation "
"and freeing performance is a non-optimal O(n)-per-page."
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:360
msgid ""
"It requires kernel memory allocations to take place during the swap freeing "
"process, and that creates low memory deadlock problems."
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:366
msgid ""
"The problem is further exacerbated by holes created due to the interleaving "
"algorithm."
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:371
msgid ""
"Also, the swap block map can become fragmented fairly easily resulting in "
"non-contiguous allocations."
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:376
msgid ""
"Kernel memory must also be allocated on the fly for additional swap "
"management structures when a swapout occurs."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:381
msgid ""
"It is evident from that list that there was plenty of room for improvement. "
"For FreeBSD 4.X, I completely rewrote the swap subsystem:"
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:387
msgid ""
"Swap management structures are allocated through a hash table rather than a "
"linear array giving them a fixed allocation size and much finer granularity."
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:393
msgid ""
"Rather then using a linearly linked list to keep track of swap space "
"reservations, it now uses a bitmap of swap blocks arranged in a radix tree "
"structure with free-space hinting in the radix node structures. This "
"effectively makes swap allocation and freeing an O(1) operation."
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:401
msgid ""
"The entire radix tree bitmap is also preallocated in order to avoid having "
"to allocate kernel memory during critical low memory swapping operations. "
"After all, the system tends to swap when it is low on memory so we should "
"avoid allocating kernel memory at such times in order to avoid potential "
"deadlocks."
msgstr ""

#. (itstool) path: listitem/para
#: article.translate.xml:410
msgid ""
"To reduce fragmentation the radix tree is capable of allocating large "
"contiguous chunks at once, skipping over smaller fragmented chunks."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:416
msgid ""
"I did not take the final step of having an <quote>allocating hint pointer</"
"quote> that would trundle through a portion of swap as allocations were made "
"in order to further guarantee contiguous allocations or at least locality of "
"reference, but I ensured that such an addition could be made."
msgstr ""

#. (itstool) path: sect1/title
#: article.translate.xml:424
msgid "When to free a page"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:426
msgid ""
"Since the VM system uses all available memory for disk caching, there are "
"usually very few truly-free pages. The VM system depends on being able to "
"properly choose pages which are not in use to reuse for new allocations. "
"Selecting the optimal pages to free is possibly the single-most important "
"function any VM system can perform because if it makes a poor selection, the "
"VM system may be forced to unnecessarily retrieve pages from disk, seriously "
"degrading system performance."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:434
msgid ""
"How much overhead are we willing to suffer in the critical path to avoid "
"freeing the wrong page? Each wrong choice we make will cost us hundreds of "
"thousands of CPU cycles and a noticeable stall of the affected processes, so "
"we are willing to endure a significant amount of overhead in order to be "
"sure that the right page is chosen. This is why FreeBSD tends to outperform "
"other systems when memory resources become stressed."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:442
msgid ""
"The free page determination algorithm is built upon a history of the use of "
"memory pages. To acquire this history, the system takes advantage of a page-"
"used bit feature that most hardware page tables have."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:446
msgid ""
"In any case, the page-used bit is cleared and at some later point the VM "
"system comes across the page again and sees that the page-used bit has been "
"set. This indicates that the page is still being actively used. If the bit "
"is still clear it is an indication that the page is not being actively used. "
"By testing this bit periodically, a use history (in the form of a counter) "
"for the physical page is developed. When the VM system later needs to free "
"up some pages, checking this history becomes the cornerstone of determining "
"the best candidate page to reuse."
msgstr ""

#. (itstool) path: sidebar/title
#: article.translate.xml:456
msgid "What if the hardware has no page-used bit?"
msgstr ""

#. (itstool) path: sidebar/para
#: article.translate.xml:458
msgid ""
"For those platforms that do not have this feature, the system actually "
"emulates a page-used bit. It unmaps or protects a page, forcing a page fault "
"if the page is accessed again. When the page fault is taken, the system "
"simply marks the page as having been used and unprotects the page so that it "
"may be used. While taking such page faults just to determine if a page is "
"being used appears to be an expensive proposition, it is much less expensive "
"than reusing the page for some other purpose only to find that a process "
"needs it back and then have to go to disk."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:469
msgid ""
"FreeBSD makes use of several page queues to further refine the selection of "
"pages to reuse as well as to determine when dirty pages must be flushed to "
"their backing store. Since page tables are dynamic entities under FreeBSD, "
"it costs virtually nothing to unmap a page from the address space of any "
"processes using it. When a page candidate has been chosen based on the page-"
"use counter, this is precisely what is done. The system must make a "
"distinction between clean pages which can theoretically be freed up at any "
"time, and dirty pages which must first be written to their backing store "
"before being reusable. When a page candidate has been found it is moved to "
"the inactive queue if it is dirty, or the cache queue if it is clean. A "
"separate algorithm based on the dirty-to-clean page ratio determines when "
"dirty pages in the inactive queue must be flushed to disk. Once this is "
"accomplished, the flushed pages are moved from the inactive queue to the "
"cache queue. At this point, pages in the cache queue can still be "
"reactivated by a VM fault at relatively low cost. However, pages in the "
"cache queue are considered to be <quote>immediately freeable</quote> and "
"will be reused in an LRU (least-recently used) fashion when the system needs "
"to allocate new memory."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:489
msgid ""
"It is important to note that the FreeBSD VM system attempts to separate "
"clean and dirty pages for the express reason of avoiding unnecessary flushes "
"of dirty pages (which eats I/O bandwidth), nor does it move pages between "
"the various page queues gratuitously when the memory subsystem is not being "
"stressed. This is why you will see some systems with very low cache queue "
"counts and high active queue counts when doing a <command>systat -vm</"
"command> command. As the VM system becomes more stressed, it makes a greater "
"effort to maintain the various page queues at the levels determined to be "
"the most effective."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:499
msgid ""
"An urban myth has circulated for years that Linux did a better job avoiding "
"swapouts than FreeBSD, but this in fact is not true. What was actually "
"occurring was that FreeBSD was proactively paging out unused pages in order "
"to make room for more disk cache while Linux was keeping unused pages in "
"core and leaving less memory available for cache and process pages. I do not "
"know whether this is still true today."
msgstr ""

#. (itstool) path: sect1/title
#: article.translate.xml:509
msgid "Pre-Faulting and Zeroing Optimizations"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:511
msgid ""
"Taking a VM fault is not expensive if the underlying page is already in core "
"and can simply be mapped into the process, but it can become expensive if "
"you take a whole lot of them on a regular basis. A good example of this is "
"running a program such as <citerefentry><refentrytitle>ls</"
"refentrytitle><manvolnum>1</manvolnum></citerefentry> or "
"<citerefentry><refentrytitle>ps</refentrytitle><manvolnum>1</manvolnum></"
"citerefentry> over and over again. If the program binary is mapped into "
"memory but not mapped into the page table, then all the pages that will be "
"accessed by the program will have to be faulted in every time the program is "
"run. This is unnecessary when the pages in question are already in the VM "
"Cache, so FreeBSD will attempt to pre-populate a process's page tables with "
"those pages that are already in the VM Cache. One thing that FreeBSD does "
"not yet do is pre-copy-on-write certain pages on exec. For example, if you "
"run the <citerefentry><refentrytitle>ls</refentrytitle><manvolnum>1</"
"manvolnum></citerefentry> program while running <command>vmstat 1</command> "
"you will notice that it always takes a certain number of page faults, even "
"when you run it over and over again. These are zero-fill faults, not program "
"code faults (which were pre-faulted in already). Pre-copying pages on exec "
"or fork is an area that could use more study."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:529
msgid ""
"A large percentage of page faults that occur are zero-fill faults. You can "
"usually see this by observing the <command>vmstat -s</command> output. These "
"occur when a process accesses pages in its BSS area. The BSS area is "
"expected to be initially zero but the VM system does not bother to allocate "
"any memory at all until the process actually accesses it. When a fault "
"occurs the VM system must not only allocate a new page, it must zero it as "
"well. To optimize the zeroing operation the VM system has the ability to pre-"
"zero pages and mark them as such, and to request pre-zeroed pages when zero-"
"fill faults occur. The pre-zeroing occurs whenever the CPU is idle but the "
"number of pages the system pre-zeros is limited in order to avoid blowing "
"away the memory caches. This is an excellent example of adding complexity to "
"the VM system in order to optimize the critical path."
msgstr ""

#. (itstool) path: sect1/title
#: article.translate.xml:545
msgid "Page Table Optimizations"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:547
msgid ""
"The page table optimizations make up the most contentious part of the "
"FreeBSD VM design and they have shown some strain with the advent of serious "
"use of <function>mmap()</function>. I think this is actually a feature of "
"most BSDs though I am not sure when it was first introduced. There are two "
"major optimizations. The first is that hardware page tables do not contain "
"persistent state but instead can be thrown away at any time with only a "
"minor amount of management overhead. The second is that every active page "
"table entry in the system has a governing <literal>pv_entry</literal> "
"structure which is tied into the <literal>vm_page</literal> structure. "
"FreeBSD can simply iterate through those mappings that are known to exist "
"while Linux must check all page tables that <emphasis>might</emphasis> "
"contain a specific mapping to see if it does, which can achieve O(n^2) "
"overhead in certain situations. It is because of this that FreeBSD tends to "
"make better choices on which pages to reuse or swap when memory is stressed, "
"giving it better performance under load. However, FreeBSD requires kernel "
"tuning to accommodate large-shared-address-space situations such as those "
"that can occur in a news system because it may run out of <literal>pv_entry</"
"literal> structures."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:567
msgid ""
"Both Linux and FreeBSD need work in this area. FreeBSD is trying to maximize "
"the advantage of a potentially sparse active-mapping model (not all "
"processes need to map all pages of a shared library, for example), whereas "
"Linux is trying to simplify its algorithms. FreeBSD generally has the "
"performance advantage here at the cost of wasting a little extra memory, but "
"FreeBSD breaks down in the case where a large file is massively shared "
"across hundreds of processes. Linux, on the other hand, breaks down in the "
"case where many processes are sparsely-mapping the same shared library and "
"also runs non-optimally when trying to determine whether a page can be "
"reused or not."
msgstr ""

#. (itstool) path: sect1/title
#: article.translate.xml:580
msgid "Page Coloring"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:582
msgid ""
"We will end with the page coloring optimizations. Page coloring is a "
"performance optimization designed to ensure that accesses to contiguous "
"pages in virtual memory make the best use of the processor cache. In ancient "
"times (i.e. 10+ years ago) processor caches tended to map virtual memory "
"rather than physical memory. This led to a huge number of problems including "
"having to clear the cache on every context switch in some cases, and "
"problems with data aliasing in the cache. Modern processor caches map "
"physical memory precisely to solve those problems. This means that two side-"
"by-side pages in a processes address space may not correspond to two side-by-"
"side pages in the cache. In fact, if you are not careful side-by-side pages "
"in virtual memory could wind up using the same page in the processor "
"cache—leading to cacheable data being thrown away prematurely and reducing "
"CPU performance. This is true even with multi-way set-associative caches "
"(though the effect is mitigated somewhat)."
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:598
msgid ""
"FreeBSD's memory allocation code implements page coloring optimizations, "
"which means that the memory allocation code will attempt to locate free "
"pages that are contiguous from the point of view of the cache. For example, "
"if page 16 of physical memory is assigned to page 0 of a process's virtual "
"memory and the cache can hold 4 pages, the page coloring code will not "
"assign page 20 of physical memory to page 1 of a process's virtual memory. "
"It would, instead, assign page 21 of physical memory. The page coloring code "
"attempts to avoid assigning page 20 because this maps over the same cache "
"memory as page 16 and would result in non-optimal caching. This code adds a "
"significant amount of complexity to the VM memory allocation subsystem as "
"you can well imagine, but the result is well worth the effort. Page Coloring "
"makes VM memory as deterministic as physical memory in regards to cache "
"performance."
msgstr ""

#. (itstool) path: sect1/title
#: article.translate.xml:615
msgid "Conclusion"
msgstr ""

#. (itstool) path: sect1/para
#: article.translate.xml:617
msgid ""
"Virtual memory in modern operating systems must address a number of "
"different issues efficiently and for many different usage patterns. The "
"modular and algorithmic approach that BSD has historically taken allows us "
"to study and understand the current implementation as well as relatively "
"cleanly replace large sections of the code. There have been a number of "
"improvements to the FreeBSD VM system in the last several years, and work is "
"ongoing."
msgstr ""

#. (itstool) path: sect1/title
#: article.translate.xml:627
msgid "Bonus QA session by Allen Briggs <email>briggs@ninthwonder.com</email>"
msgstr ""

#. (itstool) path: question/para
#: article.translate.xml:633
msgid ""
"What is <quote>the interleaving algorithm</quote> that you refer to in your "
"listing of the ills of the FreeBSD 3.X swap arrangements?"
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:639
msgid ""
"FreeBSD uses a fixed swap interleave which defaults to 4. This means that "
"FreeBSD reserves space for four swap areas even if you only have one, two, "
"or three. Since swap is interleaved the linear address space representing "
"the <quote>four swap areas</quote> will be fragmented if you do not actually "
"have four swap areas. For example, if you have two swap areas A and B "
"FreeBSD's address space representation for that swap area will be "
"interleaved in blocks of 16 pages:"
msgstr ""

#. (itstool) path: answer/literallayout
#: article.translate.xml:648
#, no-wrap
msgid "A B C D A B C D A B C D A B C D"
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:650
msgid ""
"FreeBSD 3.X uses a <quote>sequential list of free regions</quote> approach "
"to accounting for the free swap areas. The idea is that large blocks of free "
"linear space can be represented with a single list node (<filename>kern/"
"subr_rlist.c</filename>). But due to the fragmentation the sequential list "
"winds up being insanely fragmented. In the above example, completely unused "
"swap will have A and B shown as <quote>free</quote> and C and D shown as "
"<quote>all allocated</quote>. Each A-B sequence requires a list node to "
"account for because C and D are holes, so the list node cannot be combined "
"with the next A-B sequence."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:662
msgid ""
"Why do we interleave our swap space instead of just tack swap areas onto the "
"end and do something fancier? Because it is a whole lot easier to allocate "
"linear swaths of an address space and have the result automatically be "
"interleaved across multiple disks than it is to try to put that "
"sophistication elsewhere."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:668
msgid ""
"The fragmentation causes other problems. Being a linear list under 3.X, and "
"having such a huge amount of inherent fragmentation, allocating and freeing "
"swap winds up being an O(N) algorithm instead of an O(1) algorithm. Combined "
"with other factors (heavy swapping) and you start getting into O(N^2) and "
"O(N^3) levels of overhead, which is bad. The 3.X system may also need to "
"allocate KVM during a swap operation to create a new list node which can "
"lead to a deadlock if the system is trying to pageout pages in a low-memory "
"situation."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:678
msgid ""
"Under 4.X we do not use a sequential list. Instead we use a radix tree and "
"bitmaps of swap blocks rather than ranged list nodes. We take the hit of "
"preallocating all the bitmaps required for the entire swap area up front but "
"it winds up wasting less memory due to the use of a bitmap (one bit per "
"block) instead of a linked list of nodes. The use of a radix tree instead of "
"a sequential list gives us nearly O(1) performance no matter how fragmented "
"the tree becomes."
msgstr ""

#. (itstool) path: question/para
#: article.translate.xml:691
msgid ""
"How is the separation of clean and dirty (inactive) pages related to the "
"situation where you see low cache queue counts and high active queue counts "
"in <command>systat -vm</command>? Do the systat stats roll the active and "
"dirty pages together for the active queue count?"
msgstr ""

#. (itstool) path: question/para
#: article.translate.xml:697
msgid "I do not get the following:"
msgstr ""

#. (itstool) path: blockquote/para
#: article.translate.xml:700
msgid ""
"It is important to note that the FreeBSD VM system attempts to separate "
"clean and dirty pages for the express reason of avoiding unnecessary flushes "
"of dirty pages (which eats I/O bandwidth), nor does it move pages between "
"the various page queues gratuitously when the memory subsystem is not being "
"stressed. This is why you will see some systems with very low cache queue "
"counts and high active queue counts when doing a <command>systat -vm</"
"command> command."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:712
msgid ""
"Yes, that is confusing. The relationship is <quote>goal</quote> verses "
"<quote>reality</quote>. Our goal is to separate the pages but the reality is "
"that if we are not in a memory crunch, we do not really have to."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:717
msgid ""
"What this means is that FreeBSD will not try very hard to separate out dirty "
"pages (inactive queue) from clean pages (cache queue) when the system is not "
"being stressed, nor will it try to deactivate pages (active queue -&gt; "
"inactive queue) when the system is not being stressed, even if they are not "
"being used."
msgstr ""

#. (itstool) path: question/para
#: article.translate.xml:727
msgid ""
"In the <citerefentry><refentrytitle>ls</refentrytitle><manvolnum>1</"
"manvolnum></citerefentry> / <command>vmstat 1</command> example, would not "
"some of the page faults be data page faults (COW from executable file to "
"private page)? I.e., I would expect the page faults to be some zero-fill and "
"some program data. Or are you implying that FreeBSD does do pre-COW for the "
"program data?"
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:735
msgid ""
"A COW fault can be either zero-fill or program-data. The mechanism is the "
"same either way because the backing program-data is almost certainly already "
"in the cache. I am indeed lumping the two together. FreeBSD does not pre-COW "
"program data or zero-fill, but it <emphasis>does</emphasis> pre-map pages "
"that exist in its cache."
msgstr ""

#. (itstool) path: question/para
#: article.translate.xml:746
msgid ""
"In your section on page table optimizations, can you give a little more "
"detail about <literal>pv_entry</literal> and <literal>vm_page</literal> (or "
"should vm_page be <literal>vm_pmap</literal>—as in 4.4, cf. pp. 180-181 of "
"McKusick, Bostic, Karel, Quarterman)? Specifically, what kind of operation/"
"reaction would require scanning the mappings?"
msgstr ""

#. (itstool) path: question/para
#: article.translate.xml:753
msgid ""
"How does Linux do in the case where FreeBSD breaks down (sharing a large "
"file mapping over many processes)?"
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:758
msgid ""
"A <literal>vm_page</literal> represents an (object,index#) tuple. A "
"<literal>pv_entry</literal> represents a hardware page table entry (pte). If "
"you have five processes sharing the same physical page, and three of those "
"processes's page tables actually map the page, that page will be represented "
"by a single <literal>vm_page</literal> structure and three "
"<literal>pv_entry</literal> structures."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:766
msgid ""
"<literal>pv_entry</literal> structures only represent pages mapped by the "
"MMU (one <literal>pv_entry</literal> represents one pte). This means that "
"when we need to remove all hardware references to a <literal>vm_page</"
"literal> (in order to reuse the page for something else, page it out, clear "
"it, dirty it, and so forth) we can simply scan the linked list of "
"<literal>pv_entry</literal>'s associated with that <literal>vm_page</"
"literal> to remove or modify the pte's from their page tables."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:776
msgid ""
"Under Linux there is no such linked list. In order to remove all the "
"hardware page table mappings for a <literal>vm_page</literal> linux must "
"index into every VM object that <emphasis>might</emphasis> have mapped the "
"page. For example, if you have 50 processes all mapping the same shared "
"library and want to get rid of page X in that library, you need to index "
"into the page table for each of those 50 processes even if only 10 of them "
"have actually mapped the page. So Linux is trading off the simplicity of its "
"design against performance. Many VM algorithms which are O(1) or (small N) "
"under FreeBSD wind up being O(N), O(N^2), or worse under Linux. Since the "
"pte's representing a particular page in an object tend to be at the same "
"offset in all the page tables they are mapped in, reducing the number of "
"accesses into the page tables at the same pte offset will often avoid "
"blowing away the L1 cache line for that offset, which can lead to better "
"performance."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:793
msgid ""
"FreeBSD has added complexity (the <literal>pv_entry</literal> scheme) in "
"order to increase performance (to limit page table accesses to "
"<emphasis>only</emphasis> those pte's that need to be modified)."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:798
msgid ""
"But FreeBSD has a scaling problem that Linux does not in that there are a "
"limited number of <literal>pv_entry</literal> structures and this causes "
"problems when you have massive sharing of data. In this case you may run out "
"of <literal>pv_entry</literal> structures even though there is plenty of "
"free memory available. This can be fixed easily enough by bumping up the "
"number of <literal>pv_entry</literal> structures in the kernel config, but "
"we really need to find a better way to do it."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:808
msgid ""
"In regards to the memory overhead of a page table verses the "
"<literal>pv_entry</literal> scheme: Linux uses <quote>permanent</quote> page "
"tables that are not throw away, but does not need a <literal>pv_entry</"
"literal> for each potentially mapped pte. FreeBSD uses <quote>throw away</"
"quote> page tables but adds in a <literal>pv_entry</literal> structure for "
"each actually-mapped pte. I think memory utilization winds up being about "
"the same, giving FreeBSD an algorithmic advantage with its ability to throw "
"away page tables at will with very low overhead."
msgstr ""

#. (itstool) path: question/para
#: article.translate.xml:823
msgid ""
"Finally, in the page coloring section, it might help to have a little more "
"description of what you mean here. I did not quite follow it."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:829
msgid ""
"Do you know how an L1 hardware memory cache works? I will explain: Consider "
"a machine with 16MB of main memory but only 128K of L1 cache. Generally the "
"way this cache works is that each 128K block of main memory uses the "
"<emphasis>same</emphasis> 128K of cache. If you access offset 0 in main "
"memory and then offset 128K in main memory you can wind up throwing away the "
"cached data you read from offset 0!"
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:837
msgid ""
"Now, I am simplifying things greatly. What I just described is what is "
"called a <quote>direct mapped</quote> hardware memory cache. Most modern "
"caches are what are called 2-way-set-associative or 4-way-set-associative "
"caches. The set-associatively allows you to access up to N different memory "
"regions that overlap the same cache memory without destroying the previously "
"cached data. But only N."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:845
msgid ""
"So if I have a 4-way set associative cache I can access offset 0, offset "
"128K, 256K and offset 384K and still be able to access offset 0 again and "
"have it come from the L1 cache. If I then access offset 512K, however, one "
"of the four previously cached data objects will be thrown away by the cache."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:851
msgid ""
"It is extremely important… <emphasis>extremely</emphasis> important for most "
"of a processor's memory accesses to be able to come from the L1 cache, "
"because the L1 cache operates at the processor frequency. The moment you "
"have an L1 cache miss and have to go to the L2 cache or to main memory, the "
"processor will stall and potentially sit twiddling its fingers for "
"<emphasis>hundreds</emphasis> of instructions worth of time waiting for a "
"read from main memory to complete. Main memory (the dynamic ram you stuff "
"into a computer) is <emphasis>slow</emphasis>, when compared to the speed of "
"a modern processor core."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:863
msgid ""
"Ok, so now onto page coloring: All modern memory caches are what are known "
"as <emphasis>physical</emphasis> caches. They cache physical memory "
"addresses, not virtual memory addresses. This allows the cache to be left "
"alone across a process context switch, which is very important."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:869
msgid ""
"But in the <trademark class=\"registered\">UNIX</trademark> world you are "
"dealing with virtual address spaces, not physical address spaces. Any "
"program you write will see the virtual address space given to it. The actual "
"<emphasis>physical</emphasis> pages underlying that virtual address space "
"are not necessarily physically contiguous! In fact, you might have two pages "
"that are side by side in a processes address space which wind up being at "
"offset 0 and offset 128K in <emphasis>physical</emphasis> memory."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:878
msgid ""
"A program normally assumes that two side-by-side pages will be optimally "
"cached. That is, that you can access data objects in both pages without "
"having them blow away each other's cache entry. But this is only true if the "
"physical pages underlying the virtual address space are contiguous (insofar "
"as the cache is concerned)."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:885
msgid ""
"This is what Page coloring does. Instead of assigning <emphasis>random</"
"emphasis> physical pages to virtual addresses, which may result in non-"
"optimal cache performance, Page coloring assigns <emphasis>reasonably-"
"contiguous</emphasis> physical pages to virtual addresses. Thus programs can "
"be written under the assumption that the characteristics of the underlying "
"hardware cache are the same for their virtual address space as they would be "
"if the program had been run directly in a physical address space."
msgstr ""

#. (itstool) path: answer/para
#: article.translate.xml:895
msgid ""
"Note that I say <quote>reasonably</quote> contiguous rather than simply "
"<quote>contiguous</quote>. From the point of view of a 128K direct mapped "
"cache, the physical address 0 is the same as the physical address 128K. So "
"two side-by-side pages in your virtual address space may wind up being "
"offset 128K and offset 132K in physical memory, but could also easily be "
"offset 128K and offset 4K in physical memory and still retain the same cache "
"performance characteristics. So page-coloring does <emphasis>not</emphasis> "
"have to assign truly contiguous pages of physical memory to contiguous pages "
"of virtual memory, it just needs to make sure it assigns contiguous pages "
"from the point of view of cache performance and operation."
msgstr ""