From: Ashish Vulimiri [vulimir1@illinois.edu] Sent: Thursday, April 15, 2010 12:39 PM To: Gupta, Indranil Subject: 525 review 04/15 Characterizing Flash Memory: Anomalies, Observations, and Applications, L. Grupp et al, MICRO 2009 In this paper, the authors show how to identify several characteristics of flash storage devices using black-box testing, and demonstrate differences between the claimed (published) and actual specifications of devices from five different manufacturers. They then try to use these observed characteristics to optimise for specific requirements, such as improvements in latency, power consumption or reliability. The two specific characteristics they use for these optmisations are: 1. Multi-level cell devices have predictable, periodic variations in write latency -- used to optimise for latency (with the incidental benefit of lower power consumption) 2. Write performance actually begins to improve (again, in a predictable fashion) as the devices wear out -- the predictability is used to optimise for reliability In addition, they also present detailed characterisations of the power usage and reliability of flash devices. Comments: * This is an important contribution. While the situation they describe (with the hardware manufacturers being unwilling to release detailed specifications) is rather unfortunate, the black-box testing approach would be useful even if this were not the case, since there would need to be some way to verify the manufacturers' claims. * They don't directly use their characterisation of power usage in their optimisations. This characterisation would probably be more useful at a higher level of system analysis, when looking at the combination of energy usage from CPU, storage and the network. From: Fatemeh Saremi [samaneh.saremi@gmail.com] Sent: Thursday, April 15, 2010 12:17 PM To: Gupta, Indranil Subject: 525 review 04/15 Paper 1: Characterizing Flash Memory: Anomalies, Observations, and Applications In this paper, the authors empirically characterize anomalies and behavior of flash memories in terms of performance, power consumption cost, and reliability. They investigate different usage patterns that affect these metrics and quantify the results. They show that the actual performance of flash memories is highly variable and significantly worse than what is reported in publicly available datasheets. They do their measurements on flash technologies from five different manufacturers and present their performance variability. They introduce two sources of variation in program time of the devices: first, the wide variation in program speed and energy consumption between the fast and slow pages in MLC devices, and second, the change in SLC program latency as the chips aged. They develop a flash translation layer (FTL) called Mango to exploit the first variation to improve performance and/or power efficiency. They also propose using write-once-memory (WOM) coding scheme to improve flash longevity while providing reliability. The work is interesting and valuable as it provides system designers a good picture of how exactly the performance of flash memory technology is impacted by different factors, e.g. usage patterns. The variation-aware FTL does not preserve the baseline FTLs property of evenly distribution of erase and program operations across the flash memory and would result in non-uniform wear-out thereof. At some periods of time, the variation-aware FTL needs to have the garbage collection being performed more frequently rather than usual which results in higher latency. The amortized cost might be negligible, however, what about the worst-case time? The results presented only talk about the average case and leave the worst-case unclear which might be significant. Mango requires to assign priorities to incoming I/O requests. The question is how the priorities are assigned and which write requests are of higher priority. That would complicate the story and there would be critical decision points in order to avoid worsening the situation and having poor performance for some of the high priority requests. It is not mentioned about the results of variation-aware FTL for the case of SLC devices. Is there any improvement in that case as well? Do they experience similar performance improvements as MLC devices? The lifetime improvements for some of the devices using WOM coding scheme is very interesting and valuable. However, the redundancy is very significant (about 33%). Paper 2: Extending SSD Lifetimes with Disk-Based Write Caches This paper presents a hybrid storage that uses a hard disk drive (HDD) as a write cache for Flash-based Solid State Devices (SSDs). The authors exploit the performance of reads in SSDs and writes in HDDs to improve overall performance of the storage system. They hide inefficiency of SSD for (non-sequential) writes by utilizing HDD as a log-structured cache. As a result, the SSD experiences a traffic of sequential writes being collectively written into it, which improves the performance. Writes to the system will not face the (inefficient) latency of an SSD-write, and the number of writes imposed on the SSD decreases (since the writes are performed collectively instead of in a poor one-by-one manner) which in turn results in increasing the longevity of the SSD storage (since the lifetime of SSDs is significantly determined by the number of their write operations). They investigate different traces to find out spatial and temporal locality of access patterns in order to exploit the design idea in an effective manner and decide upon what data to cache, and how long to cache it for. The two metrics (Write Savings and Read Penalty) they evaluate the performance of their system based on, appropriately capture efficiency of the approach. The idea, while kind of contradictory to traditional designs, is interesting and results in improved performance. It could be very helpful to compare the results with two extreme cases of solely using either SSDs or HDDs. Providing performance and cost analysis of these two approaches versus the proposed system would be useful to see how much this idea helps to bring the widespread replacement of disk drives by SSDs within the realms of possibility. However, as the SSD is the main storage and the HDD plays only the role of a cache, failure of the cache here would be very costly compared to traditional storage systems that the cache failure does not significantly affect the reliability of the stored data (from the users point of view). It would be valuable to provide results presenting the cost of failures, since for achieving performance improvement, the required design decisions should be done in a very careful way that on the other hand might decrease the overall reliability of the storage system. From: Vivek [vivek112@gmail.com] Sent: Thursday, April 15, 2010 11:57 AM To: indy@cs.uiuc.edu Subject: 525 review 4/15 Characterizing Flash Memory: Anomalies, Observations, and Applications Core Idea: Many FLASH memory manufacturers provide very conservative estimates of the performance of their product. This paper characterizes FLASH memory devices in terms of power, performance, and reliability. They develop an experimental framework that can be used to effectively characterize 5 different devices. They also show two examples of how to apply the insights gained through their experimentation: 1. improving the flash translation layer and 2. using an alternative data encoding scheme. Pros: - The improvements provided by their two techniques are very promising. As they suggest, they can design an improved flash translation layer that lowers energy consumption by 13%. Moreover, this scheme also reduces latency for critical program operations by 44%. - The alternative data encoding scheme improves performance by 5.2. - Their quantification of device characteristics is in itself valuable for future experimental studies and for setting standards on what information FLASH device data sheets should include. - The experimental framework in section 3 using the Xilinx XUP board is explained in enough detail that the controls and dependent variables are clear. Cons: - This could be classified as an engineering study in some sense, because of the focus on characterization. It is acknowledged that they suggest two improved schemes based on what they've learned from the experiments, but how could these schemes be used in the real world? - The paper goes through a lot of detail on experimentation and tests many different devices. The results are clear and their conclusions seem to be clear. However, the analysis (perhaps what comes in between results and conclusions) seems to be a bit less clear. -The paper could improve by explaining the broader impact and usage of these devices in many more practical settings. While the devices may be unreliable for , say, laptops , are they useful for other large-scale systems? From: Giang Nguyen [nguyen59@illinois.edu] Sent: Thursday, April 15, 2010 11:54 AM To: Gupta, Indranil Subject: 525 review 04/15 CS525 review nguyen59 04/15 Characterizing Flash Memory: Anomalies, Observations, and Applications Flash memory has power and performance advantages compared to DRAM. However, the technology has its idiosyncrasies. This paper study those empirically and suggest application techniques that exploit them. A flash chip contains one or more planes or banks that are largely independent of one another. Each plane contains a set of blocks, each made up of 64 (SLC--single-level cell) or 128 (MLC--multi-level cell) pages. Each page contains between 2112 and 8448 bytes. NAND flash devices support three primary operations: erase, program, and read. Erase operates on entire blocks and sets all the bits to 1. Program operations write entire pages and can only change 1s to 0s, so an erase operation (of the entire block) is required to arbitrarily modify the page's contents. Read operations read an entire page in parallel. The basic operation performance experiments show that MLC devices exhibit regular and predictable variation in program latency between pages within a block. A second surprise is that performance increases (50% for SLC, 10-15% for MLC) as the devices wear out. In the basic operation power consumption experiments for MLC devices, the pages that are the fastest to program also consume dramatically less energy per operation. Also, SLC devices are much more energy efficient than MLC devices. In the reliability experiments, MLC devices show high error rates from the beginning, while SLC devices show near zero errors until reaching their rated lifetime and maintain reasonably low rates for up to 6 times their rate lifetime. The paper describes Mango, a Flash Translation Layer (FTL), that supports priority for incoming I/O requests. The high priority write requests get assigned to the fast pages, which are also more energy efficient. The possible application are swapping and netbook scenarios Pros: - Average 50% performance improvement for swapping. - Average 3% performance improvement for netbook. Cons: - Average 3% increases in wear across all pages for swapping. - Average 55% increases in wear across all pages for netbook. From: Rini Kaushik [rinikaushik@yahoo.com] Sent: Thursday, April 15, 2010 11:50 AM To: indy@cs.uiuc.edu Subject: 525 review 04/15 Rini Kaushik CS525 DFS: A File System for Virtualized Flash Storage Pros: 1) Flash memory's usage is becoming increasingly prevalent and Flash has several interesting properties such as low power, high IOPs and reliability. Existing systems use Flash either via standard HDD based file systems such as EXT3 or via special file systems such as YAFFS, JFFS. YAFFS etc. assume a raw Flash and hence, need to change with the hardware. In addition, they are unable to use some of the sophisticated software that comes with the flash these days to handle wear-leveling, block erasures and mapping. Hence, they need to implement these functionalities themselves. On the other hand, HDD based file systems use two levels of indirection to interface with the flash (FTL and block storage interface). The authors have instead proposed a file system DFS which interfaces with a virtual flash storage layer. This reduces the indirection between file system and the flash by one level. 2) DFS doesn't support Buffer caching. Buffer caching code is quite complicated and not needed in case of Flash memories. Removing this code leads to simpler and more manageable code. 3) The virtual flash storage layer can evolve with the hardware and can potentially provide a much richer interface than the block-based interface. The file system won't need to make deep assumptions about the RAW flash and hence, won't need to change drastically as the hardware changes. Cons/discussion: 1) It would be more interesting to see a performance comparison with some of the existing Flash File Systems such as JFFS etc. instead of ext3 which is meant for HDD. 2) Not clear as to how is the virtual layer described in the paper better than the ideas used in the Flash Translation Layer (FTL)? FTL also does wear -leveling, hides bulk erasure latencies etc. The drawbacks of existing file systems for flash such as YAFFS etc. are not clearly presented in the related work. Why are the authors saying that there is a drawback in directly accessing the NAND flash chips? Won't that result in thin software stack which is better in performance and maintainability than a thick software stack? 3) Motivation for the work and the evaluations/comparisons with ext3 don't make a lot of sense to me. Ext3 is a much more complex system with extra checks for reliability etc. Hence, comparing the CPU utilization and lines of code and saying that DFS is better may be a bit misleading. It is quite possible that DFS may end up needing similar logic as ext3 in the future. A fairer comparison would be with a revamped version of EXT3 which has functionality not existent in DFS is commented out. 4) It is not clear from the paper who will write the virtual layer? 5) Won't the databases and other applications that were written with a HDD in mind, need to be rewritten to use the virtual interface? That would make the adoption of DFS very unlikely. 6) The virtual flash storage layer exposes 64bit block addresses. The traditional file systems/databases may not support 64bit block addresses. Hence, they may need to be changed to use the virtual layer. This will impede the acceptance of the virtual flash layer. 7) I went through the FTL description in the related work and on wiki and am not able to find many differences between the virtual layer and FTL. Isn't it possible to layer a file system on top of FTL? 8) It is not clear to me as to why YFFS which are currently meant for embedded world not be tweaked and scaled up to serve high performance computing workload? From: Jayanta Mukherjee [mukherj4@illinois.edu] Sent: Thursday, April 15, 2010 11:24 AM To: Gupta, Indranil Subject: 525 Review 04/15 Flash Jayanta Mukherjee NetID: mukherj4 Characterizing Flash Memory: Anomalies, Observations, and Applications, L. Grupp et al, In this paper, the authors tried to find ways to overcome these idiosyncrasies while exploiting flash memory’s useful characteristics. They empirically characterized flash memory technology from five manufacturers by directly measuring the performance, power, and reliability. The authors have developed a flash translation layer (FTL) called Mango to exploit the first variation to improve performance and/or power efficiency. They implemented Mango in a flash memory simulator to determine the latency for each operation (including garbage collection time if the operation needed to wait for it) and the overall energy consumption for the trace. They programmed the unencoded data into the non-WOM safe pages and first-generation WOM-encoded data into the WOM-safe pages. Pros: 1.In this paper, they compared the performance variation amongst different vendors. 2.They figured out some of the unexpected device characteristics and provide some way to improve responsiveness and energy consumption They designed an improved flash translation layer (FTL) that can reduce flash energy consumption by up to 13% during battery-powered operation and reduce latency for critical program operations by up to 44%. They demonstrated how an alternative data encoding scheme effectively increases flash device lifetime by up to 5.2 times. 3.The paper contains some details about the fundamental design considerations of flash devices. 4.WOM codes allow the chip to expend less energy to program a given amount of data. They measure this as the amount of logical data written to the device before begins to experience the fatal error rate. 5.The FTL maintains a map between logical block addresses (LBAs) and physical flash addresses (PFAs). The FTL maintains a “write point” at which all program operations occur. 6.Mango has certain advantages: Mango exploits the variation in program time between the fast and slow pages by skipping slow pages for improved performance or power/energy efficiency for some operations. Mango adds a priority to incoming IO requests. For high-priority writes, the FTL will do its best to use fast pages. The FTL also provides a fast garbage collection mode that uses fast pages for garbage collecting write operations as well. Mango uses the next fast page at the current write point for MLC devices. Cons: 1.The authors intentionally did not mention the vendors name for all 5 flash systems, they tested and presented the results. This made impossible to take any decisions about any of those memories to use for any other purpose, due to anonymity. 2.In figure8, it is not clear why it is called Program Disturb, why A is not present and why other  than E-MLC8, for every other device bit-error rate is almost zero till number of reprograms is < 700.Similarly, we can not make sense of the plots in figure 11 and not enough explanations given why the characteristics are like that. Why A is showing step-function kind of response while E is not? It is not clear. How does the plots in Figure-9 is useful for comparison, as it only shows the fluctuation without actually quantifying them to make it more meaningful. Comments: This paper provides a lot of data and graph, but, it is really difficult to relate them to any physical hardware as it is anonymous. This anonymity makes it more complex to use these data for any design decision purpose. But, it is good to know the characteristics thinking it of as a generic flash device characteristics. DFS: A File System for Virtualized Flash Storage, W. Josephson et al, The authors presented the design, implementation and evaluation of Direct File System (DFS) for virtualized flash storage. They designed the layers of abstraction for directly accessing flash memory devices. have implemented DFS for the FusionIO’s virtualized flash storage layer and evaluated it with a suite of benchmarks.DFS has two main novel features. 1.It lays out its files directly in a very large virtual storage address space provided by FusionIO’s virtual flash storage layer. 2.It leverages the virtual flash storage layer to perform block allocations and atomic updates. There are three main aspects of the approach they took for designing DFS 1.New layers of abstraction for flash memory storage systems 2.A virtualized flash storage layer, which provides a very large address space and implements dynamic mapping 3.The design of DFS which takes full advantage of the virtualized flash storage layer. Pros: 1.DFS performs better and it is much simpler than a traditional Unix file system with similar functionalities. 2.As mentioned by the authors, DFS is better than ext3 while consuming less cpu power. 3.DFS is designed to take advantage of the virtualized flash storage layer for simplicity and performance. 4.DFS overcomes the complexity of the traditional files systems by adopting complex storage block allocation strategies, sophisticated buffer cache designs, and by developing methods to make the file system crash-recoverable. 5.Using DFS is more advantageous than using ext3. 6.DFS provides client software with the flexibility to directly access flash memory in a single level store fashion across multiple flash memory devices. 7.It hides the details of the mapping from virtual to physical flash memory pages. 8.The flat virtual block-addressed space provides clients with a backward compatible block storage interface. Cons: 1.They currently implement support for atomic multi-block updates in the virtualized flash storage layer. The log-structured, copy-on-write nature of the flash storage layer makes it possible to export such an interface efficiently. 2.DFS does not yet support snapshots. Although the authors tried to justify it as a design decision, but, it is a short-coming of the system. 3.There is a limitation on the number of files and the maximum file size that DFS can handle. Comments: It is a fairly well-written paper with a lot of explanation and details. Also, detecting and mentioning the limitation of the system gives an opportunity for further development. -With regards, Jayanta Mukherjee Department of Computer Science University of Illinois, Urbana-Champaign Urbana-61801, IL, USA Mobile:+1-217-778-6650 From: liangliang.cao@gmail.com on behalf of Liangliang Cao [cao4@illinois.edu] Sent: Thursday, April 15, 2010 10:57 AM To: Gupta, Indranil Subject: 525 review 04/15 CS525 reviewed on Flash Liangliang Cao (cao4@illinois.edu) April 15, 2010 Paper 1: Characterizing Flash Memory: Anomalies, Observations, and Applications, L. Grupp et al, MICRO 2009. This paper analyzes empirically the performance, power, and reliability of flash devices of Flash from five manufactures. There are mainly two kinds of Flash: SLC and MLC. SLC enjoys a large efficiency advantages over MLC in power consumption, and enjoys less error rate. Also it founds the interesting phenomenon that the performance increases as the device wears out. The most important results show that the performance varies significantly between different flash. Based on these observations, this paper builds two applications: a new FLT with less energy consumption and a flash-aware data encoding method which leads to a longer device life. Pros: • This paper builds UCSD flash testing bed and finds a lot of insightful properties of Flash devices. • The performances of new designed FLT and encoding scheme are inspiring. Cons • There are three factors which affect the reliability of Flash: (1) wear-out, (2) program disturb and (3) read disturb. The second and third factors are not addressed well in the current FLT scheme. • It is not clear whether there are variances in the performance of the same type Flash device from the same manufacture. It will be interesting to check whether the performance of Flash is stable. Paper 2: DFS: A File System for Virtualized Flash Storage, W. Josephson et al, FAST 2010 This is a seminar paper building file system using Flash instead of traditional disk. Instead of using traditional layers of abstraction, the new designed DFS (direct file system) employs virtualized Flash storage layer, which provides a large virtual lock addressed space. The new designed File system is simpler, but also with an improved performance compared with Unix ext3 file systems. Pros: • The new DFS is much simpler, with one eighth of ext3 with similar function • Since virtualized Flash storage layer provides backward compatibility with traditional block storage interface, the new DFS may be consistent with old file system but has the potential of better performance. • Compared with ext3, DFS is more energy-efficient, and with fast IO speed. DFS is consistently better than ext3 in both direct access and buffered access. • DFS requires less efforts in designing the complex cache. Cons • The size of current Flash storage might not be as good as traditional disks, which might be a concern for making it popular. Also it is more and more important to design a cluster-based or distributed storage system. It is not clear to me whether the cost of Flash-based system is more competing. Since Flash has innate advantage on energy and IO speed, I am sure that are a lot directions which are worth exploring. • This paper pays little attention to the reliability and security issue for new system, which might be important for modern file systems. From: Kurchi Subhra Hazra [hazra1@illinois.edu] Sent: Thursday, April 15, 2010 10:13 AM To: Gupta, Indranil Subject: 525 review 04/15 Characterizing Flash Memory: Anomalies, Observations, and Applications ----------------------------------------------------------------------------------------- Summary ------------ The paper talks of flash memories, their disadvantages and some methods and applications that can help overcome these disadvantages. Although flash memories offer huge performance gains and power savings while being denser than disks, they are less durable, offer low reliability and several of their operations have different latencies. Manufacturers provide us with figures related to performance and power demands, but the authors are not sure of the usability of such data since these tend to be vague and conservative. They therefore evaluate eleven flash devices. The key results obtained are: i) Read latencies of the devices are close to what manufacturers advertise. ii) Multi-level cell (MLC) devices, where each gate can store more than one bits, exhibit a regular and predictable variation in program latency between pages within a block. This occurs when bits in one MLC cell are assigned to separate pages. The pages that are faster to program also consume dramatically less energy per operation. The authors term these as fast pages and others as slow pages iii) Performance improves predictably as the devices begin to wear out. It is this characteristic which also causes flash devices to wear out easily. iv) In terms of reliability, MLC devices show significant error rates from the very beginning. SLC devices, on the other hand, show almost zero errors until they reach their rated lifetime and maintain reasonably low rates for upto six times their lifetime. In addition, different pages in MLC have different reliability. The authors then propose two applications that exploit the above mentioned characteristics. They first propose a variation-aware Flash Translation Layer (FTL) called Mango, which uses fast pages for higher priority writes. Secondly, they propose using write-once-memory (WOM) coding scheme, that allows writing data to a block twice before erasing it. Hence, energy to erase is not required as frequently as that to write. Besides, this is expected to increase the device lifetime. The also evaluate these two approaches. Pros ----- -- The authors come up with useful statistics and interesting behavior pattern of flash devices. -- Flash devices are the future of the storage industry. This research is hence very relevant and required. Cons ------ -- It is not clear if the flash devices evaluated here can be considered as representative of the all flash devices. -- Some of the information that the authors come up with are well-known already. -- Although the authors propose two approaches to exploit the observed characteristics of flash devices, the evaluation of these approaches does not show any convincing results. -- The paper ends somewhat abruptly with not too much of evaluation results from the usage of WOM coding scheme. It seems that the authors deliberately evade some discouraging results in this section. Thanks, Kurchi Subhra Hazra From: Virajith Jalaparti [jalapar1@illinois.edu] Sent: Thursday, April 15, 2010 9:56 AM To: Gupta, Indranil Subject: 525 Review 04/15 Review of “DFS: A File System for Virtualized Flash Storage” The paper presents DFS, a file system customized for flash memory, which uses a virtualized abstraction layer leveraging its capabilities of virtual to physical block mapping, eliminates the need of using a buffer cache to access files and uses atomic update mechanism for crash recovery. The virtualization layer allows the file systems to take advantage of the characteristics of a flash device while providing a block storage interface for backward compatibility. It provides a large virtual address space while hiding the details of the virtual to physical block mapping and the latencies of erasing large blocks of data. It takes care of several other functionalities like wear-leveling and recovery of bad pages. The ioDrives used by DFS are an array of flash memory cards which leverages hardware support to maintain checksums for error detection. Further, the metadata is stored in the form of physical addresses rather than virtual in order to speed up lookups. DFS uses iNodes similar to the traditional unix file systems and partitions the address space into continuous chunks whose size is configurable at file system initialization. It divides into two types: small and large, in order to prevent large fragmentation of the virtual address space. DFS can perform read/write/erase operations without concerning itself with the physical address details since they are taken care of by the virtualization layer. DFS further supports both the use of the buffer cache and also allows the applications to bypass it in interest of their performance. The paper further provides several experimental results which compare DFS with traditional file systems like ext2/3 adapted to flash devices. These include micro benchmarks as well as analysis of the I/O throughput achieved by standard benchmarks. Comments: - The paper takes a new stand on the abstractions of flash file systems: it combines the functions of mapping, wear-leveling and reliability into a single virtualized flash storage layer as opposed to the earlier design they were decoupled and required an additional translation layer for compatibility with existing file systems. Thus, it decreases the complexity of the file system stack and allows for applications to get direct access to the data. - However, the paper does not provide sufficient arguments for supporting some of its design decisions like single large virtual address space, allocation chunks etc. While some of the concepts like iNodes have to adopted by DFS from traditional linux file systems, it does not explicitly explore various possibilities of designing a new file system. - The paper presents several experimental results which show that DFS performs better than ext3 but it does not explore the reasons for such speed-up/efficiency. It would be interesting to see which design decision has lead to how much increase in the performance of the system. ­ - The design provides for a fixed block size for the file system and does not allow it to be dynamically chosen by the application which would optimize the performance depending on the type of usage that is expected of a file. It is possible that files that are read/written in smaller portions would have efficient accesses if they have smaller block sizes. -- Virajith Jalaparti PhD Student, Computer Science University of Illinois at Urbana-Champaign Web: http://www.cs.illinois.edu/homes/jalapar1/ From: pooja.agarwal.mit@gmail.com on behalf of pooja agarwal [pagarwl@illinois.edu] Sent: Thursday, April 15, 2010 9:34 AM To: Indranil Gupta Subject: 525 review 04/15 DS REVIEW 04/15 By: Pooja Agarwal Paper – Characterizing Flash Memory: Anomalies, Observations, and Applications Conference – MICRO 2009 Main Idea: This paper presents insight into variation in performance of flash memories available by five different vendors. The metrics evaluated include latency incurred in read, write and erase operations, power consumption in each of the above operations and the reliability of the memory over time. Authors have done extensive measurements on different varieties of flash drives including SLC and MLC drives. It is shown that MLC drives show higher variability in latency, power consumption and reliability than the SLC drives. The difference between the read, write and erase latencies is also quantified in the paper. Using the observations from the evaluation, authors propose a new FTL (flash transport layer) called Mango and use of WOM (write once memory) to decrease latency and power requirements. Mango essentially allows writing into faster pages and erasing only faster pages when using fast garbage. This decreases the latency for each operation. In WOM, to reduce the power requirements, minimal changes are done on a memory cell. Pros: 1) The key contribution of the paper is the extensive evaluation of flash memories which can provide deeper insight into the performance and problems faced in flash drives. Cons: 1) The proposed new FTL, Mango is not a practical design as writes/reads/deletes are done only on the faster pages which leads to underutilization of about 50% of the disk space (as approximately every other page is faster page). 2) The above scheme also decreases the reliability of these faster pages since they are read and written more often than the slow pages. 3) The WOM encoding requires larger space than the size of the data required to be stored. In this scheme, they use 3 physical bits to store 2 logical bits which essentially leads to wastage of 1/3rd of the total memory space. Discussion: 1) It is very hard to justify that the proposed schemes are practical enough given the major disadvantages incurred by them. Uneven load balancing, lesser utilization and increased unreliability just for the sake of some improvements in latency seems quite unreasonable. 2) Given that the paper is published in MICRO, a top conference in the field of Architecture, its bit disappointing to see scarcity of novelty in the paper, the only useful contribution being the evaluation of different flash drive memories. With Regards, Pooja -- Graduate Student Department of Computer Science University of Illinois at Urbana-Champaign From: Shehla Saleem [shehla.saleem@gmail.com] Sent: Thursday, April 15, 2010 8:49 AM To: Gupta, Indranil Subject: 525 review 04/15 Characterizing Flash Memory: Anomalies, Observations, and Applications Because of its speed and power-efficient operation, flash memory has fast become popular and is now common in personal computing devices. However, flash memory faces limitations in terms of durability, data integrity and asymmetry in operation granularity. Understanding the effects and interactions of cost, performance, reliability and usage patterns are very important in order to address the above mentioned limitations. But as it turns out, not much data is provided by the flash memory manufacturers. They do not provide data on how the power consumption varies with different operations e.g. read, erase etc. This hinders research significantly. In this paper, the authors try to address some of the very common issues and concerns regarding flash memory. The authors have presented measurements of performance, reliability and power consumption from five different manufacturers and show that there is remarkable variation between products from different vendors and that the numbers do not match too closely with the vendor-provided datasheets. They then provide the design of a Flash Translation Layer called ‘Mango’, which improves energy consumption by upto 13% and reduces latency by 44%. The idea of ‘Mango’ is to skip slow pages as often as can be. Finally they present an encoding scheme to increase device lifetime by upto 5.2 times. The paper provides a detailed overview of the flash memory technology. It is also comprehensive in terms of their design. They design an FPGA based flash characterization board and for power measurements, they use a current probe and view the results on an oscilloscope. I also appreciate the choice of not revealing the flash manufacturer’s identities because that shows the intent of the authors was pure research. For their average power results, they ran their test setup at its fastest and measured the power consumption. Can the results be specific to their test setup in that case or be off because of limitations of the setup? Also, more work needs to be done on relating the effects of data wear-out and shelf life. The authors do not address this. As for ‘Mango’, there are several shortcomings e.g. increased wear, increased latency etc. Their results showed that the use of ‘Mango’ increased wear by an average of 3% and this many not be insignificant. More work may be needed to convince on the usefulness of the approach. Finally, I am surprised to see grammatical and spelling errors in the paper: Errors with the use of 'a' and 'an'. Section 3.5, ‘check’ spelled as ‘chack’! Overall, this paper is detailed and provides a good understanding of many of the major aspects of the flash device technology. From: ashameem38@gmail.com on behalf of Shameem [ahmed9@illinois.edu] Sent: Thursday, April 15, 2010 12:51 AM To: Gupta, Indranil Subject: 525 review 04/15 ===================================================================== DFS: A File System for Virtualized Flash Storage ===================================================================== Flash storage is becoming very popular to run primary file system not only in laptops but also in data centers. The reasons of such popularity are as follows: flash memory is inexpensive and getting cheaper, flash memory consumes less power as opposed to tape/disk, there is no mechanical component in flash memory, and flash memory can be used as non-volatile memory storage. Currently, there exists several file systems such as FAT, NTFS, FFS, extN, XFS, and so on, which are applicable for disks. Unfortunately, those file systems were not developed with flash storage in mind, hence not directly applicable for flash memory. Although there are few file systems for flash memory such as JFFS, YAFFS, etc., those file systems are mainly applicable for embedded applications which maintain small number of files and small file sizes. To address these issues, the authors of the paper titled "DFS: A File System for Virtualized Flash Storage" presented the design, implementation and evaluation of a new file system named DFS (Direct File System) applicable for virtualized flash storage. The layers of abstraction of DFS are designed for accessing ?ash memory devices directly. According to authors, DFS enjoys the follows novel features: (1) DFS lays out its files directly in a very large virtual storage address space (2) The virtual ?ash storage layer of DFS is used for block allocations and atomic updates. The authors evaluated DFS performance with a set of micro-benchmarks (e.g. random reads, random writes, and CPU utilization), application benchmark (performance) and showed that DFS is better than popular ext3 in two ways: (a) DFS implementation is about 1/8 th of ext3 with similar functionality and (b) DFS shows much better performance than ext3 while using same memory and less CPU. Pros: 1. DFS is simple and has a short and direct way to access ?ash memory. 2. As the authors claimed, the DFS performance is close to the hardware limit. 3. DFS is faster than ext3. Cons / Discussion Points: 1. The authors admitted that their DFS directory structure doesn't scale well to large directories. What is the best way to make it scalable for large directories? 2. It seems that CPU overhead of device driver is not trivial. 3. The authors didn't compare the performance of DFS as oppose to other existing file systems for flash storage. I believe, without such comparison, the evaluation is not complete. 4. Between DFS and ext3, which one uses more device driver memory? From: Nathan Dautenhahn [dautenh1@illinois.edu] Sent: Thursday, April 15, 2010 12:04 AM To: indy@cs.uiuc.edu Subject: 525 review 04/15 -------------------------- ( Nathan Dautenhahn ) ( CS 525 Paper Reviews ) ( 4.15.10 ) -------------------------- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Title: DFS: A File System for Virtualized Flash Storage Authors: William K. Josephson, Lars A. Bongo, David Flynn, Kai Li ==== Summary and Overview ==== This paper discusses the development of DFS, which a solution to the problem of how to effectively use flash memory in a commodity operating system machine. A key insight the authors provide is that by implementing flash controller logic into disk controllers the system looses a lot of performance and other characteristics due to the fact that the software layer of abstraction using the flash is built to work with magnetic disk, and that if one would build a system from the ground up including flash storage specific code into the kernel it would be better. The primary contributions of this work is the development of a system with flash memory in mind from the ground up, and thereby creating a "short datapath" to flash memory. ==== Comments Questions and Concerns ==== My comments are as follows: - I'm confused as to what a virtualized flash storage layer is? - The introduction is discussing highly specific issues relating to the flash storage file system, but none of this is really described/introduced well enough for a less informed audience. I would like to see a simpler intro or at least some description as to what things like Virtualized flash storage really means. I suppose this is a really hard thing to reason about because the more simple you make it the more you may offend or bore a person who is well versed in the topic. I'm not sure what FAST is about, but if it is about file systems this may work, but it is still hard to follow, especially when using this content to define motivation. - This paper provides a very thorough evaluation section that really pushes the point that the authors have done an in depth evaluation covering a lot of the angles associated with their work. I really like the macro and micro benchmarks chosen to stress their system. - One thing I didn't notice was a reference to the lifetime of the flash verses magnetic in terms of the added costs. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Title: Characterizing Flash Memory: Anomalies, Observations, and Applications Authors: Grupp, Caulfield, Coburn, Swanson, Yaakobi, Siegel, and Wolf ==== Summary and Overview ==== This paper discusses the evaluation of the true performance characteristics of flash storage, and then builds upon this knowledge in order to identify key optimizations that can be done to increase flash storage usability and performance characteristics. The key issue here is firstly how can one compare the different types of flash storage currently available, and how to produce some level of common ground to perform a comparison. The key contributions are: - Comprehensive investigation, evaluation, and analysis of current flash storage devices. - The development of a flash translation layer using knowledge gained from the first step. - Develop a new data encoding scheme to increase the flash lifetime by up to 5.2 times. ==== Comments Questions and Concerns ==== My comments are as follows: - In the intro the authors refer to flash being better than DRAM, but isn't flash storage a replacement to magnetic disk? - I really liked the description of what Flash is and how it works at a low level. This really helps to frame the context in which the reader is reading from. - It was interesting to see how the SLC were closer aligned to spec than the MLC class of flash storage devices. - The authors have produced a very well written paper. This paper clearly identifies the context of each subproblem, and motivates well the design choices made by the authors. This context really helps to understand how the work is relevant. - The authors have produced a very low level view of the flash storage mechanisms, which yield tons of insightful analysis. This in depth look at flash storage was really cool. - This is one of the first papers where I have seen a large study done to compare different things. I think its cool this is a valid purpose in research, and should not be overlooked because they have done such an in depth comparison. It really opened my eyes to different forms of valid research. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ==== Common Themes ==== Each of these papers covers the topic of analyzing and improving the uses of flash storage. One focused a lot more in identifying the true hardware characteristics to understand flash, while the other focused on a different abstraction and layer for improved performance. From: gh.hosseinabadi@gmail.com on behalf of Ghazale Hosseinabadi [ghossei2@illinois.edu] Sent: Wednesday, April 14, 2010 7:13 PM To: Gupta, Indranil Subject: 525 review 04/15 Paper1: DFS: A File System for Virtualized Flash Storage In this paper, design and implementation of Direct File System (DFS) is presented. Previously, file systems were designed to be run on magnetic disk drives. This paper addresses the problem of designing the file system for flash memories. There are two main challenges in building storage systems using NAND flash. One is that an erase operation takes one or two milliseconds. The other is that an erase block may be erased successfully only a limited number of times. In this paper, the authors have designed new layers of abstraction for flash memory storage systems and a virtualized flash storage layer. Then, they have designed DFS in such a way that it takes advantage of the virtualized flash storage layer. In the new abstraction layer designed in this paper, flash storage controller hardware is embedded to provide direct access to flash memory chips and to the virtualized flash storage layer. The virtual flash storage layer provides an abstraction to enable client software to operate on flash memory devices. It also provides backward compatibility with the traditional block storage interface. In the design of DFS, properties of the virtualized flash storage layer such as large virtualized address space, direct data access and its crash recovery capability are exploited. The performance of the designed DFS is then evaluated through simulations in order to investigate how layers of abstraction perform and what is the difference of DFS and existing file systems in terms of performance efficiency. Pros: The design of file systems for operation on flash memories is novel and it has its new benefits. The designed DFS is layered and structured. Virtualized flash storage layer makes the functionality of the system simple. DFS is faster than the existing design, ext3. Cons: It is not clear which component of the design has more impact on performance improvement. It is interesting to analyze the effect of each part separately, although the design exploits all features of flash memories at the same time. As far as I know, flash memories are still expensive and also not as reliable as magnetic storage systems. It is important to consider cost and reliability in new designs. Paper 2: Characterizing Flash Memory: Anomalies, Observations, and Applications In this paper, performance, cost and reliability of flash memories are investigated. The authors also studied the relationship between these parameters and different usage patterns. The hardware architecture of flash memories are explained. NAND flash devices support three basic operations: erase, program, and read. Erase operates on entire blocks and sets all the bits in the block to 1. Program writes entire pages at once and can only change 1s to 0s. Read operations read an entire page in parallel. Other commands such as copyback-read and copyback-program are also available for flash memories. The average latency of the basic flash memory operations and also the average programming speed for 10 program/erase cycles for 16 blocks on each chip are measured. Peak and average of power and energy consumption for flash operations are also measured. Data saved on flash memories might be lost because of wear-out, program disturb and read-disturb. Bit error rate versus program/erase cycles in case of wear-out are measured. Bit error rate versus number of programs in case of program disturb is also plotted. Per-page error rates versus page number within a block is measured. The results show large variation in error rates among pages in a single block. Per-page error rates versus page number within a block in the case of program disturb is also measured. It shows that varied error patterns emerge when a page is repeatedly reprogrammed. The reprogrammed page consistently shows no errors. Pros: Different features of flash memories are investigated through measurements. Cons: No theoretical analysis is present. It is interesting to relate different performance parameters of flash memories to physical hardware features. From: gildong2@gmail.com on behalf of Hyun Duk Kim [hkim277@illinois.edu] Sent: Wednesday, April 14, 2010 3:49 PM To: Gupta, Indranil Subject: 525 review 04/15 525 review 04/15 Hyun Duk Kim (hkim277) * Characterizing Flash Memory: Anomalies, Observations, and Applications, L. Grupp et al, MICRO 2009 This paper observes characteristics of flash memory and suggests methods to improve its performance based on their observation. Flash memory has characteristics such as limited durability, data integrity problems, and asymmetry in operation granularity. Authors experiment with different flash memory to understand trade-offs between performance/cost/reliability, and how different usage patterns affect these characteristics. Based on observation, they suggested two applications and they showed performance improvement. This paper suggests effective methods to improve the performance of flash devices with software layer. From hardware analysis, authors found out hardware characteristics and use them to design better software to manage the device, and their clever ideas increased performance. This shows that technology improvement can be done by software as well as hardware improvement itself. However, their proposal may ignore its own characteristics of the hardware. For example, the second suggestion limits the write operation once. Instead of one write operation, someone may suggest use other device such as ROM. Multiple write-read operation is one of the most fundamental characteristics of flash. System designer may need to consider other options if they should sacrifice it for other purpose. The experiment about flash performance variation may be affected flash manufacturing variation. Authors used 11 devices to understand flash performance variations. They explained different characteristics depending on its type or manufacturer. Although their experiment can say at least 'there are significant variation than data sheets', it is difficult to generalize detail conclusions (e.g. MCL is better than SLC in what sense) because there can be variation in devices. Devices selected may have their own variations. Sometimes performance varies even with different manufacturing date or factory location. Therefore, with only 11 device experiments, it may be difficult to believe all results. * DFS: A File System for Virtualized Flash Storage, W. Josephson et al, FAST 2010 This paper presents Direct File System (DFS) for virtualized flash storage. Instead of traditional block storage file transfer layer, authors suggested virtual storage layer. This large virtual layer provides client software with the flat and flexibility of direct access, hidden mapping, and backward compatibility. DFS is a Unix file system implementation to take advantage of the virtual flash storage layer. According to experiment results, because of its simple design, DFS shows better performance than traditional file system with similar functionalities. This paper presents a full file system for flash storage. While the previous works such as the flash file transfer layer tries to improve performance by adding some additional component to the existing system, this paper suggests a full file system structure. Based on the virtual storage layer to DFS, authors suggest how file systems are structures. If there is consideration about reliability using flash life time, we may be able to develop more stable system. One of the issues in flashes is unreliability. Flashes have some limit in number of usage. Compared with magnetic disk, flashes can be easily disabled after extensive usage. Therefore, sudden crash may lead problem. Although DFS has simple crash recovery strategy, more direct consideration about life time or usage count information can help to predict and recover crash better. For example, we may keep more detail logs for the old units than new units. ------ Best Regards, Hyun Duk Kim Ph.D. Candidate Computer Science University of Illinois at Urbana-Champaign http://gildong2.com