Whats a good explanation of the benefits and drawbacks to lossy compression algorithms?


Whats a good explanation of the benefits and drawbacks to lossy compression algorithms?


When it comes to storing data, memory space can be an issue. Naturally, you want to optimise the space you have — and that means reducing the size of files as much as possible.

This is where compression comes in. That is, techniques to reduce the size of data without losing the information. There are two categories of compression to know about: lossy and lossless.

But what are the differences, and which is better? Let’s take a closer look at lossy vs lossless compression.


Lossy compression

Lossy compression is where the compression process reduces file size by deleting any unnecessary or extraneous data — that which won’t prevent the computer from representing the content in question.

In other words, it reduces file size by removing data without removing information. It’s also known as irreversible compression because it involves permanently removing the excess data — files are not restorable to their original state after lossy compression.

Lossy compression is most commonly used for storing images but is also used for other multimedia data like audio and video. In images, lossy compression algorithms remove the details that the eye cannot distinguish. Extra pixels and colours, for instance.

A well-known example of a lossy compression format is JPEG, used for digital images.


Lossless compression

Lossless compression is the other side of the lossy vs lossless compression coin. Where lossy compression works by removing extraneous data, lossless compression encodes digital files without losing the extra details. Lossless compression is also known as reversible compression because it allows files to be restored.

In other words, lossless compression does not sacrifice the quality of the image or files compressed. But as a pay-off, it doesn’t reduce the size of the files in question as much.

Lossless compression can be used on images but is more commonly used on text and data files.


Lossy vs lossless: pros and cons

Lossy
  • Pros

In the lossy vs lossless compression question, a pro of the lossy method is that it offers a greater reduction in file size.

In turn, this means that lossy compression makes for more data holding capacity. Smaller file sizes also make for faster loading and better performance.

  • Cons

The cons of lossy compression are twofold. First, lossy compression sacrifices the quality of the data compressed.

Second, the removal of the unneeded data is irreversible. So, with lossy compression, you cannot restore the files to their original quality.

Lossless
  • Pros

The main pro of lossless compression over lossy is that it doesn’t sacrifice quality to reduce file size. All the data remains.

Additionally, the compression can be reversed, allowing files to be restored to their original state.

  • Cons

However, in the lossy vs lossless questions, it’s important to note that lossless compression cannot reduce the size of files as much as lossy can.


Lossy vs Lossless compression

To recap: lossy compression is a compression strategy that involves removing unneeded detail. Lossless compression does not result in losing this data, but at the cost of a smaller reduction in file size.

Both lossy and lossless compression have their benefits and their drawbacks. And both have their uses in the world of data storage and management.


Useful links

Is data a liability?

ELI5: what is image classification in deep learning?

Everything wrong with manual data entry


Post navigation

Compression

Peter Wayner, in Disappearing Cryptography (Third Edition), 2009

5.4 Summary

Compression algorithms are normally used to reduce the size of a file without removing information. This can increase their entropy and make the files appear more random because all of the possible bytes become more common. The compression algorithms can also be useful when they're used to produce mimicry by running the compression functions in reverse. This is described in Chapter 6.

The Disguise Compression algorithms generally produce data that looks more random. That is, there is a more even distribution of the data.

How Secure Is It? Not secure at all. Most compression algorithms transmit the table or dictionary at the beginning of the file. This may not be necessary because both parties could agree on such a table in advance. Although I don't know how to figure out the mapping between the letters and the bits in the Huffman algorithm, I don't believe it would be hard to figure out.

How to Use It Many compression programs available for all computers. They often use proprietary algorithms that are better than the versions offered here and make an ideal first pass for any encryption program.

Further Reading

My book, Compression Algorithms for Real Programmers, is an introduction to some of the most common compression algorithms. [Way00]

The Mathematical Theory of Communication by Claude E. Shannon and Warren Weaver is still in print after almost 60 years and over 20 printings. [SW63]

Khalid Sayood's long book, Introduction to Data Compression, is an excellent, deep introduction. [Say00]

Jacob Seidelin suggests compressing text by turning it into an 8-bit PNG file. He provides the Javascript code on his blog, nihilogic. The result looks much like white noise. This may be a more practical way to hide information in the least significant bits of images. [Sei08]

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123744791500101

Dictionary-based compression

Ida Mengyi Pu, in Fundamental Data Compression, 2006

Summary

Dictionary compression algorithms use no statistical models. They focus on the memory on the strings already seen. The memory may be an explicit dictionary that can be extended infinitely, or an implicit limited dictionary as sliding windows. Each seen string is stored into a dictionary with an index. The indices of all the seen strings are used as codewords. The compression and decompression algorithm maintains individually its own dictionary but the two dictionaries are identical. Many variations are based on three representative families, namely LZ77, LZ78 and LZW. Implementation issues include the choice of the size of the buffers, the dictionary and indices.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780750663106500106

Introduction

Ida Mengyi Pu, in Fundamental Data Compression, 2006

1.1.2 Decompression

Any compression algorithm will not work unless a means of decompression is also provided due to the nature of data compression. When compression algorithms are discussed in general, the word compression alone actually implies the context of both compression and decompression.

In this book, we sometimes do not even discuss the decompression algorithms when the decompression process is obvious or can be easily derived from the compression process. However, as a reader, you should always make sure that you know the decompression solutions as well as the ones for compression.

In many practical cases, the efficiency of the decompression algorithm is of more concern than that of the compression algorithm. For example, movies, photos, and audio data are often compressed once by the artist and then the same version of the compressed files is decompressed many times by millions of viewers or listeners.

Alternatively, the efficiency of the compression algorithm is sometimes more important. For example, the recording audio or video data from some real-time programs may need to be recorded directly to a limited computer storage, or transmitted to a remote destination through a narrow signal channel.

Depending on specific problems, we sometimes consider compression and decompression as two separate synchronous or asynchronous processes.

Figure 1.1 shows a platform based on the relationship between compression and decompression algorithms.

Whats a good explanation of the benefits and drawbacks to lossy compression algorithms?

Figure 1.1. Compressor and decompressor

A compression algorithm is often called compressor and the decompression algorithm is called decompressor.

The compressor and decompressor can be located at two ends of a communication channel, at the source and at the destination respectively. In this case, the compressor at the source is often called the coder and the decompressor at the destination of the message is called the decoder. Figure 1.2 shows a platform based on the relationship between a coder and decoder connected by a transmission channel.

Whats a good explanation of the benefits and drawbacks to lossy compression algorithms?

Figure 1.2. Coder and decoder

There is no substantial difference between the platform in Figure 1.1 and that in Figure 1.2 in terms of the compression algorithms discussed in this book. However, certain concepts may be discussed and understood more conveniently at one platform than the other. For example, it might be easier to introduce the information theory in Chapter 2 based on the coder-decoder platform. Then again, it might be more convenient to discuss the symmetric properties of a compression algorithm and decompression algorithm based on the compressor-decompressor platform.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780750663106500040

Digital image basics

Margot Note, in Managing Image Collections, 2011

Compression

Compression algorithms reduce the number of bytes required to represent data and the amount of memory required to store images. Compression allows a larger number of images to be stored on a given medium and increases the amount of data that can be sent over the internet. It relies on two main strategies: redundancy reduction and irrelevancy reduction.

Redundancy reduction, used during lossless encoding, searches for patterns that can be expressed more efficiently. An image viewed after lossless compression will appear identical to the way it was before being compressed. Lossless compression techniques can reduce the size of images by up to half. The resulting compressed file may still be large and unsuitable for network dissemination. However, lossless compression does provide for more efficient storage when it is imperative that all the information stored in an image should be preserved for future use.

Irrelevancy reduction, a lossy compression, utilizes a means for averaging or discarding the least significant information, based on an understanding of visual perception, to create smaller file sizes. Lossy compression reduces the image’s quality but can achieve dramatic storage savings. It should not be used when image quality and integrity are important, such as in archival copies of digital images.

Not all images respond to lossy compression in the same manner. As an image is compressed, particular kinds of visual characteristics, such as subtle tonal variations, may produce what are known as artifacts (unintended visual effects), though these may go largely unnoticed, due to the continuously variable nature of photographic images. Other kinds of images, such as pages of text or line illustrations, will show the artifacts of lossy compression more clearly. These may accumulate over generations, especially if different compression schemes are used, so artifacts that were imperceptible in one generation may become ruinous over many. This is why uncompressed archival master files should be maintained from which compressed derivative files can be generated for access and other purposes.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781843345992500027

Introduction

Khalid Sayood, in Introduction to Data Compression (Fifth Edition), 2018

1.1.3 Measures of Performance

A compression algorithm can be evaluated in a number of different ways. We could measure the relative complexity of the algorithm, the memory required to implement the algorithm, how fast the algorithm performs on a given machine, the amount of compression, and how closely the reconstruction resembles the original. In this book we will mainly be concerned with the last two criteria. Let us take each one in turn.

A very logical way of measuring how well a compression algorithm compresses a given set of data is to look at the ratio of the number of bits required to represent the data before compression to the number of bits required to represent the data after compression. This ratio is called the compression ratio. Suppose storing an image made up of a square array of 256×256 pixels requires 65,536 bytes. The image is compressed and the compressed version requires 16,384 bytes. We would say that the compression ratio is 4:1. We can also represent the compression ratio by expressing the reduction in the amount of data required as a percentage of the size of the original data. In this particular example, the compression ratio calculated in this manner would be 75%.

Another way of reporting compression performance is to provide the average number of bits required to represent a single sample. This is generally referred to as the rate. For example, in the case of the compressed image described above, the average number of bits per pixel in the compressed representation is 2. Thus, we would say that the rate is 2 bits per pixel.

In lossy compression, the reconstruction differs from the original data. Therefore, in order to determine the efficiency of a compression algorithm, we have to have some way of quantifying the difference. The difference between the original and the reconstruction is often called the distortion. (We will describe several measures of distortion in Chapter 8.) Lossy techniques are generally used for the compression of data that originate as analog signals, such as speech and video. In compression of speech and video, the final arbiter of quality is human. Because human responses are difficult to model mathematically, many approximate measures of distortion are used to determine the quality of the reconstructed waveforms. We will discuss this topic in more detail in Chapter 8.

Other terms that are also used when talking about differences between the reconstruction and the original are fidelity and quality. When we say that the fidelity or quality of a reconstruction is high, we mean that the difference between the reconstruction and the original is small. Whether this difference is a mathematical difference or a perceptual difference should be evident from the context.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012809474700001X

Communicating pictures: delivery across networks

David R. Bull, Fan Zhang, in Intelligent Image and Video Compression (Second Edition), 2021

Abstract

Video compression algorithms rely on spatio-temporal prediction combined with variable-length entropy encoding to achieve high compression ratios but, as a consequence, they produce an encoded bitstream that is inherently sensitive to channel errors. This becomes a major problem when video information is transmitted over unreliable networks, since any errors introduced into the bitstream during transmission will rapidly propagate to other regions in the image sequence.

In order to promote reliable delivery over lossy channels, it is usual to invoke various error detection and correction methods. This chapter firstly introduces the requirements for an effective error-resilient video encoding system and then goes on to explain how errors arise and how they propagate spatially and temporally. We then examine a range of techniques in Sections 11.4, 11.5, and 11.6 that can be employed to mitigate the effects of errors and error propagation. We initially consider methods that rely on the manipulation of network parameters or the exploitation of network features to achieve this; we then go on to consider methods where the bitstream generated by the codec is made inherently robust to errors (Section 11.7). We present decoder-only methods that conceal rather than correct bitstream errors in Section 11.8. These deliver improved subjective quality without adding transmission overhead. Finally in Section 11.9, we describe congestion management techniques, in particular HTTP adaptive streaming (HAS), that are widely employed to support reliable streaming of video under dynamic network conditions.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128203538000207

A framework for accelerating bottlenecks in GPU execution with assist warps

N. Vijaykumar, ... O. Mutlu, in Advances in GPU Research and Practice, 2017

5.1.3 Implementing other algorithms

The BDI compression algorithm is naturally amenable toward implementation using assist warps because of its data-parallel nature and simplicity. The CABA framework can also be used to realize other algorithms. The challenge in implementing algorithms like FPC [59] and C-Pack [17], which have variable-length compressed words, is primarily in the placement of compressed words within the compressed cache lines. In BDI, the compressed words are in fixed locations within the cache line, and for each encoding, all the compressed words are of the same size and can, therefore, be processed in parallel. In contrast, C-Pack may employ multiple dictionary values as opposed to just one base in BDI. In order to realize algorithms with variable length words and dictionary values with assist warps, we leverage the coalescing/address generation logic [60, 61] already available in the GPU cores. We make two minor modifications to these algorithms [17, 59] to adapt them for use with CABA. First, similar to prior works [17, 54, 59], we observe that few encodings are sufficient to capture almost all the data redundancy. In addition, the impact of any loss in compressibility because of fewer encodings is minimal as the benefits of bandwidth compression are at multiples of a only single DRAM burst (e.g., 32B for GDDR5 [62]). We exploit this to reduce the number of supported encodings. Second, we place all the metadata containing the compression encoding at the head of the cache line to be able to determine how to decompress the entire line upfront. In the case of C-Pack, we place the dictionary entries after the metadata.

We note that it can be challenging to implement complex algorithms efficiently with the simple computational logic available in GPU cores. Fortunately, SFUs [63, 64] are already in the GPU SMs, used to perform efficient computations of elementary mathematical functions. SFUs could potentially be extended to implement primitives that enable the fast iterative comparisons performed frequently in some compression algorithms. This would enable more efficient execution of the described algorithms, as well as implementation of more complex compression algorithms, using CABA. We leave the exploration of an SFU-based approach to future work.

We now present a detailed overview of mapping the FPC and C-Pack algorithms into assist warps.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012803738600015X

Information Technology Systems Infrastructure

Thomas Norman CPP, PSP, CSC, in Integrated Security Systems Design (Second Edition), 2014

Advantages and Disadvantages

Each JPEG image is a new fresh image. This is very useful where the frame rate must be very low, such as on an offshore oil platform with a very low-bandwidth satellite uplink, or where only a dial-up modem connection is available for network connectivity. I used JPEG on an offshore platform with only a 64 kb/s satellite connection available. MPEG is most useful where there is adequate data bandwidth available for a fast-moving image but where it is desirable to conserve network resources for future growth and for network stability.

More on Compression Algorithms:

MJPEG: MJPEG is a compression scheme that uses the JPEG-compression method on each individual frame of video. Thus each frame is an entire picture.

MPEG-4: MPEG-4 is a compression scheme that uses a JPEG “Initial Frame (I-Frame),” followed by “Partial Frames (P-Frames),” each of which only addresses the pixels where changes have occurred from the previous frame. After some period of seconds, enough changes have occurred that a new I-frame is sent and the process is started all over again.

H.264: H.264 is a compression scheme that operates much like MPEG-4 but that results in a much more efficient method of storing video, but H.264 relies on a more robust video-rendering engine in the workstation to view the compressed video.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128000229000115

Fundamentals and Standards of Compression and Communication

Stephen P. Yanek, ... Joan E. Fetter, in Handbook of Medical Imaging, 2000

2.2 Moving Picture Experts Group (MPEG) Compression

Standardization of compression algorithms for video was first initiated by CCITT for teleconferencing and video telephony. The digital storage media for the purpose of this standard include digital audio tape (DAT), CD-ROM, writeable optical disks, magnetic tapes, and magnetic disks, as well as communications channels for local and wide area networks, LANs and WANs, respectively. Unlike still image compression, full motion image compression has time and sequence constraints. The compression level is described in terms of a compression rate for a specific resolution.

The MPEG standards consist of a number of different standards. The original MPEG standard did not take into account the requirements of high-definition television (HDTV). The MPEG-2 standards, released at the end of 1993, include HDTV requirements in addition to other enhancements. The MPEG-2 suite of standards consists of standards for MPEG-2 audio, MPEG-2 video, and MPEG-2 systems. It is also defined at different levels to accommodate different rates and resolutions as described in Table 2.

TABLE 2. MPEG-2 resolutions, rates, and metrics [2]

LevelPixel to line ratioCompression and decompression rateLines per frameFrames per secondPixels per second
High 1920 Up to 60 Mbits per second 1152 60 62.7 million
High 1440 Up to 60 Mbits per second 1152 60 47 million
Main 720 Up to 15 Mbits per second 576 30 10.4 million
Low 352 Up to 4 Mbits per second 288 30 2.53 million

Moving pictures consist of sequences of video pictures or frames that are played back at a fixed number of frames per second. Motion compensation is the basis for most compression algorithms for video. In general, motion compensation assumes that the current picture (or frame) is a revision of a previous picture (or frame). Subsequent frames may differ slightly as a result of moving objects or a moving camera, or both. Motion compensation attempts to account for this movement. To make the process of comparison more efficient, a frame is not encoded as a whole. Rather, it is split into blocks, and the blocks are encoded and then compared. Motion compensation is a central part of MPEG-2 (as well as MPEG-4) standards. It is the most demanding of the computational algorithms of a video encoder.

The established standards for image and video compression developed by JPEG and MPEG have been in existence, in one form or another, for over a decade. When first introduced, both processes were implemented via codec engines that were entirely in software and very slow in execution on the computers of that era. Dedicated hardware engines have been developed and real-time video compression of standard television transmission is now an everyday process, albeit with hardware costs that range from $10,000 to $100,000, depending on the resolution of the video frame. JPEG compression of fixed or still images can be accomplished with current generation PCs. Both JPEG and MPEG standards are in general usage within the multimedia image compression world. However, it seems that the DCT is reaching the end of its performance potential since much higher compression capability is needed by most of the users in multimedia applications. The image compression standards are in the process of turning away from DCT toward wavelet compression.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780120777907500540

Video Compression

Khalid Sayood, in Introduction to Data Compression (Fifth Edition), 2018

19.14.3 Compression Algorithms for Packet Video

Almost any compression algorithm can be modified to perform in the ATM environment, but some approaches seem more suited to this environment. We briefly present two approaches (see the original papers for more details).

One compression scheme that functions in an inherently layered manner is subband coding. In subband coding, the lower-frequency bands can be used to provide the basic reconstruction, with the higher-frequency bands providing the enhancement. As an example, consider the compression scheme proposed for packet video by Karlsson and Vetterli [303]. In their scheme, the video is divided into 11 bands. First, the video signal is divided into two temporal bands. Each band is then split into four spatial bands. The low-low band of the temporal low-frequency band is then split into four spatial bands. A graphical representation of this splitting is shown in Fig. 19.31. The subband denoted 1 in the figure contains the basic information about the video sequence. Therefore, it is transmitted with the highest priority. If the data in all the other subbands are lost, it will still be possible to reconstruct the video using only the information in this subband. We can also prioritize the output of the other bands, and if the network starts getting congested and we are required to reduce our rate, we can do so by not transmitting the information in the lower-priority subbands. Subband 1 also generates the least variable data rate. This is very helpful when negotiating with the network for the amount of priority traffic.

Whats a good explanation of the benefits and drawbacks to lossy compression algorithms?

Figure 19.31. Analysis filter bank.

Given the similarity of the ideas behind progressive transmission and subband coding, it should be possible to use progressive transmission algorithms as a starting point in the design of layered compression schemes for packet video. Chen, Sayood, and Nelson [304] use a DCT-based progressive transmission scheme [305] to develop a compression algorithm for packet video. In their scheme, they first encode the difference between the current frame and the prediction for the current frame using a 16×16 DCT. They only transmit the DC coefficient and the three lowest-order AC coefficients to the receiver. The coded coefficients make up the highest-priority layer.

The reconstructed frame is then subtracted from the original. The sum of squared errors is calculated for each 16×16 block. Blocks with squared error greater than a prescribed threshold are subdivided into four 8×8 blocks, and the coding process is repeated using an 8×8 DCT. The coded coefficients make up the next layer. Because only blocks that fail to meet the threshold test are subdivided, information about which blocks are subdivided is transmitted to the receiver as side information.

The process is repeated with 4×4 blocks, which make up the third layer, and 2×2 blocks, which make up the fourth layer. Although this algorithm is a variable-rate coding scheme, the rate for the first layer is constant. Therefore, the user can negotiate with the network for a fixed amount of high-priority traffic. In order to remove the effect of delayed packets from the prediction, only the reconstruction from the higher-priority layers is used for prediction.

This idea can be used with many different progressive transmission algorithms to make them suitable for use over ATM networks.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128094747000197

What are the advantages and disadvantages of lossy compression?

Lossy advantages and disadvantages Advantages: Very small file sizes and lots of tools, plugins, and software support it. Disadvantages: Quality degrades with higher ratio of compression. Can't get original back after compressing.

What are the drawbacks to lossy compression?

Advantages and disadvantages of lossy and lossless compression..

What are the relative benefits and drawbacks of lossy versus lossless compression?

Lossy compression will remove data it deems unnecessary from the image permanently. It uses many different techniques to achieve this, resulting in much tinier file sizes. Lossless compression also removes data, but it can restore the original if needed. The goal is to keep quality high, yet reduce the file size.

Why is using lossy compression beneficial?

Solid image quality — By and large, lossy compression can produce an image with passable or even unnoticeable differences from the original. It's all about balance. Faster website load times — Reducing resolution and file size means that your images will load quicker online.