Hỏi Đáp Là gì

End to-end Learning là gì

End-to-End Machine Learning Project Tổng quan Machine Learning là một lĩnh vực của trí tuệ nhân tạo. Mục tiêu của machine learning là hiểu được cấu trúc của dữ liệu và điều chỉnh dữ liệu thành các model mà con người có thể áp dụng vào việc dự đoán, đưa ra quyết định từ những …

Data Leakage trong Machine Learning Danh sách nội dung trong bài viết: Data leakage là gì? Data leakage có phải là Overfitting? Các ví dụ cho data leakage Nhận biết data leakage Các kỹ thuật để tránh data leakage Kết luận Tài liệu tham khảo Data leakage là gì? Data leakage thường xảy ra khi …

ĐÁNH GIÁ HIỆU NĂNG CỦA MÔ HÌNH PHÂN LỚP TRONG MACHINE LEARNING Để đánh giá hiệu năng của một mô hình phân lớp trong Machine Learning [ML] chúng ta thường dùng độ chính xác [Accuracy] của kết quả dự đoán, nhưng chỉ với chỉ số này là chưa đủ để đánh giá một mô hình. …

MÔ HÌNH HỆ THỐNG ĐÁNH CỜ CARO SỬ DỤNG GIẢI THUẬT TÌM KIẾM LEO ĐỒI Tổng quan Hệ thống đánh cờ caro dựa vào các giải thuật trí tuệ nhân tạo là một chương trình đánh cờ caro tự động giữa người với máy, với sự thông minh đủ để chiến thắng, hoặc cầm hòa …

Phát hiện gian lận trong giao dịch và sự mất cân bằng dữ liệu trong bài toán phân loại Song song với sự phát triển của ngân hàng số và sự tiện lợi của thẻ tín dụng là sự gia tăng của vấn đề lừa đảo qua giao dịch ngân hàng. Chúng ta có thể …

Ứng dụng Graph Neural Network vào bài toán xác định bất thường trong Smart Contracts. Smart Contract là gì? Smart Contract [Hợp đồng thông minh] là một ứng dụng dựa trên nền tảng blockchain giúp giao dịch có thể tự động diễn ra một cách tự động, minh bạch giữa 2 bên mua-bán khi đạt …

Giới thiệu về WaveGAN Giới thiệu Mạng đối sinh [Generative Adversarial Network] được công bố vào năm 2014 đã mở ra kỷ nguyên mới cho việc tạo ra dữ liệu và GAN luôn có thế mạnh trong việc tạo và ghép hình ảnh, văn bản. Đến năm 2018, WaveGAN đã đặt tiền đề cho mạng …

Photo by Su San Lee on Unsplash

ne of the most important skills for those who work with Machine Learning is to know which method is the right choice for a given problem. Some choices are trivial [e.g. supervised or unsupervised, regression or classification] because they are related to the problem formulation itself. However, even after defining what you are trying to solve, there is usually a myriad of algorithms that can be used.

For example, imagine you want to develop a system able to predict a categorical variable. To solve this problem either Classification Tree, K-nearest neighbors, or even Artificial Neural Networks can be used. Of course, there is a reason for many different algorithms to exist, even when they solve similar problems: each one has its particularities from which we can benefit.

What makes the task even harder is that for solving some problems, like speech recognition and autonomous driving, an architecture consisting of many layers is necessary [e.g. preprocessing, feature extraction, optimization, prediction, decision making]. For each layer, many different algorithms may be used.

The issue is: for achieving better results, changes in the inner layers and its corresponding algorithms have to be applied. However, as each layer is responsible to solve particular tasks, it becomes really difficult to determine how such changes will affect the system as a whole.

End-to-end [E2E] learning refers to training a possibly complex learning system represented by a single model [specifically a Deep Neural Network] that represents the complete target system, bypassing the intermediate layers usually present in traditional pipeline designs.

End-to-end learning

nd-to-end learning is a hot topic in the Deep Learning field for taking advantage of Deep Neural Network’s [DNNs] structure, composed of several layers, to solve complex problems. Similar to the human brain, each DNN layer [or group of layers] can specialize to perform intermediate tasks necessary for such problems. Tobias Glasmachers evidentiate how E2E is framed in the Deep Learning context [1]:

“This elegant although straightforward and somewhat brute-force technique [E2E] has been popularized in the context of deep learning. It is a seemingly natural consequence of deep neural architectures blurring the classic boundaries between learning machine and other processing components by casting a possibly complex processing pipeline into the coherent and flexible modeling language of neural networks. ”

That alternative approach has been successfully applied to solve many complex problems. Below you can find how E2E is applied for Speech Recognition and Autonomous Driving problems.

Speech Recognition

Photo by Arthur Caranta

he traditional approach design for a spoken language understanding system is a pipeline structure with several different components, exemplified by the following sequence:

Audio [input] -> feature extraction -> phoneme detection -> word composition -> text transcript [output].

A clear limitation of this pipelined architecture is that each module has to be optimized separately under different criteria. The E2E approach consists in replacing the aforementioned chain for a single Neural Network, allowing the use of a single optimization criterion for enhancing the system:

Audio [input] — — — [NN] — → transcript [output]

Mike Lewis et al. introduce an E2E learning approach for natural language negotiations [2]. The resulting system is a dialogue agent based on a single Neural Network able to negotiate to achieve an agreement. This was done by training the NN using data from a large dataset of human-human negotiation records containing a variety of different negotiation tactics.

Figure from Mike Lewis et al. [2]

Another benefit of the E2E approach is that it is possible to design a model that performs well without deep knowledge about the problem, despite its complexity. Ronan Collobert et al. explain how a unified Neural Network architecture and an appropriate learning algorithm for Natural Language Processing [NLP] can be used to avoid task-specific engineering and lots of prior knowledge [3]:

“[…] we try to excel on multiple benchmarks while avoiding task-specific engineering. Instead we use a single learning system able to discover adequate internal representations. […] Our desire to avoid task-specific engineered features prevented us from using a large body of linguistic knowledge. Instead we reach good performance levels in most of the tasks by transferring intermediate representations discovered on large unlabeled data sets. We call this approach “almost from scratch” to emphasize the reduced [but still important] reliance on a priori NLP knowledge. “

Autonomous driving

utonomous driving systems can be classified as a remarkable example of complex systems composed of many layers. Following the architecture proposed by Alexandru Serban et al., we can design an autonomous driving system using 5 different layers [4]:

Figure from Alexandru Serban et al. [4]

The input data comes from several sensors [cameras, LIDAR, radars, etc.] that are processed in the sensor fusion layer to extract the relevant features [e.g. object detection]. With all the data processed and the relevant features extracted, a “world model” is created in the second layer. That model comprises the complete picture of the surrounding environment together with the vehicle internal state.

From this model, the system must choose which decisions to make in the behavior layer. According to the vehicle’s goals, it raises multiple behavior options based on the system policy and selects the best one by applying some optimization criterion.

With the decisions taken the system determines the maneuvers the vehicle must execute to satisfy the chosen behavior in the planning layer and, finally, the control values are sent to the actuator interface modules in the vehicle control layer.

Photo by Bram Van Oost on Unsplash

In the paper “End to End Learning for Self-Driving Cars”, Mariusz Bojarski et al. propose an E2E system capable to control an autonomous car directly from the pixels provided by the embedded cameras [5]. The system was able to learn internal representations of intermediate steps, such as detecting useful road features, with only the human steering angle as the training signal. The usage of Convolutional Neural Networks [CNNs] plays an important role in the proposed system for its capacity of extracting useful features from image data:

“The breakthrough of CNNs is that features are learned automatically from training examples. The CNN approach is especially powerful in image recognition tasks because the convolution operation captures the 2D nature of images.”

The designed CNN goes beyond pattern recognition to learn the entire processing pipeline needed to steer an automobile. The network architecture consists of 9 layers, including a normalization layer, 5 convolutional layers, and 3 fully connected layers. The system was trained using real driving recorded data collected in central New Jersey, Illinois, Michigan, Pennsylvania, and New York. The following figure shows the block diagram of the training system design:

Figure from Mariusz Bojarski et al. [5]

With approximately 72 hours of driving data, the system was able to learn how to steer the car in different road types and weather conditions:

“A small amount of training data from less than a hundred hours of driving was sufficient to train the car to operate in diverse conditions, on highways, local and residential roads in sunny, cloudy, and rainy conditions. The CNN is able to learn meaningful road features from a very sparse training signal [steering alone]. The system learns for example to detect the outline of a road without the need of explicit labels during training.“

Limitations of E2E

using a single DNN between input and output works for the aforementioned examples, why not use it as a general approach for solving every Machine Learning problem?

Many are the reasons that make E2E an infeasible option in different cases:

A huge amount of data is necessary: The incorporation of some prior knowledge into the training is considered a key element that will allow an increase in performance in many applications. For E2E learning not integrating this prior knowledge, more training examples must be provided.
Difficult to improve or modify the system: If some structural change must be applied [e.g. increasing the input dimensions by adding more features] the old model has no use and the hole DNN has to be replaced and trained all over again.
Highly efficient available modules cannot be used: Many techniques are efficient to solve some tasks. As an example, state-of-the-art object recognition systems are largely distributed, but as soon as it is integrated into an E2E system, it cannot be considered E2E anymore.
Difficult to validate: If a high level of validation is necessary, E2E may become infeasible. Due to the complex architecture, the potential number of input/output pairs can be big enough to make the validation impossible. This is especially important for some sectors like the automotive industry.

On top of these issues, E2E may not work for some applications, as shown in [1]:

“We have demonstrated that end-to-end learning can be very inefficient for training neural network models composed of multiple non-trivial modules. End-to-end learning can even break down entirely; in the worst case none of the modules manages to learn. In contrast, each module is able to learn if the other modules are already trained and their weights frozen. This suggests that training of complex learning machines should proceed in a structured manner, training simple modules first and independent of the rest of the network. “

Conclusion

nd-to-end is indisputably a great tool for solving elaborate tasks. The idea of using a single model that can specialize to predict the outputs directly from the inputs allows the development of otherwise extremely complex systems that can be considered state-of-the-art. However, every enhancement comes with a price: while consecrated in the academic field, the industry is still reluctant to use E2E to solve its problems due to the need for a large amount of training data and the difficulty of validation.

References

[1] Glasmachers, Tobias. “Limits of end-to-end learning.” arXiv preprint arXiv:1704.08305 [2017].

[2] Lewis, Mike, et al. “Deal or no deal? end-to-end learning for negotiation dialogues.” arXiv preprint arXiv:1706.05125[2017].

[3] Collobert, Ronan, et al. “Natural language processing [almost] from scratch.” Journal of machine learning research 12.Aug [2011]: 2493–2537.

[4] Serban, Alexandru Constantin, Erik Poll, and Joost Visser. “A Standard Driven Software Architecture for Fully Autonomous Vehicles.” 2018 IEEE International Conference on Software Architecture Companion [ICSA-C]. IEEE, 2018.

[5] Bojarski, Mariusz, et al. “End to end learning for self-driving cars.” arXiv preprint arXiv:1604.07316 [2016].

End-to-end learning

Speech Recognition

Autonomous driving

Limitations of E2E

Conclusion

References

Video liên quan

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề