What you’ll learn
- What are transformer networks?
- State of the Art architectures for CV Apps like Image Classification, Semantic Segmentation, Object Detection and Video Processing
- Practical application of SoTA architectures like ViT, DETR, SWIN in Huggingface vision transformers
- Attention mechanisms as a general Deep Learning idea
- Inductive Bias and the landscape of DL models in terms of modeling assumptions
- Transformers application in NLP and Machine Translation
- Transformers in Computer Vision
- Different types of attention in Computer Vision
This course includes:
- 5.5 hours on-demand video
- 1 article
- 1 downloadable resource
- Access on mobile and TV
- Full lifetime access
- Certificate of completion
Description
Transformer Networks are the new trend in Deep Learning nowadays. Transformer models have taken the world of NLP by storm since 2017. Since then, they become the mainstream model in almost ALL NLP tasks. Transformers in CV are still lagging, however they started to take over since 2020.
We will start by introducing attention and the transformer networks. Since transformers were first introduced in NLP, they are easier to be described with some NLP example first. From there, we will understand the pros and cons of this architecture. Also, we will discuss the importance of unsupervised or semi supervised pre-training for the transformer architectures, discussing Large Scale Language Models (LLM) in brief, like BERT and GPT.
This will pave the way to introduce transformers in CV. Here we will try to extend the attention idea into the 2D spatial domain of the image. We will discuss how convolution can be generalized using self attention, within the encoder-decoder meta architecture. We will see how this generic architecture is almost the same in image as in text and NLP, which makes transformers a generic function approximator. We will discuss the channel and spatial attention, local vs. global attention among other topics.
In the next three modules, we will discuss the specific networks that solve the big problems in CV: classification, object detection and segmentation. We will discuss Vision Transformer (ViT) from Google, Shifter Window Transformer (SWIN) from Microsoft, Detection Transformer (DETR) from Facebook research, Segmentation Transformer (SETR) and many others. Then we will discuss the application of Transformers in video processing, through Spatio-Temporal Transformers with application to Moving Object Detection, along with Multi-Task Learning setup.
Finally, we will show how those pre-trained arcthiectures can be easily applied in practice using the famous Huggingface library using the Pipeline interface.
Who this course is for:
- Intermediate to Advanced CV Engineers
- Intermediate to Advanced CV Researchers
How to Get this course FREE?
Get a 100% Discount On Udemy courses by clicking on the Apply Here Button. This Course coupon code is automatically added to the Apply Here Button.
Apply this Coupon: 6EF8CF3F64544EDD9DB6 is applied (For 100% Discount)
For Latest Udemy Courses Coupon, Join Our Official Free Telegram Group :https://t.me/freecourseforall
Note: The udemy Courses Will be free for a Maximum of 1000 Learners can use the promo code AND Get this course for 100% Free. After that, you will get this course at a discounted price.
Important Notice and Disclaimer:- CareerBoostZone platform is a free Job Sharing platform for all the Job seekers. We don’t charge any cost and service fee for any job which is posted on our website, neither we have authorized anyone to do the same. Most of the jobs posted over Seekajob are taken from the career pages of the organizations. Jobseekers/Applicants are advised to check all the details when they apply for the job to avoid any inconvenience.