번역: How to Read Deep Learning Paper as a Software Engineer

Transcript Translation

How to Read Deep Learning Paper as a Software Engineer - https://www.youtube.com/watch?v=nL7lAo95D-o

Deep learning papers can look daunting to read, especially if you don't have a strong theoretical background in machine or deep learning. Some paper can be so dense with jargon, formula and magical looking results that you might feel you are missing ten years worth of background to even start to look at the title. In this video, I'll show you how I read most deep learning paper easily, even for new deep learning subfield to start out, there are two ways to read a paper depending on your goals. The first way is to kind of understand what was done, but not in depth, usually to use the results of the research or to understand another paper using these results. This way of reading rely heavily on trusting that the paper is right. It's not too difficult to do. If you read the abstract and conclusion of a paper, you can get the gist of it. It's a bit like knowing roughly what react is doing, but being more interested in using it than understanding how it's implementing. Like a shadow dom the second way to read the paper is where people get lost. The goal is to understand a research project so deeply that you can reproduce these results. In that setup, there is little trust and you actually need to understand the paper methods. This way of reading paper is what I'll be covering today. Before we start, let me preface it with a warning about deep learning research in general. It's a highly empirical field, meaning a lot of things get discovered just because they were tried. It also means that whenever reading a deep learning paper, you need to understand the context in which it was written. Most of the stuff used within can be slightly wrong, or at least the reasoning for why they worked is wrong. Be acutely aware of that. You will see it's easier to see it with very old deep learning paper with the knowledge that we have currently. Okay, so the full method I use looks somewhat like this. There's like seven steps. I involve like multiple read tree of them, which is very normal for any research paper. Paper in general are not linear, they are kind of a structure with like a lot of interdependency. The first step to read a paper is to have enough contextual information about it to be able to orient yourself properly. Going into a new field is highly disorienting. If you don't have context, you will get lost. Researchers with lots of experience within the field already have the context, so this is why they can grasp a paper easily. My favorite technique to absorb all the context before reading a paper quickly is the following. I find like two or three blog posts summarizing the main finding in the paper, they don't have to be well written or write. It will just help me orient myself with the nomenclature and stuff like that. After that I find like five or ten diverse videos on the paper. Main findings again, you are going for a diverse set of opinions about what is important in the paper. I will suggest here to vary the length of so between like six to 60 minutes is good. The first material you will ingest on the paper subject will fail disorienting. However, after a few you will start to notice patterns that will tell you where the real focus of the paper lies and where the strength are. This will also mean that as you read more paper in the field, it will get easier and easier to absorb the context faster. Now that you have the context in your head, you can open up the paper for first casual read I first start with a linear read from end to end where I note down on a separate sheet every element that I don't understand. Everything I understand is all fine. Usually I keep reading over it. I don't research things I don't understand deeply while I'm doing my first read, I just make note of them. There are usually five things in a paper that I take note of that make it harder to understand. These are unknowns I need to somewhat categorize and elucidate to make sense of the whole research. The first category of unknowns are critical elements of the paper that aren't explained by the others and implied to be understood. The second type are critical topics that are explained in the paper but need some work to grasp fully. The third type of unknown are elements that the author didnt understand fully, so they kind of waved it away. This type of unknown is usually very hard to get an answer for and is a core target for criticism in later years following publication. The fourth type is surprisingly stuff that the authors got wrong. It does happen. It doesnt mean the paper is garbage though sometimes things work out anyway. The fifth type is element of the paper that the reviewers wanted to be added before publication but provided no value whatsoever to the paper. These usually sticks out of the paper. Once you are done with noting down all of the unknowns you are ready to start to fill some gaps. Try to categorize the stuff you didnt understand into these five categories. Only tackle the external unknown for now, research whatever you dont understand and make sure you come back to the paper with a better understanding. Dont get too lost here. If its a specific method the paper makes use of leverage. The first way of reading a paper which is to superficially understand the intuition behind or better yet, just read a blog that contains the technical information that you need. Now comes the second read through. The whole purpose is to reduce the number of internal unknown to the minimum. In that read, I like to do the start with the abstract and the introduction to understand what is a setup in the paper. Try to understand the paper's motivation because it will help with understanding why the methods are the way they are. Then you jump to the discussion and conclusions section at the end of the paper to see the end point of this experiment. It's important that you understand deeply what the author got out of the research. Generally, the conclusion will follow the motivation of the start in a connected fashion. Since the paper was not written, introduction first, and then six months after that they got to the conclusion. It was all written together in a recursive fashion. Finally, I take each of the figures out and I try to understand what everything in the figure means, the axis, what the graphs are highlighting, the units, all of this stuff. At this point I feel a very deep connection with the whole research. I used the introduction and the conclusions as anchor and tried to fit in the result shown in the figures in a sort of like logical timeline. If there is something that feels disconnected, it's a hint that there is maybe something else that I didn't understand apart from their specific method or result the other got. Now there are two ways to finish up the second read through reading through the technical details slowly and making sure you understand each step conceptually, including the formulas. Or you open up the code base for the paper first if it's provided and start mapping out each of the sections. I like the second way because as a software engineer it will give you confidence that this stuff is not magic. At the end of the day, it's code that was written up and ran to generate whatever results. Do a good pass on the code base until you feel kind of comfortable about where things are and that you cross reference some of the name in the code with those in the paper. Disclaimer though research code is usually very messy and doesn't have a standardized structure, keep that in mind. Once done with the code, you are ready to check out the methods result section in depth. Go slowly and carefully and understand the methodology behind the paper step by step. There are a few repeating elements in a deep learning experiment that are mostly always there. Understand that structure helps a lot in knowing how to navigate the method section. First, there is the data that is fed to the deep learning model. The data has a specific structure that dictates the architecture choice for the model. As you dive into a specific field, you will see the same kind of dataset used over and over again. Then there is the architecture of the model per se. Here is the layer used, the connection with instead layer, and all this. There might be a lot of different sub elements in an architecture that you don't necessarily understand at first, and it's okay after that is the type of gradient descent optimizer and training regiment used. There are a lot of empirical details usually hidden here, so it's fine if the why isn't always 100% clear in the training. Finally, there is the pipeline from the raw data to whatever result is generated. Sometimes paper will combine multiple architecture in specific ways, like with an ensemble method. In any given research, there are usually only a few tweaks that understand the core. As you read more paper, you will start to spot them easily. Break each of the steps from the method down and try to understand the logical flow. If something feels weird or surprising, check the code or other resources for reasons why the author did it the way they did. Finally, go slowly through the results section and fill in a blank in your timeline between the introduction and the conclusion with the numerical or qualitative result. And once done, I usually like to do a final read from start to finish and look at my note to ensure I understand completely the flow again, if there is something that feels weird during that read, it might be that I still dont understand something explained within the paper, or that the author didnt explain something fully for a literal lack of knowledge, or that the author was flat out wrong, or that a reviewer asked to add some superfluous garbage information which feels disjointed anyway, in all cases, after this third read, you should understand the paper enough to take on a reproducibility effort if needs be. I hope you enjoyed the video. Don't forget to like if this was the case and leave a comment if you have any question. I'm here to help. Have a great week everyone, and see you in the next video.

소프트웨어 엔지니어로서 딥 러닝 논문을 읽는 방법 - https://www.youtube.com/watch?v=nL7lAo95D-o

딥 러닝 논문은 읽기 어려울 수 있습니다. 특히 머신 러닝이나 딥 러닝에 대한 이론적 배경이 강하지 않은 경우 더욱 그렇습니다. 일부 논문은 전문 용어, 공식, 마법처럼 보이는 결과로 너무 복잡하여 제목을 보기 시작하려면 10년 분의 배경 지식이 없는 것처럼 느낄 수 있습니다. 이 영상에서는 대부분의 딥 러닝 논문을 쉽게 읽는 방법을 보여드리겠습니다. 새로운 딥 러닝 하위 분야를 시작하려는 경우에도 목표에 따라 논문을 읽는 두 가지 방법이 있습니다. 첫 번째 방법은 수행된 내용을 어느 정도 이해하지만 심층적으로는 아니고, 일반적으로 연구 결과를 사용하거나 이러한 결과를 사용하여 다른 논문을 이해하는 것입니다. 이러한 읽기 방법은 논문이 옳다는 것을 믿는 데 크게 의존합니다. 그렇게 어렵지 않습니다. 논문의 초록과 결론을 읽으면 요점을 파악할 수 있습니다. React가 무엇을 하는지 대략적으로 알고 있지만, 구현 방법을 이해하는 것보다 사용하는 데 더 관심이 있는 것과 비슷합니다. Shadow dom과 같이 논문을 읽는 두 번째 방법은 사람들이 길을 잃는 곳입니다. 목표는 연구 프로젝트를 깊이 이해하여 이러한 결과를 재현하는 것입니다. 그러한 설정에서는 신뢰가 거의 없으며 실제로 논문 방법을 이해해야 합니다. 이러한 논문 읽기 방법이 오늘 다룰 것입니다. 시작하기 전에 일반적으로 딥 러닝 연구에 대한 경고로 시작하겠습니다. 매우 경험적인 분야이므로 많은 것이 시도되었다는 이유만으로 발견됩니다. 또한 딥 러닝 논문을 읽을 때마다 작성된 맥락을 이해해야 합니다. 사용된 대부분의 내용은 약간 잘못되었거나 적어도 작동 이유에 대한 추론이 잘못되었습니다. 이를 예리하게 인식하십시오. 현재 우리가 가진 지식으로 매우 오래된 딥 러닝 논문을 보면 더 쉽게 볼 수 있습니다. 알겠습니다. 제가 사용하는 전체 방법은 다음과 같습니다. 7단계 정도입니다. 저는 여러 개의 읽기 트리를 포함하는데, 이는 모든 연구 논문에서 매우 일반적인 일입니다. 논문은 일반적으로 선형적이지 않고 상호 의존성이 많은 구조입니다. 논문을 읽는 첫 번째 단계는 적절하게 방향을 잡을 수 있을 만큼 충분한 문맥 정보를 갖는 것입니다. 새로운 분야로 들어가는 것은 매우 방향 감각을 잃게 합니다. 문맥이 없으면 길을 잃을 것입니다. 해당 분야에서 많은 경험을 가진 연구자는 이미 문맥을 가지고 있기 때문에 논문을 쉽게 이해할 수 있습니다. 논문을 빠르게 읽기 전에 모든 문맥을 흡수하는 제가 가장 좋아하는 기술은 다음과 같습니다. 논문의 주요 결과를 요약한 블로그 게시물을 두세 개 정도 찾았는데, 잘 쓰거나 쓸 필요는 없습니다. 그저 명명법과 그런 것들에 대한 방향을 잡는 데 도움이 될 뿐입니다. 그 후에 논문에 대한 다양한 비디오를 다섯 개나 열 개 정도 찾았습니다. 주요 결과도 다시 말씀드리지만, 논문에서 중요한 것에 대한 다양한 의견을 얻으려고 합니다. 여기서는 길이를 6~60분 사이로 다양하게 제안하겠습니다. 논문 주제에 대한 첫 번째 자료는 방향 감각을 잃게 만들지 못할 것입니다. 그러나 몇 번 읽다 보면 논문의 진짜 초점이 어디에 있고 강점이 어디에 있는지 알려주는 패턴을 알아차리기 시작할 것입니다. 이는 또한 해당 분야의 논문을 더 많이 읽을수록 맥락을 더 빨리 흡수하기가 점점 더 쉬워진다는 것을 의미합니다. 이제 머릿속에 맥락이 생겼으므로 논문을 처음 가볍게 읽을 수 있습니다. 먼저 처음부터 끝까지 선형적으로 읽으며 이해하지 못하는 모든 요소를 별도의 시트에 적습니다. 제가 이해하는 모든 것은 괜찮습니다. 보통 계속 읽습니다. 첫 번째 읽기를 하는 동안 이해하지 못하는 것에 대해 깊이 연구하지 않고 그냥 적습니다. 보통 논문에서 이해하기 어렵게 만드는 다섯 가지 사항을 적습니다. 이것들은 전체 연구를 이해하기 위해 어느 정도 분류하고 설명해야 하는 미지수입니다. 미지의 첫 번째 범주는 다른 항목에서 설명하지 않고 이해해야 하는 논문의 중요한 요소입니다. 두 번째 유형은 논문에 설명되어 있지만 완전히 이해하기 위해 약간의 작업이 필요한 중요한 주제입니다. 세 번째 유형의 미지수는 저자가 완전히 이해하지 못해서 무시한 요소입니다. 이러한 유형의 미지수는 일반적으로 답을 얻기 매우 어렵고 출판 후 몇 년 동안 비판의 핵심 대상이 됩니다. 네 번째 유형은 저자가 놀랍게도 잘못 이해한 내용입니다. 실제로 발생합니다. 논문이 쓰레기라는 의미는 아니지만 때로는 어쨌든 일이 잘 풀립니다. 다섯 번째 유형은 리뷰어가 출판 전에 추가하고 싶었지만 논문에 아무런 가치를 제공하지 않은 논문 요소입니다. 이러한 요소는 일반적으로 논문에서 튀어나옵니다. 모든 미지수를 적는 것을 마치면 몇 가지 틈을 채우기 시작할 준비가 된 것입니다. 이해하지 못한 내용을 이 다섯 가지 범주로 분류해 보세요. 지금은 외부 미지수만 다루고 이해하지 못하는 것은 조사하고 더 나은 이해로 논문으로 돌아오세요. 여기서 너무 길을 잃지 마세요. 특정 방법인 경우 논문은 레버리지를 활용합니다. 논문을 읽는 첫 번째 방법은 피상적으로 그 뒤에 있는 직관을 이해하는 것입니다. 아니면 더 나은 방법으로 필요한 기술 정보가 담긴 블로그를 읽는 것입니다. 이제 두 번째 읽기가 시작됩니다. 전체적인 목적은 내부적으로 알려지지 않은 내용을 최소한으로 줄이는 것입니다. 그 읽기에서 저는 논문의 설정이 무엇인지 이해하기 위해 초록과 서론으로 시작하는 것을 좋아합니다. 논문의 동기를 이해하려고 노력하면 방법이 왜 그런지 이해하는 데 도움이 될 것입니다. 그런 다음 논문의 끝에 있는 논의 및 결론 섹션으로 넘어가 이 실험의 종료 지점을 확인합니다. 저자가 연구에서 무엇을 얻었는지 깊이 이해하는 것이 중요합니다. 일반적으로 결론은 연결된 방식으로 시작의 동기를 따릅니다. 논문이 작성되지 않았기 때문에 서론이 먼저이고 그로부터 6개월 후에 결론에 도달했습니다. 모든 것이 재귀적으로 함께 작성되었습니다. 마지막으로 각 그림을 꺼내 그림의 모든 것이 의미하는 바, 축, 그래프가 강조하는 내용, 단위 등 모든 것을 이해하려고 노력합니다. 이 시점에서 저는 전체 연구와 매우 깊은 연관성을 느낍니다. 저는 서론과 결론을 앵커로 사용하여 그림에 표시된 결과를 일종의 논리적 타임라인에 맞추려고 노력했습니다. 연결되지 않은 것이 있다면, 그것은 제가 그들의 특정 방법이나 다른 사람이 얻은 결과 외에 이해하지 못한 다른 것이 있을 수 있다는 힌트입니다. 이제 두 번째 읽기를 마무리하는 두 가지 방법이 있습니다. 기술적 세부 사항을 천천히 읽고 공식을 포함하여 각 단계를 개념적으로 이해했는지 확인합니다. 또는 제공된 경우 먼저 논문의 코드 기반을 열고 각 섹션을 매핑하기 시작합니다. 저는 두 번째 방법을 좋아하는데, 소프트웨어 엔지니어로서 이런 것들이 마법이 아니라는 확신을 줄 수 있기 때문입니다. 결국, 그것은 어떤 결과를 생성하기 위해 작성되고 실행된 코드입니다. 코드 기반을 잘 살펴보고, 사물이 어디에 있는지에 대해 어느 정도 편안함을 느끼고 코드의 일부 이름을 논문의 이름과 교차 참조할 때까지 살펴보세요. 하지만 연구 코드는 일반적으로 매우 지저분하고 표준화된 구조가 없으므로 이 점을 명심하세요. 코드를 다 작성했으면 방법 결과 섹션을 자세히 살펴볼 준비가 된 것입니다. 천천히 신중하게 진행하고 논문의 방법론을 단계별로 이해하세요. 딥 러닝 실험에는 대부분 항상 존재하는 몇 가지 반복되는 요소가 있습니다. 구조가 방법 섹션을 탐색하는 방법을 아는 데 많은 도움이 된다는 것을 이해하세요. 첫째, 딥 러닝 모델에 공급되는 데이터가 있습니다. 데이터는 모델의 아키텍처 선택을 지시하는 특정 구조를 가지고 있습니다. 특정 분야로 들어가면 동일한 종류의 데이터 세트가 계속 반복되는 것을 보게 될 것입니다. 그런 다음 모델의 아키텍처 자체가 있습니다. 여기에는 사용된 계층, 대신 계층과의 연결 등이 있습니다. 아키텍처에는 처음에는 반드시 이해하지 못하는 여러 하위 요소가 있을 수 있으며 그 후에는 사용된 경사 하강 최적화 도구와 훈련 체계의 유형이 괜찮습니다. 여기에는 일반적으로 많은 경험적 세부 사항이 숨겨져 있으므로 훈련에서 이유가 항상 100% 명확하지 않아도 괜찮습니다. 마지막으로 원시 데이터에서 생성된 결과까지의 파이프라인이 있습니다. 때때로 논문은 앙상블 방법과 같이 특정 방식으로 여러 아키텍처를 결합합니다. 주어진 연구에서는 일반적으로 핵심을 이해하는 몇 가지 조정만 있습니다. 논문을 더 많이 읽을수록 쉽게 발견할 수 있습니다. 방법에서 각 단계를 분해하고 논리적 흐름을 이해하려고 노력하세요. 무언가 이상하거나 놀라운 느낌이 들면 코드나 다른 리소스를 확인하여 저자가 그런 방식으로 한 이유를 알아보세요. 마지막으로 결과 섹션을 천천히 살펴보고 서론과 결론 사이의 타임라인에 숫자 또는 질적 결과를 채워 넣으세요. 그리고 끝나면 보통 처음부터 끝까지 최종적으로 읽고 노트를 보고 흐름을 다시 완전히 이해했는지 확인합니다. 읽는 동안 이상하게 느껴지는 것이 있다면 논문에 설명된 내용을 여전히 이해하지 못했거나 저자가 문자 그대로 지식이 부족하여 무언가를 완전히 설명하지 않았거나 저자가 완전히 틀렸거나 검토자가 어차피 분리된 것처럼 느껴지는 불필요한 쓰레기 정보를 추가하도록 요청했을 수 있습니다. 모든 경우에서 이 세 번째 읽기 후에는 필요한 경우 재현성 노력을 수행할 만큼 논문을 충분히 이해하게 될 것입니다. 영상을 즐기셨기를 바랍니다. 이 경우 좋아요를 누르고 질문이 있으면 댓글을 남기는 것을 잊지 마세요. 도와드리기 위해 여기 있습니다. 모두 좋은 주말 보내시고 다음 영상에서 뵙겠습니다.