How ð€ Transformers solve tasks
ð€ Transformersã§ã§ããããšã§ãèªç¶èšèªåŠçïŒNLPïŒãé³å£°ãšãªãŒãã£ãªãã³ã³ãã¥ãŒã¿ããžã§ã³ã®ã¿ã¹ã¯ããããã®éèŠãªã¢ããªã±ãŒã·ã§ã³ã«ã€ããŠåŠã³ãŸããããã®ããŒãžã§ã¯ãã¢ãã«ããããã®ã¿ã¹ã¯ãã©ã®ããã«è§£æ±ºãããã詳ããèŠãŠãã¢ãã«ã®å éšã§äœãèµ·ãã£ãŠãããã説æããŸããç¹å®ã®ã¿ã¹ã¯ã解決ããããã«ã¯å€ãã®æ¹æ³ããããäžéšã®ã¢ãã«ã¯ç¹å®ã®ãã¯ããã¯ãå®è£ ãããããŸãã¯æ°ãã芳ç¹ããã¿ã¹ã¯ã«åãçµããããããŸããããTransformerã¢ãã«ã«ãšã£ãŠãäžè¬çãªã¢ã€ãã¢ã¯åãã§ããæè»ãªã¢ãŒããã¯ãã£ã®ãããã§ãã»ãšãã©ã®ã¢ãã«ã¯ãšã³ã³ãŒãããã³ãŒãããŸãã¯ãšã³ã³ãŒã-ãã³ãŒãæ§é ã®å€çš®ã§ããTransformerã¢ãã«ä»¥å€ã«ããåœç€Ÿã®ã©ã€ãã©ãªã«ã¯ã³ã³ãã¥ãŒã¿ããžã§ã³ã¿ã¹ã¯ã«ä»ã§ã䜿çšãããŠããããã€ãã®ç³ã¿èŸŒã¿ãã¥ãŒã©ã«ãããã¯ãŒã¯ïŒCNNïŒããããŸãããŸããçŸä»£ã®CNNãã©ã®ããã«æ©èœãããã説æããŸãã
ã¿ã¹ã¯ãã©ã®ããã«è§£æ±ºããããã説æããããã«ãã¢ãã«å éšã§æçšãªäºæž¬ãåºåããããã«äœãèµ·ãããã«ã€ããŠèª¬æããŸãã
- Wav2Vec2ïŒãªãŒãã£ãªåé¡ããã³èªåé³å£°èªèïŒASRïŒåã
- Vision TransformerïŒViTïŒããã³ConvNeXTïŒç»ååé¡åã
- DETRïŒãªããžã§ã¯ãæ€åºåã
- Mask2FormerïŒç»åã»ã°ã¡ã³ããŒã·ã§ã³åã
- GLPNïŒæ·±åºŠæšå®åã
- BERTïŒãšã³ã³ãŒãã䜿çšããããã¹ãåé¡ãããŒã¯ã³åé¡ãããã³è³ªåå¿çãªã©ã®NLPã¿ã¹ã¯åã
- GPT2ïŒãã³ãŒãã䜿çšããããã¹ãçæãªã©ã®NLPã¿ã¹ã¯åã
- BARTïŒãšã³ã³ãŒã-ãã³ãŒãã䜿çšããèŠçŽããã³ç¿»èš³ãªã©ã®NLPã¿ã¹ã¯åã
ããã«é²ãåã«ãå ã®Transformerã¢ãŒããã¯ãã£ã®åºæ¬çãªç¥èãæã€ãšè¯ãã§ãããšã³ã³ãŒãããã³ãŒããããã³æ³šæåãã©ã®ããã«åäœããããç¥ã£ãŠãããšãç°ãªãTransformerã¢ãã«ãã©ã®ããã«åäœããããç解ããã®ã«åœ¹ç«ã¡ãŸããå§ããŠãããããªãã¬ãã·ã¥ãå¿ èŠãªå Žåã¯ã詳现ãªæ å ±ã«ã€ããŠã¯åœç€Ÿã®ã³ãŒã¹ããã§ãã¯ããŠãã ããïŒ
Speech and audio
Wav2Vec2ã¯ãæªã©ãã«ã®é³å£°ããŒã¿ã§äºåãã¬ãŒãã³ã°ããããªãŒãã£ãªåé¡ããã³èªåé³å£°èªèã®ã©ãã«ä»ãããŒã¿ã§ãã¡ã€ã³ãã¥ãŒã³ãããèªå·±æåž«ã¢ãã«ã§ãã
ãã®ã¢ãã«ã«ã¯äž»ã«æ¬¡ã®4ã€ã®ã³ã³ããŒãã³ãããããŸãã
ç¹åŸŽãšã³ã³ãŒãïŒçã®é³å£°æ³¢åœ¢ãåãåããå¹³åå€ããŒãã«æ£èŠåããåäœåæ£ã«å€æããããã20msããšã®ç¹åŸŽãã¯ãã«ã®ã·ãŒã±ã³ã¹ã«å€æããŸãã
波圢ã¯èªç¶ã«é£ç¶ããŠãããããããã¹ãã®ã·ãŒã±ã³ã¹ãåèªã«åå²ã§ããããã«ã§ããããã«ãç¹åŸŽãã¯ãã«ã¯éååã¢ãžã¥ãŒã«ã«æž¡ãããé¢æ£é³å£°ãŠããããåŠç¿ããããšããŸããé³å£°ãŠãããã¯ã³ãŒãããã¯ïŒèªåœãšèããããšãã§ããŸãïŒãšããŠç¥ãããã³ãŒãã¯ãŒãã®ã³ã¬ã¯ã·ã§ã³ããéžæãããŸããã³ãŒãããã¯ãããé£ç¶ãããªãŒãã£ãªå ¥åãæãããè¡šããã¯ãã«ãŸãã¯é³å£°ãŠãããïŒã¿ãŒã²ããã©ãã«ãšèããããšãã§ããŸãïŒãéžæãããã¢ãã«ãä»ããŠè»¢éãããŸãã
ç¹åŸŽãã¯ãã«ã®çŽååã¯ã©ã³ãã ã«ãã¹ã¯ããããã¹ã¯ãããç¹åŸŽãã¯ãã«ã¯ã³ã³ããã¹ããããã¯ãŒã¯ã«äŸçµŠãããŸããããã¯ãçžå¯Ÿçãªäœçœ®ãšã³ãããã£ã³ã°ãè¿œå ããTransformerãšã³ã³ãŒãã§ãã
ã³ã³ããã¹ããããã¯ãŒã¯ã®äºåãã¬ãŒãã³ã°ã®ç®çã¯ã³ã³ãã©ã¹ãã£ãã¿ã¹ã¯ã§ããã¢ãã«ã¯ãã¹ã¯ãããäºæž¬ã®çã®éååé³å£°è¡šçŸããåœã®äºæž¬ã®ã»ããããäºæž¬ããªããã°ãªãããã¢ãã«ã¯æã䌌ãã³ã³ããã¹ããã¯ãã«ãšéååé³å£°ãŠãããïŒã¿ãŒã²ããã©ãã«ïŒãèŠã€ããããã«ä¿ãããŸãã
ä»ãWav2Vec2ã¯äºåãã¬ãŒãã³ã°ãããŠããã®ã§ããªãŒãã£ãªåé¡ãŸãã¯èªåé³å£°èªèã®ããã«ããŒã¿ããã¡ã€ã³ãã¥ãŒã³ã§ããŸãïŒ
Audio classification
äºåãã¬ãŒãã³ã°ãããã¢ãã«ããªãŒãã£ãªåé¡ã«äœ¿çšããã«ã¯ãåºæ¬çãªWav2Vec2ã¢ãã«ã®äžã«ã·ãŒã±ã³ã¹åé¡ããããè¿œå ããŸããåé¡ãããã¯ãšã³ã³ãŒãã®é ããç¶æ ãåãå ¥ããç·åœ¢å±€ã§ãåãªãŒãã£ãªãã¬ãŒã ããåŠç¿ãããç¹åŸŽãè¡šããŸãããããã®é ããç¶æ ã¯é·ããç°ãªãå¯èœæ§ããããããæåã«é ããç¶æ ãããŒã«ããã次ã«ã¯ã©ã¹ã©ãã«ã«å¯Ÿããããžããã«å€æãããŸããããžãããšã¿ãŒã²ããéã®ã¯ãã¹ãšã³ããããŒæ倱ãèšç®ãããæãå¯èœæ§ã®é«ãã¯ã©ã¹ãèŠã€ããããã«äœ¿çšãããŸãã
ãªãŒãã£ãªåé¡ãè©Šãæºåã¯ã§ããŸãããïŒWav2Vec2ããã¡ã€ã³ãã¥ãŒã³ããŠæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã®å®å šãªãªãŒãã£ãªåé¡ã¬ã€ãããã§ãã¯ããŠãã ããïŒ
Automatic speech recognition
äºåãã¬ãŒãã³ã°ãããã¢ãã«ãèªåé³å£°èªèã«äœ¿çšããã«ã¯ãconnectionist temporal classificationïŒCTCïŒã®ããã®åºæ¬çãªWav2Vec2ã¢ãã«ã®äžã«èšèªã¢ããªã³ã°ããããè¿œå ããŸããèšèªã¢ããªã³ã°ãããã¯ãšã³ã³ãŒãã®é ããç¶æ ãåãå ¥ããããããããžããã«å€æããŸããåããžããã¯ããŒã¯ã³ã¯ã©ã¹ãè¡šãïŒããŒã¯ã³æ°ã¯ã¿ã¹ã¯ã®èªåœããæ¥ãŸãïŒãããžãããšã¿ãŒã²ããéã®CTCæ倱ãèšç®ããã次ã«è»¢åã«å€æãããŸãã
èªåé³å£°èªèãè©Šãæºåã¯ã§ããŸãããïŒWav2Vec2ããã¡ã€ã³ãã¥ãŒã³ããŠæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã®å®å šãªèªåé³å£°èªèã¬ã€ãããã§ãã¯ããŠãã ããïŒ
Computer vision
ã³ã³ãã¥ãŒã¿ããžã§ã³ã®ã¿ã¹ã¯ãã¢ãããŒãããæ¹æ³ã¯2ã€ãããŸãã
- ç»åããããã®ã·ãŒã±ã³ã¹ã«åå²ããTransformerã䜿çšããŠäžŠåã«åŠçããŸãã
- ConvNeXTãªã©ã®ã¢ãã³ãªCNNã䜿çšããŸãããããã¯ç³ã¿èŸŒã¿å±€ã䜿çšããŸãããã¢ãã³ãªãããã¯ãŒã¯èšèšãæ¡çšããŠããŸãã
ãµãŒãã¢ãããŒãã§ã¯ãTransformerãšç³ã¿èŸŒã¿ãçµã¿åããããã®ããããŸãïŒäŸïŒConvolutional Vision TransformerãŸãã¯LeViTïŒããããã«ã€ããŠã¯è°è«ããŸãããããããã¯ããã§èª¿ã¹ã2ã€ã®ã¢ãããŒããçµã¿åãããŠããŸãã
ViTãšConvNeXTã¯ç»ååé¡ã«ãã䜿çšãããŸããããªããžã§ã¯ãæ€åºãã»ã°ã¡ã³ããŒã·ã§ã³ã深床æšå®ãªã©ã®ä»ã®ããžã§ã³ã¿ã¹ã¯ã«å¯ŸããŠã¯ãDETRãMask2FormerãGLPNãªã©ãé©ããŠããŸãã
Image classification
ViTãšConvNeXTã®äž¡æ¹ãç»ååé¡ã«äœ¿çšã§ããŸããäž»ãªéãã¯ãViTã泚æã¡ã«ããºã ã䜿çšããConvNeXTãç³ã¿èŸŒã¿ã䜿çšããããšã§ãã
Transformer
ViTã¯ç³ã¿èŸŒã¿ãå®å šã«Transformerã¢ãŒããã¯ãã£ã§çœ®ãæããŸããå ã®Transformerã«ç²ŸéããŠããå ŽåãViTã®ç解ã¯æ¢ã«ã»ãšãã©å®äºããŠããŸãã
ViTãå°å ¥ããäž»ãªå€æŽç¹ã¯ãç»åãTransformerã«äŸçµŠããæ¹æ³ã§ãã
ç»åã¯æ£æ¹åœ¢ã§éãªããªããããã®ã·ãŒã±ã³ã¹ã«åå²ãããåãããã¯ãã¯ãã«ãŸãã¯ãããåã蟌ã¿ã«å€æãããŸãããããåã蟌ã¿ã¯ãé©åãªå ¥å次å ãäœæããããã«2Dç³ã¿èŸŒã¿å±€ããçæãããŸãïŒåºæ¬ã®Transformerã®å Žåãåãããåã蟌ã¿ã«768ã®å€ããããŸãïŒã224x224ãã¯ã»ã«ã®ç»åãããå Žåãããã16x16ã®ç»åãããã«åå²ã§ããŸããããã¹ããåèªã«ããŒã¯ã³åãããããã«ãç»åã¯ãããã®ã·ãŒã±ã³ã¹ã«ãããŒã¯ã³åããããŸãã
åŠç¿åã蟌ã¿ãã€ãŸãç¹å¥ãª
[CLS]
ããŒã¯ã³ããBERTã®ããã«ãããåã蟌ã¿ã®å é ã«è¿œå ãããŸãã[CLS]
ããŒã¯ã³ã®æçµçãªé ããç¶æ ã¯ãä»å±ã®åé¡ãããã®å ¥åãšããŠäœ¿çšãããŸããä»ã®åºåã¯ç¡èŠãããŸãããã®ããŒã¯ã³ã¯ãã¢ãã«ãç»åã®è¡šçŸããšã³ã³ãŒãããæ¹æ³ãåŠã¶ã®ã«åœ¹ç«ã¡ãŸããããããšåŠç¿åã蟌ã¿ã«è¿œå ããæåŸã®èŠçŽ ã¯äœçœ®åã蟌ã¿ã§ããã¢ãã«ã¯ç»åããããã©ã®ããã«äžŠã¹ãããŠããããç¥ããŸããã®ã§ãäœçœ®åã蟌ã¿ãåŠç¿å¯èœã§ããããåã蟌ã¿ãšåããµã€ãºãæã¡ãŸããæåŸã«ããã¹ãŠã®åã蟌ã¿ãTransformerãšã³ã³ãŒãã«æž¡ãããŸãã
åºåãå ·äœçã«ã¯
[CLS]
ããŒã¯ã³ã®åºåã ãããå€å±€ããŒã»ãããã³ãããïŒMLPïŒã«æž¡ãããŸããViTã®äºåãã¬ãŒãã³ã°ã®ç®çã¯åçŽã«åé¡ã§ããä»ã®åé¡ããããšåæ§ã«ãMLPãããã¯åºåãã¯ã©ã¹ã©ãã«ã«å¯Ÿããããžããã«å€æããã¯ãã¹ãšã³ããããŒæ倱ãèšç®ããŠæãå¯èœæ§ã®é«ãã¯ã©ã¹ãèŠã€ããŸãã
ç»ååé¡ãè©Šãæºåã¯ã§ããŸãããïŒViTããã¡ã€ã³ãã¥ãŒã³ããŠæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã®å®å šãªç»ååé¡ã¬ã€ãããã§ãã¯ããŠãã ããïŒ
CNN
ãã®ã»ã¯ã·ã§ã³ã§ã¯ç³ã¿èŸŒã¿ã«ã€ããŠç°¡åã«èª¬æããŠããŸãããç»åã®åœ¢ç¶ãšãµã€ãºãã©ã®ããã«å€åããããäºåã«ç解ããŠãããšåœ¹ç«ã¡ãŸããç³ã¿èŸŒã¿ã«æ £ããŠããªãå Žåã¯ãfastaiã®æžç±ããConvolution Neural Networks chapterããã§ãã¯ããŠã¿ãŠãã ããïŒ
ConvNeXTã¯ãæ§èœãåäžãããããã«æ°ããã¢ãã³ãªãããã¯ãŒã¯èšèšãæ¡çšããCNNã¢ãŒããã¯ãã£ã§ãããã ããç³ã¿èŸŒã¿ã¯ã¢ãã«ã®äžæ žã«ãŸã ãããŸããé«ã¬ãã«ããèŠãå Žåãç³ã¿èŸŒã¿ïŒconvolutionïŒã¯ãå°ããªè¡åïŒã«ãŒãã«ïŒãç»åã®ãã¯ã»ã«ã®å°ããªãŠã£ã³ããŠã«ä¹ç®ãããæäœã§ããããã¯ç¹å®ã®ãã¯ã¹ãã£ãç·ã®æ²çãªã©ã®ç¹åŸŽãèšç®ããŸãããã®åŸã次ã®ãã¯ã»ã«ã®ãŠã£ã³ããŠã«ç§»åããŸããç³ã¿èŸŒã¿ã移åããè·é¢ã¯ã¹ãã©ã€ããšããŠç¥ãããŠããŸãã
[Convolution Arithmetic for Deep Learning](https://arxiv.org/abs/1603.07285) ããã®åºæ¬çãªããã£ã³ã°ãã¹ãã©ã€ãã®ãªãç³ã¿èŸŒã¿ããã®åºåãå¥ã®ç³ã¿èŸŒã¿å±€ã«äŸçµŠããåé£ç¶ããå±€ããšã«ããããã¯ãŒã¯ã¯ãããããã°ããã±ããã®ãããªããè€éã§æœè±¡çãªãã®ãåŠç¿ããŸããç³ã¿èŸŒã¿å±€ã®éã«ã¯ãç¹åŸŽã®æ¬¡å ãåæžããç¹åŸŽã®äœçœ®ã®å€åã«å¯ŸããŠã¢ãã«ãããå ç¢ã«ããããã«ããŒãªã³ã°å±€ãè¿œå ããã®ãäžè¬çã§ãã
ConvNeXTã¯ã以äžã®5ã€ã®æ¹æ³ã§CNNãã¢ãã³åããŠããŸãã
åã¹ããŒãžã®ãããã¯æ°ãå€æŽããç»åããã倧ããªã¹ãã©ã€ããšå¯Ÿå¿ããã«ãŒãã«ãµã€ãºã§ãããåããŸããéãªããªãã¹ã©ã€ãã£ã³ã°ãŠã£ã³ããŠã¯ãããã«ããç»åããããã«åå²ããViTã®æŠç¥ãšäŒŒãŠããŸãã
ããã«ãã㯠ã¬ã€ã€ãŒã¯ãã£ãã«æ°ãçž®å°ããããã埩å ããŸãã1x1ã®ç³ã¿èŸŒã¿ãå®è¡ããã®ã¯éããæ·±ããå¢ããããšãã§ããŸããéããã«ããã¯ã¯éã®ããšãè¡ãããã£ãã«æ°ãæ¡åŒµãããããçž®å°ããŸããããã¯ã¡ã¢ãªå¹çãé«ãã§ãã
ããã«ããã¯ã¬ã€ã€ãŒå ã®éåžžã®3x3ã®ç³ã¿èŸŒã¿å±€ãã深床æ¹åã®ç³ã¿èŸŒã¿ã§çœ®ãæããŸããããã¯åå ¥åãã£ãã«ã«åå¥ã«ç³ã¿èŸŒã¿ãé©çšããæåŸã«ããããç©ã¿éããç³ã¿èŸŒã¿ã§ããããã«ãããæ§èœåäžã®ããã«ãããã¯ãŒã¯å¹ ãåºãããŸãã
ViTã¯ã°ããŒãã«å容éãæã£ãŠããããããã®æ³šæã¡ã«ããºã ã®ãããã§äžåºŠã«ç»åã®å€ããèŠãããšãã§ããŸããConvNeXTã¯ãã®å¹æãåçŸããããšããã«ãŒãã«ãµã€ãºã7x7ã«å¢ãããŸãã
ConvNeXTã¯ãŸããTransformerã¢ãã«ãæš¡å£ããããã€ãã®ã¬ã€ã€ãŒãã¶ã€ã³å€æŽãè¡ã£ãŠããŸããã¢ã¯ãã£ããŒã·ã§ã³ãšæ£èŠåã¬ã€ã€ãŒãå°ãªãã掻æ§åé¢æ°ã¯ReLUã®ä»£ããã«GELUã«åãæ¿ããBatchNormã®ä»£ããã«LayerNormã䜿çšããŠããŸãã
ç³ã¿èŸŒã¿ãããã¯ããã®åºåã¯ãåé¡ãããã«æž¡ãããåºåãããžããã«å€æããæãå¯èœæ§ã®é«ãã©ãã«ãèŠã€ããããã«ã¯ãã¹ãšã³ããããŒæ倱ãèšç®ãããŸãã
Object detection
DETRãDEtection TRansformerãã¯CNNãšTransformerãšã³ã³ãŒããŒãã³ãŒããŒãçµã¿åããããšã³ãããŒãšã³ãã®ãªããžã§ã¯ãæ€åºã¢ãã«ã§ãã
äºåãã¬ãŒãã³ã°ãããCNN ããã¯ããŒã³ ã¯ããã¯ã»ã«å€ã§è¡šãããç»åãåãåããããã®äœè§£å床ã®ç¹åŸŽããããäœæããŸããç¹åŸŽãããã«ã¯æ¬¡å åæžã®ããã«1x1ã®ç³ã¿èŸŒã¿ãé©çšãããé«ã¬ãã«ã®ç»åè¡šçŸãæã€æ°ããç¹åŸŽããããäœæãããŸããTransformerã¯é£ç¶ã¢ãã«ã§ãããããç¹åŸŽãããã¯ç¹åŸŽãã¯ãã«ã®ã·ãŒã±ã³ã¹ã«å¹³åŠåãããäœçœ®ãšã³ããã£ã³ã°ãšçµã¿åãããããŸãã
ç¹åŸŽãã¯ãã«ã¯ãšã³ã³ãŒããŒã«æž¡ããããã®æ³šæã¬ã€ã€ãŒã䜿çšããŠç»åè¡šçŸãåŠç¿ããŸãã次ã«ããšã³ã³ãŒããŒã®é ãç¶æ ã¯ãã³ãŒããŒã®ãªããžã§ã¯ãã¯ãšãªãšçµã¿åããããŸãããªããžã§ã¯ãã¯ãšãªã¯ãç»åã®ç°ãªãé åã«çŠç¹ãåœãŠãåŠç¿åã蟌ã¿ã§ãå泚æã¬ã€ã€ãŒãé²è¡ããã«ã€ããŠæŽæ°ãããŸãããã³ãŒããŒã®é ãç¶æ ã¯ãåãªããžã§ã¯ãã¯ãšãªã«å¯ŸããŠããŠã³ãã£ã³ã°ããã¯ã¹ã®åº§æšãšã¯ã©ã¹ã©ãã«ãäºæž¬ãããã£ãŒããã©ã¯ãŒããããã¯ãŒã¯ã«æž¡ãããŸãããŸãã¯ãååšããªãå Žåã¯
no object
ãæž¡ãããŸããDETRã¯åãªããžã§ã¯ãã¯ãšãªã䞊è¡ããŠãã³ãŒãããŠãNã®æçµçãªäºæž¬ïŒNã¯ã¯ãšãªã®æ°ïŒãåºåããŸããå žåçãªèªå·±ååž°ã¢ãã«ã1ã€ã®èŠçŽ ã1åãã€äºæž¬ããã®ãšã¯ç°ãªãããªããžã§ã¯ãæ€åºã¯ã»ããäºæž¬ã¿ã¹ã¯ïŒ
ããŠã³ãã£ã³ã°ããã¯ã¹
ãã¯ã©ã¹ã©ãã«
ïŒã§ããã1åã®ãã¹ã§Nã®äºæž¬ãè¡ããŸããèšç·ŽäžãDETRã¯äºéšãããã³ã°æ倱ã䜿çšããŠãåºå®ãããæ°ã®äºæž¬ãšåºå®ãããäžé£ã®æ£è§£ã©ãã«ãæ¯èŒããŸãã Nã®ã©ãã«ã»ããã«æ£è§£ã©ãã«ãå°ãªãå Žåã
no object
ã¯ã©ã¹ã§ããã£ã³ã°ãããŸãããã®æ倱é¢æ°ã¯ãDETRã«äºæž¬ãšæ£è§£ã©ãã«ãšã®éã§1察1ã®å²ãåœãŠãèŠã€ããããã«ä¿ããŸããããŠã³ãã£ã³ã°ããã¯ã¹ãŸãã¯ã¯ã©ã¹ã©ãã«ã®ã©ã¡ãããæ£ãããªãå Žåãæ倱ãçºçããŸããåæ§ã«ãDETRãååšããªããªããžã§ã¯ããäºæž¬ããå Žåã眰éãç§ããããŸããããã«ãããDETRã¯1ã€ã®éåžžã«é¡èãªãªããžã§ã¯ãã«çŠç¹ãåœãŠãã®ã§ã¯ãªããç»åå ã®ä»ã®ãªããžã§ã¯ããèŠã€ããããã«ä¿ãããŸãã
DETRã®äžã«ãªããžã§ã¯ãæ€åºããããè¿œå ããŠãã¯ã©ã¹ã©ãã«ãšããŠã³ãã£ã³ã°ããã¯ã¹ã®åº§æšãèŠã€ããŸãããªããžã§ã¯ãæ€åºãããã«ã¯2ã€ã®ã³ã³ããŒãã³ãããããŸãïŒãã³ãŒããŒã®é ãç¶æ ãã¯ã©ã¹ã©ãã«ã®ããžããã«å€æããããã®ç·åœ¢å±€ãããã³ããŠã³ãã£ã³ã°ããã¯ã¹ãäºæž¬ããããã®MLPã§ãã
ãªããžã§ã¯ãæ€åºãè©Šãæºåã¯ã§ããŸãããïŒDETROã®å®å šãªãªããžã§ã¯ãæ€åºã¬ã€ãããã§ãã¯ããŠãDETROã®ãã¡ã€ã³ãã¥ãŒãã³ã°æ¹æ³ãšæšè«æ¹æ³ãåŠãã§ãã ããïŒ
Image segmentation
Mask2Formerã¯ããã¹ãŠã®çš®é¡ã®ç»åã»ã°ã¡ã³ããŒã·ã§ã³ã¿ã¹ã¯ã解決ããããã®ãŠãããŒãµã«ã¢ãŒããã¯ãã£ã§ããåŸæ¥ã®ã»ã°ã¡ã³ããŒã·ã§ã³ã¢ãã«ã¯éåžžãã€ã³ã¹ã¿ã³ã¹ãã»ãã³ãã£ãã¯ããŸãã¯ããããã£ãã¯ã»ã°ã¡ã³ããŒã·ã§ã³ã®ç¹å®ã®ãµãã¿ã¹ã¯ã«åãããŠèšèšãããŠããŸããMask2Formerã¯ããããã®ã¿ã¹ã¯ã®ããããããã¹ã¯åé¡ã®åé¡ãšããŠæããŸãããã¹ã¯åé¡ã¯ãã¯ã»ã«ãNã®ã»ã°ã¡ã³ãã«ã°ã«ãŒãåããäžããããç»åã«å¯ŸããŠNã®ãã¹ã¯ãšããã«å¯Ÿå¿ããã¯ã©ã¹ã©ãã«ãäºæž¬ããŸãããã®ã»ã¯ã·ã§ã³ã§ã¯ãMask2Formerã®åäœæ¹æ³ã説æããæåŸã«SegFormerã®ãã¡ã€ã³ãã¥ãŒãã³ã°ãè©Šãããšãã§ããŸãã
Mask2Formerã®äž»èŠãªã³ã³ããŒãã³ãã¯æ¬¡ã®3ã€ã§ãã
Swinããã¯ããŒã³ã¯ç»åãåãå ¥ãã3ã€ã®é£ç¶ãã3x3ã®ç³ã¿èŸŒã¿ããäœè§£å床ã®ç»åç¹åŸŽããããäœæããŸãã
ç¹åŸŽãããã¯ãã¯ã»ã«ãã³ãŒããŒã«æž¡ãããäœè§£å床ã®ç¹åŸŽãé«è§£å床ã®ãã¯ã»ã«åã蟌ã¿ã«åŸã ã«ã¢ãããµã³ããªã³ã°ããŸãããã¯ã»ã«ãã³ãŒããŒã¯å®éã«ã¯è§£å床1/32ã1/16ãããã³1/8ã®ãªãªãžãã«ç»åã®ãã«ãã¹ã±ãŒã«ç¹åŸŽïŒäœè§£å床ãšé«è§£å床ã®ç¹åŸŽãå«ãïŒãçæããŸãã
ãããã®ç°ãªãã¹ã±ãŒã«ã®ç¹åŸŽãããã®ããããã¯ãé«è§£å床ã®ç¹åŸŽããå°ãããªããžã§ã¯ãããã£ããã£ããããã«1åãã€ãã©ã³ã¹ãã©ãŒããŒãã³ãŒããŒã¬ã€ã€ãŒã«æž¡ãããŸããMask2Formerã®èŠç¹ã¯ããã³ãŒããŒã®ãã¹ã¯ã¢ãã³ã·ã§ã³ã¡ã«ããºã ã§ããã¯ãã¹ã¢ãã³ã·ã§ã³ãç»åå šäœã«æ³šæãåããããšãã§ããã®ã«å¯Ÿãããã¹ã¯ã¢ãã³ã·ã§ã³ã¯ç»åã®ç¹å®ã®é åã«ã®ã¿çŠç¹ãåœãŠãŸããããã¯éããããŒã«ã«ãªç»åç¹åŸŽã ãã§ãã¢ãã«ãåŠç¿ã§ãããããããã©ãŒãã³ã¹ãåäžããŸãã
DETRãšåæ§ã«ãMask2FormerãåŠç¿ããããªããžã§ã¯ãã¯ãšãªã䜿çšããç»åã®ç¹åŸŽãšçµã¿åãããŠã»ããã®äºæž¬ïŒ
ã¯ã©ã¹ã©ãã«
ããã¹ã¯äºæž¬
ïŒãè¡ããŸãããã³ãŒããŒã®é ãç¶æ ã¯ç·åœ¢å±€ã«æž¡ãããã¯ã©ã¹ã©ãã«ã«å¯Ÿããããžããã«å€æãããŸããããžãããšæ£è§£ã©ãã«éã®ã¯ãã¹ãšã³ããããŒæ倱ãæãå¯èœæ§ã®é«ããã®ãèŠã€ããŸãããã¹ã¯äºæž¬ã¯ããã¯ã»ã«åã蟌ã¿ãšæçµçãªãã³ãŒããŒã®é ãç¶æ ãçµã¿åãããŠçæãããŸããã·ã°ã¢ã€ãã¯ãã¹ãšã³ããããŒããã€ã¹æ倱ãããžãããšæ£è§£ãã¹ã¯ã®éã§æãå¯èœæ§ã®é«ããã¹ã¯ãèŠã€ããŸãã
ã»ã°ã¡ã³ããŒã·ã§ã³ã¿ã¹ã¯ã«åãçµãæºåãã§ããŸãããïŒSegFormerã®ãã¡ã€ã³ãã¥ãŒãã³ã°æ¹æ³ãšæšè«æ¹æ³ãåŠã¶ããã«ãå®å šãªç»åã»ã°ã¡ã³ããŒã·ã§ã³ã¬ã€ãããã§ãã¯ããŠã¿ãŠãã ããïŒ
Depth estimation
GLPNãGlobal-Local Path Networkãã¯ã»ã°ã¡ã³ããŒã·ã§ã³ãŸãã¯æ·±åºŠæšå®ãªã©ã®å¯ãªäºæž¬ã¿ã¹ã¯ã«é©ããŠããŸããSegFormerãšã³ã³ãŒããŒã軜éãã³ãŒããŒãšçµã¿åãããTransformerããŒã¹ã®æ·±åºŠæšå®ã¢ãã«ã§ãã
ViTã®ããã«ãç»åã¯ãããã®ã·ãŒã±ã³ã¹ã«åå²ãããŸããããããã®ç»åãããã¯å°ããã§ããããã¯ã»ã°ã¡ã³ããŒã·ã§ã³ã深床æšå®ãªã©ã®å¯ãªäºæž¬ã¿ã¹ã¯ã«é©ããŠããŸããç»åãããã¯ãããåã蟌ã¿ã«å€æãããŸãïŒãããåã蟌ã¿ã®äœææ¹æ³ã®è©³çŽ°ã«ã€ããŠã¯ãç»ååé¡ã»ã¯ã·ã§ã³ãåç §ããŠãã ããïŒããããã®ãããåã蟌ã¿ã¯ãšã³ã³ãŒããŒã«æž¡ãããŸãã
ãšã³ã³ãŒããŒã¯ãããåã蟌ã¿ãåãå ¥ããè€æ°ã®ãšã³ã³ãŒããŒãããã¯ãéããŠããããæž¡ããŸããåãããã¯ã«ã¯ã¢ãã³ã·ã§ã³ãšMix-FFNã¬ã€ã€ãŒãå«ãŸããŠããŸããåŸè ã®åœ¹å²ã¯äœçœ®æ å ±ãæäŸããããšã§ããåãšã³ã³ãŒããŒãããã¯ã®æåŸã«ã¯ãéå±€çè¡šçŸãäœæããããã®ãããããŒãžã³ã°ã¬ã€ã€ãŒããããŸããé£æ¥ãããããã®ã°ã«ãŒãããšã®ç¹åŸŽãé£çµãããé£çµãããç¹åŸŽã«å¯ŸããŠç·åœ¢å±€ãé©çšããããããã®æ°ã1/4ã®è§£å床ã«åæžããŸããããã次ã®ãšã³ã³ãŒããŒãããã¯ãžã®å ¥åãšãªããããã§ã¯ãã®ããã»ã¹å šäœãç¹°ãè¿ãããå ã®ç»åã®1/8ã1/16ãããã³1/32ã®è§£å床ã®ç»åç¹åŸŽãåŸãããŸãã
軜éãã³ãŒããŒã¯ããšã³ã³ãŒããŒããã®æåŸã®ç¹åŸŽãããïŒ1/32ã¹ã±ãŒã«ïŒãåãåããããã1/16ã¹ã±ãŒã«ã«ã¢ãããµã³ããªã³ã°ããŸãããã®åŸãç¹åŸŽã¯åç¹åŸŽã«å¯Ÿããã¢ãã³ã·ã§ã³ãããããããŒã«ã«ãšã°ããŒãã«ãªç¹åŸŽãéžæããŠçµã¿åãããã»ã¬ã¯ãã£ããã£ãŒãã£ãŒãã¥ãŒãžã§ã³ïŒSFFïŒã¢ãžã¥ãŒã«ã«æž¡ããã1/8ã«ã¢ãããµã³ããªã³ã°ãããŸãããã®ããã»ã¹ã¯ãã³ãŒããããç¹åŸŽãå ã®ç»åãšåããµã€ãºã«ãªããŸã§ç¹°ãè¿ãããŸãã
ãã³ãŒããããç¹åŸŽã¯ãæçµçãªäºæž¬ãè¡ãããã«ã»ãã³ãã£ãã¯ã»ã°ã¡ã³ããŒã·ã§ã³ã深床æšå®ããŸãã¯ãã®ä»ã®å¯ãªäºæž¬ã¿ã¹ã¯ã«äŸçµŠãããŸããã»ãã³ãã£ãã¯ã»ã°ã¡ã³ããŒã·ã§ã³ã®å Žåãç¹åŸŽã¯ã¯ã©ã¹æ°ã«å¯Ÿããããžããã«å€æãããã¯ãã¹ãšã³ããããŒæ倱ã䜿çšããŠæé©åãããŸãã深床æšå®ã®å Žåãç¹åŸŽã¯æ·±åºŠãããã«å€æãããå¹³å絶察誀差ïŒMAEïŒãŸãã¯å¹³åäºä¹èª€å·®ïŒMSEïŒæ倱ã䜿çšãããŸãã
Natural language processing
Transformerã¯æåã«æ©æ¢°ç¿»èš³ã®ããã«èšèšããããã以éãã»ãšãã©ã®NLPã¿ã¹ã¯ã解決ããããã®ããã©ã«ãã®ã¢ãŒããã¯ãã£ãšãªã£ãŠããŸããäžéšã®ã¿ã¹ã¯ã¯Transformerã®ãšã³ã³ãŒããŒæ§é ã«é©ããŠãããä»ã®ã¿ã¹ã¯ã¯ãã³ãŒããŒã«é©ããŠããŸããããã«ãäžéšã®ã¿ã¹ã¯ã§ã¯Transformerã®ãšã³ã³ãŒããŒ-ãã³ãŒããŒæ§é ã䜿çšããŸãã
Text classification
BERTã¯ãšã³ã³ãŒããŒã®ã¿ã®ã¢ãã«ã§ãããããã¹ãã®è±ããªè¡šçŸãåŠç¿ããããã«äž¡åŽã®åèªã«æ³šæãæãããšã§ãæ·±ãåæ¹åæ§ãå¹æçã«å®è£ ããæåã®ã¢ãã«ã§ãã
BERTã¯WordPieceããŒã¯ãã€ãŒãŒã·ã§ã³ã䜿çšããŠããã¹ãã®ããŒã¯ã³åã蟌ã¿ãçæããŸããåäžã®æãšæã®ãã¢ãåºå¥ããããã«ãç¹å¥ãª
[SEP]
ããŒã¯ã³ãè¿œå ãããŸãã[CLS]
ããŒã¯ã³ã¯ãã¹ãŠã®ããã¹ãã·ãŒã±ã³ã¹ã®å é ã«è¿œå ãããŸãã[CLS]
ããŒã¯ã³ãšãšãã«æçµåºåã¯ãåé¡ã¿ã¹ã¯ã®ããã®å ¥åãšããŠäœ¿çšãããŸããBERTã¯ãŸããããŒã¯ã³ãæã®ãã¢ã®æåãŸãã¯2çªç®ã®æã«å±ãããã©ããã瀺ãã»ã°ã¡ã³ãåã蟌ã¿ãè¿œå ããŸããBERTã¯ãäºåãã¬ãŒãã³ã°ã§2ã€ã®ç®æšã䜿çšããŸãïŒãã¹ã¯ãããèšèªã¢ããªã³ã°ãšæ¬¡ã®æã®äºæž¬ã§ãããã¹ã¯ãããèšèªã¢ããªã³ã°ã§ã¯ãå ¥åããŒã¯ã³ã®äžéšãã©ã³ãã ã«ãã¹ã¯ãããã¢ãã«ã¯ããããäºæž¬ããå¿ èŠããããŸããããã«ãããã¢ãã«ãå šãŠã®åèªãèŠãŠã次ã®åèªããäºæž¬ããããšãã§ããåæ¹åæ§ã®åé¡ã解決ãããŸããäºæž¬ããããã¹ã¯ããŒã¯ã³ã®æçµçãªé ããç¶æ ã¯ããœããããã¯ã¹ã䜿çšããåèªã®ãã¹ã¯ãäºæž¬ããããã®ãã£ãŒããã©ã¯ãŒããããã¯ãŒã¯ã«æž¡ãããŸãã
2çªç®ã®äºåãã¬ãŒãã³ã°ãªããžã§ã¯ãã¯æ¬¡ã®æã®äºæž¬ã§ããã¢ãã«ã¯æAã®åŸã«æBãç¶ããã©ãããäºæž¬ããå¿ èŠããããŸããååã®å ŽåãæBã¯æ¬¡ã®æã§ãããæ®ãã®ååã®å ŽåãæBã¯ã©ã³ãã ãªæã§ããäºæž¬ïŒæ¬¡ã®æãã©ããïŒã¯ã2ã€ã®ã¯ã©ã¹ïŒ
IsNext
ããã³NotNext
ïŒã«å¯Ÿãããœããããã¯ã¹ãæã€ãã£ãŒããã©ã¯ãŒããããã¯ãŒã¯ã«æž¡ãããŸããå ¥ååã蟌ã¿ã¯ãæçµçãªé ããç¶æ ãåºåããããã«è€æ°ã®ãšã³ã³ãŒããŒã¬ã€ã€ãŒãä»ããŠæž¡ãããŸãã
äºåèšç·Žæžã¿ã¢ãã«ãããã¹ãåé¡ã«äœ¿çšããã«ã¯ãããŒã¹ã®BERTã¢ãã«ã®äžã«ã·ãŒã±ã³ã¹åé¡ããããè¿œå ããŸããã·ãŒã±ã³ã¹åé¡ãããã¯æçµçãªé ããç¶æ ãåãå ¥ããããããããžããã«å€æããããã®ç·åœ¢å±€ã§ããã¯ãã¹ãšã³ããããŒæ倱ã¯ãããžãããšã¿ãŒã²ããéã§æãå¯èœæ§ã®é«ãã©ãã«ãèŠã€ããããã«èšç®ãããŸãã
ããã¹ãåé¡ãè©ŠããŠã¿ãæºåã¯ã§ããŸãããïŒDistilBERTã埮調æŽããæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å šãªããã¹ãåé¡ã¬ã€ãããã§ãã¯ããŠã¿ãŠãã ããïŒ
Token classification
BERTãååãšã³ãã£ãã£èªèïŒNERïŒãªã©ã®ããŒã¯ã³åé¡ã¿ã¹ã¯ã«äœ¿çšããã«ã¯ãããŒã¹ã®BERTã¢ãã«ã®äžã«ããŒã¯ã³åé¡ããããè¿œå ããŸããããŒã¯ã³åé¡ãããã¯æçµçãªé ããç¶æ ãåãå ¥ããããããããžããã«å€æããããã®ç·åœ¢å±€ã§ããã¯ãã¹ãšã³ããããŒæ倱ã¯ãããžãããšåããŒã¯ã³éã§æãå¯èœæ§ã®é«ãã©ãã«ãèŠã€ããããã«èšç®ãããŸãã
ããŒã¯ã³åé¡ãè©ŠããŠã¿ãæºåã¯ã§ããŸãããïŒDistilBERTã埮調æŽããæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å šãªããŒã¯ã³åé¡ã¬ã€ãããã§ãã¯ããŠã¿ãŠãã ããïŒ
Question answering
BERTã質åå¿çã«äœ¿çšããã«ã¯ãããŒã¹ã®BERTã¢ãã«ã®äžã«ã¹ãã³åé¡ããããè¿œå ããŸãããã®ç·åœ¢å±€ã¯æçµçãªé ããç¶æ ãåãå ¥ããåçã«å¯Ÿå¿ããããã¹ãã®ãã¹ãã³ãéå§ãšçµäºã®ããžãããèšç®ããŸããã¯ãã¹ãšã³ããããŒæ倱ã¯ãããžãããšã©ãã«äœçœ®ãšã®éã§æãå¯èœæ§ã®é«ãããã¹ãã¹ãã³ãèŠã€ããããã«èšç®ãããŸãã
質åå¿çãè©ŠããŠã¿ãæºåã¯ã§ããŸãããïŒDistilBERTã埮調æŽããæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å šãªè³ªåå¿çã¬ã€ãããã§ãã¯ããŠã¿ãŠãã ããïŒ
ð¡ 泚æããŠãã ãããäžåºŠäºåãã¬ãŒãã³ã°ãå®äºããBERTã䜿çšããŠããŸããŸãªã¿ã¹ã¯ã«ç°¡åã«é©çšã§ããããšã«æ³šç®ããŠãã ãããå¿ èŠãªã®ã¯ãäºåãã¬ãŒãã³ã°æžã¿ã¢ãã«ã«ç¹å®ã®ããããè¿œå ããŠãé ããç¶æ ãææã®åºåã«å€æããããšã ãã§ãïŒ
Text generation
GPT-2ã¯å€§éã®ããã¹ãã§äºåãã¬ãŒãã³ã°ããããã³ãŒããŒå°çšã¢ãã«ã§ããããã³ãããäžãããšèª¬åŸåã®ããããã¹ããçæããæ瀺çã«ãã¬ãŒãã³ã°ãããŠããªãã«ããããããã質åå¿çãªã©ã®ä»ã®NLPã¿ã¹ã¯ãå®äºã§ããŸãã
GPT-2ã¯ãã€ããã¢ãšã³ã³ãŒãã£ã³ã°ïŒBPEïŒã䜿çšããŠåèªãããŒã¯ãã€ãºããããŒã¯ã³åã蟌ã¿ãçæããŸããäœçœ®ãšã³ã³ãŒãã£ã³ã°ãããŒã¯ã³åã蟌ã¿ã«è¿œå ãããåããŒã¯ã³ã®äœçœ®ã瀺ããŸããå ¥ååã蟌ã¿ã¯è€æ°ã®ãã³ãŒããŒãããã¯ãä»ããŠæçµçãªé ããç¶æ ãåºåããããã«æž¡ãããŸããåãã³ãŒããŒãããã¯å ã§ãGPT-2ã¯ããã¹ã¯ãããèªå·±æ³šæãã¬ã€ã€ãŒã䜿çšããŸããããã¯ãGPT-2ãæªæ¥ã®ããŒã¯ã³ã«æ³šæãæãããšã¯ã§ããªãããšãæå³ããŸããGPT-2ã¯å·ŠåŽã®ããŒã¯ã³ã«ã®ã¿æ³šæãæãããšãèš±å¯ãããŠããŸããããã¯BERTã®
mask
ããŒã¯ã³ãšã¯ç°ãªãããã¹ã¯ãããèªå·±æ³šæã§ã¯æªæ¥ã®ããŒã¯ã³ã«å¯ŸããŠã¹ã³ã¢ã0
ã«èšå®ããããã®æ³šæãã¹ã¯ã䜿çšãããŸãããã³ãŒããŒããã®åºåã¯ãèšèªã¢ããªã³ã°ãããã«æž¡ãããæçµçãªé ããç¶æ ãããžããã«å€æããããã®ç·åœ¢å€æãå®è¡ããŸããã©ãã«ã¯ã·ãŒã±ã³ã¹å ã®æ¬¡ã®ããŒã¯ã³ã§ãããããã¯ããžãããå³ã«1ã€ããããŠçæãããŸããã¯ãã¹ãšã³ããããŒæ倱ã¯ãã·ãããããããžãããšã©ãã«éã§èšç®ããã次ã«æãå¯èœæ§ã®é«ãããŒã¯ã³ãåºåããŸãã
GPT-2ã®äºåãã¬ãŒãã³ã°ã®ç®æšã¯å®å šã«å æèšèªã¢ããªã³ã°ã«åºã¥ããŠãããã·ãŒã±ã³ã¹å ã®æ¬¡ã®åèªãäºæž¬ããŸããããã«ãããGPT-2ã¯ããã¹ãçæãå«ãã¿ã¹ã¯ã§ç¹ã«åªããæ§èœãçºæ®ããŸãã
ããã¹ãçæãè©ŠããŠã¿ãæºåã¯ã§ããŸãããïŒDistilGPT-2ã埮調æŽããæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å šãªå æèšèªã¢ããªã³ã°ã¬ã€ãããã§ãã¯ããŠã¿ãŠãã ããïŒ
ããã¹ãçæã«é¢ãã詳现ã¯ãããã¹ãçææŠç¥ã¬ã€ãããã§ãã¯ããŠã¿ãŠãã ããïŒ
Summarization
BART ã T5 ã®ãããªãšã³ã³ãŒããŒãã³ãŒããŒã¢ãã«ã¯ãèŠçŽã¿ã¹ã¯ã®ã·ãŒã±ã³ã¹ã»ãã¥ã»ã·ãŒã±ã³ã¹ã»ãã¿ãŒã³ã«èšèšãããŠããŸãããã®ã»ã¯ã·ã§ã³ã§ã¯ãBARTã®åäœæ¹æ³ã説æããæåŸã«T5ã®åŸ®èª¿æŽãè©Šãããšãã§ããŸãã
BARTã®ãšã³ã³ãŒããŒã¢ãŒããã¯ãã£ã¯ãBERTãšéåžžã«äŒŒãŠãããããã¹ãã®ããŒã¯ã³ãšäœçœ®ãšã³ããã£ã³ã°ãåãå ¥ããŸããBARTã¯ãå ¥åãç Žå£ããŠãããã³ãŒããŒã§åæ§ç¯ããããšã«ãã£ãŠäºåãã¬ãŒãã³ã°ãããŸããç¹å®ã®ç Žå£æŠç¥ãæã€ä»ã®ãšã³ã³ãŒããŒãšã¯ç°ãªããBARTã¯ä»»æã®çš®é¡ã®ç Žå£ãé©çšã§ããŸãããã ããããã¹ãã€ã³ãã£ãªã³ã°ç Žå£æŠç¥ãæé©ã§ããããã¹ãã€ã³ãã£ãªã³ã°ã§ã¯ãããã€ãã®ããã¹ãã¹ãã³ãåäžã®
mask
ããŒã¯ã³ã§çœ®ãæããããŸããããã¯éèŠã§ãããªããªãã¢ãã«ã¯ãã¹ã¯ãããããŒã¯ã³ãäºæž¬ããªããã°ãªãããã¢ãã«ã«æ¬ èœããŒã¯ã³ã®æ°ãäºæž¬ãããããã§ããå ¥ååã蟌ã¿ãšãã¹ã¯ãããã¹ãã³ã¯ãšã³ã³ãŒããŒãä»ããŠæçµçãªé ããç¶æ ãåºåããŸãããBERTãšã¯ç°ãªããBARTã¯åèªãäºæž¬ããããã®æçµçãªãã£ãŒããã©ã¯ãŒããããã¯ãŒã¯ãæåŸã«è¿œå ããŸããããšã³ã³ãŒããŒã®åºåã¯ãã³ãŒããŒã«æž¡ããããã³ãŒããŒã¯ãšã³ã³ãŒããŒã®åºåãããã¹ã¯ãããããŒã¯ã³ãšéç Žå£ããŒã¯ã³ãäºæž¬ããå¿ èŠããããŸããããã«ããããã³ãŒããŒã¯å ã®ããã¹ãã埩å ããã®ã«åœ¹ç«ã€è¿œå ã®ã³ã³ããã¹ããæäŸãããŸãããã³ãŒããŒããã®åºåã¯èšèªã¢ããªã³ã°ãããã«æž¡ãããé ããç¶æ ãããžããã«å€æããããã®ç·åœ¢å€æãå®è¡ããŸããã¯ãã¹ãšã³ããããŒæ倱ã¯ãããžãããšã©ãã«ã®éã§èšç®ãããã©ãã«ã¯åã«å³ã«ã·ãããããããŒã¯ã³ã§ãã
èŠçŽãè©Šãæºåã¯ã§ããŸãããïŒT5ã埮調æŽããŠæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å šãªèŠçŽã¬ã€ããã芧ãã ããïŒ
ããã¹ãçæã«é¢ãã詳现ã¯ãããã¹ãçææŠç¥ã¬ã€ãããã§ãã¯ããŠã¿ãŠãã ããïŒ
Translation
翻蚳ã¯ãããäžã€ã®ã·ãŒã±ã³ã¹ã»ãã¥ã»ã·ãŒã±ã³ã¹ã»ã¿ã¹ã¯ã®äŸã§ãããBART ã T5 ã®ãããªãšã³ã³ãŒããŒãã³ãŒããŒã¢ãã«ã䜿çšããŠå®è¡ã§ããŸãããã®ã»ã¯ã·ã§ã³ã§ã¯ãBARTã®åäœæ¹æ³ã説æããæåŸã«T5ã®åŸ®èª¿æŽãè©Šãããšãã§ããŸãã
BARTã¯ããœãŒã¹èšèªãã¿ãŒã²ããèšèªã«ãã³ãŒãã§ããããã«ããããã«ãå¥åã«ã©ã³ãã ã«åæåããããšã³ã³ãŒããŒãè¿œå ããããšã§ç¿»èš³ã«é©å¿ããŸãããã®æ°ãããšã³ã³ãŒããŒã®åã蟌ã¿ã¯ãå ã®åèªåã蟌ã¿ã®ä»£ããã«äºåãã¬ãŒãã³ã°æžã¿ã®ãšã³ã³ãŒããŒã«æž¡ãããŸãããœãŒã¹ãšã³ã³ãŒããŒã¯ãã¢ãã«ã®åºåããã®ã¯ãã¹ãšã³ããããŒæ倱ãçšããŠãœãŒã¹ãšã³ã³ãŒããŒãäœçœ®ãšã³ããã£ã³ã°ãããã³å ¥åãšã³ããã£ã³ã°ãæŽæ°ããããšã«ãã£ãŠèšç·ŽãããŸãããã®æåã®ã¹ãããã§ã¯ã¢ãã«ãã©ã¡ãŒã¿ãåºå®ããããã¹ãŠã®ã¢ãã«ãã©ã¡ãŒã¿ã2çªç®ã®ã¹ãããã§äžç·ã«èšç·ŽãããŸãã
ãã®åŸã翻蚳ã®ããã«å€èšèªçã®mBARTãç»å Žããå€èšèªã§äºåãã¬ãŒãã³ã°ãããã¢ãã«ãšããŠå©çšå¯èœã§ãã
翻蚳ãè©Šãæºåã¯ã§ããŸãããïŒT5ã埮調æŽããŠæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å šãªç¿»èš³ã¬ã€ããã芧ãã ããïŒ
ããã¹ãçæã«é¢ãã詳现ã¯ãããã¹ãçææŠç¥ã¬ã€ãããã§ãã¯ããŠã¿ãŠãã ããïŒ